All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
@ 2018-07-03  7:19 Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 01/15] linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT Eric Auger
                   ` (16 more replies)
  0 siblings, 17 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

This series aims at supporting PCDIMM/NVDIMM intantiation in
machvirt at 2TB guest physical address.

This is achieved in 3 steps:
1) support more than 40b IPA/GPA
2) support PCDIMM instantiation
3) support NVDIMM instantiation

This series reuses/rebases patches initially submitted by Shameer in [1]
and Kwangwoo in [2].

I put all parts all together for consistency and due to dependencies
however as soon as the kernel dependency is resolved we can consider
upstreaming them separately.

Support more than 40b IPA/GPA [ patches 1 - 5 ]
-----------------------------------------------
was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"

At the moment the guest physical address space is limited to 40b
due to KVM limitations. [0] bumps this limitation and allows to
create a VM with up to 52b GPA address space.

With this series, QEMU creates a virt VM with the max IPA range
reported by the host kernel or 40b by default.

This choice can be overriden by using the -machine kvm-type=<bits>
option with bits within [40, 52]. If <bits> are not supported by
the host, the legacy 40b value is used.

Currently the EDK2 FW also hardcodes the max number of GPA bits to
40. This will need to be fixed.

PCDIMM Support [ patches 6 - 11 ]
---------------------------------
was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"

We instantiate the device_memory at 2TB. Using it obviously requires
at least 42b of IPA/GPA. While its max capacity is currently limited
to 2TB, the actual size depends on the initial guest RAM size and
maxmem parameter.

Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
of support of those features in baremetal.

NVDIMM support [ patches 12 - 15 ]
----------------------------------

Once the memory hotplug framework is in place it is fairly
straightforward to add support for NVDIMM. the machine "nvdimm" option
turns the capability on.

Best Regards

Eric

References:

[0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
https://www.spinics.net/lists/kernel/msg2841735.html

[1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
http://patchwork.ozlabs.org/cover/914694/

[2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html

Tests:
- On Cavium Gigabyte, a 48b VM was created.
- Migration tests were performed between kernel supporting the
  feature and destination kernel not suporting it
- test with ACPI: to overcome the limitation of EDK2 FW, virt
  memory map was hacked to move the device memory below 1TB.

This series can be found at:
https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3

History:

v2 -> v3:
- fix pc_q35 and pc_piix compilation error
- kwangwoo's email being not valid anymore, remove his address

v1 -> v2:
- kvm_get_max_vm_phys_shift moved in arch specific file
- addition of NVDIMM part
- single series
- rebase on David's refactoring

v1:
- was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
- was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"

Best Regards

Eric


Eric Auger (9):
  linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
  hw/boards: Add a MachineState parameter to kvm_type callback
  kvm: add kvm_arm_get_max_vm_phys_shift
  hw/arm/virt: support kvm_type property
  hw/arm/virt: handle max_vm_phys_shift conflicts on migration
  hw/arm/virt: Allocate device_memory
  acpi: move build_srat_hotpluggable_memory to generic ACPI source
  hw/arm/boot: Expose the pmem nodes in the DT
  hw/arm/virt: Add nvdimm and nvdimm-persistence options

Kwangwoo Lee (2):
  nvdimm: use configurable ACPI IO base and size
  hw/arm/virt: Add nvdimm hot-plug infrastructure

Shameer Kolothum (4):
  hw/arm/virt: Add memory hotplug framework
  hw/arm/boot: introduce fdt_add_memory_node helper
  hw/arm/boot: Expose the PC-DIMM nodes in the DT
  hw/arm/virt-acpi-build: Add PC-DIMM in SRAT

 accel/kvm/kvm-all.c                            |   2 +-
 default-configs/arm-softmmu.mak                |   4 +
 hw/acpi/aml-build.c                            |  51 ++++
 hw/acpi/nvdimm.c                               |  28 ++-
 hw/arm/boot.c                                  | 123 +++++++--
 hw/arm/virt-acpi-build.c                       |  10 +
 hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
 hw/i386/acpi-build.c                           |  49 ----
 hw/i386/pc_piix.c                              |   8 +-
 hw/i386/pc_q35.c                               |   8 +-
 hw/ppc/mac_newworld.c                          |   2 +-
 hw/ppc/mac_oldworld.c                          |   2 +-
 hw/ppc/spapr.c                                 |   2 +-
 include/hw/acpi/aml-build.h                    |   3 +
 include/hw/arm/arm.h                           |   2 +
 include/hw/arm/virt.h                          |   7 +
 include/hw/boards.h                            |   2 +-
 include/hw/mem/nvdimm.h                        |  12 +
 include/standard-headers/linux/virtio_config.h |  16 +-
 linux-headers/asm-mips/unistd.h                |  18 +-
 linux-headers/asm-powerpc/kvm.h                |   1 +
 linux-headers/linux/kvm.h                      |  16 ++
 target/arm/kvm.c                               |   9 +
 target/arm/kvm_arm.h                           |  16 ++
 24 files changed, 597 insertions(+), 124 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 01/15] linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 02/15] hw/boards: Add a MachineState parameter to kvm_type callback Eric Auger
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

This is a header update against

git://linux-arm.org/linux-skp.git ipa52/v3

to get the KVM_ARM_GET_MAX_VM_PHYS_SHIFT ioctl. This allows to retrieve
the IPA address range KVM supports.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/standard-headers/linux/virtio_config.h | 16 ++++++++++++----
 linux-headers/asm-mips/unistd.h                | 18 ++++++++++++------
 linux-headers/asm-powerpc/kvm.h                |  1 +
 linux-headers/linux/kvm.h                      | 16 ++++++++++++++++
 4 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
index b777069..0b19436 100644
--- a/include/standard-headers/linux/virtio_config.h
+++ b/include/standard-headers/linux/virtio_config.h
@@ -45,11 +45,14 @@
 /* We've given up on this device. */
 #define VIRTIO_CONFIG_S_FAILED		0x80
 
-/* Some virtio feature bits (currently bits 28 through 32) are reserved for the
- * transport being used (eg. virtio_ring), the rest are per-device feature
- * bits. */
+/*
+ * Virtio feature bits VIRTIO_TRANSPORT_F_START through
+ * VIRTIO_TRANSPORT_F_END are reserved for the transport
+ * being used (e.g. virtio_ring, virtio_pci etc.), the
+ * rest are per-device feature bits.
+ */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		34
+#define VIRTIO_TRANSPORT_F_END		38
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +74,9 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM		33
+
+/*
+ * Does the device support Single Root I/O Virtualization?
+ */
+#define VIRTIO_F_SR_IOV			37
 #endif /* _LINUX_VIRTIO_CONFIG_H */
diff --git a/linux-headers/asm-mips/unistd.h b/linux-headers/asm-mips/unistd.h
index 9bfef7f..d4a85ef 100644
--- a/linux-headers/asm-mips/unistd.h
+++ b/linux-headers/asm-mips/unistd.h
@@ -388,17 +388,19 @@
 #define __NR_pkey_alloc			(__NR_Linux + 364)
 #define __NR_pkey_free			(__NR_Linux + 365)
 #define __NR_statx			(__NR_Linux + 366)
+#define __NR_rseq			(__NR_Linux + 367)
+#define __NR_io_pgetevents		(__NR_Linux + 368)
 
 
 /*
  * Offset of the last Linux o32 flavoured syscall
  */
-#define __NR_Linux_syscalls		366
+#define __NR_Linux_syscalls		368
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
 
 #define __NR_O32_Linux			4000
-#define __NR_O32_Linux_syscalls		366
+#define __NR_O32_Linux_syscalls		368
 
 #if _MIPS_SIM == _MIPS_SIM_ABI64
 
@@ -733,16 +735,18 @@
 #define __NR_pkey_alloc			(__NR_Linux + 324)
 #define __NR_pkey_free			(__NR_Linux + 325)
 #define __NR_statx			(__NR_Linux + 326)
+#define __NR_rseq			(__NR_Linux + 327)
+#define __NR_io_pgetevents		(__NR_Linux + 328)
 
 /*
  * Offset of the last Linux 64-bit flavoured syscall
  */
-#define __NR_Linux_syscalls		326
+#define __NR_Linux_syscalls		328
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
 
 #define __NR_64_Linux			5000
-#define __NR_64_Linux_syscalls		326
+#define __NR_64_Linux_syscalls		328
 
 #if _MIPS_SIM == _MIPS_SIM_NABI32
 
@@ -1081,15 +1085,17 @@
 #define __NR_pkey_alloc			(__NR_Linux + 328)
 #define __NR_pkey_free			(__NR_Linux + 329)
 #define __NR_statx			(__NR_Linux + 330)
+#define __NR_rseq			(__NR_Linux + 331)
+#define __NR_io_pgetevents		(__NR_Linux + 332)
 
 /*
  * Offset of the last N32 flavoured syscall
  */
-#define __NR_Linux_syscalls		330
+#define __NR_Linux_syscalls		332
 
 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
 
 #define __NR_N32_Linux			6000
-#define __NR_N32_Linux_syscalls		330
+#define __NR_N32_Linux_syscalls		332
 
 #endif /* _ASM_UNISTD_H */
diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 833ed9a..1b32b56 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -633,6 +633,7 @@ struct kvm_ppc_cpu_char {
 #define KVM_REG_PPC_PSSCR	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbd)
 
 #define KVM_REG_PPC_DEC_EXPIRY	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbe)
+#define KVM_REG_PPC_ONLINE	(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf)
 
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 98f389a..0a90115 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
 #define KVM_S390_SIE_PAGE_OFFSET 1
 
 /*
+ * On arm/arm64, machine type can be used to request the physical
+ * address size for the VM. Bits [7-0] have been reserved for the
+ * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
+ * value 0 implies the default IPA size, which is 40bits.
+ */
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK	0xff
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)		\
+	((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
+
+/*
  * ioctls for /dev/kvm fds:
  */
 #define KVM_GET_API_VERSION       _IO(KVMIO,   0x00)
@@ -775,6 +785,12 @@ struct kvm_ppc_resize_hpt {
 #define KVM_GET_MSR_FEATURE_INDEX_LIST    _IOWR(KVMIO, 0x0a, struct kvm_msr_list)
 
 /*
+ * Get the maximum physical address size supported by the host.
+ * Returns log2(Max-Physical-Address-Size)
+ */
+#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT	_IO(KVMIO, 0x0b)
+
+/*
  * Extension capability list.
  */
 #define KVM_CAP_IRQCHIP	  0
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 02/15] hw/boards: Add a MachineState parameter to kvm_type callback
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 01/15] linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 03/15] kvm: add kvm_arm_get_max_vm_phys_shift Eric Auger
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

On ARM, the kvm_type will be resolved by querying the KVMState.
Let's add the MachineState handle to the callback so that we
can retrieve the  KVMState handle. in kvm_init, when the callback
is called, the kvm_state variable is not yet set.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
[ppc parts]
---
 accel/kvm/kvm-all.c   | 2 +-
 hw/ppc/mac_newworld.c | 2 +-
 hw/ppc/mac_oldworld.c | 2 +-
 hw/ppc/spapr.c        | 2 +-
 include/hw/boards.h   | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index eb7db92..37ac834 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1549,7 +1549,7 @@ static int kvm_init(MachineState *ms)
 
     kvm_type = qemu_opt_get(qemu_get_machine_opts(), "kvm-type");
     if (mc->kvm_type) {
-        type = mc->kvm_type(kvm_type);
+        type = mc->kvm_type(ms, kvm_type);
     } else if (kvm_type) {
         ret = -EINVAL;
         fprintf(stderr, "Invalid argument kvm-type=%s\n", kvm_type);
diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index ff715ff..606b827 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -511,7 +511,7 @@ static void ppc_core99_init(MachineState *machine)
     qemu_register_boot_set(fw_cfg_boot_set, fw_cfg);
 }
 
-static int core99_kvm_type(const char *arg)
+static int core99_kvm_type(MachineState *ms, const char *arg)
 {
     /* Always force PR KVM */
     return 2;
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 4608bab..1211fcd 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -363,7 +363,7 @@ static void ppc_heathrow_init(MachineState *machine)
     qemu_register_boot_set(fw_cfg_boot_set, fw_cfg);
 }
 
-static int heathrow_kvm_type(const char *arg)
+static int heathrow_kvm_type(MachineState *ms, const char *arg)
 {
     /* Always force PR KVM */
     return 2;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b32b971..db6b018 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2841,7 +2841,7 @@ static void spapr_machine_init(MachineState *machine)
     }
 }
 
-static int spapr_kvm_type(const char *vm_type)
+static int spapr_kvm_type(MachineState *ms, const char *vm_type)
 {
     if (!vm_type) {
         return 0;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 79069dd..3ac9594 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -173,7 +173,7 @@ struct MachineClass {
     void (*init)(MachineState *state);
     void (*reset)(void);
     void (*hot_add_cpu)(const int64_t id, Error **errp);
-    int (*kvm_type)(const char *arg);
+    int (*kvm_type)(MachineState *ms, const char *arg);
 
     BlockInterfaceType block_default_type;
     int units_per_default_bus;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 03/15] kvm: add kvm_arm_get_max_vm_phys_shift
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 01/15] linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 02/15] hw/boards: Add a MachineState parameter to kvm_type callback Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 04/15] hw/arm/virt: support kvm_type property Eric Auger
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

Add the kvm_arm_get_max_vm_phys_shift() helper that returns the
log of the maximum IPA size supported by KVM. This capability
needs to be known to create the VM with a correct IPA max size
(kvm_type passed along KVM_CREATE_VM ioctl.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v1 -> v2:
- put this in ARM specific code
---
 target/arm/kvm.c     |  9 +++++++++
 target/arm/kvm_arm.h | 16 ++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 65f867d..7d501ca 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -18,6 +18,7 @@
 #include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
+#include "sysemu/kvm_int.h"
 #include "kvm_arm.h"
 #include "cpu.h"
 #include "trace.h"
@@ -154,6 +155,13 @@ void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu)
     env->features = arm_host_cpu_features.features;
 }
 
+int kvm_arm_get_max_vm_phys_shift(MachineState *ms)
+{
+    KVMState *s = KVM_STATE(ms->accelerator);
+
+    return kvm_ioctl(s, KVM_ARM_GET_MAX_VM_PHYS_SHIFT, 0);
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     /* For ARM interrupt delivery is always asynchronous,
@@ -713,3 +721,4 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
     return (data - 32) & 0xffff;
 }
+
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 863f205..ee973f6 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -183,6 +183,17 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf);
 void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu);
 
 /**
+ * kvm_arm_get_max_vm_phys_shift - Returns log2 of the max IPA size
+ * supported by KVM
+ *
+ * @s: Machine state handle
+ *
+ * Return the max number of IPA bits or a negative value if
+ * the host kernel does not expose this value.
+ */
+int kvm_arm_get_max_vm_phys_shift(MachineState *ms);
+
+/**
  * kvm_arm_sync_mpstate_to_kvm
  * @cpu: ARMCPU
  *
@@ -214,6 +225,11 @@ static inline void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu)
     cpu->host_cpu_probe_failed = true;
 }
 
+static inline int kvm_arm_get_max_vm_phys_shift(MachineState *ms)
+{
+    return -ENOENT;
+}
+
 static inline int kvm_arm_vgic_probe(void)
 {
     return 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 04/15] hw/arm/virt: support kvm_type property
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (2 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 03/15] kvm: add kvm_arm_get_max_vm_phys_shift Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 05/15] hw/arm/virt: handle max_vm_phys_shift conflicts on migration Eric Auger
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

The kvm-type property currently is used to pass
a user parameter to KVM_CREATE_VM. This matches
the way KVM/ARM expects to pass the max_vm_phys_shift
parameter.

This patch adds the support or the kvm-type property in
machvirt and also implements the machine class kvm_type()
callback so that it either returns the kvm-type value
provided by the user or returns the max_vm_phys_shift
exposed by KVM.

for instance, the usespace can use the following option to
instantiate a 42b IPA guest: -machine kvm-type=42

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c         | 44 ++++++++++++++++++++++++++++++++++++++++++++
 include/hw/arm/virt.h |  1 +
 2 files changed, 45 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 281ddcd..04a32de 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1649,6 +1649,21 @@ static void virt_set_iommu(Object *obj, const char *value, Error **errp)
     }
 }
 
+static char *virt_get_kvm_type(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return g_strdup(vms->kvm_type);
+}
+
+static void virt_set_kvm_type(Object *obj, const char *value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    g_free(vms->kvm_type);
+    vms->kvm_type = g_strdup(value);
+}
+
 static CpuInstanceProperties
 virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 {
@@ -1710,6 +1725,31 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
     return NULL;
 }
 
+static int virt_kvm_type(MachineState *ms, const char *type_str)
+{
+    int max_vm_phys_shift, ret = 0;
+    uint64_t type;
+
+    if (!type_str) {
+        max_vm_phys_shift = kvm_arm_get_max_vm_phys_shift(ms);
+        if (max_vm_phys_shift < 0) {
+            goto out;
+        }
+    } else {
+        type = g_ascii_strtoll(type_str, NULL, 0);
+        type &= 0xFF;
+        max_vm_phys_shift = (int)type;
+        if (max_vm_phys_shift < 40 || max_vm_phys_shift > 52) {
+            warn_report("valid kvm-type type values are within [40, 52]:"
+                        " option is ignored and VM is created with 40b IPA");
+            goto out;
+        }
+    }
+    ret = max_vm_phys_shift;
+out:
+    return ret;
+}
+
 static void virt_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -1733,6 +1773,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
     mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
+    mc->kvm_type = virt_kvm_type;
     assert(!mc->get_hotplug_handler);
     mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
     hc->plug = virt_machine_device_plug_cb;
@@ -1826,6 +1867,9 @@ static void virt_3_0_instance_init(Object *obj)
                                     "Valid values are none and smmuv3",
                                     NULL);
 
+    object_property_add_str(obj, "kvm-type",
+                            virt_get_kvm_type, virt_set_kvm_type, NULL);
+
     vms->memmap = a15memmap;
     vms->irqmap = a15irqmap;
 }
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 9a870cc..1a90ffc 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -124,6 +124,7 @@ typedef struct {
     uint32_t msi_phandle;
     uint32_t iommu_phandle;
     int psci_conduit;
+    char *kvm_type;
 } VirtMachineState;
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_PCIE_ECAM_HIGH : VIRT_PCIE_ECAM)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 05/15] hw/arm/virt: handle max_vm_phys_shift conflicts on migration
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (3 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 04/15] hw/arm/virt: support kvm_type property Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03 18:41   ` David Hildenbrand
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory Eric Auger
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

When migrating a VM, we must make sure the destination host
supports as many IPA bits as the source. Otherwise the migration
must fail.

We add a VMState infrastructure to machvirt. On pre_save(),
the current source max_vm_phys_shift is saved.

On destination, we cannot use this information when creating the
VM. The VM is created using the max value reported by the
destination host - or the kvm_type inherited value -. However on
post_load() we can check that this value is compatible with the
source saved value.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c         | 37 +++++++++++++++++++++++++++++++++++++
 include/hw/arm/virt.h |  2 ++
 2 files changed, 39 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 04a32de..5a4d0bf 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1316,6 +1316,40 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
     return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static int virt_post_load(void *opaque, int version_id)
+{
+    VirtMachineState *vms = (VirtMachineState *)opaque;
+
+    if (vms->max_vm_phys_shift < vms->source_max_vm_phys_shift) {
+        error_report("This host kernel only supports %d IPA bits whereas "
+                     "the guest requires %d GPA bits", vms->max_vm_phys_shift,
+                     vms->source_max_vm_phys_shift);
+        return -1;
+    }
+    return 0;
+}
+
+static int virt_pre_save(void *opaque)
+{
+    VirtMachineState *vms = (VirtMachineState *)opaque;
+
+    vms->source_max_vm_phys_shift = vms->max_vm_phys_shift;
+    return 0;
+}
+
+static const VMStateDescription vmstate_virt = {
+    .name = "virt",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .post_load = virt_post_load,
+    .pre_save = virt_pre_save,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT32(source_max_vm_phys_shift, VirtMachineState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+
 static void machvirt_init(MachineState *machine)
 {
     VirtMachineState *vms = VIRT_MACHINE(machine);
@@ -1537,6 +1571,7 @@ static void machvirt_init(MachineState *machine)
 
     vms->machine_done.notify = virt_machine_done;
     qemu_add_machine_init_done_notifier(&vms->machine_done);
+    vmstate_register(NULL, 0, &vmstate_virt, vms);
 }
 
 static bool virt_get_secure(Object *obj, Error **errp)
@@ -1727,6 +1762,7 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
 
 static int virt_kvm_type(MachineState *ms, const char *type_str)
 {
+    VirtMachineState *vms = VIRT_MACHINE(ms);
     int max_vm_phys_shift, ret = 0;
     uint64_t type;
 
@@ -1747,6 +1783,7 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
     }
     ret = max_vm_phys_shift;
 out:
+    vms->max_vm_phys_shift = (max_vm_phys_shift > 0) ? ret : 40;
     return ret;
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 1a90ffc..91f6de2 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -125,6 +125,8 @@ typedef struct {
     uint32_t iommu_phandle;
     int psci_conduit;
     char *kvm_type;
+    int32_t max_vm_phys_shift;
+    int32_t source_max_vm_phys_shift;
 } VirtMachineState;
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_PCIE_ECAM_HIGH : VIRT_PCIE_ECAM)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (4 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 05/15] hw/arm/virt: handle max_vm_phys_shift conflicts on migration Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03 18:25   ` David Hildenbrand
  2018-07-18 13:05   ` Igor Mammedov
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework Eric Auger
                   ` (10 subsequent siblings)
  16 siblings, 2 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

We define a new hotpluggable RAM region (aka. device memory).
Its base is 2TB GPA. This obviously requires 42b IPA support
in KVM/ARM, FW and guest kernel. At the moment the device
memory region is max 2TB.

This is largely inspired of device memory initialization in
pc machine code.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
---
 hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
 include/hw/arm/arm.h  |   2 +
 include/hw/arm/virt.h |   1 +
 3 files changed, 79 insertions(+), 28 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 5a4d0bf..6fefb78 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -59,6 +59,7 @@
 #include "qapi/visitor.h"
 #include "standard-headers/linux/input.h"
 #include "hw/arm/smmuv3.h"
+#include "hw/acpi/acpi.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -94,34 +95,25 @@
 
 #define PLATFORM_BUS_NUM_IRQS 64
 
-/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
- * RAM can go up to the 256GB mark, leaving 256GB of the physical
- * address space unallocated and free for future use between 256G and 512G.
- * If we need to provide more RAM to VMs in the future then we need to:
- *  * allocate a second bank of RAM starting at 2TB and working up
- *  * fix the DT and ACPI table generation code in QEMU to correctly
- *    report two split lumps of RAM to the guest
- *  * fix KVM in the host kernel to allow guests with >40 bit address spaces
- * (We don't want to fill all the way up to 512GB with RAM because
- * we might want it for non-RAM purposes later. Conversely it seems
- * reasonable to assume that anybody configuring a VM with a quarter
- * of a terabyte of RAM will be doing it on a host with more than a
- * terabyte of physical address space.)
- */
-#define RAMLIMIT_GB 255
-#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
+#define SZ_64K 0x10000
+#define SZ_1G (1024ULL * 1024 * 1024)
 
 /* Addresses and sizes of our components.
- * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
- * 128MB..256MB is used for miscellaneous device I/O.
- * 256MB..1GB is reserved for possible future PCI support (ie where the
- * PCI memory window will go if we add a PCI host controller).
- * 1GB and up is RAM (which may happily spill over into the
- * high memory region beyond 4GB).
- * This represents a compromise between how much RAM can be given to
- * a 32 bit VM and leaving space for expansion and in particular for PCI.
- * Note that devices should generally be placed at multiples of 0x10000,
+ * 0..128MB is space for a flash device so we can run bootrom code such as UEFI,
+ * 128MB..256MB is used for miscellaneous device I/O,
+ * 256MB..1GB is used for PCI host controller,
+ * 1GB..256GB is RAM (not hotpluggable),
+ * 256GB..512GB: is left for device I/O (non RAM purpose),
+ * 512GB..1TB: high mem PCI MMIO region,
+ * 2TB..4TB is used for hot-pluggable DIMM (assumes 42b GPA is supported).
+ *
+ * Note that IO devices should generally be placed at multiples of 0x10000,
  * to accommodate guests using 64K pages.
+ *
+ * Conversely it seems reasonable to assume that anybody configuring a VM
+ * with a quarter of a terabyte of RAM will be doing it on a host with more
+ * than a terabyte of physical address space.)
+ *
  */
 static const MemMapEntry a15memmap[] = {
     /* Space up to 0x8000000 is reserved for a boot ROM */
@@ -148,12 +140,13 @@ static const MemMapEntry a15memmap[] = {
     [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
     [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
     [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
-    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
+    [VIRT_MEM] =                { SZ_1G , 255 * SZ_1G },
     /* Additional 64 MB redist region (can contain up to 512 redistributors) */
     [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
     [VIRT_PCIE_ECAM_HIGH] =     { 0x4010000000ULL, 0x10000000 },
     /* Second PCIe window, 512GB wide at the 512GB boundary */
-    [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
+    [VIRT_PCIE_MMIO_HIGH] =     { 512 * SZ_1G, 512 * SZ_1G },
+    [VIRT_HOTPLUG_MEM] =        { 2048 * SZ_1G, 2048 * SZ_1G },
 };
 
 static const int a15irqmap[] = {
@@ -1223,6 +1216,58 @@ static void create_secure_ram(VirtMachineState *vms,
     g_free(nodename);
 }
 
+static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
+{
+    MachineState *ms = MACHINE(vms);
+    uint64_t device_memory_size;
+    uint64_t align = SZ_64K;
+
+    /* always allocate the device memory information */
+    ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
+
+    if (vms->max_vm_phys_shift < 42) {
+        /* device memory starts at 2TB whereas this VM supports less than
+         * 2TB GPA */
+        if (ms->maxram_size > ms->ram_size || ms->ram_slots) {
+            MachineClass *mc = MACHINE_GET_CLASS(ms);
+
+            error_report("\"-memory 'slots|maxmem'\" is not supported by %s "
+                         "since KVM does not support more than 41b IPA",
+                         mc->name);
+            exit(EXIT_FAILURE);
+        }
+        return;
+    }
+
+    if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
+        error_report("unsupported number of memory slots: %"PRIu64,
+                     ms->ram_slots);
+        exit(EXIT_FAILURE);
+    }
+
+    if (QEMU_ALIGN_UP(ms->maxram_size, align) != ms->maxram_size) {
+        error_report("maximum memory size must be aligned to multiple of 0x%"
+                     PRIx64, align);
+            exit(EXIT_FAILURE);
+    }
+
+    ms->device_memory->base = vms->memmap[VIRT_HOTPLUG_MEM].base;
+    device_memory_size = ms->maxram_size - ms->ram_size;
+
+    if (device_memory_size > vms->memmap[VIRT_HOTPLUG_MEM].size) {
+        error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
+                         ms->maxram_size);
+        exit(EXIT_FAILURE);
+    }
+
+    memory_region_init(&ms->device_memory->mr, OBJECT(vms),
+                       "device-memory", device_memory_size);
+    memory_region_add_subregion(sysmem, ms->device_memory->base,
+                                &ms->device_memory->mr);
+    vms->bootinfo.device_memory_start = ms->device_memory->base;
+    vms->bootinfo.device_memory_size = device_memory_size;
+}
+
 static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
 {
     const VirtMachineState *board = container_of(binfo, VirtMachineState,
@@ -1430,7 +1475,8 @@ static void machvirt_init(MachineState *machine)
     vms->smp_cpus = smp_cpus;
 
     if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
-        error_report("mach-virt: cannot model more than %dGB RAM", RAMLIMIT_GB);
+        error_report("mach-virt: cannot model more than %dGB RAM",
+                     (int)(vms->memmap[VIRT_MEM].size / SZ_1G));
         exit(1);
     }
 
@@ -1525,6 +1571,8 @@ static void machvirt_init(MachineState *machine)
                                          machine->ram_size);
     memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
 
+    create_device_memory(vms, sysmem);
+
     create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
 
     create_gic(vms, pic);
diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index ffed392..76269e6 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -116,6 +116,8 @@ struct arm_boot_info {
     bool secure_board_setup;
 
     arm_endianness endianness;
+    hwaddr device_memory_start;
+    hwaddr device_memory_size;
 };
 
 /**
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 91f6de2..173938d 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -78,6 +78,7 @@ enum {
     VIRT_GPIO,
     VIRT_SECURE_UART,
     VIRT_SECURE_MEM,
+    VIRT_HOTPLUG_MEM,
 };
 
 typedef enum VirtIOMMUType {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (5 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03 18:28   ` David Hildenbrand
  2018-07-03 18:44   ` David Hildenbrand
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 08/15] hw/arm/boot: introduce fdt_add_memory_node helper Eric Auger
                   ` (9 subsequent siblings)
  16 siblings, 2 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

This patch adds the the memory hot-plug/hot-unplug infrastructure
in machvirt.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>

---

v1 -> v2:
- s/virt_dimm_plug|unplug/virt_memory_plug|unplug
- s/pc_dimm_memory_plug/pc_dimm_plug
- reworded title and commit message
- added pre_plug cb
- don't handle get_memory_region failure anymore
---
 default-configs/arm-softmmu.mak |  2 ++
 hw/arm/virt.c                   | 68 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 834d45c..28fe8f3 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -152,3 +152,5 @@ CONFIG_PCI_DESIGNWARE=y
 CONFIG_STRONGARM=y
 CONFIG_HIGHBANK=y
 CONFIG_MUSICPAL=y
+CONFIG_MEM_HOTPLUG=y
+
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6fefb78..7190962 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -60,6 +60,8 @@
 #include "standard-headers/linux/input.h"
 #include "hw/arm/smmuv3.h"
 #include "hw/acpi/acpi.h"
+#include "hw/mem/pc-dimm.h"
+#include "hw/mem/nvdimm.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -1785,6 +1787,53 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
     return ms->possible_cpus;
 }
 
+static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                                 Error **errp)
+{
+    const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
+
+    if (is_nvdimm) {
+        error_setg(errp, "nvdimm is not yet supported");
+        return;
+    }
+}
+
+static void virt_memory_plug(HotplugHandler *hotplug_dev,
+                             DeviceState *dev, Error **errp)
+{
+    PCDIMMDevice *dimm = PC_DIMM(dev);
+    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MemoryRegion *mr = ddc->get_memory_region(dimm, &error_abort);
+    Error *local_err = NULL;
+    uint64_t align;
+
+    if (memory_region_get_alignment(mr)) {
+        align = memory_region_get_alignment(mr);
+    } else {
+        /* by default we align on 64KB page size */
+        align = SZ_64K;
+    }
+
+    pc_dimm_plug(dev, MACHINE(hotplug_dev), align, &local_err);
+
+    error_propagate(errp, local_err);
+}
+
+static void virt_memory_unplug(HotplugHandler *hotplug_dev,
+                               DeviceState *dev, Error **errp)
+{
+    pc_dimm_unplug(dev, MACHINE(hotplug_dev));
+    object_unparent(OBJECT(dev));
+}
+
+static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
+                                            DeviceState *dev, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        virt_memory_pre_plug(hotplug_dev, dev, errp);
+    }
+}
+
 static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                                         DeviceState *dev, Error **errp)
 {
@@ -1796,12 +1845,27 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                                      SYS_BUS_DEVICE(dev));
         }
     }
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+            virt_memory_plug(hotplug_dev, dev, errp);
+    }
+}
+
+static void virt_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
+                                          DeviceState *dev, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        virt_memory_unplug(hotplug_dev, dev, errp);
+    } else {
+        error_setg(errp, "device unplug request for unsupported device"
+                   " type: %s", object_get_typename(OBJECT(dev)));
+    }
 }
 
 static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
                                                         DeviceState *dev)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE) ||
+       (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM))) {
         return HOTPLUG_HANDLER(machine);
     }
 
@@ -1861,7 +1925,9 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     mc->kvm_type = virt_kvm_type;
     assert(!mc->get_hotplug_handler);
     mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
+    hc->pre_plug = virt_machine_device_pre_plug_cb;
     hc->plug = virt_machine_device_plug_cb;
+    hc->unplug = virt_machine_device_unplug_cb;
 }
 
 static const TypeInfo virt_machine_info = {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 08/15] hw/arm/boot: introduce fdt_add_memory_node helper
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (6 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-18 14:04   ` Igor Mammedov
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 09/15] hw/arm/boot: Expose the PC-DIMM nodes in the DT Eric Auger
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

We introduce an helper to create a memory node.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

---

v1 -> v2:
- nop of existing /memory nodes was already handled
---
 hw/arm/boot.c | 54 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index e09201c..5243a25 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -413,6 +413,36 @@ static void set_kernel_args_old(const struct arm_boot_info *info,
     }
 }
 
+static int fdt_add_memory_node(void *fdt, uint32_t acells, hwaddr mem_base,
+                               uint32_t scells, hwaddr mem_len,
+                               int numa_node_id)
+{
+    char *nodename = NULL;
+    int ret;
+
+    nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
+    qemu_fdt_add_subnode(fdt, nodename);
+    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
+    ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
+                                       scells, mem_len);
+    if (ret < 0) {
+        fprintf(stderr, "couldn't set %s/reg\n", nodename);
+        goto out;
+    }
+    if (numa_node_id < 0) {
+        goto out;
+    }
+
+    ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", numa_node_id);
+    if (ret < 0) {
+        fprintf(stderr, "couldn't set %s/numa-node-id\n", nodename);
+    }
+
+out:
+    g_free(nodename);
+    return ret;
+}
+
 static void fdt_add_psci_node(void *fdt)
 {
     uint32_t cpu_suspend_fn;
@@ -492,7 +522,6 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     void *fdt = NULL;
     int size, rc, n = 0;
     uint32_t acells, scells;
-    char *nodename;
     unsigned int i;
     hwaddr mem_base, mem_len;
     char **node_path;
@@ -566,35 +595,20 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
         mem_base = binfo->loader_start;
         for (i = 0; i < nb_numa_nodes; i++) {
             mem_len = numa_info[i].node_mem;
-            nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
-            qemu_fdt_add_subnode(fdt, nodename);
-            qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-            rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
-                                              acells, mem_base,
-                                              scells, mem_len);
+            rc = fdt_add_memory_node(fdt, acells, mem_base,
+                                     scells, mem_len, i);
             if (rc < 0) {
-                fprintf(stderr, "couldn't set %s/reg for node %d\n", nodename,
-                        i);
                 goto fail;
             }
 
-            qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", i);
             mem_base += mem_len;
-            g_free(nodename);
         }
     } else {
-        nodename = g_strdup_printf("/memory@%" PRIx64, binfo->loader_start);
-        qemu_fdt_add_subnode(fdt, nodename);
-        qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-
-        rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
-                                          acells, binfo->loader_start,
-                                          scells, binfo->ram_size);
+        rc = fdt_add_memory_node(fdt, acells, binfo->loader_start,
+                                 scells, binfo->ram_size, -1);
         if (rc < 0) {
-            fprintf(stderr, "couldn't set %s reg\n", nodename);
             goto fail;
         }
-        g_free(nodename);
     }
 
     rc = fdt_path_offset(fdt, "/chosen");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 09/15] hw/arm/boot: Expose the PC-DIMM nodes in the DT
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (7 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 08/15] hw/arm/boot: introduce fdt_add_memory_node helper Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 10/15] acpi: move build_srat_hotpluggable_memory to generic ACPI source Eric Auger
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

This patch add memory nodes corresponding to PC-DIMM regions.

NV_DIMM and ACPI_NVDIMM configs are not yet set for ARM so we
don't need to care about NV-DIMM at this stage.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v1 -> v2:
- added qapi_free_MemoryDeviceInfoList and simplify the loop
---
 hw/arm/boot.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 5243a25..2c7d558 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -19,6 +19,7 @@
 #include "sysemu/numa.h"
 #include "hw/boards.h"
 #include "hw/loader.h"
+#include "hw/mem/memory-device.h"
 #include "elf.h"
 #include "sysemu/device_tree.h"
 #include "qemu/config-file.h"
@@ -516,6 +517,35 @@ static void fdt_add_psci_node(void *fdt)
     qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
 }
 
+static int fdt_add_hotpluggable_memory_nodes(void *fdt,
+                                             uint64_t base, uint64_t len,
+                                             uint32_t acells, uint32_t scells) {
+    MemoryDeviceInfoList *info, *info_list = qmp_memory_device_list();
+    MemoryDeviceInfo *mi;
+    PCDIMMDeviceInfo *di;
+    bool is_nvdimm;
+    int ret = 0;
+
+    for (info = info_list; info != NULL; info = info->next) {
+        mi = info->value;
+        is_nvdimm = (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM);
+        di = !is_nvdimm ? mi->u.dimm.data : mi->u.nvdimm.data;
+
+        if (is_nvdimm) {
+            ret = -ENOENT; /* NV-DIMM not yet supported */
+        } else {
+            ret = fdt_add_memory_node(fdt, acells, di->addr,
+                                      scells, di->size, di->node);
+        }
+        if (ret < 0) {
+            goto out;
+        }
+    }
+out:
+    qapi_free_MemoryDeviceInfoList(info_list);
+    return ret;
+}
+
 int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
                  hwaddr addr_limit, AddressSpace *as)
 {
@@ -611,6 +641,14 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
         }
     }
 
+    rc = fdt_add_hotpluggable_memory_nodes(fdt, binfo->device_memory_start,
+                                           binfo->device_memory_size,
+                                           acells, scells);
+    if (rc < 0) {
+            fprintf(stderr, "couldn't add hotpluggable memory nodes\n");
+            goto fail;
+    }
+
     rc = fdt_path_offset(fdt, "/chosen");
     if (rc < 0) {
         qemu_fdt_add_subnode(fdt, "/chosen");
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 10/15] acpi: move build_srat_hotpluggable_memory to generic ACPI source
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (8 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 09/15] hw/arm/boot: Expose the PC-DIMM nodes in the DT Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 11/15] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT Eric Auger
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

We plan to reuse build_srat_hotpluggable_memory() for ARM so
let's move the function to aml-build.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/acpi/aml-build.c         | 51 +++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/acpi-build.c        | 49 -------------------------------------------
 include/hw/acpi/aml-build.h |  3 +++
 3 files changed, 54 insertions(+), 49 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 1e43cd7..167fb6b 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -22,6 +22,7 @@
 #include "qemu/osdep.h"
 #include <glib/gprintf.h>
 #include "hw/acpi/aml-build.h"
+#include "hw/mem/memory-device.h"
 #include "qemu/bswap.h"
 #include "qemu/bitops.h"
 #include "sysemu/numa.h"
@@ -1802,3 +1803,53 @@ build_hdr:
     build_header(linker, tbl, (void *)(tbl->data + fadt_start),
                  "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
 }
+
+void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
+                                    uint64_t len, int default_node)
+{
+    MemoryDeviceInfoList *info_list = qmp_memory_device_list();
+    MemoryDeviceInfoList *info;
+    MemoryDeviceInfo *mi;
+    PCDIMMDeviceInfo *di;
+    uint64_t end = base + len, cur, size;
+    bool is_nvdimm;
+    AcpiSratMemoryAffinity *numamem;
+    MemoryAffinityFlags flags;
+
+    for (cur = base, info = info_list;
+         cur < end;
+         cur += size, info = info->next) {
+        numamem = acpi_data_push(table_data, sizeof *numamem);
+
+        if (!info) {
+            build_srat_memory(numamem, cur, end - cur, default_node,
+                              MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+            break;
+        }
+
+        mi = info->value;
+        is_nvdimm = (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM);
+        di = !is_nvdimm ? mi->u.dimm.data : mi->u.nvdimm.data;
+
+        if (cur < di->addr) {
+            build_srat_memory(numamem, cur, di->addr - cur, default_node,
+                              MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+            numamem = acpi_data_push(table_data, sizeof *numamem);
+        }
+
+        size = di->size;
+
+        flags = MEM_AFFINITY_ENABLED;
+        if (di->hotpluggable) {
+            flags |= MEM_AFFINITY_HOTPLUGGABLE;
+        }
+        if (is_nvdimm) {
+            flags |= MEM_AFFINITY_NON_VOLATILE;
+        }
+
+        build_srat_memory(numamem, di->addr, size, di->node, flags);
+    }
+
+    qapi_free_MemoryDeviceInfoList(info_list);
+}
+
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9bc6d97..fcebd02 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2251,55 +2251,6 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
 #define HOLE_640K_START  (640 * 1024)
 #define HOLE_640K_END   (1024 * 1024)
 
-static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
-                                           uint64_t len, int default_node)
-{
-    MemoryDeviceInfoList *info_list = qmp_memory_device_list();
-    MemoryDeviceInfoList *info;
-    MemoryDeviceInfo *mi;
-    PCDIMMDeviceInfo *di;
-    uint64_t end = base + len, cur, size;
-    bool is_nvdimm;
-    AcpiSratMemoryAffinity *numamem;
-    MemoryAffinityFlags flags;
-
-    for (cur = base, info = info_list;
-         cur < end;
-         cur += size, info = info->next) {
-        numamem = acpi_data_push(table_data, sizeof *numamem);
-
-        if (!info) {
-            build_srat_memory(numamem, cur, end - cur, default_node,
-                              MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
-            break;
-        }
-
-        mi = info->value;
-        is_nvdimm = (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM);
-        di = !is_nvdimm ? mi->u.dimm.data : mi->u.nvdimm.data;
-
-        if (cur < di->addr) {
-            build_srat_memory(numamem, cur, di->addr - cur, default_node,
-                              MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
-            numamem = acpi_data_push(table_data, sizeof *numamem);
-        }
-
-        size = di->size;
-
-        flags = MEM_AFFINITY_ENABLED;
-        if (di->hotpluggable) {
-            flags |= MEM_AFFINITY_HOTPLUGGABLE;
-        }
-        if (is_nvdimm) {
-            flags |= MEM_AFFINITY_NON_VOLATILE;
-        }
-
-        build_srat_memory(numamem, di->addr, size, di->node, flags);
-    }
-
-    qapi_free_MemoryDeviceInfoList(info_list);
-}
-
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 6c36903..4c2ca13 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -416,4 +416,7 @@ void build_slit(GArray *table_data, BIOSLinker *linker);
 
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
                 const char *oem_id, const char *oem_table_id);
+
+void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
+                                    uint64_t len, int default_node);
 #endif
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 11/15] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (9 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 10/15] acpi: move build_srat_hotpluggable_memory to generic ACPI source Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 12/15] nvdimm: use configurable ACPI IO base and size Eric Auger
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Generate Memory Affinity Structures for PC-DIMM ranges.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v1 -> v2:
- build_srat_hotpluggable_memory movedc to aml-build
---
 hw/arm/virt-acpi-build.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 6ea47e2..0915391 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -568,6 +568,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         mem_base += numa_info[i].node_mem;
     }
 
+    build_srat_hotpluggable_memory(table_data,
+                                   vms->bootinfo.device_memory_start,
+                                   vms->bootinfo.device_memory_size , 0);
+
     build_header(linker, table_data, (void *)(table_data->data + srat_start),
                  "SRAT", table_data->len - srat_start, 3, NULL, NULL);
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 12/15] nvdimm: use configurable ACPI IO base and size
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (10 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 11/15] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 13/15] hw/arm/virt: Add nvdimm hot-plug infrastructure Eric Auger
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

From: Kwangwoo Lee <kwangwoo.lee@sk.com>

This patch uses configurable IO base and size to create NPIO AML for
ACPI NFIT. Since a different architecture like AArch64 does not use
port-mapped IO, a configurable IO base is required to create correct
mapping of ACPI IO address and size.

Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- s/size/len in pc_piix.c and pc_q35.c
---
 hw/acpi/nvdimm.c        | 28 +++++++++++++++++++---------
 hw/i386/pc_piix.c       |  8 +++++++-
 hw/i386/pc_q35.c        |  8 +++++++-
 include/hw/mem/nvdimm.h | 12 ++++++++++++
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 27eeb66..17d7146 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -929,8 +929,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
                             FWCfgState *fw_cfg, Object *owner)
 {
     memory_region_init_io(&state->io_mr, owner, &nvdimm_dsm_ops, state,
-                          "nvdimm-acpi-io", NVDIMM_ACPI_IO_LEN);
-    memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
+                          "nvdimm-acpi-io", state->dsm_io.len);
+    memory_region_add_subregion(io, state->dsm_io.base, &state->io_mr);
 
     state->dsm_mem = g_array_new(false, true /* clear */, 1);
     acpi_data_push(state->dsm_mem, sizeof(NvdimmDsmIn));
@@ -959,12 +959,14 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
 
 #define NVDIMM_QEMU_RSVD_UUID   "648B9CF2-CDA1-4312-8AD9-49C4AF32BD62"
 
-static void nvdimm_build_common_dsm(Aml *dev)
+static void nvdimm_build_common_dsm(Aml *dev,
+                                    AcpiNVDIMMState *acpi_nvdimm_state)
 {
     Aml *method, *ifctx, *function, *handle, *uuid, *dsm_mem, *elsectx2;
     Aml *elsectx, *unsupport, *unpatched, *expected_uuid, *uuid_invalid;
     Aml *pckg, *pckg_index, *pckg_buf, *field, *dsm_out_buf, *dsm_out_buf_size;
     uint8_t byte_list[1];
+    AmlRegionSpace rs;
 
     method = aml_method(NVDIMM_COMMON_DSM, 5, AML_SERIALIZED);
     uuid = aml_arg(0);
@@ -975,9 +977,16 @@ static void nvdimm_build_common_dsm(Aml *dev)
 
     aml_append(method, aml_store(aml_name(NVDIMM_ACPI_MEM_ADDR), dsm_mem));
 
+    if (acpi_nvdimm_state->dsm_io.type == NVDIMM_ACPI_IO_PORT) {
+        rs = AML_SYSTEM_IO;
+    } else {
+        rs = AML_SYSTEM_MEMORY;
+    }
+
     /* map DSM memory and IO into ACPI namespace. */
-    aml_append(method, aml_operation_region(NVDIMM_DSM_IOPORT, AML_SYSTEM_IO,
-               aml_int(NVDIMM_ACPI_IO_BASE), NVDIMM_ACPI_IO_LEN));
+    aml_append(method, aml_operation_region(NVDIMM_DSM_IOPORT, rs,
+               aml_int(acpi_nvdimm_state->dsm_io.base),
+               acpi_nvdimm_state->dsm_io.len));
     aml_append(method, aml_operation_region(NVDIMM_DSM_MEMORY,
                AML_SYSTEM_MEMORY, dsm_mem, sizeof(NvdimmDsmIn)));
 
@@ -1260,7 +1269,8 @@ static void nvdimm_build_nvdimm_devices(Aml *root_dev, uint32_t ram_slots)
 }
 
 static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
-                              BIOSLinker *linker, GArray *dsm_dma_arrea,
+                              BIOSLinker *linker,
+                              AcpiNVDIMMState *acpi_nvdimm_state,
                               uint32_t ram_slots)
 {
     Aml *ssdt, *sb_scope, *dev;
@@ -1288,7 +1298,7 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
      */
     aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
 
-    nvdimm_build_common_dsm(dev);
+    nvdimm_build_common_dsm(dev, acpi_nvdimm_state);
 
     /* 0 is reserved for root device. */
     nvdimm_build_device_dsm(dev, 0);
@@ -1307,7 +1317,7 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
                                                NVDIMM_ACPI_MEM_ADDR);
 
     bios_linker_loader_alloc(linker,
-                             NVDIMM_DSM_MEM_FILE, dsm_dma_arrea,
+                             NVDIMM_DSM_MEM_FILE, acpi_nvdimm_state->dsm_mem,
                              sizeof(NvdimmDsmIn), false /* high memory */);
     bios_linker_loader_add_pointer(linker,
         ACPI_BUILD_TABLE_FILE, mem_addr_offset, sizeof(uint32_t),
@@ -1329,7 +1339,7 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
         return;
     }
 
-    nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
+    nvdimm_build_ssdt(table_offsets, table_data, linker, state,
                       ram_slots);
 
     device_list = nvdimm_get_device_list();
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index d357907..95dde50 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -296,7 +296,13 @@ static void pc_init1(MachineState *machine,
     }
 
     if (pcms->acpi_nvdimm_state.is_enabled) {
-        nvdimm_init_acpi_state(&pcms->acpi_nvdimm_state, system_io,
+        AcpiNVDIMMState *acpi_nvdimm_state = &pcms->acpi_nvdimm_state;
+
+        acpi_nvdimm_state->dsm_io.type = NVDIMM_ACPI_IO_PORT;
+        acpi_nvdimm_state->dsm_io.base = NVDIMM_ACPI_IO_BASE;
+        acpi_nvdimm_state->dsm_io.len = NVDIMM_ACPI_IO_LEN;
+
+        nvdimm_init_acpi_state(acpi_nvdimm_state, system_io,
                                pcms->fw_cfg, OBJECT(pcms));
     }
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 1a73e18..98e9d08 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -276,7 +276,13 @@ static void pc_q35_init(MachineState *machine)
     pc_nic_init(pcmc, isa_bus, host_bus);
 
     if (pcms->acpi_nvdimm_state.is_enabled) {
-        nvdimm_init_acpi_state(&pcms->acpi_nvdimm_state, system_io,
+        AcpiNVDIMMState *acpi_nvdimm_state = &pcms->acpi_nvdimm_state;
+
+        acpi_nvdimm_state->dsm_io.type = NVDIMM_ACPI_IO_PORT;
+        acpi_nvdimm_state->dsm_io.base = NVDIMM_ACPI_IO_BASE;
+        acpi_nvdimm_state->dsm_io.len = NVDIMM_ACPI_IO_LEN;
+
+        nvdimm_init_acpi_state(acpi_nvdimm_state, system_io,
                                pcms->fw_cfg, OBJECT(pcms));
     }
 }
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index c5c9b3c..af8a5fd 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -123,6 +123,17 @@ struct NvdimmFitBuffer {
 };
 typedef struct NvdimmFitBuffer NvdimmFitBuffer;
 
+typedef enum {
+    NVDIMM_ACPI_IO_PORT,
+    NVDIMM_ACPI_IO_MEMORY,
+} AcpiNVDIMMIOType;
+
+typedef struct AcpiNVDIMMIOEntry {
+    AcpiNVDIMMIOType type;
+    hwaddr base;
+    hwaddr len;
+} AcpiNVDIMMIOEntry;
+
 struct AcpiNVDIMMState {
     /* detect if NVDIMM support is enabled. */
     bool is_enabled;
@@ -140,6 +151,7 @@ struct AcpiNVDIMMState {
      */
     int32_t persistence;
     char    *persistence_string;
+    AcpiNVDIMMIOEntry dsm_io;
 };
 typedef struct AcpiNVDIMMState AcpiNVDIMMState;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 13/15] hw/arm/virt: Add nvdimm hot-plug infrastructure
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (11 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 12/15] nvdimm: use configurable ACPI IO base and size Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 14/15] hw/arm/boot: Expose the pmem nodes in the DT Eric Auger
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

From: Kwangwoo Lee <kwangwoo.lee@sk.com>

Pre-plug and plug handlers are prepared for NVDIMM support.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
---
 default-configs/arm-softmmu.mak |  2 ++
 hw/arm/virt-acpi-build.c        |  6 ++++++
 hw/arm/virt.c                   | 23 +++++++++++++++++++++++
 include/hw/arm/virt.h           |  3 +++
 4 files changed, 34 insertions(+)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 28fe8f3..9f49a6a 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -153,4 +153,6 @@ CONFIG_STRONGARM=y
 CONFIG_HIGHBANK=y
 CONFIG_MUSICPAL=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
 
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 0915391..f18bb5c 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -804,6 +804,7 @@ static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+    MachineState *ms = MACHINE(vms);
     GArray *table_offsets;
     unsigned dsdt, xsdt;
     GArray *tables_blob = tables->table_data;
@@ -844,6 +845,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
         }
     }
 
+    if (vms->acpi_nvdimm_state.is_enabled) {
+        nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
+                          &vms->acpi_nvdimm_state, ms->ram_slots);
+    }
+
     if (its_class_name() && !vmc->no_its) {
         acpi_add_table(table_offsets, tables_blob);
         build_iort(tables_blob, tables->linker, vms);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7190962..51f42cd 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -135,6 +135,7 @@ static const MemMapEntry a15memmap[] = {
     [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
     [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
     [VIRT_SMMU] =               { 0x09050000, 0x00020000 },
+    [VIRT_ACPI_IO] =            { 0x09070000, 0x00010000 },
     [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
     /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
     [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
@@ -1607,6 +1608,18 @@ static void machvirt_init(MachineState *machine)
 
     create_platform_bus(vms, pic);
 
+    if (vms->acpi_nvdimm_state.is_enabled) {
+        AcpiNVDIMMState *acpi_nvdimm_state = &vms->acpi_nvdimm_state;
+
+        acpi_nvdimm_state->dsm_io.type = NVDIMM_ACPI_IO_MEMORY;
+        acpi_nvdimm_state->dsm_io.base =
+                vms->memmap[VIRT_ACPI_IO].base + NVDIMM_ACPI_IO_BASE;
+        acpi_nvdimm_state->dsm_io.len = NVDIMM_ACPI_IO_LEN;
+
+        nvdimm_init_acpi_state(acpi_nvdimm_state, sysmem,
+                               vms->fw_cfg, OBJECT(vms));
+    }
+
     vms->bootinfo.ram_size = machine->ram_size;
     vms->bootinfo.kernel_filename = machine->kernel_filename;
     vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
@@ -1801,9 +1814,11 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
 static void virt_memory_plug(HotplugHandler *hotplug_dev,
                              DeviceState *dev, Error **errp)
 {
+    VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
     PCDIMMDevice *dimm = PC_DIMM(dev);
     PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
     MemoryRegion *mr = ddc->get_memory_region(dimm, &error_abort);
+    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
     Error *local_err = NULL;
     uint64_t align;
 
@@ -1815,7 +1830,15 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev,
     }
 
     pc_dimm_plug(dev, MACHINE(hotplug_dev), align, &local_err);
+    if (local_err) {
+        goto out;
+    }
 
+    if (is_nvdimm) {
+        nvdimm_plug(&vms->acpi_nvdimm_state);
+    }
+
+out:
     error_propagate(errp, local_err);
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 173938d..2cabdbe 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -37,6 +37,7 @@
 #include "hw/arm/arm.h"
 #include "sysemu/kvm.h"
 #include "hw/intc/arm_gicv3_common.h"
+#include "hw/mem/nvdimm.h"
 
 #define NUM_GICV2M_SPIS       64
 #define NUM_VIRTIO_TRANSPORTS 32
@@ -79,6 +80,7 @@ enum {
     VIRT_SECURE_UART,
     VIRT_SECURE_MEM,
     VIRT_HOTPLUG_MEM,
+    VIRT_ACPI_IO,
 };
 
 typedef enum VirtIOMMUType {
@@ -128,6 +130,7 @@ typedef struct {
     char *kvm_type;
     int32_t max_vm_phys_shift;
     int32_t source_max_vm_phys_shift;
+    AcpiNVDIMMState acpi_nvdimm_state;
 } VirtMachineState;
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_PCIE_ECAM_HIGH : VIRT_PCIE_ECAM)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 14/15] hw/arm/boot: Expose the pmem nodes in the DT
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (12 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 13/15] hw/arm/virt: Add nvdimm hot-plug infrastructure Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 15/15] hw/arm/virt: Add nvdimm and nvdimm-persistence options Eric Auger
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

In case of NV-DIMM slots, let's add /pmem DT nodes.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/boot.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 2c7d558..3381c66 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -444,6 +444,36 @@ out:
     return ret;
 }
 
+static int fdt_add_pmem_node(void *fdt, uint32_t acells, hwaddr mem_base,
+                             uint32_t scells, hwaddr mem_len,
+                             int numa_node_id)
+{
+    char *nodename = NULL;
+    int ret;
+
+    nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base);
+    qemu_fdt_add_subnode(fdt, nodename);
+    qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region");
+    ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
+                                       scells, mem_len);
+    if (ret < 0) {
+        fprintf(stderr, "couldn't set %s/reg\n", nodename);
+        goto out;
+    }
+    if (numa_node_id < 0) {
+        goto out;
+    }
+
+    ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", numa_node_id);
+    if (ret < 0) {
+        fprintf(stderr, "couldn't set %s/numa-node-id\n", nodename);
+    }
+
+out:
+    g_free(nodename);
+    return ret;
+}
+
 static void fdt_add_psci_node(void *fdt)
 {
     uint32_t cpu_suspend_fn;
@@ -532,7 +562,8 @@ static int fdt_add_hotpluggable_memory_nodes(void *fdt,
         di = !is_nvdimm ? mi->u.dimm.data : mi->u.nvdimm.data;
 
         if (is_nvdimm) {
-            ret = -ENOENT; /* NV-DIMM not yet supported */
+            ret = fdt_add_pmem_node(fdt, acells, di->addr,
+                                    scells, di->size, di->node);
         } else {
             ret = fdt_add_memory_node(fdt, acells, di->addr,
                                       scells, di->size, di->node);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 15/15] hw/arm/virt: Add nvdimm and nvdimm-persistence options
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (13 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 14/15] hw/arm/boot: Expose the pmem nodes in the DT Eric Auger
@ 2018-07-03  7:19 ` Eric Auger
  2018-07-18 14:08 ` [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Igor Mammedov
  2018-10-03 13:49 ` Auger Eric
  16 siblings, 0 replies; 62+ messages in thread
From: Eric Auger @ 2018-07-03  7:19 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei

Machine option nvdimm allows to turn NVDIMM support on.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 51f42cd..13e6dec 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1762,6 +1762,47 @@ static void virt_set_kvm_type(Object *obj, const char *value, Error **errp)
     vms->kvm_type = g_strdup(value);
 }
 
+static bool virt_get_nvdimm(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->acpi_nvdimm_state.is_enabled;
+}
+
+static void virt_set_nvdimm(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->acpi_nvdimm_state.is_enabled = value;
+}
+
+static char *virt_get_nvdimm_persistence(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return g_strdup(vms->acpi_nvdimm_state.persistence_string);
+}
+
+static void virt_set_nvdimm_persistence(Object *obj, const char *value,
+                                        Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+    AcpiNVDIMMState *nvdimm_state = &vms->acpi_nvdimm_state;
+
+    if (strcmp(value, "cpu") == 0)
+        nvdimm_state->persistence = 3;
+    else if (strcmp(value, "mem-ctrl") == 0)
+        nvdimm_state->persistence = 2;
+    else {
+        error_report("-machine nvdimm-persistence=%s: unsupported option",
+                     value);
+        exit(EXIT_FAILURE);
+    }
+
+    g_free(nvdimm_state->persistence_string);
+    nvdimm_state->persistence_string = g_strdup(value);
+}
+
 static CpuInstanceProperties
 virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 {
@@ -1804,10 +1845,10 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                                  Error **errp)
 {
     const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
+    VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
 
-    if (is_nvdimm) {
-        error_setg(errp, "nvdimm is not yet supported");
-        return;
+    if (is_nvdimm && !vms->acpi_nvdimm_state.is_enabled) {
+        error_setg(errp, "nvdimm is not enabled: missing 'nvdimm' in '-M'");
     }
 }
 
@@ -2044,6 +2085,19 @@ static void virt_3_0_instance_init(Object *obj)
     object_property_add_str(obj, "kvm-type",
                             virt_get_kvm_type, virt_set_kvm_type, NULL);
 
+    object_property_add_bool(obj, "nvdimm",
+                             virt_get_nvdimm, virt_set_nvdimm, NULL);
+    object_property_set_description(obj, "nvdimm",
+                                         "Set on/off to enable/disable NVDIMM "
+                                         "instantiation", NULL);
+
+    object_property_add_str(obj, "nvdimm-persistence",
+                            virt_get_nvdimm_persistence,
+                            virt_set_nvdimm_persistence, NULL);
+    object_property_set_description(obj, "nvdimm-persistence",
+                                    "Set NVDIMM persistence"
+                                    "Valid values are cpu and mem-ctrl", NULL);
+
     vms->memmap = a15memmap;
     vms->irqmap = a15irqmap;
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory Eric Auger
@ 2018-07-03 18:25   ` David Hildenbrand
  2018-07-03 19:27     ` Auger Eric
  2018-07-18 13:05   ` Igor Mammedov
  1 sibling, 1 reply; 62+ messages in thread
From: David Hildenbrand @ 2018-07-03 18:25 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo
  Cc: dgilbert, agraf, david, drjones, wei

On 03.07.2018 09:19, Eric Auger wrote:
> We define a new hotpluggable RAM region (aka. device memory).
> Its base is 2TB GPA. This obviously requires 42b IPA support
> in KVM/ARM, FW and guest kernel. At the moment the device
> memory region is max 2TB.

Maybe a stupid question, but why exactly does it have to start at 2TB
(and not e.g. at 1TB)?

> 
> This is largely inspired of device memory initialization in
> pc machine code.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> ---
>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>  include/hw/arm/arm.h  |   2 +
>  include/hw/arm/virt.h |   1 +
>  3 files changed, 79 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 5a4d0bf..6fefb78 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -59,6 +59,7 @@
>  #include "qapi/visitor.h"
>  #include "standard-headers/linux/input.h"
>  #include "hw/arm/smmuv3.h"
> +#include "hw/acpi/acpi.h"
>  
>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> @@ -94,34 +95,25 @@
>  
>  #define PLATFORM_BUS_NUM_IRQS 64
>  
> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
> - * address space unallocated and free for future use between 256G and 512G.
> - * If we need to provide more RAM to VMs in the future then we need to:
> - *  * allocate a second bank of RAM starting at 2TB and working up
> - *  * fix the DT and ACPI table generation code in QEMU to correctly
> - *    report two split lumps of RAM to the guest
> - *  * fix KVM in the host kernel to allow guests with >40 bit address spaces
> - * (We don't want to fill all the way up to 512GB with RAM because
> - * we might want it for non-RAM purposes later. Conversely it seems
> - * reasonable to assume that anybody configuring a VM with a quarter
> - * of a terabyte of RAM will be doing it on a host with more than a
> - * terabyte of physical address space.)
> - */
> -#define RAMLIMIT_GB 255
> -#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
> +#define SZ_64K 0x10000
> +#define SZ_1G (1024ULL * 1024 * 1024)
>  
>  /* Addresses and sizes of our components.
> - * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
> - * 128MB..256MB is used for miscellaneous device I/O.
> - * 256MB..1GB is reserved for possible future PCI support (ie where the
> - * PCI memory window will go if we add a PCI host controller).
> - * 1GB and up is RAM (which may happily spill over into the
> - * high memory region beyond 4GB).
> - * This represents a compromise between how much RAM can be given to
> - * a 32 bit VM and leaving space for expansion and in particular for PCI.
> - * Note that devices should generally be placed at multiples of 0x10000,
> + * 0..128MB is space for a flash device so we can run bootrom code such as UEFI,
> + * 128MB..256MB is used for miscellaneous device I/O,
> + * 256MB..1GB is used for PCI host controller,
> + * 1GB..256GB is RAM (not hotpluggable),
> + * 256GB..512GB: is left for device I/O (non RAM purpose),
> + * 512GB..1TB: high mem PCI MMIO region,
> + * 2TB..4TB is used for hot-pluggable DIMM (assumes 42b GPA is supported).
> + *
> + * Note that IO devices should generally be placed at multiples of 0x10000,
>   * to accommodate guests using 64K pages.
> + *
> + * Conversely it seems reasonable to assume that anybody configuring a VM
> + * with a quarter of a terabyte of RAM will be doing it on a host with more
> + * than a terabyte of physical address space.)
> + *
>   */
>  static const MemMapEntry a15memmap[] = {
>      /* Space up to 0x8000000 is reserved for a boot ROM */
> @@ -148,12 +140,13 @@ static const MemMapEntry a15memmap[] = {
>      [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
> -    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
> +    [VIRT_MEM] =                { SZ_1G , 255 * SZ_1G },
>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
>      [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
>      [VIRT_PCIE_ECAM_HIGH] =     { 0x4010000000ULL, 0x10000000 },
>      /* Second PCIe window, 512GB wide at the 512GB boundary */
> -    [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
> +    [VIRT_PCIE_MMIO_HIGH] =     { 512 * SZ_1G, 512 * SZ_1G },
> +    [VIRT_HOTPLUG_MEM] =        { 2048 * SZ_1G, 2048 * SZ_1G },
>  };
>  
>  static const int a15irqmap[] = {
> @@ -1223,6 +1216,58 @@ static void create_secure_ram(VirtMachineState *vms,
>      g_free(nodename);
>  }
>  
> +static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
> +{
> +    MachineState *ms = MACHINE(vms);
> +    uint64_t device_memory_size;
> +    uint64_t align = SZ_64K;
> +
> +    /* always allocate the device memory information */
> +    ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
> +
> +    if (vms->max_vm_phys_shift < 42) {
> +        /* device memory starts at 2TB whereas this VM supports less than
> +         * 2TB GPA */
> +        if (ms->maxram_size > ms->ram_size || ms->ram_slots) {
> +            MachineClass *mc = MACHINE_GET_CLASS(ms);
> +
> +            error_report("\"-memory 'slots|maxmem'\" is not supported by %s "
> +                         "since KVM does not support more than 41b IPA",
> +                         mc->name);
> +            exit(EXIT_FAILURE);
> +        }
> +        return;
> +    }
> +
> +    if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
> +        error_report("unsupported number of memory slots: %"PRIu64,
> +                     ms->ram_slots);
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    if (QEMU_ALIGN_UP(ms->maxram_size, align) != ms->maxram_size) {
> +        error_report("maximum memory size must be aligned to multiple of 0x%"
> +                     PRIx64, align);
> +            exit(EXIT_FAILURE);
> +    }
> +
> +    ms->device_memory->base = vms->memmap[VIRT_HOTPLUG_MEM].base;
> +    device_memory_size = ms->maxram_size - ms->ram_size;
> +
> +    if (device_memory_size > vms->memmap[VIRT_HOTPLUG_MEM].size) {
> +        error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
> +                         ms->maxram_size);
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    memory_region_init(&ms->device_memory->mr, OBJECT(vms),
> +                       "device-memory", device_memory_size);
> +    memory_region_add_subregion(sysmem, ms->device_memory->base,
> +                                &ms->device_memory->mr);
> +    vms->bootinfo.device_memory_start = ms->device_memory->base;
> +    vms->bootinfo.device_memory_size = device_memory_size;
> +}
> +
>  static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
>  {
>      const VirtMachineState *board = container_of(binfo, VirtMachineState,
> @@ -1430,7 +1475,8 @@ static void machvirt_init(MachineState *machine)
>      vms->smp_cpus = smp_cpus;
>  
>      if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
> -        error_report("mach-virt: cannot model more than %dGB RAM", RAMLIMIT_GB);
> +        error_report("mach-virt: cannot model more than %dGB RAM",
> +                     (int)(vms->memmap[VIRT_MEM].size / SZ_1G));
>          exit(1);
>      }
>  
> @@ -1525,6 +1571,8 @@ static void machvirt_init(MachineState *machine)
>                                           machine->ram_size);
>      memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
>  
> +    create_device_memory(vms, sysmem);
> +
>      create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
>  
>      create_gic(vms, pic);
> diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
> index ffed392..76269e6 100644
> --- a/include/hw/arm/arm.h
> +++ b/include/hw/arm/arm.h
> @@ -116,6 +116,8 @@ struct arm_boot_info {
>      bool secure_board_setup;
>  
>      arm_endianness endianness;
> +    hwaddr device_memory_start;
> +    hwaddr device_memory_size;
>  };
>  
>  /**
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 91f6de2..173938d 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -78,6 +78,7 @@ enum {
>      VIRT_GPIO,
>      VIRT_SECURE_UART,
>      VIRT_SECURE_MEM,
> +    VIRT_HOTPLUG_MEM,
>  };
>  
>  typedef enum VirtIOMMUType {
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework Eric Auger
@ 2018-07-03 18:28   ` David Hildenbrand
  2018-07-03 19:28     ` Auger Eric
  2018-07-03 18:44   ` David Hildenbrand
  1 sibling, 1 reply; 62+ messages in thread
From: David Hildenbrand @ 2018-07-03 18:28 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo
  Cc: dgilbert, agraf, david, drjones, wei

On 03.07.2018 09:19, Eric Auger wrote:
> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> 
> This patch adds the the memory hot-plug/hot-unplug infrastructure
> in machvirt.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> 
> ---
> 
> v1 -> v2:
> - s/virt_dimm_plug|unplug/virt_memory_plug|unplug
> - s/pc_dimm_memory_plug/pc_dimm_plug
> - reworded title and commit message
> - added pre_plug cb
> - don't handle get_memory_region failure anymore
> ---
>  default-configs/arm-softmmu.mak |  2 ++
>  hw/arm/virt.c                   | 68 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 834d45c..28fe8f3 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -152,3 +152,5 @@ CONFIG_PCI_DESIGNWARE=y
>  CONFIG_STRONGARM=y
>  CONFIG_HIGHBANK=y
>  CONFIG_MUSICPAL=y
> +CONFIG_MEM_HOTPLUG=y
> +
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 6fefb78..7190962 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -60,6 +60,8 @@
>  #include "standard-headers/linux/input.h"
>  #include "hw/arm/smmuv3.h"
>  #include "hw/acpi/acpi.h"
> +#include "hw/mem/pc-dimm.h"
> +#include "hw/mem/nvdimm.h"
>  
>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> @@ -1785,6 +1787,53 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>      return ms->possible_cpus;
>  }
>  
> +static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                                 Error **errp)
> +{
> +    const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> +
> +    if (is_nvdimm) {
> +        error_setg(errp, "nvdimm is not yet supported");
> +        return;
> +    }
> +}
> +
> +static void virt_memory_plug(HotplugHandler *hotplug_dev,
> +                             DeviceState *dev, Error **errp)
> +{
> +    PCDIMMDevice *dimm = PC_DIMM(dev);
> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +    MemoryRegion *mr = ddc->get_memory_region(dimm, &error_abort);
> +    Error *local_err = NULL;
> +    uint64_t align;
> +
> +    if (memory_region_get_alignment(mr)) {
> +        align = memory_region_get_alignment(mr);
> +    } else {
> +        /* by default we align on 64KB page size */
> +        align = SZ_64K;
> +    }

After my latest re-factoring is applied

1. memory_region_get_alignment(mr) will never be 0
2. alignment detection will be handled internally

So once you rebase to that version, just pass NULL for "*legacy_align"


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 05/15] hw/arm/virt: handle max_vm_phys_shift conflicts on migration
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 05/15] hw/arm/virt: handle max_vm_phys_shift conflicts on migration Eric Auger
@ 2018-07-03 18:41   ` David Hildenbrand
  2018-07-03 19:32     ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: David Hildenbrand @ 2018-07-03 18:41 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo
  Cc: dgilbert, agraf, david, drjones, wei

On 03.07.2018 09:19, Eric Auger wrote:
> When migrating a VM, we must make sure the destination host
> supports as many IPA bits as the source. Otherwise the migration
> must fail.
> 
> We add a VMState infrastructure to machvirt. On pre_save(),
> the current source max_vm_phys_shift is saved.
> 
> On destination, we cannot use this information when creating the
> VM. The VM is created using the max value reported by the
> destination host - or the kvm_type inherited value -. However on
> post_load() we can check that this value is compatible with the
> source saved value.

Just wondering, how exactly is the guest able to detect the 42b (e.g. vs
42b) configuration?

> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  hw/arm/virt.c         | 37 +++++++++++++++++++++++++++++++++++++
>  include/hw/arm/virt.h |  2 ++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 04a32de..5a4d0bf 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1316,6 +1316,40 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>      return arm_cpu_mp_affinity(idx, clustersz);
>  }
>  
> +static int virt_post_load(void *opaque, int version_id)
> +{
> +    VirtMachineState *vms = (VirtMachineState *)opaque;
> +
> +    if (vms->max_vm_phys_shift < vms->source_max_vm_phys_shift) {
> +        error_report("This host kernel only supports %d IPA bits whereas "
> +                     "the guest requires %d GPA bits", vms->max_vm_phys_shift,
> +                     vms->source_max_vm_phys_shift);
> +        return -1;
> +    }
> +    return 0;
> +}
> +
> +static int virt_pre_save(void *opaque)
> +{
> +    VirtMachineState *vms = (VirtMachineState *)opaque;
> +
> +    vms->source_max_vm_phys_shift = vms->max_vm_phys_shift;
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_virt = {
> +    .name = "virt",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .post_load = virt_post_load,
> +    .pre_save = virt_pre_save,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_INT32(source_max_vm_phys_shift, VirtMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +
>  static void machvirt_init(MachineState *machine)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(machine);
> @@ -1537,6 +1571,7 @@ static void machvirt_init(MachineState *machine)
>  
>      vms->machine_done.notify = virt_machine_done;
>      qemu_add_machine_init_done_notifier(&vms->machine_done);
> +    vmstate_register(NULL, 0, &vmstate_virt, vms);
>  }
>  
>  static bool virt_get_secure(Object *obj, Error **errp)
> @@ -1727,6 +1762,7 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
>  
>  static int virt_kvm_type(MachineState *ms, const char *type_str)
>  {
> +    VirtMachineState *vms = VIRT_MACHINE(ms);
>      int max_vm_phys_shift, ret = 0;
>      uint64_t type;
>  
> @@ -1747,6 +1783,7 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
>      }
>      ret = max_vm_phys_shift;
>  out:
> +    vms->max_vm_phys_shift = (max_vm_phys_shift > 0) ? ret : 40;
>      return ret;
>  }
>  
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 1a90ffc..91f6de2 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -125,6 +125,8 @@ typedef struct {
>      uint32_t iommu_phandle;
>      int psci_conduit;
>      char *kvm_type;
> +    int32_t max_vm_phys_shift;
> +    int32_t source_max_vm_phys_shift;
>  } VirtMachineState;
>  
>  #define VIRT_ECAM_ID(high) (high ? VIRT_PCIE_ECAM_HIGH : VIRT_PCIE_ECAM)
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework Eric Auger
  2018-07-03 18:28   ` David Hildenbrand
@ 2018-07-03 18:44   ` David Hildenbrand
  2018-07-03 19:34     ` Auger Eric
  1 sibling, 1 reply; 62+ messages in thread
From: David Hildenbrand @ 2018-07-03 18:44 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo
  Cc: dgilbert, agraf, david, drjones, wei

On 03.07.2018 09:19, Eric Auger wrote:
> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> 
> This patch adds the the memory hot-plug/hot-unplug infrastructure
> in machvirt.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> 
> ---
> 
> v1 -> v2:
> - s/virt_dimm_plug|unplug/virt_memory_plug|unplug
> - s/pc_dimm_memory_plug/pc_dimm_plug
> - reworded title and commit message
> - added pre_plug cb
> - don't handle get_memory_region failure anymore
> ---
>  default-configs/arm-softmmu.mak |  2 ++
>  hw/arm/virt.c                   | 68 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 834d45c..28fe8f3 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -152,3 +152,5 @@ CONFIG_PCI_DESIGNWARE=y
>  CONFIG_STRONGARM=y
>  CONFIG_HIGHBANK=y
>  CONFIG_MUSICPAL=y
> +CONFIG_MEM_HOTPLUG=y
> +
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 6fefb78..7190962 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -60,6 +60,8 @@
>  #include "standard-headers/linux/input.h"
>  #include "hw/arm/smmuv3.h"
>  #include "hw/acpi/acpi.h"
> +#include "hw/mem/pc-dimm.h"
> +#include "hw/mem/nvdimm.h"
>  
>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> @@ -1785,6 +1787,53 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>      return ms->possible_cpus;
>  }
>  
> +static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                                 Error **errp)
> +{
> +    const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> +
> +    if (is_nvdimm) {
> +        error_setg(errp, "nvdimm is not yet supported");
> +        return;
> +    }

You mention that actual hotplug is not supported, only coldplug.
Wouldn't this be the right place to check for that? (only skimmed over
your patches, how do you handle that?)


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-03 18:25   ` David Hildenbrand
@ 2018-07-03 19:27     ` Auger Eric
  2018-07-04 12:05       ` David Hildenbrand
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-07-03 19:27 UTC (permalink / raw)
  To: David Hildenbrand, eric.auger.pro, qemu-devel, qemu-arm,
	peter.maydell, shameerali.kolothum.thodi, imammedo
  Cc: dgilbert, agraf, david, drjones, wei

Hi David,
On 07/03/2018 08:25 PM, David Hildenbrand wrote:
> On 03.07.2018 09:19, Eric Auger wrote:
>> We define a new hotpluggable RAM region (aka. device memory).
>> Its base is 2TB GPA. This obviously requires 42b IPA support
>> in KVM/ARM, FW and guest kernel. At the moment the device
>> memory region is max 2TB.
> 
> Maybe a stupid question, but why exactly does it have to start at 2TB
> (and not e.g. at 1TB)?
not a stupid question. See tentative answer below.
> 
>>
>> This is largely inspired of device memory initialization in
>> pc machine code.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>> ---
>>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>>  include/hw/arm/arm.h  |   2 +
>>  include/hw/arm/virt.h |   1 +
>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 5a4d0bf..6fefb78 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -59,6 +59,7 @@
>>  #include "qapi/visitor.h"
>>  #include "standard-headers/linux/input.h"
>>  #include "hw/arm/smmuv3.h"
>> +#include "hw/acpi/acpi.h"
>>  
>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>> @@ -94,34 +95,25 @@
>>  
>>  #define PLATFORM_BUS_NUM_IRQS 64
>>  
>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>> - * address space unallocated and free for future use between 256G and 512G.
>> - * If we need to provide more RAM to VMs in the future then we need to:
>> - *  * allocate a second bank of RAM starting at 2TB and working up
I acknowledge this comment was the main justification. Now if you look at

Principles of ARM Memory Maps
http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_principles_of_arm_memory_maps.pdf
chapter 2.3 you will find that when adding PA bits, you always leave
space for reserved space and mapped IO.

On the other hand, if you look at chapter 5, "Proposed 44-bit and 48bit
Address Maps", we should logically put the additional RAM at 8TB if we
want to comply with that doc.

Peter, was there any other justification why we should put the RAM at 2TB?

Thanks

Eric


>> - *  * fix the DT and ACPI table generation code in QEMU to correctly
>> - *    report two split lumps of RAM to the guest
>> - *  * fix KVM in the host kernel to allow guests with >40 bit address spaces
>> - * (We don't want to fill all the way up to 512GB with RAM because
>> - * we might want it for non-RAM purposes later. Conversely it seems
>> - * reasonable to assume that anybody configuring a VM with a quarter
>> - * of a terabyte of RAM will be doing it on a host with more than a
>> - * terabyte of physical address space.)
>> - */
>> -#define RAMLIMIT_GB 255
>> -#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
>> +#define SZ_64K 0x10000
>> +#define SZ_1G (1024ULL * 1024 * 1024)
>>  
>>  /* Addresses and sizes of our components.
>> - * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
>> - * 128MB..256MB is used for miscellaneous device I/O.
>> - * 256MB..1GB is reserved for possible future PCI support (ie where the
>> - * PCI memory window will go if we add a PCI host controller).
>> - * 1GB and up is RAM (which may happily spill over into the
>> - * high memory region beyond 4GB).
>> - * This represents a compromise between how much RAM can be given to
>> - * a 32 bit VM and leaving space for expansion and in particular for PCI.
>> - * Note that devices should generally be placed at multiples of 0x10000,
>> + * 0..128MB is space for a flash device so we can run bootrom code such as UEFI,
>> + * 128MB..256MB is used for miscellaneous device I/O,
>> + * 256MB..1GB is used for PCI host controller,
>> + * 1GB..256GB is RAM (not hotpluggable),
>> + * 256GB..512GB: is left for device I/O (non RAM purpose),
>> + * 512GB..1TB: high mem PCI MMIO region,
>> + * 2TB..4TB is used for hot-pluggable DIMM (assumes 42b GPA is supported).
>> + *
>> + * Note that IO devices should generally be placed at multiples of 0x10000,
>>   * to accommodate guests using 64K pages.
>> + *
>> + * Conversely it seems reasonable to assume that anybody configuring a VM
>> + * with a quarter of a terabyte of RAM will be doing it on a host with more
>> + * than a terabyte of physical address space.)
>> + *
>>   */
>>  static const MemMapEntry a15memmap[] = {
>>      /* Space up to 0x8000000 is reserved for a boot ROM */
>> @@ -148,12 +140,13 @@ static const MemMapEntry a15memmap[] = {
>>      [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
>>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
>> -    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
>> +    [VIRT_MEM] =                { SZ_1G , 255 * SZ_1G },
>>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
>>      [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
>>      [VIRT_PCIE_ECAM_HIGH] =     { 0x4010000000ULL, 0x10000000 },
>>      /* Second PCIe window, 512GB wide at the 512GB boundary */
>> -    [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
>> +    [VIRT_PCIE_MMIO_HIGH] =     { 512 * SZ_1G, 512 * SZ_1G },
>> +    [VIRT_HOTPLUG_MEM] =        { 2048 * SZ_1G, 2048 * SZ_1G },
>>  };
>>  
>>  static const int a15irqmap[] = {
>> @@ -1223,6 +1216,58 @@ static void create_secure_ram(VirtMachineState *vms,
>>      g_free(nodename);
>>  }
>>  
>> +static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
>> +{
>> +    MachineState *ms = MACHINE(vms);
>> +    uint64_t device_memory_size;
>> +    uint64_t align = SZ_64K;
>> +
>> +    /* always allocate the device memory information */
>> +    ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
>> +
>> +    if (vms->max_vm_phys_shift < 42) {
>> +        /* device memory starts at 2TB whereas this VM supports less than
>> +         * 2TB GPA */
>> +        if (ms->maxram_size > ms->ram_size || ms->ram_slots) {
>> +            MachineClass *mc = MACHINE_GET_CLASS(ms);
>> +
>> +            error_report("\"-memory 'slots|maxmem'\" is not supported by %s "
>> +                         "since KVM does not support more than 41b IPA",
>> +                         mc->name);
>> +            exit(EXIT_FAILURE);
>> +        }
>> +        return;
>> +    }
>> +
>> +    if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
>> +        error_report("unsupported number of memory slots: %"PRIu64,
>> +                     ms->ram_slots);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (QEMU_ALIGN_UP(ms->maxram_size, align) != ms->maxram_size) {
>> +        error_report("maximum memory size must be aligned to multiple of 0x%"
>> +                     PRIx64, align);
>> +            exit(EXIT_FAILURE);
>> +    }
>> +
>> +    ms->device_memory->base = vms->memmap[VIRT_HOTPLUG_MEM].base;
>> +    device_memory_size = ms->maxram_size - ms->ram_size;
>> +
>> +    if (device_memory_size > vms->memmap[VIRT_HOTPLUG_MEM].size) {
>> +        error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
>> +                         ms->maxram_size);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    memory_region_init(&ms->device_memory->mr, OBJECT(vms),
>> +                       "device-memory", device_memory_size);
>> +    memory_region_add_subregion(sysmem, ms->device_memory->base,
>> +                                &ms->device_memory->mr);
>> +    vms->bootinfo.device_memory_start = ms->device_memory->base;
>> +    vms->bootinfo.device_memory_size = device_memory_size;
>> +}
>> +
>>  static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
>>  {
>>      const VirtMachineState *board = container_of(binfo, VirtMachineState,
>> @@ -1430,7 +1475,8 @@ static void machvirt_init(MachineState *machine)
>>      vms->smp_cpus = smp_cpus;
>>  
>>      if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
>> -        error_report("mach-virt: cannot model more than %dGB RAM", RAMLIMIT_GB);
>> +        error_report("mach-virt: cannot model more than %dGB RAM",
>> +                     (int)(vms->memmap[VIRT_MEM].size / SZ_1G));
>>          exit(1);
>>      }
>>  
>> @@ -1525,6 +1571,8 @@ static void machvirt_init(MachineState *machine)
>>                                           machine->ram_size);
>>      memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
>>  
>> +    create_device_memory(vms, sysmem);
>> +
>>      create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
>>  
>>      create_gic(vms, pic);
>> diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
>> index ffed392..76269e6 100644
>> --- a/include/hw/arm/arm.h
>> +++ b/include/hw/arm/arm.h
>> @@ -116,6 +116,8 @@ struct arm_boot_info {
>>      bool secure_board_setup;
>>  
>>      arm_endianness endianness;
>> +    hwaddr device_memory_start;
>> +    hwaddr device_memory_size;
>>  };
>>  
>>  /**
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index 91f6de2..173938d 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -78,6 +78,7 @@ enum {
>>      VIRT_GPIO,
>>      VIRT_SECURE_UART,
>>      VIRT_SECURE_MEM,
>> +    VIRT_HOTPLUG_MEM,
>>  };
>>  
>>  typedef enum VirtIOMMUType {
>>
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework
  2018-07-03 18:28   ` David Hildenbrand
@ 2018-07-03 19:28     ` Auger Eric
  0 siblings, 0 replies; 62+ messages in thread
From: Auger Eric @ 2018-07-03 19:28 UTC (permalink / raw)
  To: David Hildenbrand, eric.auger.pro, qemu-devel, qemu-arm,
	peter.maydell, shameerali.kolothum.thodi, imammedo
  Cc: dgilbert, agraf, david, drjones, wei

Hi David,

On 07/03/2018 08:28 PM, David Hildenbrand wrote:
> On 03.07.2018 09:19, Eric Auger wrote:
>> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>>
>> This patch adds the the memory hot-plug/hot-unplug infrastructure
>> in machvirt.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>
>> ---
>>
>> v1 -> v2:
>> - s/virt_dimm_plug|unplug/virt_memory_plug|unplug
>> - s/pc_dimm_memory_plug/pc_dimm_plug
>> - reworded title and commit message
>> - added pre_plug cb
>> - don't handle get_memory_region failure anymore
>> ---
>>  default-configs/arm-softmmu.mak |  2 ++
>>  hw/arm/virt.c                   | 68 ++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 69 insertions(+), 1 deletion(-)
>>
>> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
>> index 834d45c..28fe8f3 100644
>> --- a/default-configs/arm-softmmu.mak
>> +++ b/default-configs/arm-softmmu.mak
>> @@ -152,3 +152,5 @@ CONFIG_PCI_DESIGNWARE=y
>>  CONFIG_STRONGARM=y
>>  CONFIG_HIGHBANK=y
>>  CONFIG_MUSICPAL=y
>> +CONFIG_MEM_HOTPLUG=y
>> +
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 6fefb78..7190962 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -60,6 +60,8 @@
>>  #include "standard-headers/linux/input.h"
>>  #include "hw/arm/smmuv3.h"
>>  #include "hw/acpi/acpi.h"
>> +#include "hw/mem/pc-dimm.h"
>> +#include "hw/mem/nvdimm.h"
>>  
>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>> @@ -1785,6 +1787,53 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>>      return ms->possible_cpus;
>>  }
>>  
>> +static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>> +                                 Error **errp)
>> +{
>> +    const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>> +
>> +    if (is_nvdimm) {
>> +        error_setg(errp, "nvdimm is not yet supported");
>> +        return;
>> +    }
>> +}
>> +
>> +static void virt_memory_plug(HotplugHandler *hotplug_dev,
>> +                             DeviceState *dev, Error **errp)
>> +{
>> +    PCDIMMDevice *dimm = PC_DIMM(dev);
>> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>> +    MemoryRegion *mr = ddc->get_memory_region(dimm, &error_abort);
>> +    Error *local_err = NULL;
>> +    uint64_t align;
>> +
>> +    if (memory_region_get_alignment(mr)) {
>> +        align = memory_region_get_alignment(mr);
>> +    } else {
>> +        /* by default we align on 64KB page size */
>> +        align = SZ_64K;
>> +    }
> 
> After my latest re-factoring is applied
> 
> 1. memory_region_get_alignment(mr) will never be 0
> 2. alignment detection will be handled internally
> 
> So once you rebase to that version, just pass NULL for "*legacy_align"

Agreed. Thanks

Eric
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 05/15] hw/arm/virt: handle max_vm_phys_shift conflicts on migration
  2018-07-03 18:41   ` David Hildenbrand
@ 2018-07-03 19:32     ` Auger Eric
  2018-07-04 11:53       ` David Hildenbrand
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-07-03 19:32 UTC (permalink / raw)
  To: David Hildenbrand, eric.auger.pro, qemu-devel, qemu-arm,
	peter.maydell, shameerali.kolothum.thodi, imammedo
  Cc: wei, drjones, david, dgilbert, agraf

Hi David,
On 07/03/2018 08:41 PM, David Hildenbrand wrote:
> On 03.07.2018 09:19, Eric Auger wrote:
>> When migrating a VM, we must make sure the destination host
>> supports as many IPA bits as the source. Otherwise the migration
>> must fail.
>>
>> We add a VMState infrastructure to machvirt. On pre_save(),
>> the current source max_vm_phys_shift is saved.
>>
>> On destination, we cannot use this information when creating the
>> VM. The VM is created using the max value reported by the
>> destination host - or the kvm_type inherited value -. However on
>> post_load() we can check that this value is compatible with the
>> source saved value.
> 
> Just wondering, how exactly is the guest able to detect the 42b (e.g. vs
> 42b) configuration?

the source IPA size is saved in the VMState. When restoring it on
post_load we check against the current IPA size (corresponding to the
max the destination KVM does support). The destination IPA size is
chosen before creating the destination VM. If the destination IPA size
is less than the source IPA size, we fail the migration.

Hope this helps

Thanks

Eric

> 
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  hw/arm/virt.c         | 37 +++++++++++++++++++++++++++++++++++++
>>  include/hw/arm/virt.h |  2 ++
>>  2 files changed, 39 insertions(+)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 04a32de..5a4d0bf 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -1316,6 +1316,40 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>>      return arm_cpu_mp_affinity(idx, clustersz);
>>  }
>>  
>> +static int virt_post_load(void *opaque, int version_id)
>> +{
>> +    VirtMachineState *vms = (VirtMachineState *)opaque;
>> +
>> +    if (vms->max_vm_phys_shift < vms->source_max_vm_phys_shift) {
>> +        error_report("This host kernel only supports %d IPA bits whereas "
>> +                     "the guest requires %d GPA bits", vms->max_vm_phys_shift,
>> +                     vms->source_max_vm_phys_shift);
>> +        return -1;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static int virt_pre_save(void *opaque)
>> +{
>> +    VirtMachineState *vms = (VirtMachineState *)opaque;
>> +
>> +    vms->source_max_vm_phys_shift = vms->max_vm_phys_shift;
>> +    return 0;
>> +}
>> +
>> +static const VMStateDescription vmstate_virt = {
>> +    .name = "virt",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .post_load = virt_post_load,
>> +    .pre_save = virt_pre_save,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_INT32(source_max_vm_phys_shift, VirtMachineState),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +
>>  static void machvirt_init(MachineState *machine)
>>  {
>>      VirtMachineState *vms = VIRT_MACHINE(machine);
>> @@ -1537,6 +1571,7 @@ static void machvirt_init(MachineState *machine)
>>  
>>      vms->machine_done.notify = virt_machine_done;
>>      qemu_add_machine_init_done_notifier(&vms->machine_done);
>> +    vmstate_register(NULL, 0, &vmstate_virt, vms);
>>  }
>>  
>>  static bool virt_get_secure(Object *obj, Error **errp)
>> @@ -1727,6 +1762,7 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
>>  
>>  static int virt_kvm_type(MachineState *ms, const char *type_str)
>>  {
>> +    VirtMachineState *vms = VIRT_MACHINE(ms);
>>      int max_vm_phys_shift, ret = 0;
>>      uint64_t type;
>>  
>> @@ -1747,6 +1783,7 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
>>      }
>>      ret = max_vm_phys_shift;
>>  out:
>> +    vms->max_vm_phys_shift = (max_vm_phys_shift > 0) ? ret : 40;
>>      return ret;
>>  }
>>  
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index 1a90ffc..91f6de2 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -125,6 +125,8 @@ typedef struct {
>>      uint32_t iommu_phandle;
>>      int psci_conduit;
>>      char *kvm_type;
>> +    int32_t max_vm_phys_shift;
>> +    int32_t source_max_vm_phys_shift;
>>  } VirtMachineState;
>>  
>>  #define VIRT_ECAM_ID(high) (high ? VIRT_PCIE_ECAM_HIGH : VIRT_PCIE_ECAM)
>>
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework
  2018-07-03 18:44   ` David Hildenbrand
@ 2018-07-03 19:34     ` Auger Eric
  2018-07-04 11:47       ` David Hildenbrand
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-07-03 19:34 UTC (permalink / raw)
  To: David Hildenbrand, eric.auger.pro, qemu-devel, qemu-arm,
	peter.maydell, shameerali.kolothum.thodi, imammedo
  Cc: dgilbert, agraf, david, drjones, wei

Hi David,

On 07/03/2018 08:44 PM, David Hildenbrand wrote:
> On 03.07.2018 09:19, Eric Auger wrote:
>> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>>
>> This patch adds the the memory hot-plug/hot-unplug infrastructure
>> in machvirt.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>
>> ---
>>
>> v1 -> v2:
>> - s/virt_dimm_plug|unplug/virt_memory_plug|unplug
>> - s/pc_dimm_memory_plug/pc_dimm_plug
>> - reworded title and commit message
>> - added pre_plug cb
>> - don't handle get_memory_region failure anymore
>> ---
>>  default-configs/arm-softmmu.mak |  2 ++
>>  hw/arm/virt.c                   | 68 ++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 69 insertions(+), 1 deletion(-)
>>
>> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
>> index 834d45c..28fe8f3 100644
>> --- a/default-configs/arm-softmmu.mak
>> +++ b/default-configs/arm-softmmu.mak
>> @@ -152,3 +152,5 @@ CONFIG_PCI_DESIGNWARE=y
>>  CONFIG_STRONGARM=y
>>  CONFIG_HIGHBANK=y
>>  CONFIG_MUSICPAL=y
>> +CONFIG_MEM_HOTPLUG=y
>> +
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 6fefb78..7190962 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -60,6 +60,8 @@
>>  #include "standard-headers/linux/input.h"
>>  #include "hw/arm/smmuv3.h"
>>  #include "hw/acpi/acpi.h"
>> +#include "hw/mem/pc-dimm.h"
>> +#include "hw/mem/nvdimm.h"
>>  
>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>> @@ -1785,6 +1787,53 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>>      return ms->possible_cpus;
>>  }
>>  
>> +static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>> +                                 Error **errp)
>> +{
>> +    const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>> +
>> +    if (is_nvdimm) {
>> +        error_setg(errp, "nvdimm is not yet supported");
>> +        return;
>> +    }
> 
> You mention that actual hotplug is not supported, only coldplug.
> Wouldn't this be the right place to check for that? (only skimmed over
> your patches, how do you handle that?)
At the moment I don't check it. I did not look yet at ways to
discriminate both cases.

Thanks

Eric
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework
  2018-07-03 19:34     ` Auger Eric
@ 2018-07-04 11:47       ` David Hildenbrand
  0 siblings, 0 replies; 62+ messages in thread
From: David Hildenbrand @ 2018-07-04 11:47 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo
  Cc: dgilbert, agraf, david, drjones, wei

On 03.07.2018 21:34, Auger Eric wrote:
> Hi David,
> 
> On 07/03/2018 08:44 PM, David Hildenbrand wrote:
>> On 03.07.2018 09:19, Eric Auger wrote:
>>> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>>>
>>> This patch adds the the memory hot-plug/hot-unplug infrastructure
>>> in machvirt.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>>
>>> ---
>>>
>>> v1 -> v2:
>>> - s/virt_dimm_plug|unplug/virt_memory_plug|unplug
>>> - s/pc_dimm_memory_plug/pc_dimm_plug
>>> - reworded title and commit message
>>> - added pre_plug cb
>>> - don't handle get_memory_region failure anymore
>>> ---
>>>  default-configs/arm-softmmu.mak |  2 ++
>>>  hw/arm/virt.c                   | 68 ++++++++++++++++++++++++++++++++++++++++-
>>>  2 files changed, 69 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
>>> index 834d45c..28fe8f3 100644
>>> --- a/default-configs/arm-softmmu.mak
>>> +++ b/default-configs/arm-softmmu.mak
>>> @@ -152,3 +152,5 @@ CONFIG_PCI_DESIGNWARE=y
>>>  CONFIG_STRONGARM=y
>>>  CONFIG_HIGHBANK=y
>>>  CONFIG_MUSICPAL=y
>>> +CONFIG_MEM_HOTPLUG=y
>>> +
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index 6fefb78..7190962 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -60,6 +60,8 @@
>>>  #include "standard-headers/linux/input.h"
>>>  #include "hw/arm/smmuv3.h"
>>>  #include "hw/acpi/acpi.h"
>>> +#include "hw/mem/pc-dimm.h"
>>> +#include "hw/mem/nvdimm.h"
>>>  
>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>>> @@ -1785,6 +1787,53 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>>>      return ms->possible_cpus;
>>>  }
>>>  
>>> +static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>> +                                 Error **errp)
>>> +{
>>> +    const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>> +
>>> +    if (is_nvdimm) {
>>> +        error_setg(errp, "nvdimm is not yet supported");
>>> +        return;
>>> +    }
>>
>> You mention that actual hotplug is not supported, only coldplug.
>> Wouldn't this be the right place to check for that? (only skimmed over
>> your patches, how do you handle that?)
> At the moment I don't check it. I did not look yet at ways to
> discriminate both cases.

Looking at dev->hotplugged should be enough I guess.

> 
> Thanks
> 
> Eric
>>
>>


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 05/15] hw/arm/virt: handle max_vm_phys_shift conflicts on migration
  2018-07-03 19:32     ` Auger Eric
@ 2018-07-04 11:53       ` David Hildenbrand
  2018-07-04 12:50         ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: David Hildenbrand @ 2018-07-04 11:53 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo
  Cc: wei, drjones, david, dgilbert, agraf

On 03.07.2018 21:32, Auger Eric wrote:
> Hi David,
> On 07/03/2018 08:41 PM, David Hildenbrand wrote:
>> On 03.07.2018 09:19, Eric Auger wrote:
>>> When migrating a VM, we must make sure the destination host
>>> supports as many IPA bits as the source. Otherwise the migration
>>> must fail.
>>>
>>> We add a VMState infrastructure to machvirt. On pre_save(),
>>> the current source max_vm_phys_shift is saved.
>>>
>>> On destination, we cannot use this information when creating the
>>> VM. The VM is created using the max value reported by the
>>> destination host - or the kvm_type inherited value -. However on
>>> post_load() we can check that this value is compatible with the
>>> source saved value.
>>
>> Just wondering, how exactly is the guest able to detect the 42b (e.g. vs
>> 42b) configuration?
> 
> the source IPA size is saved in the VMState. When restoring it on
> post_load we check against the current IPA size (corresponding to the
> max the destination KVM does support). The destination IPA size is
> chosen before creating the destination VM. If the destination IPA size
> is less than the source IPA size, we fail the migration.
> 
> Hope this helps

No, I asked if the *guest* is able to distinguish e.g. 43 from 44 or if
the device memory setup is sufficient.

Once you create the machine, you setup device memory (using the maxmem
parameter).

>From that, you directly know how big the largest guest physical address
will be (e.g. 2TB + (maxram_size - ram_size)). You can check that
against max_vm_phys_shift and error out.

During migration, source and destination have to have the same qemu
cmdline, especially same maxmem parameter. So you would catch an invalid
setup on the destination, without manually migrating and checking
max_vm_phys_shift.

However (that's why I am asking) if the guest can spot the difference
between e.g. 43 and 44, then you should migrate and check. If it is
implicitly handled by device memory position and size, you should not
migrate it.

> 
> Thanks
> 
> Eric
> 
>>


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-03 19:27     ` Auger Eric
@ 2018-07-04 12:05       ` David Hildenbrand
  2018-07-05 11:42         ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: David Hildenbrand @ 2018-07-04 12:05 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo
  Cc: dgilbert, agraf, david, drjones, wei

On 03.07.2018 21:27, Auger Eric wrote:
> Hi David,
> On 07/03/2018 08:25 PM, David Hildenbrand wrote:
>> On 03.07.2018 09:19, Eric Auger wrote:
>>> We define a new hotpluggable RAM region (aka. device memory).
>>> Its base is 2TB GPA. This obviously requires 42b IPA support
>>> in KVM/ARM, FW and guest kernel. At the moment the device
>>> memory region is max 2TB.
>>
>> Maybe a stupid question, but why exactly does it have to start at 2TB
>> (and not e.g. at 1TB)?
> not a stupid question. See tentative answer below.
>>
>>>
>>> This is largely inspired of device memory initialization in
>>> pc machine code.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>> ---
>>>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>>>  include/hw/arm/arm.h  |   2 +
>>>  include/hw/arm/virt.h |   1 +
>>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index 5a4d0bf..6fefb78 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -59,6 +59,7 @@
>>>  #include "qapi/visitor.h"
>>>  #include "standard-headers/linux/input.h"
>>>  #include "hw/arm/smmuv3.h"
>>> +#include "hw/acpi/acpi.h"
>>>  
>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>>> @@ -94,34 +95,25 @@
>>>  
>>>  #define PLATFORM_BUS_NUM_IRQS 64
>>>  
>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>>> - * address space unallocated and free for future use between 256G and 512G.
>>> - * If we need to provide more RAM to VMs in the future then we need to:
>>> - *  * allocate a second bank of RAM starting at 2TB and working up
> I acknowledge this comment was the main justification. Now if you look at
> 
> Principles of ARM Memory Maps
> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_principles_of_arm_memory_maps.pdf
> chapter 2.3 you will find that when adding PA bits, you always leave
> space for reserved space and mapped IO.

Thanks for the pointer!

So ... we can fit

a) 2GB at 2GB
b) 32GB at 32GB
c) 512GB at 512GB
d) 8TB at 8TB
e) 128TB at 128TB

(this is a nice rule of thumb if I understand it correctly :) )

We should strive for device memory (maxram_size - ram_size) to fit
exactly into one of these slots (otherwise things get nasty).

Depending on the ram_size, we might have simpler setups and can support
more configurations, no?

E.g. ram_size <= 34GB, device_memory <= 512GB
-> move ram into a) and b)
-> move device memory into c)

We should make up our mind right from the beginning how our setup will
look, so we can avoid (possibly complicated) compatibility handling
later on.

> 
> On the other hand, if you look at chapter 5, "Proposed 44-bit and 48bit
> Address Maps", we should logically put the additional RAM at 8TB if we
> want to comply with that doc.

I agree, 2TB es in the reserved area.

> 
> Peter, was there any other justification why we should put the RAM at 2TB?
> 
> Thanks
> 
> Eric


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 05/15] hw/arm/virt: handle max_vm_phys_shift conflicts on migration
  2018-07-04 11:53       ` David Hildenbrand
@ 2018-07-04 12:50         ` Auger Eric
  0 siblings, 0 replies; 62+ messages in thread
From: Auger Eric @ 2018-07-04 12:50 UTC (permalink / raw)
  To: David Hildenbrand, eric.auger.pro, qemu-devel, qemu-arm,
	peter.maydell, shameerali.kolothum.thodi, imammedo
  Cc: wei, agraf, drjones, dgilbert, david

Hi David,
On 07/04/2018 01:53 PM, David Hildenbrand wrote:
> On 03.07.2018 21:32, Auger Eric wrote:
>> Hi David,
>> On 07/03/2018 08:41 PM, David Hildenbrand wrote:
>>> On 03.07.2018 09:19, Eric Auger wrote:
>>>> When migrating a VM, we must make sure the destination host
>>>> supports as many IPA bits as the source. Otherwise the migration
>>>> must fail.
>>>>
>>>> We add a VMState infrastructure to machvirt. On pre_save(),
>>>> the current source max_vm_phys_shift is saved.
>>>>
>>>> On destination, we cannot use this information when creating the
>>>> VM. The VM is created using the max value reported by the
>>>> destination host - or the kvm_type inherited value -. However on
>>>> post_load() we can check that this value is compatible with the
>>>> source saved value.
>>>
>>> Just wondering, how exactly is the guest able to detect the 42b (e.g. vs
>>> 42b) configuration?
>>
>> the source IPA size is saved in the VMState. When restoring it on
>> post_load we check against the current IPA size (corresponding to the
>> max the destination KVM does support). The destination IPA size is
>> chosen before creating the destination VM. If the destination IPA size
>> is less than the source IPA size, we fail the migration.
>>
>> Hope this helps
> 
> No, I asked if the *guest* is able to distinguish e.g. 43 from 44 or if
> the device memory setup is sufficient.
> 
> Once you create the machine, you setup device memory (using the maxmem
> parameter).
> 
> From that, you directly know how big the largest guest physical address
> will be (e.g. 2TB + (maxram_size - ram_size)). You can check that
> against max_vm_phys_shift and error out.

Ah OK I didn't catch your question. Yes indeed you method is simpler. At
the moment I don't think the guest can make any difference. But the
guest sees the CPU PARange which is fixed currently, as far as I
understand it; also the guest is GPA limited at compilation time with a
given CONFIG_ARM64_PA_BITS_=X config.

So we come back to Dave's remark, if we make CPU PARange match the
max_vm_phys_shift and make the former dynamic, then the guest can see it.

Thanks

Eric
> 
> During migration, source and destination have to have the same qemu
> cmdline, especially same maxmem parameter. So you would catch an invalid
> setup on the destination, without manually migrating and checking
> max_vm_phys_shift.
> 
> However (that's why I am asking) if the guest can spot the difference
> between e.g. 43 and 44, then you should migrate and check. If it is
> implicitly handled by device memory position and size, you should not
> migrate it.
> 
>>
>> Thanks
>>
>> Eric
>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-04 12:05       ` David Hildenbrand
@ 2018-07-05 11:42         ` Auger Eric
  2018-07-05 11:54           ` David Hildenbrand
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-07-05 11:42 UTC (permalink / raw)
  To: David Hildenbrand, eric.auger.pro, qemu-devel, qemu-arm,
	peter.maydell, shameerali.kolothum.thodi, imammedo
  Cc: wei, drjones, david, dgilbert, agraf

Hi David,

On 07/04/2018 02:05 PM, David Hildenbrand wrote:
> On 03.07.2018 21:27, Auger Eric wrote:
>> Hi David,
>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:
>>> On 03.07.2018 09:19, Eric Auger wrote:
>>>> We define a new hotpluggable RAM region (aka. device memory).
>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
>>>> in KVM/ARM, FW and guest kernel. At the moment the device
>>>> memory region is max 2TB.
>>>
>>> Maybe a stupid question, but why exactly does it have to start at 2TB
>>> (and not e.g. at 1TB)?
>> not a stupid question. See tentative answer below.
>>>
>>>>
>>>> This is largely inspired of device memory initialization in
>>>> pc machine code.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>>> ---
>>>>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>>>>  include/hw/arm/arm.h  |   2 +
>>>>  include/hw/arm/virt.h |   1 +
>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>> index 5a4d0bf..6fefb78 100644
>>>> --- a/hw/arm/virt.c
>>>> +++ b/hw/arm/virt.c
>>>> @@ -59,6 +59,7 @@
>>>>  #include "qapi/visitor.h"
>>>>  #include "standard-headers/linux/input.h"
>>>>  #include "hw/arm/smmuv3.h"
>>>> +#include "hw/acpi/acpi.h"
>>>>  
>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>>>> @@ -94,34 +95,25 @@
>>>>  
>>>>  #define PLATFORM_BUS_NUM_IRQS 64
>>>>  
>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>>>> - * address space unallocated and free for future use between 256G and 512G.
>>>> - * If we need to provide more RAM to VMs in the future then we need to:
>>>> - *  * allocate a second bank of RAM starting at 2TB and working up
>> I acknowledge this comment was the main justification. Now if you look at
>>
>> Principles of ARM Memory Maps
>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_principles_of_arm_memory_maps.pdf
>> chapter 2.3 you will find that when adding PA bits, you always leave
>> space for reserved space and mapped IO.
> 
> Thanks for the pointer!
> 
> So ... we can fit
> 
> a) 2GB at 2GB
> b) 32GB at 32GB
> c) 512GB at 512GB
> d) 8TB at 8TB
> e) 128TB at 128TB
> 
> (this is a nice rule of thumb if I understand it correctly :) )
> 
> We should strive for device memory (maxram_size - ram_size) to fit
> exactly into one of these slots (otherwise things get nasty).
> 
> Depending on the ram_size, we might have simpler setups and can support
> more configurations, no?
> 
> E.g. ram_size <= 34GB, device_memory <= 512GB
> -> move ram into a) and b)
> -> move device memory into c)

The issue is machvirt doesn't comply with that document.
At the moment we have
0 -> 1GB MMIO
1GB -> 256GB RAM
256GB -> 512GB is theoretically reserved for IO but most is free.
512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
existing 40b GPA space.

We don't want to change this address map due to legacy reasons.

Another question! do you know if it would be possible to have
device_memory region split into several discontinuous segments?

Thanks

Eric
> 
> We should make up our mind right from the beginning how our setup will
> look, so we can avoid (possibly complicated) compatibility handling
> later on.
> 
>>
>> On the other hand, if you look at chapter 5, "Proposed 44-bit and 48bit
>> Address Maps", we should logically put the additional RAM at 8TB if we
>> want to comply with that doc.
> 
> I agree, 2TB es in the reserved area.
> 
>>
>> Peter, was there any other justification why we should put the RAM at 2TB?
>>
>> Thanks
>>
>> Eric
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-05 11:42         ` Auger Eric
@ 2018-07-05 11:54           ` David Hildenbrand
  2018-07-05 12:00             ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: David Hildenbrand @ 2018-07-05 11:54 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo
  Cc: wei, drjones, david, dgilbert, agraf

On 05.07.2018 13:42, Auger Eric wrote:
> Hi David,
> 
> On 07/04/2018 02:05 PM, David Hildenbrand wrote:
>> On 03.07.2018 21:27, Auger Eric wrote:
>>> Hi David,
>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:
>>>> On 03.07.2018 09:19, Eric Auger wrote:
>>>>> We define a new hotpluggable RAM region (aka. device memory).
>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
>>>>> memory region is max 2TB.
>>>>
>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
>>>> (and not e.g. at 1TB)?
>>> not a stupid question. See tentative answer below.
>>>>
>>>>>
>>>>> This is largely inspired of device memory initialization in
>>>>> pc machine code.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>>>> ---
>>>>>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>>>>>  include/hw/arm/arm.h  |   2 +
>>>>>  include/hw/arm/virt.h |   1 +
>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>>>>
>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>> index 5a4d0bf..6fefb78 100644
>>>>> --- a/hw/arm/virt.c
>>>>> +++ b/hw/arm/virt.c
>>>>> @@ -59,6 +59,7 @@
>>>>>  #include "qapi/visitor.h"
>>>>>  #include "standard-headers/linux/input.h"
>>>>>  #include "hw/arm/smmuv3.h"
>>>>> +#include "hw/acpi/acpi.h"
>>>>>  
>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>>>>> @@ -94,34 +95,25 @@
>>>>>  
>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
>>>>>  
>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>>>>> - * address space unallocated and free for future use between 256G and 512G.
>>>>> - * If we need to provide more RAM to VMs in the future then we need to:
>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up
>>> I acknowledge this comment was the main justification. Now if you look at
>>>
>>> Principles of ARM Memory Maps
>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_principles_of_arm_memory_maps.pdf
>>> chapter 2.3 you will find that when adding PA bits, you always leave
>>> space for reserved space and mapped IO.
>>
>> Thanks for the pointer!
>>
>> So ... we can fit
>>
>> a) 2GB at 2GB
>> b) 32GB at 32GB
>> c) 512GB at 512GB
>> d) 8TB at 8TB
>> e) 128TB at 128TB
>>
>> (this is a nice rule of thumb if I understand it correctly :) )
>>
>> We should strive for device memory (maxram_size - ram_size) to fit
>> exactly into one of these slots (otherwise things get nasty).
>>
>> Depending on the ram_size, we might have simpler setups and can support
>> more configurations, no?
>>
>> E.g. ram_size <= 34GB, device_memory <= 512GB
>> -> move ram into a) and b)
>> -> move device memory into c)
> 
> The issue is machvirt doesn't comply with that document.
> At the moment we have
> 0 -> 1GB MMIO
> 1GB -> 256GB RAM
> 256GB -> 512GB is theoretically reserved for IO but most is free.
> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
> existing 40b GPA space.
> 
> We don't want to change this address map due to legacy reasons.
> 

Thanks, good to know!

> Another question! do you know if it would be possible to have
> device_memory region split into several discontinuous segments?

It can be implemented for sure, but I would try to avoid that, as it
makes certain configurations impossible (and very end user unfriendly).

E.g. (numbers completely made up, but it should show what I mean)

-m 20G,maxmem=120G:
-> Try to add a DIMM with 100G -> error.
-> But we can add e.g. two DIMMs with 40G and 60G.

This exposes internal details to the end user. And the end user has no
idea what is going on.

So I think we should try our best to keep that area consecutive.

> 
> Thanks
> 
> Eric


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-05 11:54           ` David Hildenbrand
@ 2018-07-05 12:00             ` Auger Eric
  2018-07-05 12:09               ` David Hildenbrand
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-07-05 12:00 UTC (permalink / raw)
  To: David Hildenbrand, eric.auger.pro, qemu-devel, qemu-arm,
	peter.maydell, shameerali.kolothum.thodi, imammedo
  Cc: wei, drjones, david, dgilbert, agraf

Hi David,

On 07/05/2018 01:54 PM, David Hildenbrand wrote:
> On 05.07.2018 13:42, Auger Eric wrote:
>> Hi David,
>>
>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:
>>> On 03.07.2018 21:27, Auger Eric wrote:
>>>> Hi David,
>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:
>>>>> On 03.07.2018 09:19, Eric Auger wrote:
>>>>>> We define a new hotpluggable RAM region (aka. device memory).
>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
>>>>>> memory region is max 2TB.
>>>>>
>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
>>>>> (and not e.g. at 1TB)?
>>>> not a stupid question. See tentative answer below.
>>>>>
>>>>>>
>>>>>> This is largely inspired of device memory initialization in
>>>>>> pc machine code.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>>>>> ---
>>>>>>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>>>>>>  include/hw/arm/arm.h  |   2 +
>>>>>>  include/hw/arm/virt.h |   1 +
>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>>> index 5a4d0bf..6fefb78 100644
>>>>>> --- a/hw/arm/virt.c
>>>>>> +++ b/hw/arm/virt.c
>>>>>> @@ -59,6 +59,7 @@
>>>>>>  #include "qapi/visitor.h"
>>>>>>  #include "standard-headers/linux/input.h"
>>>>>>  #include "hw/arm/smmuv3.h"
>>>>>> +#include "hw/acpi/acpi.h"
>>>>>>  
>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>>>>>> @@ -94,34 +95,25 @@
>>>>>>  
>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
>>>>>>  
>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>>>>>> - * address space unallocated and free for future use between 256G and 512G.
>>>>>> - * If we need to provide more RAM to VMs in the future then we need to:
>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up
>>>> I acknowledge this comment was the main justification. Now if you look at
>>>>
>>>> Principles of ARM Memory Maps
>>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_principles_of_arm_memory_maps.pdf
>>>> chapter 2.3 you will find that when adding PA bits, you always leave
>>>> space for reserved space and mapped IO.
>>>
>>> Thanks for the pointer!
>>>
>>> So ... we can fit
>>>
>>> a) 2GB at 2GB
>>> b) 32GB at 32GB
>>> c) 512GB at 512GB
>>> d) 8TB at 8TB
>>> e) 128TB at 128TB
>>>
>>> (this is a nice rule of thumb if I understand it correctly :) )
>>>
>>> We should strive for device memory (maxram_size - ram_size) to fit
>>> exactly into one of these slots (otherwise things get nasty).
>>>
>>> Depending on the ram_size, we might have simpler setups and can support
>>> more configurations, no?
>>>
>>> E.g. ram_size <= 34GB, device_memory <= 512GB
>>> -> move ram into a) and b)
>>> -> move device memory into c)
>>
>> The issue is machvirt doesn't comply with that document.
>> At the moment we have
>> 0 -> 1GB MMIO
>> 1GB -> 256GB RAM
>> 256GB -> 512GB is theoretically reserved for IO but most is free.
>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
>> existing 40b GPA space.
>>
>> We don't want to change this address map due to legacy reasons.
>>
> 
> Thanks, good to know!
> 
>> Another question! do you know if it would be possible to have
>> device_memory region split into several discontinuous segments?
> 
> It can be implemented for sure, but I would try to avoid that, as it
> makes certain configurations impossible (and very end user unfriendly).
> 
> E.g. (numbers completely made up, but it should show what I mean)
> 
> -m 20G,maxmem=120G:
> -> Try to add a DIMM with 100G -> error.
> -> But we can add e.g. two DIMMs with 40G and 60G.
> 
> This exposes internal details to the end user. And the end user has no
> idea what is going on.
> 
> So I think we should try our best to keep that area consecutive.

Actually I didn't sufficiently detail the context. I would like
1) 1 segment to be exposed to the end-user through slot|maxmem stuff
(what this series targets) and
2) another segment used to instantiate PC-DIMM for internal needs as
replacement of part of the 1GB -> 256GB static RAM. This was the purpose
of Shameer's original series

[1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
http://patchwork.ozlabs.org/cover/914694/
This approach is not yet validated though.

The rationale is sometimes you must have "holes" in RAM as some GPAs
match reserved IOVAs for assigned devices.

Thanks

Eric

> 
>>
>> Thanks
>>
>> Eric
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-05 12:00             ` Auger Eric
@ 2018-07-05 12:09               ` David Hildenbrand
  2018-07-05 12:17                 ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: David Hildenbrand @ 2018-07-05 12:09 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo
  Cc: wei, drjones, david, dgilbert, agraf

On 05.07.2018 14:00, Auger Eric wrote:
> Hi David,
> 
> On 07/05/2018 01:54 PM, David Hildenbrand wrote:
>> On 05.07.2018 13:42, Auger Eric wrote:
>>> Hi David,
>>>
>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:
>>>> On 03.07.2018 21:27, Auger Eric wrote:
>>>>> Hi David,
>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:
>>>>>> On 03.07.2018 09:19, Eric Auger wrote:
>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
>>>>>>> memory region is max 2TB.
>>>>>>
>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
>>>>>> (and not e.g. at 1TB)?
>>>>> not a stupid question. See tentative answer below.
>>>>>>
>>>>>>>
>>>>>>> This is largely inspired of device memory initialization in
>>>>>>> pc machine code.
>>>>>>>
>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>>>>>> ---
>>>>>>>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>>>>>>>  include/hw/arm/arm.h  |   2 +
>>>>>>>  include/hw/arm/virt.h |   1 +
>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>>>> index 5a4d0bf..6fefb78 100644
>>>>>>> --- a/hw/arm/virt.c
>>>>>>> +++ b/hw/arm/virt.c
>>>>>>> @@ -59,6 +59,7 @@
>>>>>>>  #include "qapi/visitor.h"
>>>>>>>  #include "standard-headers/linux/input.h"
>>>>>>>  #include "hw/arm/smmuv3.h"
>>>>>>> +#include "hw/acpi/acpi.h"
>>>>>>>  
>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>>>>>>> @@ -94,34 +95,25 @@
>>>>>>>  
>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
>>>>>>>  
>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>>>>>>> - * address space unallocated and free for future use between 256G and 512G.
>>>>>>> - * If we need to provide more RAM to VMs in the future then we need to:
>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up
>>>>> I acknowledge this comment was the main justification. Now if you look at
>>>>>
>>>>> Principles of ARM Memory Maps
>>>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_principles_of_arm_memory_maps.pdf
>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
>>>>> space for reserved space and mapped IO.
>>>>
>>>> Thanks for the pointer!
>>>>
>>>> So ... we can fit
>>>>
>>>> a) 2GB at 2GB
>>>> b) 32GB at 32GB
>>>> c) 512GB at 512GB
>>>> d) 8TB at 8TB
>>>> e) 128TB at 128TB
>>>>
>>>> (this is a nice rule of thumb if I understand it correctly :) )
>>>>
>>>> We should strive for device memory (maxram_size - ram_size) to fit
>>>> exactly into one of these slots (otherwise things get nasty).
>>>>
>>>> Depending on the ram_size, we might have simpler setups and can support
>>>> more configurations, no?
>>>>
>>>> E.g. ram_size <= 34GB, device_memory <= 512GB
>>>> -> move ram into a) and b)
>>>> -> move device memory into c)
>>>
>>> The issue is machvirt doesn't comply with that document.
>>> At the moment we have
>>> 0 -> 1GB MMIO
>>> 1GB -> 256GB RAM
>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
>>> existing 40b GPA space.
>>>
>>> We don't want to change this address map due to legacy reasons.
>>>
>>
>> Thanks, good to know!
>>
>>> Another question! do you know if it would be possible to have
>>> device_memory region split into several discontinuous segments?
>>
>> It can be implemented for sure, but I would try to avoid that, as it
>> makes certain configurations impossible (and very end user unfriendly).
>>
>> E.g. (numbers completely made up, but it should show what I mean)
>>
>> -m 20G,maxmem=120G:
>> -> Try to add a DIMM with 100G -> error.
>> -> But we can add e.g. two DIMMs with 40G and 60G.
>>
>> This exposes internal details to the end user. And the end user has no
>> idea what is going on.
>>
>> So I think we should try our best to keep that area consecutive.
> 
> Actually I didn't sufficiently detail the context. I would like
> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff
> (what this series targets) and
> 2) another segment used to instantiate PC-DIMM for internal needs as
> replacement of part of the 1GB -> 256GB static RAM. This was the purpose
> of Shameer's original series

I am not sure if PC-DIMMs are exactly what you want for internal purposes.

> 
> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> http://patchwork.ozlabs.org/cover/914694/
> This approach is not yet validated though.
> 
> The rationale is sometimes you must have "holes" in RAM as some GPAs
> match reserved IOVAs for assigned devices.

So if I understand it correctly, all you want is some memory region that
a) contains only initially defined memory
b) can have some holes in it

This is exactly what x86 already does (pc_memory_init): Simply construct
your own memory region leaving holes in it.


memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
                         0, pcms->below_4g_mem_size);
memory_region_add_subregion(system_memory, 0, ram_below_4g);
...
if (pcms->above_4g_mem_size > 0)
    memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
    ...
    memory_region_add_subregion(system_memory, 0x100000000ULL,
    ...

They "indicate" these different GPA areas using the e820 map to the guest.

Would that also work for you?

> 
> Thanks
> 
> Eric
> 
>>
>>>
>>> Thanks
>>>
>>> Eric
>>
>>


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-05 12:09               ` David Hildenbrand
@ 2018-07-05 12:17                 ` Auger Eric
  2018-07-05 13:19                   ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-07-05 12:17 UTC (permalink / raw)
  To: David Hildenbrand, eric.auger.pro, qemu-devel, qemu-arm,
	peter.maydell, shameerali.kolothum.thodi, imammedo
  Cc: wei, drjones, david, dgilbert, agraf

Hi David,

On 07/05/2018 02:09 PM, David Hildenbrand wrote:
> On 05.07.2018 14:00, Auger Eric wrote:
>> Hi David,
>>
>> On 07/05/2018 01:54 PM, David Hildenbrand wrote:
>>> On 05.07.2018 13:42, Auger Eric wrote:
>>>> Hi David,
>>>>
>>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:
>>>>> On 03.07.2018 21:27, Auger Eric wrote:
>>>>>> Hi David,
>>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:
>>>>>>> On 03.07.2018 09:19, Eric Auger wrote:
>>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
>>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
>>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
>>>>>>>> memory region is max 2TB.
>>>>>>>
>>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
>>>>>>> (and not e.g. at 1TB)?
>>>>>> not a stupid question. See tentative answer below.
>>>>>>>
>>>>>>>>
>>>>>>>> This is largely inspired of device memory initialization in
>>>>>>>> pc machine code.
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>>>>>>> ---
>>>>>>>>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>>>>>>>>  include/hw/arm/arm.h  |   2 +
>>>>>>>>  include/hw/arm/virt.h |   1 +
>>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>>>>> index 5a4d0bf..6fefb78 100644
>>>>>>>> --- a/hw/arm/virt.c
>>>>>>>> +++ b/hw/arm/virt.c
>>>>>>>> @@ -59,6 +59,7 @@
>>>>>>>>  #include "qapi/visitor.h"
>>>>>>>>  #include "standard-headers/linux/input.h"
>>>>>>>>  #include "hw/arm/smmuv3.h"
>>>>>>>> +#include "hw/acpi/acpi.h"
>>>>>>>>  
>>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>>>>>>>> @@ -94,34 +95,25 @@
>>>>>>>>  
>>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
>>>>>>>>  
>>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
>>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>>>>>>>> - * address space unallocated and free for future use between 256G and 512G.
>>>>>>>> - * If we need to provide more RAM to VMs in the future then we need to:
>>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up
>>>>>> I acknowledge this comment was the main justification. Now if you look at
>>>>>>
>>>>>> Principles of ARM Memory Maps
>>>>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_principles_of_arm_memory_maps.pdf
>>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
>>>>>> space for reserved space and mapped IO.
>>>>>
>>>>> Thanks for the pointer!
>>>>>
>>>>> So ... we can fit
>>>>>
>>>>> a) 2GB at 2GB
>>>>> b) 32GB at 32GB
>>>>> c) 512GB at 512GB
>>>>> d) 8TB at 8TB
>>>>> e) 128TB at 128TB
>>>>>
>>>>> (this is a nice rule of thumb if I understand it correctly :) )
>>>>>
>>>>> We should strive for device memory (maxram_size - ram_size) to fit
>>>>> exactly into one of these slots (otherwise things get nasty).
>>>>>
>>>>> Depending on the ram_size, we might have simpler setups and can support
>>>>> more configurations, no?
>>>>>
>>>>> E.g. ram_size <= 34GB, device_memory <= 512GB
>>>>> -> move ram into a) and b)
>>>>> -> move device memory into c)
>>>>
>>>> The issue is machvirt doesn't comply with that document.
>>>> At the moment we have
>>>> 0 -> 1GB MMIO
>>>> 1GB -> 256GB RAM
>>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
>>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
>>>> existing 40b GPA space.
>>>>
>>>> We don't want to change this address map due to legacy reasons.
>>>>
>>>
>>> Thanks, good to know!
>>>
>>>> Another question! do you know if it would be possible to have
>>>> device_memory region split into several discontinuous segments?
>>>
>>> It can be implemented for sure, but I would try to avoid that, as it
>>> makes certain configurations impossible (and very end user unfriendly).
>>>
>>> E.g. (numbers completely made up, but it should show what I mean)
>>>
>>> -m 20G,maxmem=120G:
>>> -> Try to add a DIMM with 100G -> error.
>>> -> But we can add e.g. two DIMMs with 40G and 60G.
>>>
>>> This exposes internal details to the end user. And the end user has no
>>> idea what is going on.
>>>
>>> So I think we should try our best to keep that area consecutive.
>>
>> Actually I didn't sufficiently detail the context. I would like
>> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff
>> (what this series targets) and
>> 2) another segment used to instantiate PC-DIMM for internal needs as
>> replacement of part of the 1GB -> 256GB static RAM. This was the purpose
>> of Shameer's original series
> 
> I am not sure if PC-DIMMs are exactly what you want for internal purposes.
> 
>>
>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>> http://patchwork.ozlabs.org/cover/914694/
>> This approach is not yet validated though.
>>
>> The rationale is sometimes you must have "holes" in RAM as some GPAs
>> match reserved IOVAs for assigned devices.
> 
> So if I understand it correctly, all you want is some memory region that
> a) contains only initially defined memory
> b) can have some holes in it
> 
> This is exactly what x86 already does (pc_memory_init): Simply construct
> your own memory region leaving holes in it.
> 
> 
> memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>                          0, pcms->below_4g_mem_size);
> memory_region_add_subregion(system_memory, 0, ram_below_4g);
> ...
> if (pcms->above_4g_mem_size > 0)
>     memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>     ...
>     memory_region_add_subregion(system_memory, 0x100000000ULL,
>     ...
> 
> They "indicate" these different GPA areas using the e820 map to the guest.
> 
> Would that also work for you?

I would tentatively say yes. Effectively I am not sure that if we were
to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the
natural choice. Also the reserved IOVA issue impacts the device_memory
region area I think. I am skeptical about the fact we can put holes in
static RAM and device_memory regions like that.

Thanks!

Eric
> 
>>
>> Thanks
>>
>> Eric
>>
>>>
>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-05 12:17                 ` Auger Eric
@ 2018-07-05 13:19                   ` Shameerali Kolothum Thodi
  2018-07-05 14:27                     ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: Shameerali Kolothum Thodi @ 2018-07-05 13:19 UTC (permalink / raw)
  To: Auger Eric, David Hildenbrand, eric.auger.pro, qemu-devel,
	qemu-arm, peter.maydell, imammedo
  Cc: wei, drjones, david, dgilbert, agraf


> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 05 July 2018 13:18
> To: David Hildenbrand <david@redhat.com>; eric.auger.pro@gmail.com;
> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> imammedo@redhat.com
> Cc: wei@redhat.com; drjones@redhat.com; david@gibson.dropbear.id.au;
> dgilbert@redhat.com; agraf@suse.de
> Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate
> device_memory
> 
> Hi David,
> 
> On 07/05/2018 02:09 PM, David Hildenbrand wrote:
> > On 05.07.2018 14:00, Auger Eric wrote:
> >> Hi David,
> >>
> >> On 07/05/2018 01:54 PM, David Hildenbrand wrote:
> >>> On 05.07.2018 13:42, Auger Eric wrote:
> >>>> Hi David,
> >>>>
> >>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:
> >>>>> On 03.07.2018 21:27, Auger Eric wrote:
> >>>>>> Hi David,
> >>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:
> >>>>>>> On 03.07.2018 09:19, Eric Auger wrote:
> >>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
> >>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
> >>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
> >>>>>>>> memory region is max 2TB.
> >>>>>>>
> >>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
> >>>>>>> (and not e.g. at 1TB)?
> >>>>>> not a stupid question. See tentative answer below.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> This is largely inspired of device memory initialization in
> >>>>>>>> pc machine code.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> >>>>>>>> ---
> >>>>>>>>  hw/arm/virt.c         | 104
> ++++++++++++++++++++++++++++++++++++--------------
> >>>>>>>>  include/hw/arm/arm.h  |   2 +
> >>>>>>>>  include/hw/arm/virt.h |   1 +
> >>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >>>>>>>> index 5a4d0bf..6fefb78 100644
> >>>>>>>> --- a/hw/arm/virt.c
> >>>>>>>> +++ b/hw/arm/virt.c
> >>>>>>>> @@ -59,6 +59,7 @@
> >>>>>>>>  #include "qapi/visitor.h"
> >>>>>>>>  #include "standard-headers/linux/input.h"
> >>>>>>>>  #include "hw/arm/smmuv3.h"
> >>>>>>>> +#include "hw/acpi/acpi.h"
> >>>>>>>>
> >>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
> >>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc,
> \
> >>>>>>>> @@ -94,34 +95,25 @@
> >>>>>>>>
> >>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
> >>>>>>>>
> >>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this
> means
> >>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
> >>>>>>>> - * address space unallocated and free for future use between 256G
> and 512G.
> >>>>>>>> - * If we need to provide more RAM to VMs in the future then we
> need to:
> >>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up
> >>>>>> I acknowledge this comment was the main justification. Now if you look
> at
> >>>>>>
> >>>>>> Principles of ARM Memory Maps
> >>>>>>
> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ
> iples_of_arm_memory_maps.pdf
> >>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
> >>>>>> space for reserved space and mapped IO.
> >>>>>
> >>>>> Thanks for the pointer!
> >>>>>
> >>>>> So ... we can fit
> >>>>>
> >>>>> a) 2GB at 2GB
> >>>>> b) 32GB at 32GB
> >>>>> c) 512GB at 512GB
> >>>>> d) 8TB at 8TB
> >>>>> e) 128TB at 128TB
> >>>>>
> >>>>> (this is a nice rule of thumb if I understand it correctly :) )
> >>>>>
> >>>>> We should strive for device memory (maxram_size - ram_size) to fit
> >>>>> exactly into one of these slots (otherwise things get nasty).
> >>>>>
> >>>>> Depending on the ram_size, we might have simpler setups and can
> support
> >>>>> more configurations, no?
> >>>>>
> >>>>> E.g. ram_size <= 34GB, device_memory <= 512GB
> >>>>> -> move ram into a) and b)
> >>>>> -> move device memory into c)
> >>>>
> >>>> The issue is machvirt doesn't comply with that document.
> >>>> At the moment we have
> >>>> 0 -> 1GB MMIO
> >>>> 1GB -> 256GB RAM
> >>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
> >>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
> >>>> existing 40b GPA space.
> >>>>
> >>>> We don't want to change this address map due to legacy reasons.
> >>>>
> >>>
> >>> Thanks, good to know!
> >>>
> >>>> Another question! do you know if it would be possible to have
> >>>> device_memory region split into several discontinuous segments?
> >>>
> >>> It can be implemented for sure, but I would try to avoid that, as it
> >>> makes certain configurations impossible (and very end user unfriendly).
> >>>
> >>> E.g. (numbers completely made up, but it should show what I mean)
> >>>
> >>> -m 20G,maxmem=120G:
> >>> -> Try to add a DIMM with 100G -> error.
> >>> -> But we can add e.g. two DIMMs with 40G and 60G.
> >>>
> >>> This exposes internal details to the end user. And the end user has no
> >>> idea what is going on.
> >>>
> >>> So I think we should try our best to keep that area consecutive.
> >>
> >> Actually I didn't sufficiently detail the context. I would like
> >> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff
> >> (what this series targets) and
> >> 2) another segment used to instantiate PC-DIMM for internal needs as
> >> replacement of part of the 1GB -> 256GB static RAM. This was the purpose
> >> of Shameer's original series
> >
> > I am not sure if PC-DIMMs are exactly what you want for internal purposes.
> >
> >>
> >> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> >> http://patchwork.ozlabs.org/cover/914694/
> >> This approach is not yet validated though.
> >>
> >> The rationale is sometimes you must have "holes" in RAM as some GPAs
> >> match reserved IOVAs for assigned devices.
> >
> > So if I understand it correctly, all you want is some memory region that
> > a) contains only initially defined memory
> > b) can have some holes in it
> >
> > This is exactly what x86 already does (pc_memory_init): Simply construct
> > your own memory region leaving holes in it.
> >
> >
> > memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> >                          0, pcms->below_4g_mem_size);
> > memory_region_add_subregion(system_memory, 0, ram_below_4g);
> > ...
> > if (pcms->above_4g_mem_size > 0)
> >     memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> >     ...
> >     memory_region_add_subregion(system_memory, 0x100000000ULL,
> >     ...
> >
> > They "indicate" these different GPA areas using the e820 map to the guest.
> >
> > Would that also work for you?
> 
> I would tentatively say yes. Effectively I am not sure that if we were
> to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the
> natural choice. Also the reserved IOVA issue impacts the device_memory
> region area I think. I am skeptical about the fact we can put holes in
> static RAM and device_memory regions like that.

The first approach[1] we had to address the holes in memory was using
the memory alias way mentioned above.  And based on Drew's review, the
pc-dimm way of handling was introduced. I think the main argument was that
it will be useful when we eventually support hotplug. But since that is added
anyway as part of this series, I am not sure we have any other benefit in
modeling it as pc-dimm. May be I am missing something here.

Thanks,
Shameer

[1]. https://lists.gnu.org/archive/html/qemu-arm/2018-04/msg00243.html


> Thanks!
> 
> Eric
> >
> >>
> >> Thanks
> >>
> >> Eric
> >>
> >>>
> >>>>
> >>>> Thanks
> >>>>
> >>>> Eric
> >>>
> >>>
> >
> >

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-05 13:19                   ` Shameerali Kolothum Thodi
@ 2018-07-05 14:27                     ` Auger Eric
  2018-07-11 13:17                       ` Igor Mammedov
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-07-05 14:27 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, David Hildenbrand, eric.auger.pro,
	qemu-devel, qemu-arm, peter.maydell, imammedo
  Cc: wei, agraf, drjones, dgilbert, david

Hi Shameer,

On 07/05/2018 03:19 PM, Shameerali Kolothum Thodi wrote:
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: 05 July 2018 13:18
>> To: David Hildenbrand <david@redhat.com>; eric.auger.pro@gmail.com;
>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
>> imammedo@redhat.com
>> Cc: wei@redhat.com; drjones@redhat.com; david@gibson.dropbear.id.au;
>> dgilbert@redhat.com; agraf@suse.de
>> Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate
>> device_memory
>>
>> Hi David,
>>
>> On 07/05/2018 02:09 PM, David Hildenbrand wrote:
>>> On 05.07.2018 14:00, Auger Eric wrote:
>>>> Hi David,
>>>>
>>>> On 07/05/2018 01:54 PM, David Hildenbrand wrote:
>>>>> On 05.07.2018 13:42, Auger Eric wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:
>>>>>>> On 03.07.2018 21:27, Auger Eric wrote:
>>>>>>>> Hi David,
>>>>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:
>>>>>>>>> On 03.07.2018 09:19, Eric Auger wrote:
>>>>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
>>>>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
>>>>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
>>>>>>>>>> memory region is max 2TB.
>>>>>>>>>
>>>>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
>>>>>>>>> (and not e.g. at 1TB)?
>>>>>>>> not a stupid question. See tentative answer below.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is largely inspired of device memory initialization in
>>>>>>>>>> pc machine code.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>>>>>>>>> ---
>>>>>>>>>>  hw/arm/virt.c         | 104
>> ++++++++++++++++++++++++++++++++++++--------------
>>>>>>>>>>  include/hw/arm/arm.h  |   2 +
>>>>>>>>>>  include/hw/arm/virt.h |   1 +
>>>>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>>>>>>> index 5a4d0bf..6fefb78 100644
>>>>>>>>>> --- a/hw/arm/virt.c
>>>>>>>>>> +++ b/hw/arm/virt.c
>>>>>>>>>> @@ -59,6 +59,7 @@
>>>>>>>>>>  #include "qapi/visitor.h"
>>>>>>>>>>  #include "standard-headers/linux/input.h"
>>>>>>>>>>  #include "hw/arm/smmuv3.h"
>>>>>>>>>> +#include "hw/acpi/acpi.h"
>>>>>>>>>>
>>>>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc,
>> \
>>>>>>>>>> @@ -94,34 +95,25 @@
>>>>>>>>>>
>>>>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
>>>>>>>>>>
>>>>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this
>> means
>>>>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>>>>>>>>>> - * address space unallocated and free for future use between 256G
>> and 512G.
>>>>>>>>>> - * If we need to provide more RAM to VMs in the future then we
>> need to:
>>>>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up
>>>>>>>> I acknowledge this comment was the main justification. Now if you look
>> at
>>>>>>>>
>>>>>>>> Principles of ARM Memory Maps
>>>>>>>>
>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ
>> iples_of_arm_memory_maps.pdf
>>>>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
>>>>>>>> space for reserved space and mapped IO.
>>>>>>>
>>>>>>> Thanks for the pointer!
>>>>>>>
>>>>>>> So ... we can fit
>>>>>>>
>>>>>>> a) 2GB at 2GB
>>>>>>> b) 32GB at 32GB
>>>>>>> c) 512GB at 512GB
>>>>>>> d) 8TB at 8TB
>>>>>>> e) 128TB at 128TB
>>>>>>>
>>>>>>> (this is a nice rule of thumb if I understand it correctly :) )
>>>>>>>
>>>>>>> We should strive for device memory (maxram_size - ram_size) to fit
>>>>>>> exactly into one of these slots (otherwise things get nasty).
>>>>>>>
>>>>>>> Depending on the ram_size, we might have simpler setups and can
>> support
>>>>>>> more configurations, no?
>>>>>>>
>>>>>>> E.g. ram_size <= 34GB, device_memory <= 512GB
>>>>>>> -> move ram into a) and b)
>>>>>>> -> move device memory into c)
>>>>>>
>>>>>> The issue is machvirt doesn't comply with that document.
>>>>>> At the moment we have
>>>>>> 0 -> 1GB MMIO
>>>>>> 1GB -> 256GB RAM
>>>>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
>>>>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
>>>>>> existing 40b GPA space.
>>>>>>
>>>>>> We don't want to change this address map due to legacy reasons.
>>>>>>
>>>>>
>>>>> Thanks, good to know!
>>>>>
>>>>>> Another question! do you know if it would be possible to have
>>>>>> device_memory region split into several discontinuous segments?
>>>>>
>>>>> It can be implemented for sure, but I would try to avoid that, as it
>>>>> makes certain configurations impossible (and very end user unfriendly).
>>>>>
>>>>> E.g. (numbers completely made up, but it should show what I mean)
>>>>>
>>>>> -m 20G,maxmem=120G:
>>>>> -> Try to add a DIMM with 100G -> error.
>>>>> -> But we can add e.g. two DIMMs with 40G and 60G.
>>>>>
>>>>> This exposes internal details to the end user. And the end user has no
>>>>> idea what is going on.
>>>>>
>>>>> So I think we should try our best to keep that area consecutive.
>>>>
>>>> Actually I didn't sufficiently detail the context. I would like
>>>> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff
>>>> (what this series targets) and
>>>> 2) another segment used to instantiate PC-DIMM for internal needs as
>>>> replacement of part of the 1GB -> 256GB static RAM. This was the purpose
>>>> of Shameer's original series
>>>
>>> I am not sure if PC-DIMMs are exactly what you want for internal purposes.
>>>
>>>>
>>>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>>> http://patchwork.ozlabs.org/cover/914694/
>>>> This approach is not yet validated though.
>>>>
>>>> The rationale is sometimes you must have "holes" in RAM as some GPAs
>>>> match reserved IOVAs for assigned devices.
>>>
>>> So if I understand it correctly, all you want is some memory region that
>>> a) contains only initially defined memory
>>> b) can have some holes in it
>>>
>>> This is exactly what x86 already does (pc_memory_init): Simply construct
>>> your own memory region leaving holes in it.
>>>
>>>
>>> memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>>>                          0, pcms->below_4g_mem_size);
>>> memory_region_add_subregion(system_memory, 0, ram_below_4g);
>>> ...
>>> if (pcms->above_4g_mem_size > 0)
>>>     memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>>>     ...
>>>     memory_region_add_subregion(system_memory, 0x100000000ULL,
>>>     ...
>>>
>>> They "indicate" these different GPA areas using the e820 map to the guest.
>>>
>>> Would that also work for you?
>>
>> I would tentatively say yes. Effectively I am not sure that if we were
>> to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the
>> natural choice. Also the reserved IOVA issue impacts the device_memory
>> region area I think. I am skeptical about the fact we can put holes in
>> static RAM and device_memory regions like that.
> 
> The first approach[1] we had to address the holes in memory was using
> the memory alias way mentioned above.  And based on Drew's review, the
> pc-dimm way of handling was introduced. I think the main argument was that
> it will be useful when we eventually support hotplug.

That's my understanding too.

 But since that is added
> anyway as part of this series, I am not sure we have any other benefit in
> modeling it as pc-dimm. May be I am missing something here.

I tentatively agree with you. I was trying to understand if the
device_memory region was fitting the original needs too but I think
standard alias approach is more adapted to hole creation.

Thanks

Eric
> 
> Thanks,
> Shameer
> 
> [1]. https://lists.gnu.org/archive/html/qemu-arm/2018-04/msg00243.html
> 
> 
>> Thanks!
>>
>> Eric
>>>
>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Eric
>>>>>
>>>>>
>>>
>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-05 14:27                     ` Auger Eric
@ 2018-07-11 13:17                       ` Igor Mammedov
  2018-07-12 14:22                         ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: Igor Mammedov @ 2018-07-11 13:17 UTC (permalink / raw)
  To: Auger Eric
  Cc: Shameerali Kolothum Thodi, David Hildenbrand, eric.auger.pro,
	qemu-devel, qemu-arm, peter.maydell, wei, agraf, drjones,
	dgilbert, david

On Thu, 5 Jul 2018 16:27:05 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Shameer,
> 
> On 07/05/2018 03:19 PM, Shameerali Kolothum Thodi wrote:
> >   
> >> -----Original Message-----
> >> From: Auger Eric [mailto:eric.auger@redhat.com]
> >> Sent: 05 July 2018 13:18
> >> To: David Hildenbrand <david@redhat.com>; eric.auger.pro@gmail.com;
> >> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
> >> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> >> imammedo@redhat.com
> >> Cc: wei@redhat.com; drjones@redhat.com; david@gibson.dropbear.id.au;
> >> dgilbert@redhat.com; agraf@suse.de
> >> Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate
> >> device_memory
> >>
> >> Hi David,
> >>
> >> On 07/05/2018 02:09 PM, David Hildenbrand wrote:  
> >>> On 05.07.2018 14:00, Auger Eric wrote:  
> >>>> Hi David,
> >>>>
> >>>> On 07/05/2018 01:54 PM, David Hildenbrand wrote:  
> >>>>> On 05.07.2018 13:42, Auger Eric wrote:  
> >>>>>> Hi David,
> >>>>>>
> >>>>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:  
> >>>>>>> On 03.07.2018 21:27, Auger Eric wrote:  
> >>>>>>>> Hi David,
> >>>>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:  
> >>>>>>>>> On 03.07.2018 09:19, Eric Auger wrote:  
> >>>>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
> >>>>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
> >>>>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
> >>>>>>>>>> memory region is max 2TB.  
> >>>>>>>>>
> >>>>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
> >>>>>>>>> (and not e.g. at 1TB)?  
> >>>>>>>> not a stupid question. See tentative answer below.  
> >>>>>>>>>  
> >>>>>>>>>>
> >>>>>>>>>> This is largely inspired of device memory initialization in
> >>>>>>>>>> pc machine code.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> >>>>>>>>>> ---
> >>>>>>>>>>  hw/arm/virt.c         | 104  
> >> ++++++++++++++++++++++++++++++++++++--------------  
> >>>>>>>>>>  include/hw/arm/arm.h  |   2 +
> >>>>>>>>>>  include/hw/arm/virt.h |   1 +
> >>>>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >>>>>>>>>> index 5a4d0bf..6fefb78 100644
> >>>>>>>>>> --- a/hw/arm/virt.c
> >>>>>>>>>> +++ b/hw/arm/virt.c
> >>>>>>>>>> @@ -59,6 +59,7 @@
> >>>>>>>>>>  #include "qapi/visitor.h"
> >>>>>>>>>>  #include "standard-headers/linux/input.h"
> >>>>>>>>>>  #include "hw/arm/smmuv3.h"
> >>>>>>>>>> +#include "hw/acpi/acpi.h"
> >>>>>>>>>>
> >>>>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
> >>>>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc,  
> >> \  
> >>>>>>>>>> @@ -94,34 +95,25 @@
> >>>>>>>>>>
> >>>>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
> >>>>>>>>>>
> >>>>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this  
> >> means  
> >>>>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
> >>>>>>>>>> - * address space unallocated and free for future use between 256G  
> >> and 512G.  
> >>>>>>>>>> - * If we need to provide more RAM to VMs in the future then we  
> >> need to:  
> >>>>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up  
> >>>>>>>> I acknowledge this comment was the main justification. Now if you look  
> >> at  
> >>>>>>>>
> >>>>>>>> Principles of ARM Memory Maps
> >>>>>>>>  
> >> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ
> >> iples_of_arm_memory_maps.pdf  
> >>>>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
> >>>>>>>> space for reserved space and mapped IO.  
> >>>>>>>
> >>>>>>> Thanks for the pointer!
> >>>>>>>
> >>>>>>> So ... we can fit
> >>>>>>>
> >>>>>>> a) 2GB at 2GB
> >>>>>>> b) 32GB at 32GB
> >>>>>>> c) 512GB at 512GB
> >>>>>>> d) 8TB at 8TB
> >>>>>>> e) 128TB at 128TB
> >>>>>>>
> >>>>>>> (this is a nice rule of thumb if I understand it correctly :) )
> >>>>>>>
> >>>>>>> We should strive for device memory (maxram_size - ram_size) to fit
> >>>>>>> exactly into one of these slots (otherwise things get nasty).
> >>>>>>>
> >>>>>>> Depending on the ram_size, we might have simpler setups and can  
> >> support  
> >>>>>>> more configurations, no?
> >>>>>>>
> >>>>>>> E.g. ram_size <= 34GB, device_memory <= 512GB  
> >>>>>>> -> move ram into a) and b)
> >>>>>>> -> move device memory into c)  
> >>>>>>
> >>>>>> The issue is machvirt doesn't comply with that document.
> >>>>>> At the moment we have
> >>>>>> 0 -> 1GB MMIO
> >>>>>> 1GB -> 256GB RAM
> >>>>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
> >>>>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
> >>>>>> existing 40b GPA space.
> >>>>>>
> >>>>>> We don't want to change this address map due to legacy reasons.
> >>>>>>  
> >>>>>
> >>>>> Thanks, good to know!
> >>>>>  
> >>>>>> Another question! do you know if it would be possible to have
> >>>>>> device_memory region split into several discontinuous segments?  
> >>>>>
> >>>>> It can be implemented for sure, but I would try to avoid that, as it
> >>>>> makes certain configurations impossible (and very end user unfriendly).
> >>>>>
> >>>>> E.g. (numbers completely made up, but it should show what I mean)
> >>>>>
> >>>>> -m 20G,maxmem=120G:  
> >>>>> -> Try to add a DIMM with 100G -> error.
> >>>>> -> But we can add e.g. two DIMMs with 40G and 60G.  
> >>>>>
> >>>>> This exposes internal details to the end user. And the end user has no
> >>>>> idea what is going on.
> >>>>>
> >>>>> So I think we should try our best to keep that area consecutive.  
> >>>>
> >>>> Actually I didn't sufficiently detail the context. I would like
> >>>> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff
> >>>> (what this series targets) and
> >>>> 2) another segment used to instantiate PC-DIMM for internal needs as
> >>>> replacement of part of the 1GB -> 256GB static RAM. This was the purpose
> >>>> of Shameer's original series  
> >>>
> >>> I am not sure if PC-DIMMs are exactly what you want for internal purposes.
> >>>  
> >>>>
> >>>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> >>>> http://patchwork.ozlabs.org/cover/914694/
> >>>> This approach is not yet validated though.
> >>>>
> >>>> The rationale is sometimes you must have "holes" in RAM as some GPAs
> >>>> match reserved IOVAs for assigned devices.  
> >>>
> >>> So if I understand it correctly, all you want is some memory region that
> >>> a) contains only initially defined memory
> >>> b) can have some holes in it
> >>>
> >>> This is exactly what x86 already does (pc_memory_init): Simply construct
> >>> your own memory region leaving holes in it.
> >>>
> >>>
> >>> memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> >>>                          0, pcms->below_4g_mem_size);
> >>> memory_region_add_subregion(system_memory, 0, ram_below_4g);
> >>> ...
> >>> if (pcms->above_4g_mem_size > 0)
> >>>     memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> >>>     ...
> >>>     memory_region_add_subregion(system_memory, 0x100000000ULL,
> >>>     ...
> >>>
> >>> They "indicate" these different GPA areas using the e820 map to the guest.
> >>>
> >>> Would that also work for you?  
> >>
> >> I would tentatively say yes. Effectively I am not sure that if we were
> >> to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the
> >> natural choice. Also the reserved IOVA issue impacts the device_memory
> >> region area I think. I am skeptical about the fact we can put holes in
> >> static RAM and device_memory regions like that.
Could we just use a single device_memory region for both initial+hotpluggable
RAM if we make base RAM address dynamic?
In this case RAM could start wherever there is a free space for maxmem
(if there is free space in lowmem, put device_memory there, otherwise put
it somewhere in high mem) and we won't care if there is IOVA or not.

*I don't have a clue about iommus, so here goes a stupid question*
I agree with Peter that whole IOVA thing looks broken, when host
layout dictates the guest's one.
Shouldn't be there somewhere an iommu that would remap host map into
guest specific one? (so guest would model board we need and be migrate-able
instead of mimicking host hw)


> > The first approach[1] we had to address the holes in memory was using
> > the memory alias way mentioned above.  And based on Drew's review, the
> > pc-dimm way of handling was introduced. I think the main argument was that
> > it will be useful when we eventually support hotplug.  
> 
> That's my understanding too.
not only hotplug,

  a RAM memory region that's split by aliases is difficult to handle
  as it creates nonlinear GPA<->HVA mapping instead of
  1:1 mapping of pc-dimm,
  so if one needs to build HVA<->GPA map for a given MemoryRegion
  in case of aliases one would have to get list of MemorySections
  that belong to it and build map from that vs (addr + offset) in
  case of simple 1:1 mapping.

  complicated machine specific SRAT/e820 code due to holes
  /grep 'the memory map is a bit tricky'/

>  But since that is added
> > anyway as part of this series, I am not sure we have any other benefit in
> > modeling it as pc-dimm. May be I am missing something here.  
> 
> I tentatively agree with you. I was trying to understand if the
> device_memory region was fitting the original needs too but I think
> standard alias approach is more adapted to hole creation.
Aliases are easy way to start with, but as compat knobs grow
(based on PC experience,  grep 'Calculate ram split')
It's quite a pain to maintain manual implicit aliases layout
without breaking it by accident.
We probably won't be able to get rid of aliases on PC for legacy
reasons but why introduce the same pain to virt board.

Well, magical conversion from -m X to 2..y memory regions (aliases or not)
aren't going to be easy in both cases, especially if one would take into
account "-numa memdev|mem".
I'd rather use a single pc-dimm approach for both /initial and hotpluggble RAM/
and then use device_memory framework to enumerate RAM wherever needed (ACPI/DT)
in inform way.


> Thanks
> 
> Eric
> > 
> > Thanks,
> > Shameer
> > 
> > [1]. https://lists.gnu.org/archive/html/qemu-arm/2018-04/msg00243.html
> > 
> >   
> >> Thanks!
> >>
> >> Eric  
> >>>  
> >>>>
> >>>> Thanks
> >>>>
> >>>> Eric
> >>>>  
> >>>>>  
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> Eric  
> >>>>>
> >>>>>  
> >>>
> >>>  

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-11 13:17                       ` Igor Mammedov
@ 2018-07-12 14:22                         ` Auger Eric
  2018-07-12 14:45                           ` Andrew Jones
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-07-12 14:22 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: wei, peter.maydell, drjones, David Hildenbrand, qemu-devel,
	Shameerali Kolothum Thodi, agraf, qemu-arm, david, dgilbert,
	eric.auger.pro

Hi Igor,

On 07/11/2018 03:17 PM, Igor Mammedov wrote:
> On Thu, 5 Jul 2018 16:27:05 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Shameer,
>>
>> On 07/05/2018 03:19 PM, Shameerali Kolothum Thodi wrote:
>>>   
>>>> -----Original Message-----
>>>> From: Auger Eric [mailto:eric.auger@redhat.com]
>>>> Sent: 05 July 2018 13:18
>>>> To: David Hildenbrand <david@redhat.com>; eric.auger.pro@gmail.com;
>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
>>>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
>>>> imammedo@redhat.com
>>>> Cc: wei@redhat.com; drjones@redhat.com; david@gibson.dropbear.id.au;
>>>> dgilbert@redhat.com; agraf@suse.de
>>>> Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate
>>>> device_memory
>>>>
>>>> Hi David,
>>>>
>>>> On 07/05/2018 02:09 PM, David Hildenbrand wrote:  
>>>>> On 05.07.2018 14:00, Auger Eric wrote:  
>>>>>> Hi David,
>>>>>>
>>>>>> On 07/05/2018 01:54 PM, David Hildenbrand wrote:  
>>>>>>> On 05.07.2018 13:42, Auger Eric wrote:  
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:  
>>>>>>>>> On 03.07.2018 21:27, Auger Eric wrote:  
>>>>>>>>>> Hi David,
>>>>>>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:  
>>>>>>>>>>> On 03.07.2018 09:19, Eric Auger wrote:  
>>>>>>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
>>>>>>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
>>>>>>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
>>>>>>>>>>>> memory region is max 2TB.  
>>>>>>>>>>>
>>>>>>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
>>>>>>>>>>> (and not e.g. at 1TB)?  
>>>>>>>>>> not a stupid question. See tentative answer below.  
>>>>>>>>>>>  
>>>>>>>>>>>>
>>>>>>>>>>>> This is largely inspired of device memory initialization in
>>>>>>>>>>>> pc machine code.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  hw/arm/virt.c         | 104  
>>>> ++++++++++++++++++++++++++++++++++++--------------  
>>>>>>>>>>>>  include/hw/arm/arm.h  |   2 +
>>>>>>>>>>>>  include/hw/arm/virt.h |   1 +
>>>>>>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>>>>>>>>> index 5a4d0bf..6fefb78 100644
>>>>>>>>>>>> --- a/hw/arm/virt.c
>>>>>>>>>>>> +++ b/hw/arm/virt.c
>>>>>>>>>>>> @@ -59,6 +59,7 @@
>>>>>>>>>>>>  #include "qapi/visitor.h"
>>>>>>>>>>>>  #include "standard-headers/linux/input.h"
>>>>>>>>>>>>  #include "hw/arm/smmuv3.h"
>>>>>>>>>>>> +#include "hw/acpi/acpi.h"
>>>>>>>>>>>>
>>>>>>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>>>>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc,  
>>>> \  
>>>>>>>>>>>> @@ -94,34 +95,25 @@
>>>>>>>>>>>>
>>>>>>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
>>>>>>>>>>>>
>>>>>>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this  
>>>> means  
>>>>>>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>>>>>>>>>>>> - * address space unallocated and free for future use between 256G  
>>>> and 512G.  
>>>>>>>>>>>> - * If we need to provide more RAM to VMs in the future then we  
>>>> need to:  
>>>>>>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up  
>>>>>>>>>> I acknowledge this comment was the main justification. Now if you look  
>>>> at  
>>>>>>>>>>
>>>>>>>>>> Principles of ARM Memory Maps
>>>>>>>>>>  
>>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ
>>>> iples_of_arm_memory_maps.pdf  
>>>>>>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
>>>>>>>>>> space for reserved space and mapped IO.  
>>>>>>>>>
>>>>>>>>> Thanks for the pointer!
>>>>>>>>>
>>>>>>>>> So ... we can fit
>>>>>>>>>
>>>>>>>>> a) 2GB at 2GB
>>>>>>>>> b) 32GB at 32GB
>>>>>>>>> c) 512GB at 512GB
>>>>>>>>> d) 8TB at 8TB
>>>>>>>>> e) 128TB at 128TB
>>>>>>>>>
>>>>>>>>> (this is a nice rule of thumb if I understand it correctly :) )
>>>>>>>>>
>>>>>>>>> We should strive for device memory (maxram_size - ram_size) to fit
>>>>>>>>> exactly into one of these slots (otherwise things get nasty).
>>>>>>>>>
>>>>>>>>> Depending on the ram_size, we might have simpler setups and can  
>>>> support  
>>>>>>>>> more configurations, no?
>>>>>>>>>
>>>>>>>>> E.g. ram_size <= 34GB, device_memory <= 512GB  
>>>>>>>>> -> move ram into a) and b)
>>>>>>>>> -> move device memory into c)  
>>>>>>>>
>>>>>>>> The issue is machvirt doesn't comply with that document.
>>>>>>>> At the moment we have
>>>>>>>> 0 -> 1GB MMIO
>>>>>>>> 1GB -> 256GB RAM
>>>>>>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
>>>>>>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
>>>>>>>> existing 40b GPA space.
>>>>>>>>
>>>>>>>> We don't want to change this address map due to legacy reasons.
>>>>>>>>  
>>>>>>>
>>>>>>> Thanks, good to know!
>>>>>>>  
>>>>>>>> Another question! do you know if it would be possible to have
>>>>>>>> device_memory region split into several discontinuous segments?  
>>>>>>>
>>>>>>> It can be implemented for sure, but I would try to avoid that, as it
>>>>>>> makes certain configurations impossible (and very end user unfriendly).
>>>>>>>
>>>>>>> E.g. (numbers completely made up, but it should show what I mean)
>>>>>>>
>>>>>>> -m 20G,maxmem=120G:  
>>>>>>> -> Try to add a DIMM with 100G -> error.
>>>>>>> -> But we can add e.g. two DIMMs with 40G and 60G.  
>>>>>>>
>>>>>>> This exposes internal details to the end user. And the end user has no
>>>>>>> idea what is going on.
>>>>>>>
>>>>>>> So I think we should try our best to keep that area consecutive.  
>>>>>>
>>>>>> Actually I didn't sufficiently detail the context. I would like
>>>>>> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff
>>>>>> (what this series targets) and
>>>>>> 2) another segment used to instantiate PC-DIMM for internal needs as
>>>>>> replacement of part of the 1GB -> 256GB static RAM. This was the purpose
>>>>>> of Shameer's original series  
>>>>>
>>>>> I am not sure if PC-DIMMs are exactly what you want for internal purposes.
>>>>>  
>>>>>>
>>>>>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>>>>> http://patchwork.ozlabs.org/cover/914694/
>>>>>> This approach is not yet validated though.
>>>>>>
>>>>>> The rationale is sometimes you must have "holes" in RAM as some GPAs
>>>>>> match reserved IOVAs for assigned devices.  
>>>>>
>>>>> So if I understand it correctly, all you want is some memory region that
>>>>> a) contains only initially defined memory
>>>>> b) can have some holes in it
>>>>>
>>>>> This is exactly what x86 already does (pc_memory_init): Simply construct
>>>>> your own memory region leaving holes in it.
>>>>>
>>>>>
>>>>> memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>>>>>                          0, pcms->below_4g_mem_size);
>>>>> memory_region_add_subregion(system_memory, 0, ram_below_4g);
>>>>> ...
>>>>> if (pcms->above_4g_mem_size > 0)
>>>>>     memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>>>>>     ...
>>>>>     memory_region_add_subregion(system_memory, 0x100000000ULL,
>>>>>     ...
>>>>>
>>>>> They "indicate" these different GPA areas using the e820 map to the guest.
>>>>>
>>>>> Would that also work for you?  
>>>>
>>>> I would tentatively say yes. Effectively I am not sure that if we were
>>>> to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the
>>>> natural choice. Also the reserved IOVA issue impacts the device_memory
>>>> region area I think. I am skeptical about the fact we can put holes in
>>>> static RAM and device_memory regions like that.
> Could we just use a single device_memory region for both initial+hotpluggable
> RAM if we make base RAM address dynamic?
This assumes FW does support dynamic RAM base. If I understand correctly
this is not the case. Also there is the problematic of migration. How
would you migrate between guests whose RAM is not laid out at the same
place? I understood hotplug memory relied on a specific device_memory
region. So do you mean we would have 2 contiguous regions?
> In this case RAM could start wherever there is a free space for maxmem
> (if there is free space in lowmem, put device_memory there, otherwise put
> it somewhere in high mem) and we won't care if there is IOVA or not.
> 
> *I don't have a clue about iommus, so here goes a stupid question*
> I agree with Peter that whole IOVA thing looks broken, when host
> layout dictates the guest's one.
> Shouldn't be there somewhere an iommu that would remap host map into
> guest specific one? (so guest would model board we need and be migrate-able
> instead of mimicking host hw)
The issue is related to IOVAs programmed by the guest in the host
assigned devices. DMA requests issued by the assigned devices using
those IOVAs are supposed to reach the guest RAM. But due to the host
topology they won't (host PCI host bridge windows or MSI reserved
regions). Adding a vIOMMU on guest side effectively allows to use IOVAs
!= GPAs but guest is exposed to that change. Adding this extra
translation stage adds a huge performance penalty. And eventually the
IOVA allocator used in the guest should theoretically be aware of host
reserved regions as well.

> 
> 
>>> The first approach[1] we had to address the holes in memory was using
>>> the memory alias way mentioned above.  And based on Drew's review, the
>>> pc-dimm way of handling was introduced. I think the main argument was that
>>> it will be useful when we eventually support hotplug.  
>>
>> That's my understanding too.
> not only hotplug,
> 
>   a RAM memory region that's split by aliases is difficult to handle
>   as it creates nonlinear GPA<->HVA mapping instead of
>   1:1 mapping of pc-dimm,
>   so if one needs to build HVA<->GPA map for a given MemoryRegion
>   in case of aliases one would have to get list of MemorySections
>   that belong to it and build map from that vs (addr + offset) in
>   case of simple 1:1 mapping.
> 
>   complicated machine specific SRAT/e820 code due to holes
>   /grep 'the memory map is a bit tricky'/
> 
>>  But since that is added
>>> anyway as part of this series, I am not sure we have any other benefit in
>>> modeling it as pc-dimm. May be I am missing something here.  
>>
>> I tentatively agree with you. I was trying to understand if the
>> device_memory region was fitting the original needs too but I think
>> standard alias approach is more adapted to hole creation.
> Aliases are easy way to start with, but as compat knobs grow
> (based on PC experience,  grep 'Calculate ram split')
> It's quite a pain to maintain manual implicit aliases layout
> without breaking it by accident.
> We probably won't be able to get rid of aliases on PC for legacy
> reasons but why introduce the same pain to virt board.
> 
> Well, magical conversion from -m X to 2..y memory regions (aliases or not)
> aren't going to be easy in both cases, especially if one would take into
> account "-numa memdev|mem".
> I'd rather use a single pc-dimm approach for both /initial and hotpluggble RAM/
> and then use device_memory framework to enumerate RAM wherever needed (ACPI/DT)
> in inform way.
We have 2 problematics:
- support more RAM. This can be achieved by adding a single memory
region based on DIMMs
- manage IOVA reserved regions. I don't think we have a consensus on the
solution at the moment. What about migration between 2 guests having a
different memory topogy?

Thanks

Eric
> 
> 
>> Thanks
>>
>> Eric
>>>
>>> Thanks,
>>> Shameer
>>>
>>> [1]. https://lists.gnu.org/archive/html/qemu-arm/2018-04/msg00243.html
>>>
>>>   
>>>> Thanks!
>>>>
>>>> Eric  
>>>>>  
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Eric
>>>>>>  
>>>>>>>  
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Eric  
>>>>>>>
>>>>>>>  
>>>>>
>>>>>  
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-12 14:22                         ` Auger Eric
@ 2018-07-12 14:45                           ` Andrew Jones
  2018-07-12 14:53                             ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: Andrew Jones @ 2018-07-12 14:45 UTC (permalink / raw)
  To: Auger Eric
  Cc: Igor Mammedov, wei, peter.maydell, David Hildenbrand, qemu-devel,
	Shameerali Kolothum Thodi, agraf, qemu-arm, eric.auger.pro,
	dgilbert, david

On Thu, Jul 12, 2018 at 04:22:05PM +0200, Auger Eric wrote:
> Hi Igor,
> 
> On 07/11/2018 03:17 PM, Igor Mammedov wrote:
> > On Thu, 5 Jul 2018 16:27:05 +0200
> > Auger Eric <eric.auger@redhat.com> wrote:
> > 
> >> Hi Shameer,
> >>
> >> On 07/05/2018 03:19 PM, Shameerali Kolothum Thodi wrote:
> >>>   
> >>>> -----Original Message-----
> >>>> From: Auger Eric [mailto:eric.auger@redhat.com]
> >>>> Sent: 05 July 2018 13:18
> >>>> To: David Hildenbrand <david@redhat.com>; eric.auger.pro@gmail.com;
> >>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
> >>>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> >>>> imammedo@redhat.com
> >>>> Cc: wei@redhat.com; drjones@redhat.com; david@gibson.dropbear.id.au;
> >>>> dgilbert@redhat.com; agraf@suse.de
> >>>> Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate
> >>>> device_memory
> >>>>
> >>>> Hi David,
> >>>>
> >>>> On 07/05/2018 02:09 PM, David Hildenbrand wrote:  
> >>>>> On 05.07.2018 14:00, Auger Eric wrote:  
> >>>>>> Hi David,
> >>>>>>
> >>>>>> On 07/05/2018 01:54 PM, David Hildenbrand wrote:  
> >>>>>>> On 05.07.2018 13:42, Auger Eric wrote:  
> >>>>>>>> Hi David,
> >>>>>>>>
> >>>>>>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:  
> >>>>>>>>> On 03.07.2018 21:27, Auger Eric wrote:  
> >>>>>>>>>> Hi David,
> >>>>>>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:  
> >>>>>>>>>>> On 03.07.2018 09:19, Eric Auger wrote:  
> >>>>>>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
> >>>>>>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
> >>>>>>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
> >>>>>>>>>>>> memory region is max 2TB.  
> >>>>>>>>>>>
> >>>>>>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
> >>>>>>>>>>> (and not e.g. at 1TB)?  
> >>>>>>>>>> not a stupid question. See tentative answer below.  
> >>>>>>>>>>>  
> >>>>>>>>>>>>
> >>>>>>>>>>>> This is largely inspired of device memory initialization in
> >>>>>>>>>>>> pc machine code.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>  hw/arm/virt.c         | 104  
> >>>> ++++++++++++++++++++++++++++++++++++--------------  
> >>>>>>>>>>>>  include/hw/arm/arm.h  |   2 +
> >>>>>>>>>>>>  include/hw/arm/virt.h |   1 +
> >>>>>>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >>>>>>>>>>>> index 5a4d0bf..6fefb78 100644
> >>>>>>>>>>>> --- a/hw/arm/virt.c
> >>>>>>>>>>>> +++ b/hw/arm/virt.c
> >>>>>>>>>>>> @@ -59,6 +59,7 @@
> >>>>>>>>>>>>  #include "qapi/visitor.h"
> >>>>>>>>>>>>  #include "standard-headers/linux/input.h"
> >>>>>>>>>>>>  #include "hw/arm/smmuv3.h"
> >>>>>>>>>>>> +#include "hw/acpi/acpi.h"
> >>>>>>>>>>>>
> >>>>>>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
> >>>>>>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc,  
> >>>> \  
> >>>>>>>>>>>> @@ -94,34 +95,25 @@
> >>>>>>>>>>>>
> >>>>>>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
> >>>>>>>>>>>>
> >>>>>>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this  
> >>>> means  
> >>>>>>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
> >>>>>>>>>>>> - * address space unallocated and free for future use between 256G  
> >>>> and 512G.  
> >>>>>>>>>>>> - * If we need to provide more RAM to VMs in the future then we  
> >>>> need to:  
> >>>>>>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up  
> >>>>>>>>>> I acknowledge this comment was the main justification. Now if you look  
> >>>> at  
> >>>>>>>>>>
> >>>>>>>>>> Principles of ARM Memory Maps
> >>>>>>>>>>  
> >>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ
> >>>> iples_of_arm_memory_maps.pdf  
> >>>>>>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
> >>>>>>>>>> space for reserved space and mapped IO.  
> >>>>>>>>>
> >>>>>>>>> Thanks for the pointer!
> >>>>>>>>>
> >>>>>>>>> So ... we can fit
> >>>>>>>>>
> >>>>>>>>> a) 2GB at 2GB
> >>>>>>>>> b) 32GB at 32GB
> >>>>>>>>> c) 512GB at 512GB
> >>>>>>>>> d) 8TB at 8TB
> >>>>>>>>> e) 128TB at 128TB
> >>>>>>>>>
> >>>>>>>>> (this is a nice rule of thumb if I understand it correctly :) )
> >>>>>>>>>
> >>>>>>>>> We should strive for device memory (maxram_size - ram_size) to fit
> >>>>>>>>> exactly into one of these slots (otherwise things get nasty).
> >>>>>>>>>
> >>>>>>>>> Depending on the ram_size, we might have simpler setups and can  
> >>>> support  
> >>>>>>>>> more configurations, no?
> >>>>>>>>>
> >>>>>>>>> E.g. ram_size <= 34GB, device_memory <= 512GB  
> >>>>>>>>> -> move ram into a) and b)
> >>>>>>>>> -> move device memory into c)  
> >>>>>>>>
> >>>>>>>> The issue is machvirt doesn't comply with that document.
> >>>>>>>> At the moment we have
> >>>>>>>> 0 -> 1GB MMIO
> >>>>>>>> 1GB -> 256GB RAM
> >>>>>>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
> >>>>>>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
> >>>>>>>> existing 40b GPA space.
> >>>>>>>>
> >>>>>>>> We don't want to change this address map due to legacy reasons.
> >>>>>>>>  
> >>>>>>>
> >>>>>>> Thanks, good to know!
> >>>>>>>  
> >>>>>>>> Another question! do you know if it would be possible to have
> >>>>>>>> device_memory region split into several discontinuous segments?  
> >>>>>>>
> >>>>>>> It can be implemented for sure, but I would try to avoid that, as it
> >>>>>>> makes certain configurations impossible (and very end user unfriendly).
> >>>>>>>
> >>>>>>> E.g. (numbers completely made up, but it should show what I mean)
> >>>>>>>
> >>>>>>> -m 20G,maxmem=120G:  
> >>>>>>> -> Try to add a DIMM with 100G -> error.
> >>>>>>> -> But we can add e.g. two DIMMs with 40G and 60G.  
> >>>>>>>
> >>>>>>> This exposes internal details to the end user. And the end user has no
> >>>>>>> idea what is going on.
> >>>>>>>
> >>>>>>> So I think we should try our best to keep that area consecutive.  
> >>>>>>
> >>>>>> Actually I didn't sufficiently detail the context. I would like
> >>>>>> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff
> >>>>>> (what this series targets) and
> >>>>>> 2) another segment used to instantiate PC-DIMM for internal needs as
> >>>>>> replacement of part of the 1GB -> 256GB static RAM. This was the purpose
> >>>>>> of Shameer's original series  
> >>>>>
> >>>>> I am not sure if PC-DIMMs are exactly what you want for internal purposes.
> >>>>>  
> >>>>>>
> >>>>>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> >>>>>> http://patchwork.ozlabs.org/cover/914694/
> >>>>>> This approach is not yet validated though.
> >>>>>>
> >>>>>> The rationale is sometimes you must have "holes" in RAM as some GPAs
> >>>>>> match reserved IOVAs for assigned devices.  
> >>>>>
> >>>>> So if I understand it correctly, all you want is some memory region that
> >>>>> a) contains only initially defined memory
> >>>>> b) can have some holes in it
> >>>>>
> >>>>> This is exactly what x86 already does (pc_memory_init): Simply construct
> >>>>> your own memory region leaving holes in it.
> >>>>>
> >>>>>
> >>>>> memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> >>>>>                          0, pcms->below_4g_mem_size);
> >>>>> memory_region_add_subregion(system_memory, 0, ram_below_4g);
> >>>>> ...
> >>>>> if (pcms->above_4g_mem_size > 0)
> >>>>>     memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> >>>>>     ...
> >>>>>     memory_region_add_subregion(system_memory, 0x100000000ULL,
> >>>>>     ...
> >>>>>
> >>>>> They "indicate" these different GPA areas using the e820 map to the guest.
> >>>>>
> >>>>> Would that also work for you?  
> >>>>
> >>>> I would tentatively say yes. Effectively I am not sure that if we were
> >>>> to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the
> >>>> natural choice. Also the reserved IOVA issue impacts the device_memory
> >>>> region area I think. I am skeptical about the fact we can put holes in
> >>>> static RAM and device_memory regions like that.
> > Could we just use a single device_memory region for both initial+hotpluggable
> > RAM if we make base RAM address dynamic?
> This assumes FW does support dynamic RAM base. If I understand correctly
> this is not the case. 

It's not currently the case, but I've adding prototyping this near the top
of my TODO. So stay tuned.

> Also there is the problematic of migration. How
> would you migrate between guests whose RAM is not laid out at the same
> place?

I'm not sure what you mean here. Boot a guest with a new memory map,
probably by explicitly asking for it with a new machine property,
which means a new virt machine version. Then migrate at will to any
host that supports that machine type.

> I understood hotplug memory relied on a specific device_memory
> region. So do you mean we would have 2 contiguous regions?

I think Igor wants one contiguous region for RAM, where additional
space can be reserved for hotplugging.

> > In this case RAM could start wherever there is a free space for maxmem
> > (if there is free space in lowmem, put device_memory there, otherwise put
> > it somewhere in high mem) and we won't care if there is IOVA or not.
> > 
> > *I don't have a clue about iommus, so here goes a stupid question*
> > I agree with Peter that whole IOVA thing looks broken, when host
> > layout dictates the guest's one.
> > Shouldn't be there somewhere an iommu that would remap host map into
> > guest specific one? (so guest would model board we need and be migrate-able
> > instead of mimicking host hw)
> The issue is related to IOVAs programmed by the guest in the host
> assigned devices. DMA requests issued by the assigned devices using
> those IOVAs are supposed to reach the guest RAM. But due to the host
> topology they won't (host PCI host bridge windows or MSI reserved
> regions). Adding a vIOMMU on guest side effectively allows to use IOVAs
> != GPAs but guest is exposed to that change. Adding this extra
> translation stage adds a huge performance penalty. And eventually the
> IOVA allocator used in the guest should theoretically be aware of host
> reserved regions as well.
> 
> > 
> > 
> >>> The first approach[1] we had to address the holes in memory was using
> >>> the memory alias way mentioned above.  And based on Drew's review, the
> >>> pc-dimm way of handling was introduced. I think the main argument was that
> >>> it will be useful when we eventually support hotplug.  
> >>
> >> That's my understanding too.
> > not only hotplug,
> > 
> >   a RAM memory region that's split by aliases is difficult to handle
> >   as it creates nonlinear GPA<->HVA mapping instead of
> >   1:1 mapping of pc-dimm,
> >   so if one needs to build HVA<->GPA map for a given MemoryRegion
> >   in case of aliases one would have to get list of MemorySections
> >   that belong to it and build map from that vs (addr + offset) in
> >   case of simple 1:1 mapping.
> > 
> >   complicated machine specific SRAT/e820 code due to holes
> >   /grep 'the memory map is a bit tricky'/
> > 
> >>  But since that is added
> >>> anyway as part of this series, I am not sure we have any other benefit in
> >>> modeling it as pc-dimm. May be I am missing something here.  
> >>
> >> I tentatively agree with you. I was trying to understand if the
> >> device_memory region was fitting the original needs too but I think
> >> standard alias approach is more adapted to hole creation.
> > Aliases are easy way to start with, but as compat knobs grow
> > (based on PC experience,  grep 'Calculate ram split')
> > It's quite a pain to maintain manual implicit aliases layout
> > without breaking it by accident.
> > We probably won't be able to get rid of aliases on PC for legacy
> > reasons but why introduce the same pain to virt board.
> > 
> > Well, magical conversion from -m X to 2..y memory regions (aliases or not)
> > aren't going to be easy in both cases, especially if one would take into
> > account "-numa memdev|mem".
> > I'd rather use a single pc-dimm approach for both /initial and hotpluggble RAM/
> > and then use device_memory framework to enumerate RAM wherever needed (ACPI/DT)
> > in inform way.
> We have 2 problematics:
> - support more RAM. This can be achieved by adding a single memory
> region based on DIMMs
> - manage IOVA reserved regions. I don't think we have a consensus on the
> solution at the moment. What about migration between 2 guests having a
> different memory topogy?

With a dynamic RAM base (pretty easy to do in QEMU, but requires FW change
- now on my TODO), then one only needs to pick a contiguous region within
the guest physical address limits that has the requested size and does not
overlap any host reserved regions (I think). I'm still not sure what the
migration concern is.

Thanks,
drew

> 
> Thanks
> 
> Eric
> > 
> > 
> >> Thanks
> >>
> >> Eric
> >>>
> >>> Thanks,
> >>> Shameer
> >>>
> >>> [1]. https://lists.gnu.org/archive/html/qemu-arm/2018-04/msg00243.html
> >>>
> >>>   
> >>>> Thanks!
> >>>>
> >>>> Eric  
> >>>>>  
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> Eric
> >>>>>>  
> >>>>>>>  
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> Eric  
> >>>>>>>
> >>>>>>>  
> >>>>>
> >>>>>  
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-12 14:45                           ` Andrew Jones
@ 2018-07-12 14:53                             ` Auger Eric
  2018-07-12 15:15                               ` Andrew Jones
  2018-07-18 13:00                               ` Igor Mammedov
  0 siblings, 2 replies; 62+ messages in thread
From: Auger Eric @ 2018-07-12 14:53 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Igor Mammedov, wei, peter.maydell, David Hildenbrand, qemu-devel,
	Shameerali Kolothum Thodi, agraf, qemu-arm, eric.auger.pro,
	dgilbert, david

Hi Drew,

On 07/12/2018 04:45 PM, Andrew Jones wrote:
> On Thu, Jul 12, 2018 at 04:22:05PM +0200, Auger Eric wrote:
>> Hi Igor,
>>
>> On 07/11/2018 03:17 PM, Igor Mammedov wrote:
>>> On Thu, 5 Jul 2018 16:27:05 +0200
>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>
>>>> Hi Shameer,
>>>>
>>>> On 07/05/2018 03:19 PM, Shameerali Kolothum Thodi wrote:
>>>>>   
>>>>>> -----Original Message-----
>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com]
>>>>>> Sent: 05 July 2018 13:18
>>>>>> To: David Hildenbrand <david@redhat.com>; eric.auger.pro@gmail.com;
>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
>>>>>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
>>>>>> imammedo@redhat.com
>>>>>> Cc: wei@redhat.com; drjones@redhat.com; david@gibson.dropbear.id.au;
>>>>>> dgilbert@redhat.com; agraf@suse.de
>>>>>> Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate
>>>>>> device_memory
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> On 07/05/2018 02:09 PM, David Hildenbrand wrote:  
>>>>>>> On 05.07.2018 14:00, Auger Eric wrote:  
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> On 07/05/2018 01:54 PM, David Hildenbrand wrote:  
>>>>>>>>> On 05.07.2018 13:42, Auger Eric wrote:  
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:  
>>>>>>>>>>> On 03.07.2018 21:27, Auger Eric wrote:  
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:  
>>>>>>>>>>>>> On 03.07.2018 09:19, Eric Auger wrote:  
>>>>>>>>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
>>>>>>>>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
>>>>>>>>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
>>>>>>>>>>>>>> memory region is max 2TB.  
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
>>>>>>>>>>>>> (and not e.g. at 1TB)?  
>>>>>>>>>>>> not a stupid question. See tentative answer below.  
>>>>>>>>>>>>>  
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is largely inspired of device memory initialization in
>>>>>>>>>>>>>> pc machine code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>  hw/arm/virt.c         | 104  
>>>>>> ++++++++++++++++++++++++++++++++++++--------------  
>>>>>>>>>>>>>>  include/hw/arm/arm.h  |   2 +
>>>>>>>>>>>>>>  include/hw/arm/virt.h |   1 +
>>>>>>>>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>>>>>>>>>>> index 5a4d0bf..6fefb78 100644
>>>>>>>>>>>>>> --- a/hw/arm/virt.c
>>>>>>>>>>>>>> +++ b/hw/arm/virt.c
>>>>>>>>>>>>>> @@ -59,6 +59,7 @@
>>>>>>>>>>>>>>  #include "qapi/visitor.h"
>>>>>>>>>>>>>>  #include "standard-headers/linux/input.h"
>>>>>>>>>>>>>>  #include "hw/arm/smmuv3.h"
>>>>>>>>>>>>>> +#include "hw/acpi/acpi.h"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>>>>>>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc,  
>>>>>> \  
>>>>>>>>>>>>>> @@ -94,34 +95,25 @@
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this  
>>>>>> means  
>>>>>>>>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>>>>>>>>>>>>>> - * address space unallocated and free for future use between 256G  
>>>>>> and 512G.  
>>>>>>>>>>>>>> - * If we need to provide more RAM to VMs in the future then we  
>>>>>> need to:  
>>>>>>>>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up  
>>>>>>>>>>>> I acknowledge this comment was the main justification. Now if you look  
>>>>>> at  
>>>>>>>>>>>>
>>>>>>>>>>>> Principles of ARM Memory Maps
>>>>>>>>>>>>  
>>>>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ
>>>>>> iples_of_arm_memory_maps.pdf  
>>>>>>>>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
>>>>>>>>>>>> space for reserved space and mapped IO.  
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the pointer!
>>>>>>>>>>>
>>>>>>>>>>> So ... we can fit
>>>>>>>>>>>
>>>>>>>>>>> a) 2GB at 2GB
>>>>>>>>>>> b) 32GB at 32GB
>>>>>>>>>>> c) 512GB at 512GB
>>>>>>>>>>> d) 8TB at 8TB
>>>>>>>>>>> e) 128TB at 128TB
>>>>>>>>>>>
>>>>>>>>>>> (this is a nice rule of thumb if I understand it correctly :) )
>>>>>>>>>>>
>>>>>>>>>>> We should strive for device memory (maxram_size - ram_size) to fit
>>>>>>>>>>> exactly into one of these slots (otherwise things get nasty).
>>>>>>>>>>>
>>>>>>>>>>> Depending on the ram_size, we might have simpler setups and can  
>>>>>> support  
>>>>>>>>>>> more configurations, no?
>>>>>>>>>>>
>>>>>>>>>>> E.g. ram_size <= 34GB, device_memory <= 512GB  
>>>>>>>>>>> -> move ram into a) and b)
>>>>>>>>>>> -> move device memory into c)  
>>>>>>>>>>
>>>>>>>>>> The issue is machvirt doesn't comply with that document.
>>>>>>>>>> At the moment we have
>>>>>>>>>> 0 -> 1GB MMIO
>>>>>>>>>> 1GB -> 256GB RAM
>>>>>>>>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
>>>>>>>>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
>>>>>>>>>> existing 40b GPA space.
>>>>>>>>>>
>>>>>>>>>> We don't want to change this address map due to legacy reasons.
>>>>>>>>>>  
>>>>>>>>>
>>>>>>>>> Thanks, good to know!
>>>>>>>>>  
>>>>>>>>>> Another question! do you know if it would be possible to have
>>>>>>>>>> device_memory region split into several discontinuous segments?  
>>>>>>>>>
>>>>>>>>> It can be implemented for sure, but I would try to avoid that, as it
>>>>>>>>> makes certain configurations impossible (and very end user unfriendly).
>>>>>>>>>
>>>>>>>>> E.g. (numbers completely made up, but it should show what I mean)
>>>>>>>>>
>>>>>>>>> -m 20G,maxmem=120G:  
>>>>>>>>> -> Try to add a DIMM with 100G -> error.
>>>>>>>>> -> But we can add e.g. two DIMMs with 40G and 60G.  
>>>>>>>>>
>>>>>>>>> This exposes internal details to the end user. And the end user has no
>>>>>>>>> idea what is going on.
>>>>>>>>>
>>>>>>>>> So I think we should try our best to keep that area consecutive.  
>>>>>>>>
>>>>>>>> Actually I didn't sufficiently detail the context. I would like
>>>>>>>> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff
>>>>>>>> (what this series targets) and
>>>>>>>> 2) another segment used to instantiate PC-DIMM for internal needs as
>>>>>>>> replacement of part of the 1GB -> 256GB static RAM. This was the purpose
>>>>>>>> of Shameer's original series  
>>>>>>>
>>>>>>> I am not sure if PC-DIMMs are exactly what you want for internal purposes.
>>>>>>>  
>>>>>>>>
>>>>>>>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>>>>>>> http://patchwork.ozlabs.org/cover/914694/
>>>>>>>> This approach is not yet validated though.
>>>>>>>>
>>>>>>>> The rationale is sometimes you must have "holes" in RAM as some GPAs
>>>>>>>> match reserved IOVAs for assigned devices.  
>>>>>>>
>>>>>>> So if I understand it correctly, all you want is some memory region that
>>>>>>> a) contains only initially defined memory
>>>>>>> b) can have some holes in it
>>>>>>>
>>>>>>> This is exactly what x86 already does (pc_memory_init): Simply construct
>>>>>>> your own memory region leaving holes in it.
>>>>>>>
>>>>>>>
>>>>>>> memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>>>>>>>                          0, pcms->below_4g_mem_size);
>>>>>>> memory_region_add_subregion(system_memory, 0, ram_below_4g);
>>>>>>> ...
>>>>>>> if (pcms->above_4g_mem_size > 0)
>>>>>>>     memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>>>>>>>     ...
>>>>>>>     memory_region_add_subregion(system_memory, 0x100000000ULL,
>>>>>>>     ...
>>>>>>>
>>>>>>> They "indicate" these different GPA areas using the e820 map to the guest.
>>>>>>>
>>>>>>> Would that also work for you?  
>>>>>>
>>>>>> I would tentatively say yes. Effectively I am not sure that if we were
>>>>>> to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the
>>>>>> natural choice. Also the reserved IOVA issue impacts the device_memory
>>>>>> region area I think. I am skeptical about the fact we can put holes in
>>>>>> static RAM and device_memory regions like that.
>>> Could we just use a single device_memory region for both initial+hotpluggable
>>> RAM if we make base RAM address dynamic?
>> This assumes FW does support dynamic RAM base. If I understand correctly
>> this is not the case. 
> 
> It's not currently the case, but I've adding prototyping this near the top
> of my TODO. So stay tuned.
ok
> 
>> Also there is the problematic of migration. How
>> would you migrate between guests whose RAM is not laid out at the same
>> place?
> 
> I'm not sure what you mean here. Boot a guest with a new memory map,
> probably by explicitly asking for it with a new machine property,
> which means a new virt machine version. Then migrate at will to any
> host that supports that machine type.
My concern rather was about holes in the memory map matching reserved
regions.
> 
>> I understood hotplug memory relied on a specific device_memory
>> region. So do you mean we would have 2 contiguous regions?
> 
> I think Igor wants one contiguous region for RAM, where additional
> space can be reserved for hotplugging.
This is not compliant with 2012 ARM white paper, although I don't really
know if this document truly is a reference (did not get any reply).

Thanks

Eric
> 
>>> In this case RAM could start wherever there is a free space for maxmem
>>> (if there is free space in lowmem, put device_memory there, otherwise put
>>> it somewhere in high mem) and we won't care if there is IOVA or not.
>>>
>>> *I don't have a clue about iommus, so here goes a stupid question*
>>> I agree with Peter that whole IOVA thing looks broken, when host
>>> layout dictates the guest's one.
>>> Shouldn't be there somewhere an iommu that would remap host map into
>>> guest specific one? (so guest would model board we need and be migrate-able
>>> instead of mimicking host hw)
>> The issue is related to IOVAs programmed by the guest in the host
>> assigned devices. DMA requests issued by the assigned devices using
>> those IOVAs are supposed to reach the guest RAM. But due to the host
>> topology they won't (host PCI host bridge windows or MSI reserved
>> regions). Adding a vIOMMU on guest side effectively allows to use IOVAs
>> != GPAs but guest is exposed to that change. Adding this extra
>> translation stage adds a huge performance penalty. And eventually the
>> IOVA allocator used in the guest should theoretically be aware of host
>> reserved regions as well.
>>
>>>
>>>
>>>>> The first approach[1] we had to address the holes in memory was using
>>>>> the memory alias way mentioned above.  And based on Drew's review, the
>>>>> pc-dimm way of handling was introduced. I think the main argument was that
>>>>> it will be useful when we eventually support hotplug.  
>>>>
>>>> That's my understanding too.
>>> not only hotplug,
>>>
>>>   a RAM memory region that's split by aliases is difficult to handle
>>>   as it creates nonlinear GPA<->HVA mapping instead of
>>>   1:1 mapping of pc-dimm,
>>>   so if one needs to build HVA<->GPA map for a given MemoryRegion
>>>   in case of aliases one would have to get list of MemorySections
>>>   that belong to it and build map from that vs (addr + offset) in
>>>   case of simple 1:1 mapping.
>>>
>>>   complicated machine specific SRAT/e820 code due to holes
>>>   /grep 'the memory map is a bit tricky'/
>>>
>>>>  But since that is added
>>>>> anyway as part of this series, I am not sure we have any other benefit in
>>>>> modeling it as pc-dimm. May be I am missing something here.  
>>>>
>>>> I tentatively agree with you. I was trying to understand if the
>>>> device_memory region was fitting the original needs too but I think
>>>> standard alias approach is more adapted to hole creation.
>>> Aliases are easy way to start with, but as compat knobs grow
>>> (based on PC experience,  grep 'Calculate ram split')
>>> It's quite a pain to maintain manual implicit aliases layout
>>> without breaking it by accident.
>>> We probably won't be able to get rid of aliases on PC for legacy
>>> reasons but why introduce the same pain to virt board.
>>>
>>> Well, magical conversion from -m X to 2..y memory regions (aliases or not)
>>> aren't going to be easy in both cases, especially if one would take into
>>> account "-numa memdev|mem".
>>> I'd rather use a single pc-dimm approach for both /initial and hotpluggble RAM/
>>> and then use device_memory framework to enumerate RAM wherever needed (ACPI/DT)
>>> in inform way.
>> We have 2 problematics:
>> - support more RAM. This can be achieved by adding a single memory
>> region based on DIMMs
>> - manage IOVA reserved regions. I don't think we have a consensus on the
>> solution at the moment. What about migration between 2 guests having a
>> different memory topogy?
> 
> With a dynamic RAM base (pretty easy to do in QEMU, but requires FW change
> - now on my TODO), then one only needs to pick a contiguous region within
> the guest physical address limits that has the requested size and does not
> overlap any host reserved regions (I think). I'm still not sure what the
> migration concern is.
> 
> Thanks,
> drew
> 
>>
>> Thanks
>>
>> Eric
>>>
>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>>
>>>>> Thanks,
>>>>> Shameer
>>>>>
>>>>> [1]. https://lists.gnu.org/archive/html/qemu-arm/2018-04/msg00243.html
>>>>>
>>>>>   
>>>>>> Thanks!
>>>>>>
>>>>>> Eric  
>>>>>>>  
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Eric
>>>>>>>>  
>>>>>>>>>  
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Eric  
>>>>>>>>>
>>>>>>>>>  
>>>>>>>
>>>>>>>  
>>>
>>>
>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-12 14:53                             ` Auger Eric
@ 2018-07-12 15:15                               ` Andrew Jones
  2018-07-18 13:00                               ` Igor Mammedov
  1 sibling, 0 replies; 62+ messages in thread
From: Andrew Jones @ 2018-07-12 15:15 UTC (permalink / raw)
  To: Auger Eric
  Cc: wei, peter.maydell, David Hildenbrand, qemu-devel,
	Shameerali Kolothum Thodi, agraf, qemu-arm, Igor Mammedov, david,
	dgilbert, eric.auger.pro

On Thu, Jul 12, 2018 at 04:53:01PM +0200, Auger Eric wrote:
> Hi Drew,
> 
> On 07/12/2018 04:45 PM, Andrew Jones wrote:
> > On Thu, Jul 12, 2018 at 04:22:05PM +0200, Auger Eric wrote:
> >> Hi Igor,
> >>
> >> On 07/11/2018 03:17 PM, Igor Mammedov wrote:
> >>> On Thu, 5 Jul 2018 16:27:05 +0200
> >>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>
> >>>> Hi Shameer,
> >>>>
> >>>> On 07/05/2018 03:19 PM, Shameerali Kolothum Thodi wrote:
> >>>>>   
> >>>>>> -----Original Message-----
> >>>>>> From: Auger Eric [mailto:eric.auger@redhat.com]
> >>>>>> Sent: 05 July 2018 13:18
> >>>>>> To: David Hildenbrand <david@redhat.com>; eric.auger.pro@gmail.com;
> >>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
> >>>>>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> >>>>>> imammedo@redhat.com
> >>>>>> Cc: wei@redhat.com; drjones@redhat.com; david@gibson.dropbear.id.au;
> >>>>>> dgilbert@redhat.com; agraf@suse.de
> >>>>>> Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate
> >>>>>> device_memory
> >>>>>>
> >>>>>> Hi David,
> >>>>>>
> >>>>>> On 07/05/2018 02:09 PM, David Hildenbrand wrote:  
> >>>>>>> On 05.07.2018 14:00, Auger Eric wrote:  
> >>>>>>>> Hi David,
> >>>>>>>>
> >>>>>>>> On 07/05/2018 01:54 PM, David Hildenbrand wrote:  
> >>>>>>>>> On 05.07.2018 13:42, Auger Eric wrote:  
> >>>>>>>>>> Hi David,
> >>>>>>>>>>
> >>>>>>>>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:  
> >>>>>>>>>>> On 03.07.2018 21:27, Auger Eric wrote:  
> >>>>>>>>>>>> Hi David,
> >>>>>>>>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:  
> >>>>>>>>>>>>> On 03.07.2018 09:19, Eric Auger wrote:  
> >>>>>>>>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
> >>>>>>>>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
> >>>>>>>>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
> >>>>>>>>>>>>>> memory region is max 2TB.  
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
> >>>>>>>>>>>>> (and not e.g. at 1TB)?  
> >>>>>>>>>>>> not a stupid question. See tentative answer below.  
> >>>>>>>>>>>>>  
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is largely inspired of device memory initialization in
> >>>>>>>>>>>>>> pc machine code.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>>>>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>  hw/arm/virt.c         | 104  
> >>>>>> ++++++++++++++++++++++++++++++++++++--------------  
> >>>>>>>>>>>>>>  include/hw/arm/arm.h  |   2 +
> >>>>>>>>>>>>>>  include/hw/arm/virt.h |   1 +
> >>>>>>>>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >>>>>>>>>>>>>> index 5a4d0bf..6fefb78 100644
> >>>>>>>>>>>>>> --- a/hw/arm/virt.c
> >>>>>>>>>>>>>> +++ b/hw/arm/virt.c
> >>>>>>>>>>>>>> @@ -59,6 +59,7 @@
> >>>>>>>>>>>>>>  #include "qapi/visitor.h"
> >>>>>>>>>>>>>>  #include "standard-headers/linux/input.h"
> >>>>>>>>>>>>>>  #include "hw/arm/smmuv3.h"
> >>>>>>>>>>>>>> +#include "hw/acpi/acpi.h"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
> >>>>>>>>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc,  
> >>>>>> \  
> >>>>>>>>>>>>>> @@ -94,34 +95,25 @@
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this  
> >>>>>> means  
> >>>>>>>>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
> >>>>>>>>>>>>>> - * address space unallocated and free for future use between 256G  
> >>>>>> and 512G.  
> >>>>>>>>>>>>>> - * If we need to provide more RAM to VMs in the future then we  
> >>>>>> need to:  
> >>>>>>>>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up  
> >>>>>>>>>>>> I acknowledge this comment was the main justification. Now if you look  
> >>>>>> at  
> >>>>>>>>>>>>
> >>>>>>>>>>>> Principles of ARM Memory Maps
> >>>>>>>>>>>>  
> >>>>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ
> >>>>>> iples_of_arm_memory_maps.pdf  
> >>>>>>>>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
> >>>>>>>>>>>> space for reserved space and mapped IO.  
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for the pointer!
> >>>>>>>>>>>
> >>>>>>>>>>> So ... we can fit
> >>>>>>>>>>>
> >>>>>>>>>>> a) 2GB at 2GB
> >>>>>>>>>>> b) 32GB at 32GB
> >>>>>>>>>>> c) 512GB at 512GB
> >>>>>>>>>>> d) 8TB at 8TB
> >>>>>>>>>>> e) 128TB at 128TB
> >>>>>>>>>>>
> >>>>>>>>>>> (this is a nice rule of thumb if I understand it correctly :) )
> >>>>>>>>>>>
> >>>>>>>>>>> We should strive for device memory (maxram_size - ram_size) to fit
> >>>>>>>>>>> exactly into one of these slots (otherwise things get nasty).
> >>>>>>>>>>>
> >>>>>>>>>>> Depending on the ram_size, we might have simpler setups and can  
> >>>>>> support  
> >>>>>>>>>>> more configurations, no?
> >>>>>>>>>>>
> >>>>>>>>>>> E.g. ram_size <= 34GB, device_memory <= 512GB  
> >>>>>>>>>>> -> move ram into a) and b)
> >>>>>>>>>>> -> move device memory into c)  
> >>>>>>>>>>
> >>>>>>>>>> The issue is machvirt doesn't comply with that document.
> >>>>>>>>>> At the moment we have
> >>>>>>>>>> 0 -> 1GB MMIO
> >>>>>>>>>> 1GB -> 256GB RAM
> >>>>>>>>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
> >>>>>>>>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
> >>>>>>>>>> existing 40b GPA space.
> >>>>>>>>>>
> >>>>>>>>>> We don't want to change this address map due to legacy reasons.
> >>>>>>>>>>  
> >>>>>>>>>
> >>>>>>>>> Thanks, good to know!
> >>>>>>>>>  
> >>>>>>>>>> Another question! do you know if it would be possible to have
> >>>>>>>>>> device_memory region split into several discontinuous segments?  
> >>>>>>>>>
> >>>>>>>>> It can be implemented for sure, but I would try to avoid that, as it
> >>>>>>>>> makes certain configurations impossible (and very end user unfriendly).
> >>>>>>>>>
> >>>>>>>>> E.g. (numbers completely made up, but it should show what I mean)
> >>>>>>>>>
> >>>>>>>>> -m 20G,maxmem=120G:  
> >>>>>>>>> -> Try to add a DIMM with 100G -> error.
> >>>>>>>>> -> But we can add e.g. two DIMMs with 40G and 60G.  
> >>>>>>>>>
> >>>>>>>>> This exposes internal details to the end user. And the end user has no
> >>>>>>>>> idea what is going on.
> >>>>>>>>>
> >>>>>>>>> So I think we should try our best to keep that area consecutive.  
> >>>>>>>>
> >>>>>>>> Actually I didn't sufficiently detail the context. I would like
> >>>>>>>> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff
> >>>>>>>> (what this series targets) and
> >>>>>>>> 2) another segment used to instantiate PC-DIMM for internal needs as
> >>>>>>>> replacement of part of the 1GB -> 256GB static RAM. This was the purpose
> >>>>>>>> of Shameer's original series  
> >>>>>>>
> >>>>>>> I am not sure if PC-DIMMs are exactly what you want for internal purposes.
> >>>>>>>  
> >>>>>>>>
> >>>>>>>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> >>>>>>>> http://patchwork.ozlabs.org/cover/914694/
> >>>>>>>> This approach is not yet validated though.
> >>>>>>>>
> >>>>>>>> The rationale is sometimes you must have "holes" in RAM as some GPAs
> >>>>>>>> match reserved IOVAs for assigned devices.  
> >>>>>>>
> >>>>>>> So if I understand it correctly, all you want is some memory region that
> >>>>>>> a) contains only initially defined memory
> >>>>>>> b) can have some holes in it
> >>>>>>>
> >>>>>>> This is exactly what x86 already does (pc_memory_init): Simply construct
> >>>>>>> your own memory region leaving holes in it.
> >>>>>>>
> >>>>>>>
> >>>>>>> memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> >>>>>>>                          0, pcms->below_4g_mem_size);
> >>>>>>> memory_region_add_subregion(system_memory, 0, ram_below_4g);
> >>>>>>> ...
> >>>>>>> if (pcms->above_4g_mem_size > 0)
> >>>>>>>     memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> >>>>>>>     ...
> >>>>>>>     memory_region_add_subregion(system_memory, 0x100000000ULL,
> >>>>>>>     ...
> >>>>>>>
> >>>>>>> They "indicate" these different GPA areas using the e820 map to the guest.
> >>>>>>>
> >>>>>>> Would that also work for you?  
> >>>>>>
> >>>>>> I would tentatively say yes. Effectively I am not sure that if we were
> >>>>>> to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the
> >>>>>> natural choice. Also the reserved IOVA issue impacts the device_memory
> >>>>>> region area I think. I am skeptical about the fact we can put holes in
> >>>>>> static RAM and device_memory regions like that.
> >>> Could we just use a single device_memory region for both initial+hotpluggable
> >>> RAM if we make base RAM address dynamic?
> >> This assumes FW does support dynamic RAM base. If I understand correctly
> >> this is not the case. 
> > 
> > It's not currently the case, but I've adding prototyping this near the top
> > of my TODO. So stay tuned.
> ok
> > 
> >> Also there is the problematic of migration. How
> >> would you migrate between guests whose RAM is not laid out at the same
> >> place?
> > 
> > I'm not sure what you mean here. Boot a guest with a new memory map,
> > probably by explicitly asking for it with a new machine property,
> > which means a new virt machine version. Then migrate at will to any
> > host that supports that machine type.
> My concern rather was about holes in the memory map matching reserved
> regions.

Oh, I see. I don't think the reserved-host-memory-regions-messing-up-guest
problem can ever be solved for the migration case of the destination host
not having strictly an equal set or subset of the source host's reserved
regions. So there's nothing we can do but force upper management layers to
maintain migration candidate lists in these environments. A pre-check
could also be added, allowing migration to error-out early when an overlap
is detected.

> > 
> >> I understood hotplug memory relied on a specific device_memory
> >> region. So do you mean we would have 2 contiguous regions?
> > 
> > I think Igor wants one contiguous region for RAM, where additional
> > space can be reserved for hotplugging.
> This is not compliant with 2012 ARM white paper, although I don't really
> know if this document truly is a reference (did not get any reply).
> 
> Thanks
> 
> Eric
> > 
> >>> In this case RAM could start wherever there is a free space for maxmem
> >>> (if there is free space in lowmem, put device_memory there, otherwise put
> >>> it somewhere in high mem) and we won't care if there is IOVA or not.
> >>>
> >>> *I don't have a clue about iommus, so here goes a stupid question*
> >>> I agree with Peter that whole IOVA thing looks broken, when host
> >>> layout dictates the guest's one.
> >>> Shouldn't be there somewhere an iommu that would remap host map into
> >>> guest specific one? (so guest would model board we need and be migrate-able
> >>> instead of mimicking host hw)
> >> The issue is related to IOVAs programmed by the guest in the host
> >> assigned devices. DMA requests issued by the assigned devices using
> >> those IOVAs are supposed to reach the guest RAM. But due to the host
> >> topology they won't (host PCI host bridge windows or MSI reserved
> >> regions). Adding a vIOMMU on guest side effectively allows to use IOVAs
> >> != GPAs but guest is exposed to that change. Adding this extra
> >> translation stage adds a huge performance penalty. And eventually the
> >> IOVA allocator used in the guest should theoretically be aware of host
> >> reserved regions as well.
> >>
> >>>
> >>>
> >>>>> The first approach[1] we had to address the holes in memory was using
> >>>>> the memory alias way mentioned above.  And based on Drew's review, the
> >>>>> pc-dimm way of handling was introduced. I think the main argument was that
> >>>>> it will be useful when we eventually support hotplug.  
> >>>>
> >>>> That's my understanding too.
> >>> not only hotplug,
> >>>
> >>>   a RAM memory region that's split by aliases is difficult to handle
> >>>   as it creates nonlinear GPA<->HVA mapping instead of
> >>>   1:1 mapping of pc-dimm,
> >>>   so if one needs to build HVA<->GPA map for a given MemoryRegion
> >>>   in case of aliases one would have to get list of MemorySections
> >>>   that belong to it and build map from that vs (addr + offset) in
> >>>   case of simple 1:1 mapping.
> >>>
> >>>   complicated machine specific SRAT/e820 code due to holes
> >>>   /grep 'the memory map is a bit tricky'/
> >>>
> >>>>  But since that is added
> >>>>> anyway as part of this series, I am not sure we have any other benefit in
> >>>>> modeling it as pc-dimm. May be I am missing something here.  
> >>>>
> >>>> I tentatively agree with you. I was trying to understand if the
> >>>> device_memory region was fitting the original needs too but I think
> >>>> standard alias approach is more adapted to hole creation.
> >>> Aliases are easy way to start with, but as compat knobs grow
> >>> (based on PC experience,  grep 'Calculate ram split')
> >>> It's quite a pain to maintain manual implicit aliases layout
> >>> without breaking it by accident.
> >>> We probably won't be able to get rid of aliases on PC for legacy
> >>> reasons but why introduce the same pain to virt board.
> >>>
> >>> Well, magical conversion from -m X to 2..y memory regions (aliases or not)
> >>> aren't going to be easy in both cases, especially if one would take into
> >>> account "-numa memdev|mem".
> >>> I'd rather use a single pc-dimm approach for both /initial and hotpluggble RAM/
> >>> and then use device_memory framework to enumerate RAM wherever needed (ACPI/DT)
> >>> in inform way.
> >> We have 2 problematics:
> >> - support more RAM. This can be achieved by adding a single memory
> >> region based on DIMMs
> >> - manage IOVA reserved regions. I don't think we have a consensus on the
> >> solution at the moment. What about migration between 2 guests having a
> >> different memory topogy?
> > 
> > With a dynamic RAM base (pretty easy to do in QEMU, but requires FW change
> > - now on my TODO), then one only needs to pick a contiguous region within
> > the guest physical address limits that has the requested size and does not
> > overlap any host reserved regions (I think). I'm still not sure what the
> > migration concern is.
> > 
> > Thanks,
> > drew
> > 
> >>
> >> Thanks
> >>
> >> Eric
> >>>
> >>>
> >>>> Thanks
> >>>>
> >>>> Eric
> >>>>>
> >>>>> Thanks,
> >>>>> Shameer
> >>>>>
> >>>>> [1]. https://lists.gnu.org/archive/html/qemu-arm/2018-04/msg00243.html
> >>>>>
> >>>>>   
> >>>>>> Thanks!
> >>>>>>
> >>>>>> Eric  
> >>>>>>>  
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> Eric
> >>>>>>>>  
> >>>>>>>>>  
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>>
> >>>>>>>>>> Eric  
> >>>>>>>>>
> >>>>>>>>>  
> >>>>>>>
> >>>>>>>  
> >>>
> >>>
> >>
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-12 14:53                             ` Auger Eric
  2018-07-12 15:15                               ` Andrew Jones
@ 2018-07-18 13:00                               ` Igor Mammedov
  2018-08-08  9:33                                 ` Auger Eric
  1 sibling, 1 reply; 62+ messages in thread
From: Igor Mammedov @ 2018-07-18 13:00 UTC (permalink / raw)
  To: Auger Eric
  Cc: Andrew Jones, wei, peter.maydell, David Hildenbrand, qemu-devel,
	Shameerali Kolothum Thodi, agraf, qemu-arm, david, dgilbert,
	eric.auger.pro

On Thu, 12 Jul 2018 16:53:01 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Drew,
> 
> On 07/12/2018 04:45 PM, Andrew Jones wrote:
> > On Thu, Jul 12, 2018 at 04:22:05PM +0200, Auger Eric wrote:  
> >> Hi Igor,
> >>
> >> On 07/11/2018 03:17 PM, Igor Mammedov wrote:  
> >>> On Thu, 5 Jul 2018 16:27:05 +0200
> >>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>  
> >>>> Hi Shameer,
> >>>>
> >>>> On 07/05/2018 03:19 PM, Shameerali Kolothum Thodi wrote:  
> >>>>>     
> >>>>>> -----Original Message-----
> >>>>>> From: Auger Eric [mailto:eric.auger@redhat.com]
> >>>>>> Sent: 05 July 2018 13:18
> >>>>>> To: David Hildenbrand <david@redhat.com>; eric.auger.pro@gmail.com;
> >>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
> >>>>>> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> >>>>>> imammedo@redhat.com
> >>>>>> Cc: wei@redhat.com; drjones@redhat.com; david@gibson.dropbear.id.au;
> >>>>>> dgilbert@redhat.com; agraf@suse.de
> >>>>>> Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate
> >>>>>> device_memory
> >>>>>>
> >>>>>> Hi David,
> >>>>>>
> >>>>>> On 07/05/2018 02:09 PM, David Hildenbrand wrote:    
> >>>>>>> On 05.07.2018 14:00, Auger Eric wrote:    
> >>>>>>>> Hi David,
> >>>>>>>>
> >>>>>>>> On 07/05/2018 01:54 PM, David Hildenbrand wrote:    
> >>>>>>>>> On 05.07.2018 13:42, Auger Eric wrote:    
> >>>>>>>>>> Hi David,
> >>>>>>>>>>
> >>>>>>>>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote:    
> >>>>>>>>>>> On 03.07.2018 21:27, Auger Eric wrote:    
> >>>>>>>>>>>> Hi David,
> >>>>>>>>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote:    
> >>>>>>>>>>>>> On 03.07.2018 09:19, Eric Auger wrote:    
> >>>>>>>>>>>>>> We define a new hotpluggable RAM region (aka. device memory).
> >>>>>>>>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support
> >>>>>>>>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device
> >>>>>>>>>>>>>> memory region is max 2TB.    
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB
> >>>>>>>>>>>>> (and not e.g. at 1TB)?    
> >>>>>>>>>>>> not a stupid question. See tentative answer below.    
> >>>>>>>>>>>>>    
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is largely inspired of device memory initialization in
> >>>>>>>>>>>>>> pc machine code.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>>>>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>  hw/arm/virt.c         | 104    
> >>>>>> ++++++++++++++++++++++++++++++++++++--------------    
> >>>>>>>>>>>>>>  include/hw/arm/arm.h  |   2 +
> >>>>>>>>>>>>>>  include/hw/arm/virt.h |   1 +
> >>>>>>>>>>>>>>  3 files changed, 79 insertions(+), 28 deletions(-)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >>>>>>>>>>>>>> index 5a4d0bf..6fefb78 100644
> >>>>>>>>>>>>>> --- a/hw/arm/virt.c
> >>>>>>>>>>>>>> +++ b/hw/arm/virt.c
> >>>>>>>>>>>>>> @@ -59,6 +59,7 @@
> >>>>>>>>>>>>>>  #include "qapi/visitor.h"
> >>>>>>>>>>>>>>  #include "standard-headers/linux/input.h"
> >>>>>>>>>>>>>>  #include "hw/arm/smmuv3.h"
> >>>>>>>>>>>>>> +#include "hw/acpi/acpi.h"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
> >>>>>>>>>>>>>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc,    
> >>>>>> \    
> >>>>>>>>>>>>>> @@ -94,34 +95,25 @@
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  #define PLATFORM_BUS_NUM_IRQS 64
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this    
> >>>>>> means    
> >>>>>>>>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
> >>>>>>>>>>>>>> - * address space unallocated and free for future use between 256G    
> >>>>>> and 512G.    
> >>>>>>>>>>>>>> - * If we need to provide more RAM to VMs in the future then we    
> >>>>>> need to:    
> >>>>>>>>>>>>>> - *  * allocate a second bank of RAM starting at 2TB and working up    
> >>>>>>>>>>>> I acknowledge this comment was the main justification. Now if you look    
> >>>>>> at    
> >>>>>>>>>>>>
> >>>>>>>>>>>> Principles of ARM Memory Maps
> >>>>>>>>>>>>    
> >>>>>> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ
> >>>>>> iples_of_arm_memory_maps.pdf    
> >>>>>>>>>>>> chapter 2.3 you will find that when adding PA bits, you always leave
> >>>>>>>>>>>> space for reserved space and mapped IO.    
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for the pointer!
> >>>>>>>>>>>
> >>>>>>>>>>> So ... we can fit
> >>>>>>>>>>>
> >>>>>>>>>>> a) 2GB at 2GB
> >>>>>>>>>>> b) 32GB at 32GB
> >>>>>>>>>>> c) 512GB at 512GB
> >>>>>>>>>>> d) 8TB at 8TB
> >>>>>>>>>>> e) 128TB at 128TB
> >>>>>>>>>>>
> >>>>>>>>>>> (this is a nice rule of thumb if I understand it correctly :) )
> >>>>>>>>>>>
> >>>>>>>>>>> We should strive for device memory (maxram_size - ram_size) to fit
> >>>>>>>>>>> exactly into one of these slots (otherwise things get nasty).
> >>>>>>>>>>>
> >>>>>>>>>>> Depending on the ram_size, we might have simpler setups and can    
> >>>>>> support    
> >>>>>>>>>>> more configurations, no?
> >>>>>>>>>>>
> >>>>>>>>>>> E.g. ram_size <= 34GB, device_memory <= 512GB    
> >>>>>>>>>>> -> move ram into a) and b)
> >>>>>>>>>>> -> move device memory into c)    
> >>>>>>>>>>
> >>>>>>>>>> The issue is machvirt doesn't comply with that document.
> >>>>>>>>>> At the moment we have
> >>>>>>>>>> 0 -> 1GB MMIO
> >>>>>>>>>> 1GB -> 256GB RAM
> >>>>>>>>>> 256GB -> 512GB is theoretically reserved for IO but most is free.
> >>>>>>>>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our
> >>>>>>>>>> existing 40b GPA space.
> >>>>>>>>>>
> >>>>>>>>>> We don't want to change this address map due to legacy reasons.
[...]

> >> Also there is the problematic of migration. How
> >> would you migrate between guests whose RAM is not laid out at the same
> >> place?  
> > 
> > I'm not sure what you mean here. Boot a guest with a new memory map,
> > probably by explicitly asking for it with a new machine property,
> > which means a new virt machine version. Then migrate at will to any
> > host that supports that machine type.  
> My concern rather was about holes in the memory map matching reserved
> regions.
> >   
> >> I understood hotplug memory relied on a specific device_memory
> >> region. So do you mean we would have 2 contiguous regions?  
> > 
> > I think Igor wants one contiguous region for RAM, where additional
> > space can be reserved for hotplugging.  
> This is not compliant with 2012 ARM white paper, although I don't really
> know if this document truly is a reference (did not get any reply).
it's upto QEMU to pick layout, if we have maxmem (upto 256Gb) we could
accommodate legacy req and put single device_memory in 1Gb-256Gb GPA gap,
if it's more we can move whole device_memory to 2Tb, 8Tb ...
that keeps things manageable for us and fits specs (if such exist).
WE should make selection of the next RAM base deterministic is possible
when layout changes due to maxram size or IOVA, so that we won't need
to use compat knobs/checks to keep machine migratable.

[...]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory Eric Auger
  2018-07-03 18:25   ` David Hildenbrand
@ 2018-07-18 13:05   ` Igor Mammedov
  2018-08-08  9:33     ` Auger Eric
  1 sibling, 1 reply; 62+ messages in thread
From: Igor Mammedov @ 2018-07-18 13:05 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, agraf, david,
	drjones, wei

On Tue,  3 Jul 2018 09:19:49 +0200
Eric Auger <eric.auger@redhat.com> wrote:

> We define a new hotpluggable RAM region (aka. device memory).
> Its base is 2TB GPA. This obviously requires 42b IPA support
> in KVM/ARM, FW and guest kernel. At the moment the device
> memory region is max 2TB.
> 
> This is largely inspired of device memory initialization in
> pc machine code.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> ---
>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>  include/hw/arm/arm.h  |   2 +
>  include/hw/arm/virt.h |   1 +
>  3 files changed, 79 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 5a4d0bf..6fefb78 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -59,6 +59,7 @@
>  #include "qapi/visitor.h"
>  #include "standard-headers/linux/input.h"
>  #include "hw/arm/smmuv3.h"
> +#include "hw/acpi/acpi.h"
>  
>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> @@ -94,34 +95,25 @@
>  
>  #define PLATFORM_BUS_NUM_IRQS 64
>  
> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
> - * address space unallocated and free for future use between 256G and 512G.
> - * If we need to provide more RAM to VMs in the future then we need to:
> - *  * allocate a second bank of RAM starting at 2TB and working up
> - *  * fix the DT and ACPI table generation code in QEMU to correctly
> - *    report two split lumps of RAM to the guest
> - *  * fix KVM in the host kernel to allow guests with >40 bit address spaces
> - * (We don't want to fill all the way up to 512GB with RAM because
> - * we might want it for non-RAM purposes later. Conversely it seems
> - * reasonable to assume that anybody configuring a VM with a quarter
> - * of a terabyte of RAM will be doing it on a host with more than a
> - * terabyte of physical address space.)
> - */
> -#define RAMLIMIT_GB 255
> -#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
> +#define SZ_64K 0x10000
> +#define SZ_1G (1024ULL * 1024 * 1024)
>  
>  /* Addresses and sizes of our components.
> - * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
> - * 128MB..256MB is used for miscellaneous device I/O.
> - * 256MB..1GB is reserved for possible future PCI support (ie where the
> - * PCI memory window will go if we add a PCI host controller).
> - * 1GB and up is RAM (which may happily spill over into the
> - * high memory region beyond 4GB).
> - * This represents a compromise between how much RAM can be given to
> - * a 32 bit VM and leaving space for expansion and in particular for PCI.
> - * Note that devices should generally be placed at multiples of 0x10000,
> + * 0..128MB is space for a flash device so we can run bootrom code such as UEFI,
> + * 128MB..256MB is used for miscellaneous device I/O,
> + * 256MB..1GB is used for PCI host controller,
> + * 1GB..256GB is RAM (not hotpluggable),
> + * 256GB..512GB: is left for device I/O (non RAM purpose),
> + * 512GB..1TB: high mem PCI MMIO region,
> + * 2TB..4TB is used for hot-pluggable DIMM (assumes 42b GPA is supported).
> + *
> + * Note that IO devices should generally be placed at multiples of 0x10000,
>   * to accommodate guests using 64K pages.
> + *
> + * Conversely it seems reasonable to assume that anybody configuring a VM
> + * with a quarter of a terabyte of RAM will be doing it on a host with more
> + * than a terabyte of physical address space.)
> + *
>   */
>  static const MemMapEntry a15memmap[] = {
>      /* Space up to 0x8000000 is reserved for a boot ROM */
> @@ -148,12 +140,13 @@ static const MemMapEntry a15memmap[] = {
>      [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
> -    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
> +    [VIRT_MEM] =                { SZ_1G , 255 * SZ_1G },
>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
>      [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
>      [VIRT_PCIE_ECAM_HIGH] =     { 0x4010000000ULL, 0x10000000 },
>      /* Second PCIe window, 512GB wide at the 512GB boundary */
> -    [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
> +    [VIRT_PCIE_MMIO_HIGH] =     { 512 * SZ_1G, 512 * SZ_1G },
> +    [VIRT_HOTPLUG_MEM] =        { 2048 * SZ_1G, 2048 * SZ_1G },
>  };
>  
>  static const int a15irqmap[] = {
> @@ -1223,6 +1216,58 @@ static void create_secure_ram(VirtMachineState *vms,
>      g_free(nodename);
>  }
>  
> +static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
> +{
> +    MachineState *ms = MACHINE(vms);
> +    uint64_t device_memory_size;
> +    uint64_t align = SZ_64K;
> +
> +    /* always allocate the device memory information */
> +    ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
> +
> +    if (vms->max_vm_phys_shift < 42) {
> +        /* device memory starts at 2TB whereas this VM supports less than
> +         * 2TB GPA */
> +        if (ms->maxram_size > ms->ram_size || ms->ram_slots) {
> +            MachineClass *mc = MACHINE_GET_CLASS(ms);
> +
> +            error_report("\"-memory 'slots|maxmem'\" is not supported by %s "
> +                         "since KVM does not support more than 41b IPA",
> +                         mc->name);
> +            exit(EXIT_FAILURE);
> +        }
> +        return;
> +    }
> +
> +    if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
> +        error_report("unsupported number of memory slots: %"PRIu64,
> +                     ms->ram_slots);
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    if (QEMU_ALIGN_UP(ms->maxram_size, align) != ms->maxram_size) {
> +        error_report("maximum memory size must be aligned to multiple of 0x%"
> +                     PRIx64, align);
> +            exit(EXIT_FAILURE);
> +    }
> +
> +    ms->device_memory->base = vms->memmap[VIRT_HOTPLUG_MEM].base;
> +    device_memory_size = ms->maxram_size - ms->ram_size;
> +
> +    if (device_memory_size > vms->memmap[VIRT_HOTPLUG_MEM].size) {
> +        error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
> +                         ms->maxram_size);
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    memory_region_init(&ms->device_memory->mr, OBJECT(vms),
> +                       "device-memory", device_memory_size);
> +    memory_region_add_subregion(sysmem, ms->device_memory->base,
> +                                &ms->device_memory->mr);

> +    vms->bootinfo.device_memory_start = ms->device_memory->base;
> +    vms->bootinfo.device_memory_size = device_memory_size;
why do we need duplicate it in bootinfo?
(I'd try avoid using bootinfo and use original source instead
where it's needed)


> +}
> +
>  static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
>  {
>      const VirtMachineState *board = container_of(binfo, VirtMachineState,
> @@ -1430,7 +1475,8 @@ static void machvirt_init(MachineState *machine)
>      vms->smp_cpus = smp_cpus;
>  
>      if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
> -        error_report("mach-virt: cannot model more than %dGB RAM", RAMLIMIT_GB);
> +        error_report("mach-virt: cannot model more than %dGB RAM",
> +                     (int)(vms->memmap[VIRT_MEM].size / SZ_1G));
>          exit(1);
>      }
>  
> @@ -1525,6 +1571,8 @@ static void machvirt_init(MachineState *machine)
>                                           machine->ram_size);
>      memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
>  
> +    create_device_memory(vms, sysmem);
> +
>      create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
>  
>      create_gic(vms, pic);
> diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
> index ffed392..76269e6 100644
> --- a/include/hw/arm/arm.h
> +++ b/include/hw/arm/arm.h
> @@ -116,6 +116,8 @@ struct arm_boot_info {
>      bool secure_board_setup;
>  
>      arm_endianness endianness;
> +    hwaddr device_memory_start;
> +    hwaddr device_memory_size;
>  };
>  
>  /**
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 91f6de2..173938d 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -78,6 +78,7 @@ enum {
>      VIRT_GPIO,
>      VIRT_SECURE_UART,
>      VIRT_SECURE_MEM,
> +    VIRT_HOTPLUG_MEM,
>  };
>  
>  typedef enum VirtIOMMUType {

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 08/15] hw/arm/boot: introduce fdt_add_memory_node helper
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 08/15] hw/arm/boot: introduce fdt_add_memory_node helper Eric Auger
@ 2018-07-18 14:04   ` Igor Mammedov
  2018-08-08  9:44     ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: Igor Mammedov @ 2018-07-18 14:04 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, agraf, david,
	drjones, wei

On Tue,  3 Jul 2018 09:19:51 +0200
Eric Auger <eric.auger@redhat.com> wrote:

> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> 
> We introduce an helper to create a memory node.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> 
> ---
> 
> v1 -> v2:
> - nop of existing /memory nodes was already handled
> ---
>  hw/arm/boot.c | 54 ++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 34 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index e09201c..5243a25 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -413,6 +413,36 @@ static void set_kernel_args_old(const struct arm_boot_info *info,
>      }
>  }
>  
> +static int fdt_add_memory_node(void *fdt, uint32_t acells, hwaddr mem_base,
> +                               uint32_t scells, hwaddr mem_len,
> +                               int numa_node_id)
> +{
> +    char *nodename = NULL;
> +    int ret;
> +
> +    nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
> +    qemu_fdt_add_subnode(fdt, nodename);
> +    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> +    ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
> +                                       scells, mem_len);
> +    if (ret < 0) {
> +        fprintf(stderr, "couldn't set %s/reg\n", nodename);
> +        goto out;
> +    }
> +    if (numa_node_id < 0) {
> +        goto out;
> +    }
> +
> +    ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", numa_node_id);
> +    if (ret < 0) {
> +        fprintf(stderr, "couldn't set %s/numa-node-id\n", nodename);
> +    }
> +
> +out:
> +    g_free(nodename);
> +    return ret;
> +}
> +

not related question from hotplug POV,
is entry size fixed?
can we estimate exact size for #slots number of dimms and reserve it in advance
in FDT 'rom'?

>  static void fdt_add_psci_node(void *fdt)
>  {
>      uint32_t cpu_suspend_fn;
> @@ -492,7 +522,6 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>      void *fdt = NULL;
>      int size, rc, n = 0;
>      uint32_t acells, scells;
> -    char *nodename;
>      unsigned int i;
>      hwaddr mem_base, mem_len;
>      char **node_path;
> @@ -566,35 +595,20 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>          mem_base = binfo->loader_start;
>          for (i = 0; i < nb_numa_nodes; i++) {
>              mem_len = numa_info[i].node_mem;
> -            nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
> -            qemu_fdt_add_subnode(fdt, nodename);
> -            qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -            rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
> -                                              acells, mem_base,
> -                                              scells, mem_len);
> +            rc = fdt_add_memory_node(fdt, acells, mem_base,
> +                                     scells, mem_len, i);
>              if (rc < 0) {
> -                fprintf(stderr, "couldn't set %s/reg for node %d\n", nodename,
> -                        i);
>                  goto fail;
>              }
>  
> -            qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", i);
>              mem_base += mem_len;
> -            g_free(nodename);
>          }
>      } else {
> -        nodename = g_strdup_printf("/memory@%" PRIx64, binfo->loader_start);
> -        qemu_fdt_add_subnode(fdt, nodename);
> -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -
> -        rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
> -                                          acells, binfo->loader_start,
> -                                          scells, binfo->ram_size);
> +        rc = fdt_add_memory_node(fdt, acells, binfo->loader_start,
> +                                 scells, binfo->ram_size, -1);
>          if (rc < 0) {
> -            fprintf(stderr, "couldn't set %s reg\n", nodename);
>              goto fail;
>          }
> -        g_free(nodename);
>      }
>  
>      rc = fdt_path_offset(fdt, "/chosen");
nice cleanup, but I won't stop here just yet if hotplug to be considered.

I see arm_load_dtb() as a hack called from every board
where we dump everything that might be related to DTB regardless
if it's generic for every board or only a board specific stuff.

Could we split it in several logical parts that do a single thing
and preferably user only when they are actually need?
Something along following lines:
(cleanups/refactoring should be a separate from pcdimm series as it's self sufficient
and it would be easier to review/merge and could simplify following up pcdimm series):

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index e09201c..9c41efd 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@ -486,9 +486,6 @@ static void fdt_add_psci_node(void *fdt)
     qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
 }
 
-int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
-                 hwaddr addr_limit, AddressSpace *as)
-{
...

@@ -1158,9 +1158,14 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     }
 
     if (!info->skip_dtb_autoload && have_dtb(info)) {
-        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
-            exit(1);
-        }
+        load_dtb_from_file() /* reuse generic machine_get_dtb() ??? */
+        create_dtb_memory_nodes() /* non numa variant */
+        /* move out mac-virt specific binfo->get_dtb into the board */
+        /* move out modify_dtb() which vexpress hack into vexpress */
+        /* move out fdt_add_psci_node() into mac-ivrt */
+        create_dtb_initrd_kernel_nodes()
+        dump_fdt()
+        rom_add_blob_fixed_as()
     }
 }
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 281ddcd..7686abf 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1285,9 +1285,12 @@ void virt_machine_done(Notifier *notifier, void *data)
                                        vms->memmap[VIRT_PLATFORM_BUS].size,
                                        vms->irqmap[VIRT_PLATFORM_BUS]);
     }
-    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
-        exit(1);
-    }
+    load_dtb_from_file()/get_dtb() stuff
+    virt_create_dtb_memory_nodes() /* incl. numa variant nad later pcdimm nodes */                         
+    fdt_add_psci_node()                         
+    create_dtb_initrd_kernel_nodes()                                         
+    dump_fdt()                                                               
+    rom_add_blob_fixed_as()
 
     virt_acpi_setup(vms);
     virt_build_smbios(vms);

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (14 preceding siblings ...)
  2018-07-03  7:19 ` [Qemu-devel] [RFC v3 15/15] hw/arm/virt: Add nvdimm and nvdimm-persistence options Eric Auger
@ 2018-07-18 14:08 ` Igor Mammedov
  2018-10-18 12:56   ` Auger Eric
  2018-10-03 13:49 ` Auger Eric
  16 siblings, 1 reply; 62+ messages in thread
From: Igor Mammedov @ 2018-07-18 14:08 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, agraf, david,
	drjones, wei

On Tue,  3 Jul 2018 09:19:43 +0200
Eric Auger <eric.auger@redhat.com> wrote:

> This series aims at supporting PCDIMM/NVDIMM intantiation in
> machvirt at 2TB guest physical address.
> 
> This is achieved in 3 steps:
> 1) support more than 40b IPA/GPA
will it work for TCG as well?
/important from make check pov and maybe in cases when there is no ARM system available to test/play with the feature/



> 2) support PCDIMM instantiation
> 3) support NVDIMM instantiation
> 
> This series reuses/rebases patches initially submitted by Shameer in [1]
> and Kwangwoo in [2].
> 
> I put all parts all together for consistency and due to dependencies
> however as soon as the kernel dependency is resolved we can consider
> upstreaming them separately.
> 
> Support more than 40b IPA/GPA [ patches 1 - 5 ]
> -----------------------------------------------
> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> 
> At the moment the guest physical address space is limited to 40b
> due to KVM limitations. [0] bumps this limitation and allows to
> create a VM with up to 52b GPA address space.
> 
> With this series, QEMU creates a virt VM with the max IPA range
> reported by the host kernel or 40b by default.
> 
> This choice can be overriden by using the -machine kvm-type=<bits>
> option with bits within [40, 52]. If <bits> are not supported by
> the host, the legacy 40b value is used.
> 
> Currently the EDK2 FW also hardcodes the max number of GPA bits to
> 40. This will need to be fixed.
> 
> PCDIMM Support [ patches 6 - 11 ]
> ---------------------------------
> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> 
> We instantiate the device_memory at 2TB. Using it obviously requires
> at least 42b of IPA/GPA. While its max capacity is currently limited
> to 2TB, the actual size depends on the initial guest RAM size and
> maxmem parameter.
> 
> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
> of support of those features in baremetal.
> 
> NVDIMM support [ patches 12 - 15 ]
> ----------------------------------
> 
> Once the memory hotplug framework is in place it is fairly
> straightforward to add support for NVDIMM. the machine "nvdimm" option
> turns the capability on.
> 
> Best Regards
> 
> Eric
> 
> References:
> 
> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
> https://www.spinics.net/lists/kernel/msg2841735.html
> 
> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> http://patchwork.ozlabs.org/cover/914694/
> 
> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> 
> Tests:
> - On Cavium Gigabyte, a 48b VM was created.
> - Migration tests were performed between kernel supporting the
>   feature and destination kernel not suporting it
> - test with ACPI: to overcome the limitation of EDK2 FW, virt
>   memory map was hacked to move the device memory below 1TB.
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
> 
> History:
> 
> v2 -> v3:
> - fix pc_q35 and pc_piix compilation error
> - kwangwoo's email being not valid anymore, remove his address
> 
> v1 -> v2:
> - kvm_get_max_vm_phys_shift moved in arch specific file
> - addition of NVDIMM part
> - single series
> - rebase on David's refactoring
> 
> v1:
> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> 
> Best Regards
> 
> Eric
> 
> 
> Eric Auger (9):
>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
>   hw/boards: Add a MachineState parameter to kvm_type callback
>   kvm: add kvm_arm_get_max_vm_phys_shift
>   hw/arm/virt: support kvm_type property
>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
>   hw/arm/virt: Allocate device_memory
>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
>   hw/arm/boot: Expose the pmem nodes in the DT
>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> 
> Kwangwoo Lee (2):
>   nvdimm: use configurable ACPI IO base and size
>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> 
> Shameer Kolothum (4):
>   hw/arm/virt: Add memory hotplug framework
>   hw/arm/boot: introduce fdt_add_memory_node helper
>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> 
>  accel/kvm/kvm-all.c                            |   2 +-
>  default-configs/arm-softmmu.mak                |   4 +
>  hw/acpi/aml-build.c                            |  51 ++++
>  hw/acpi/nvdimm.c                               |  28 ++-
>  hw/arm/boot.c                                  | 123 +++++++--
>  hw/arm/virt-acpi-build.c                       |  10 +
>  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
>  hw/i386/acpi-build.c                           |  49 ----
>  hw/i386/pc_piix.c                              |   8 +-
>  hw/i386/pc_q35.c                               |   8 +-
>  hw/ppc/mac_newworld.c                          |   2 +-
>  hw/ppc/mac_oldworld.c                          |   2 +-
>  hw/ppc/spapr.c                                 |   2 +-
>  include/hw/acpi/aml-build.h                    |   3 +
>  include/hw/arm/arm.h                           |   2 +
>  include/hw/arm/virt.h                          |   7 +
>  include/hw/boards.h                            |   2 +-
>  include/hw/mem/nvdimm.h                        |  12 +
>  include/standard-headers/linux/virtio_config.h |  16 +-
>  linux-headers/asm-mips/unistd.h                |  18 +-
>  linux-headers/asm-powerpc/kvm.h                |   1 +
>  linux-headers/linux/kvm.h                      |  16 ++
>  target/arm/kvm.c                               |   9 +
>  target/arm/kvm_arm.h                           |  16 ++
>  24 files changed, 597 insertions(+), 124 deletions(-)
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-18 13:00                               ` Igor Mammedov
@ 2018-08-08  9:33                                 ` Auger Eric
  2018-08-09  8:45                                   ` Igor Mammedov
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-08-08  9:33 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Andrew Jones, wei, peter.maydell, David Hildenbrand, qemu-devel,
	Shameerali Kolothum Thodi, agraf, qemu-arm, david, dgilbert,
	eric.auger.pro

Hi Igor,

On 07/18/2018 03:00 PM, Igor Mammedov wrote:
[...]
>>>
>>> I think Igor wants one contiguous region for RAM, where additional
>>> space can be reserved for hotplugging.  
>> This is not compliant with 2012 ARM white paper, although I don't really
>> know if this document truly is a reference (did not get any reply).
> it's upto QEMU to pick layout, if we have maxmem (upto 256Gb) we could
> accommodate legacy req and put single device_memory in 1Gb-256Gb GPA gap,
> if it's more we can move whole device_memory to 2Tb, 8Tb ...
> that keeps things manageable for us and fits specs (if such exist).
> WE should make selection of the next RAM base deterministic is possible
> when layout changes due to maxram size or IOVA, so that we won't need
> to use compat knobs/checks to keep machine migratable.
Sorry for the delay. I was out of the office those past weeks.

OK understood. Your preferred approach is to have a contiguous memory
region (initial + hotplug). So this depends on the FW capability to
support flexible RAM base. Let's see how this dependency gets resolved.

This series does not bump the non hotpluggable memory region limit,
which is still limited to 255GB. The only way to add more memory is
though PCDIMM or NVDIMM (max 2TB atm). To do so you need to add ,maxmem
and ,slots options which need to be on both source and dest, right, +
the PCDIMM/NVDIMM device option lines? Also the series checks the
destination has at least the same IPA range capability as the source,
which conditions the fact the requested device_memory size can be
accommodated. At the moment I fail to see what are the other compat
knobs I must be prepared to handle.

Thanks

Eric
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-07-18 13:05   ` Igor Mammedov
@ 2018-08-08  9:33     ` Auger Eric
  0 siblings, 0 replies; 62+ messages in thread
From: Auger Eric @ 2018-08-08  9:33 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, agraf, david,
	drjones, wei

Hi Igor,
On 07/18/2018 03:05 PM, Igor Mammedov wrote:
> On Tue,  3 Jul 2018 09:19:49 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> We define a new hotpluggable RAM region (aka. device memory).
>> Its base is 2TB GPA. This obviously requires 42b IPA support
>> in KVM/ARM, FW and guest kernel. At the moment the device
>> memory region is max 2TB.
>>
>> This is largely inspired of device memory initialization in
>> pc machine code.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>> ---
>>  hw/arm/virt.c         | 104 ++++++++++++++++++++++++++++++++++++--------------
>>  include/hw/arm/arm.h  |   2 +
>>  include/hw/arm/virt.h |   1 +
>>  3 files changed, 79 insertions(+), 28 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 5a4d0bf..6fefb78 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -59,6 +59,7 @@
>>  #include "qapi/visitor.h"
>>  #include "standard-headers/linux/input.h"
>>  #include "hw/arm/smmuv3.h"
>> +#include "hw/acpi/acpi.h"
>>  
>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>> @@ -94,34 +95,25 @@
>>  
>>  #define PLATFORM_BUS_NUM_IRQS 64
>>  
>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical
>> - * address space unallocated and free for future use between 256G and 512G.
>> - * If we need to provide more RAM to VMs in the future then we need to:
>> - *  * allocate a second bank of RAM starting at 2TB and working up
>> - *  * fix the DT and ACPI table generation code in QEMU to correctly
>> - *    report two split lumps of RAM to the guest
>> - *  * fix KVM in the host kernel to allow guests with >40 bit address spaces
>> - * (We don't want to fill all the way up to 512GB with RAM because
>> - * we might want it for non-RAM purposes later. Conversely it seems
>> - * reasonable to assume that anybody configuring a VM with a quarter
>> - * of a terabyte of RAM will be doing it on a host with more than a
>> - * terabyte of physical address space.)
>> - */
>> -#define RAMLIMIT_GB 255
>> -#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
>> +#define SZ_64K 0x10000
>> +#define SZ_1G (1024ULL * 1024 * 1024)
>>  
>>  /* Addresses and sizes of our components.
>> - * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
>> - * 128MB..256MB is used for miscellaneous device I/O.
>> - * 256MB..1GB is reserved for possible future PCI support (ie where the
>> - * PCI memory window will go if we add a PCI host controller).
>> - * 1GB and up is RAM (which may happily spill over into the
>> - * high memory region beyond 4GB).
>> - * This represents a compromise between how much RAM can be given to
>> - * a 32 bit VM and leaving space for expansion and in particular for PCI.
>> - * Note that devices should generally be placed at multiples of 0x10000,
>> + * 0..128MB is space for a flash device so we can run bootrom code such as UEFI,
>> + * 128MB..256MB is used for miscellaneous device I/O,
>> + * 256MB..1GB is used for PCI host controller,
>> + * 1GB..256GB is RAM (not hotpluggable),
>> + * 256GB..512GB: is left for device I/O (non RAM purpose),
>> + * 512GB..1TB: high mem PCI MMIO region,
>> + * 2TB..4TB is used for hot-pluggable DIMM (assumes 42b GPA is supported).
>> + *
>> + * Note that IO devices should generally be placed at multiples of 0x10000,
>>   * to accommodate guests using 64K pages.
>> + *
>> + * Conversely it seems reasonable to assume that anybody configuring a VM
>> + * with a quarter of a terabyte of RAM will be doing it on a host with more
>> + * than a terabyte of physical address space.)
>> + *
>>   */
>>  static const MemMapEntry a15memmap[] = {
>>      /* Space up to 0x8000000 is reserved for a boot ROM */
>> @@ -148,12 +140,13 @@ static const MemMapEntry a15memmap[] = {
>>      [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
>>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
>> -    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
>> +    [VIRT_MEM] =                { SZ_1G , 255 * SZ_1G },
>>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
>>      [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
>>      [VIRT_PCIE_ECAM_HIGH] =     { 0x4010000000ULL, 0x10000000 },
>>      /* Second PCIe window, 512GB wide at the 512GB boundary */
>> -    [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
>> +    [VIRT_PCIE_MMIO_HIGH] =     { 512 * SZ_1G, 512 * SZ_1G },
>> +    [VIRT_HOTPLUG_MEM] =        { 2048 * SZ_1G, 2048 * SZ_1G },
>>  };
>>  
>>  static const int a15irqmap[] = {
>> @@ -1223,6 +1216,58 @@ static void create_secure_ram(VirtMachineState *vms,
>>      g_free(nodename);
>>  }
>>  
>> +static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
>> +{
>> +    MachineState *ms = MACHINE(vms);
>> +    uint64_t device_memory_size;
>> +    uint64_t align = SZ_64K;
>> +
>> +    /* always allocate the device memory information */
>> +    ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
>> +
>> +    if (vms->max_vm_phys_shift < 42) {
>> +        /* device memory starts at 2TB whereas this VM supports less than
>> +         * 2TB GPA */
>> +        if (ms->maxram_size > ms->ram_size || ms->ram_slots) {
>> +            MachineClass *mc = MACHINE_GET_CLASS(ms);
>> +
>> +            error_report("\"-memory 'slots|maxmem'\" is not supported by %s "
>> +                         "since KVM does not support more than 41b IPA",
>> +                         mc->name);
>> +            exit(EXIT_FAILURE);
>> +        }
>> +        return;
>> +    }
>> +
>> +    if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
>> +        error_report("unsupported number of memory slots: %"PRIu64,
>> +                     ms->ram_slots);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (QEMU_ALIGN_UP(ms->maxram_size, align) != ms->maxram_size) {
>> +        error_report("maximum memory size must be aligned to multiple of 0x%"
>> +                     PRIx64, align);
>> +            exit(EXIT_FAILURE);
>> +    }
>> +
>> +    ms->device_memory->base = vms->memmap[VIRT_HOTPLUG_MEM].base;
>> +    device_memory_size = ms->maxram_size - ms->ram_size;
>> +
>> +    if (device_memory_size > vms->memmap[VIRT_HOTPLUG_MEM].size) {
>> +        error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
>> +                         ms->maxram_size);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    memory_region_init(&ms->device_memory->mr, OBJECT(vms),
>> +                       "device-memory", device_memory_size);
>> +    memory_region_add_subregion(sysmem, ms->device_memory->base,
>> +                                &ms->device_memory->mr);
> 
>> +    vms->bootinfo.device_memory_start = ms->device_memory->base;
>> +    vms->bootinfo.device_memory_size = device_memory_size;
> why do we need duplicate it in bootinfo?
> (I'd try avoid using bootinfo and use original source instead
> where it's needed)
agreed. Not needed.

Thanks

Eric
> 
> 
>> +}
>> +
>>  static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
>>  {
>>      const VirtMachineState *board = container_of(binfo, VirtMachineState,
>> @@ -1430,7 +1475,8 @@ static void machvirt_init(MachineState *machine)
>>      vms->smp_cpus = smp_cpus;
>>  
>>      if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
>> -        error_report("mach-virt: cannot model more than %dGB RAM", RAMLIMIT_GB);
>> +        error_report("mach-virt: cannot model more than %dGB RAM",
>> +                     (int)(vms->memmap[VIRT_MEM].size / SZ_1G));
>>          exit(1);
>>      }
>>  
>> @@ -1525,6 +1571,8 @@ static void machvirt_init(MachineState *machine)
>>                                           machine->ram_size);
>>      memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
>>  
>> +    create_device_memory(vms, sysmem);
>> +
>>      create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
>>  
>>      create_gic(vms, pic);
>> diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
>> index ffed392..76269e6 100644
>> --- a/include/hw/arm/arm.h
>> +++ b/include/hw/arm/arm.h
>> @@ -116,6 +116,8 @@ struct arm_boot_info {
>>      bool secure_board_setup;
>>  
>>      arm_endianness endianness;
>> +    hwaddr device_memory_start;
>> +    hwaddr device_memory_size;
>>  };
>>  
>>  /**
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index 91f6de2..173938d 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -78,6 +78,7 @@ enum {
>>      VIRT_GPIO,
>>      VIRT_SECURE_UART,
>>      VIRT_SECURE_MEM,
>> +    VIRT_HOTPLUG_MEM,
>>  };
>>  
>>  typedef enum VirtIOMMUType {
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 08/15] hw/arm/boot: introduce fdt_add_memory_node helper
  2018-07-18 14:04   ` Igor Mammedov
@ 2018-08-08  9:44     ` Auger Eric
  2018-08-09  8:57       ` Igor Mammedov
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-08-08  9:44 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, agraf, david,
	drjones, wei

Hi Igor,

On 07/18/2018 04:04 PM, Igor Mammedov wrote:
> On Tue,  3 Jul 2018 09:19:51 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>>
>> We introduce an helper to create a memory node.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>>
>> ---
>>
>> v1 -> v2:
>> - nop of existing /memory nodes was already handled
>> ---
>>  hw/arm/boot.c | 54 ++++++++++++++++++++++++++++++++++--------------------
>>  1 file changed, 34 insertions(+), 20 deletions(-)
>>
>> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
>> index e09201c..5243a25 100644
>> --- a/hw/arm/boot.c
>> +++ b/hw/arm/boot.c
>> @@ -413,6 +413,36 @@ static void set_kernel_args_old(const struct arm_boot_info *info,
>>      }
>>  }
>>  
>> +static int fdt_add_memory_node(void *fdt, uint32_t acells, hwaddr mem_base,
>> +                               uint32_t scells, hwaddr mem_len,
>> +                               int numa_node_id)
>> +{
>> +    char *nodename = NULL;
>> +    int ret;
>> +
>> +    nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
>> +    qemu_fdt_add_subnode(fdt, nodename);
>> +    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
>> +    ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
>> +                                       scells, mem_len);
>> +    if (ret < 0) {
>> +        fprintf(stderr, "couldn't set %s/reg\n", nodename);
>> +        goto out;
>> +    }
>> +    if (numa_node_id < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", numa_node_id);
>> +    if (ret < 0) {
>> +        fprintf(stderr, "couldn't set %s/numa-node-id\n", nodename);
>> +    }
>> +
>> +out:
>> +    g_free(nodename);
>> +    return ret;
>> +}
>> +
> 
> not related question from hotplug POV,
> is entry size fixed?
Sorry I don't get what entry you are referring to?
> can we estimate exact size for #slots number of dimms and reserve it in advance
> in FDT 'rom'?
Not sure I get your drift either.

patch "[RFC v3 09/15] hw/arm/boot: Expose the PC-DIMM nodes in the DT"
builds the DT nodes for each node, by enumerating the MemoryDeviceInfoList.

> 
>>  static void fdt_add_psci_node(void *fdt)
>>  {
>>      uint32_t cpu_suspend_fn;
>> @@ -492,7 +522,6 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>>      void *fdt = NULL;
>>      int size, rc, n = 0;
>>      uint32_t acells, scells;
>> -    char *nodename;
>>      unsigned int i;
>>      hwaddr mem_base, mem_len;
>>      char **node_path;
>> @@ -566,35 +595,20 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>>          mem_base = binfo->loader_start;
>>          for (i = 0; i < nb_numa_nodes; i++) {
>>              mem_len = numa_info[i].node_mem;
>> -            nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
>> -            qemu_fdt_add_subnode(fdt, nodename);
>> -            qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
>> -            rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
>> -                                              acells, mem_base,
>> -                                              scells, mem_len);
>> +            rc = fdt_add_memory_node(fdt, acells, mem_base,
>> +                                     scells, mem_len, i);
>>              if (rc < 0) {
>> -                fprintf(stderr, "couldn't set %s/reg for node %d\n", nodename,
>> -                        i);
>>                  goto fail;
>>              }
>>  
>> -            qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", i);
>>              mem_base += mem_len;
>> -            g_free(nodename);
>>          }
>>      } else {
>> -        nodename = g_strdup_printf("/memory@%" PRIx64, binfo->loader_start);
>> -        qemu_fdt_add_subnode(fdt, nodename);
>> -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
>> -
>> -        rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
>> -                                          acells, binfo->loader_start,
>> -                                          scells, binfo->ram_size);
>> +        rc = fdt_add_memory_node(fdt, acells, binfo->loader_start,
>> +                                 scells, binfo->ram_size, -1);
>>          if (rc < 0) {
>> -            fprintf(stderr, "couldn't set %s reg\n", nodename);
>>              goto fail;
>>          }
>> -        g_free(nodename);
>>      }
>>  
>>      rc = fdt_path_offset(fdt, "/chosen");
> nice cleanup, but I won't stop here just yet if hotplug to be considered.
> 
> I see arm_load_dtb() as a hack called from every board
> where we dump everything that might be related to DTB regardless
> if it's generic for every board or only a board specific stuff.
> 
> Could we split it in several logical parts that do a single thing
> and preferably user only when they are actually need?
> Something along following lines:
> (cleanups/refactoring should be a separate from pcdimm series as it's self sufficient
> and it would be easier to review/merge and could simplify following up pcdimm series):
The refactoring of arm_load_dtb() may be relevant indeed but I prefer to
keep it out of the scope of this series. Please feel free to send a
separate series. As you advise, I will send this very patch separately too.

Thanks

Eric
> 
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index e09201c..9c41efd 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @ -486,9 +486,6 @@ static void fdt_add_psci_node(void *fdt)
>      qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
>  }
>  
> -int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> -                 hwaddr addr_limit, AddressSpace *as)
> -{
> ...
> 
> @@ -1158,9 +1158,14 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>      }
>  
>      if (!info->skip_dtb_autoload && have_dtb(info)) {
> -        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
> -            exit(1);
> -        }
> +        load_dtb_from_file() /* reuse generic machine_get_dtb() ??? */
> +        create_dtb_memory_nodes() /* non numa variant */
> +        /* move out mac-virt specific binfo->get_dtb into the board */
> +        /* move out modify_dtb() which vexpress hack into vexpress */
> +        /* move out fdt_add_psci_node() into mac-ivrt */
> +        create_dtb_initrd_kernel_nodes()
> +        dump_fdt()
> +        rom_add_blob_fixed_as()
>      }
>  }
>  
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 281ddcd..7686abf 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1285,9 +1285,12 @@ void virt_machine_done(Notifier *notifier, void *data)
>                                         vms->memmap[VIRT_PLATFORM_BUS].size,
>                                         vms->irqmap[VIRT_PLATFORM_BUS]);
>      }
> -    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
> -        exit(1);
> -    }
> +    load_dtb_from_file()/get_dtb() stuff
> +    virt_create_dtb_memory_nodes() /* incl. numa variant nad later pcdimm nodes */                         
> +    fdt_add_psci_node()                         
> +    create_dtb_initrd_kernel_nodes()                                         
> +    dump_fdt()                                                               
> +    rom_add_blob_fixed_as()
>  
>      virt_acpi_setup(vms);
>      virt_build_smbios(vms);
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-08-08  9:33                                 ` Auger Eric
@ 2018-08-09  8:45                                   ` Igor Mammedov
  2018-08-09  9:54                                     ` Auger Eric
  0 siblings, 1 reply; 62+ messages in thread
From: Igor Mammedov @ 2018-08-09  8:45 UTC (permalink / raw)
  To: Auger Eric
  Cc: Andrew Jones, wei, peter.maydell, David Hildenbrand, qemu-devel,
	Shameerali Kolothum Thodi, agraf, qemu-arm, david, dgilbert,
	eric.auger.pro

On Wed, 8 Aug 2018 11:33:23 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 07/18/2018 03:00 PM, Igor Mammedov wrote:
> [...]
> >>>
> >>> I think Igor wants one contiguous region for RAM, where additional
> >>> space can be reserved for hotplugging.    
> >> This is not compliant with 2012 ARM white paper, although I don't really
> >> know if this document truly is a reference (did not get any reply).  
> > it's upto QEMU to pick layout, if we have maxmem (upto 256Gb) we could
> > accommodate legacy req and put single device_memory in 1Gb-256Gb GPA gap,
> > if it's more we can move whole device_memory to 2Tb, 8Tb ...
> > that keeps things manageable for us and fits specs (if such exist).
> > WE should make selection of the next RAM base deterministic is possible
> > when layout changes due to maxram size or IOVA, so that we won't need
> > to use compat knobs/checks to keep machine migratable.  
> Sorry for the delay. I was out of the office those past weeks.
> 
> OK understood. Your preferred approach is to have a contiguous memory
> region (initial + hotplug). So this depends on the FW capability to
> support flexible RAM base. Let's see how this dependency gets resolved.
I think Drew had already a look at FW side of the issue and has
a prototype to works with.
Once he's back in the office he planned to work on upstreaming EDK
and qemu parts.
 
> This series does not bump the non hotpluggable memory region limit,
> which is still limited to 255GB. The only way to add more memory is
> though PCDIMM or NVDIMM (max 2TB atm). To do so you need to add ,maxmem
> and ,slots options which need to be on both source and dest, right, +
> the PCDIMM/NVDIMM device option lines? Also the series checks the
> destination has at least the same IPA range capability as the source,
> which conditions the fact the requested device_memory size can be
> accommodated. At the moment I fail to see what are the other compat
> knobs I must be prepared to handle.
it looks the same to me.

We might use presence of slot/maxmem options as a knob to switch
to a new all DIMM layout (initial + hotplug) with floating ram base.
That way guests/fw that are designed to work with fixed RAM base will
work just fine by default and guests/fw that are to work with
mem hotplug or large RAM need vfio holes will use floating RAM base.
Does it seem reasonable?


> Thanks
> 
> Eric
> > 
> > [...]
> >   

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 08/15] hw/arm/boot: introduce fdt_add_memory_node helper
  2018-08-08  9:44     ` Auger Eric
@ 2018-08-09  8:57       ` Igor Mammedov
  0 siblings, 0 replies; 62+ messages in thread
From: Igor Mammedov @ 2018-08-09  8:57 UTC (permalink / raw)
  To: Auger Eric
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, agraf, david,
	drjones, wei

On Wed, 8 Aug 2018 11:44:14 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 07/18/2018 04:04 PM, Igor Mammedov wrote:
> > On Tue,  3 Jul 2018 09:19:51 +0200
> > Eric Auger <eric.auger@redhat.com> wrote:
> >   
> >> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> >>
> >> We introduce an helper to create a memory node.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> >>
> >> ---
> >>
> >> v1 -> v2:
> >> - nop of existing /memory nodes was already handled
> >> ---
> >>  hw/arm/boot.c | 54 ++++++++++++++++++++++++++++++++++--------------------
> >>  1 file changed, 34 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> >> index e09201c..5243a25 100644
> >> --- a/hw/arm/boot.c
> >> +++ b/hw/arm/boot.c
> >> @@ -413,6 +413,36 @@ static void set_kernel_args_old(const struct arm_boot_info *info,
> >>      }
> >>  }
> >>  
> >> +static int fdt_add_memory_node(void *fdt, uint32_t acells, hwaddr mem_base,
> >> +                               uint32_t scells, hwaddr mem_len,
> >> +                               int numa_node_id)
> >> +{
> >> +    char *nodename = NULL;
> >> +    int ret;
> >> +
> >> +    nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
> >> +    qemu_fdt_add_subnode(fdt, nodename);
> >> +    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> >> +    ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
> >> +                                       scells, mem_len);
> >> +    if (ret < 0) {
> >> +        fprintf(stderr, "couldn't set %s/reg\n", nodename);
> >> +        goto out;
> >> +    }
> >> +    if (numa_node_id < 0) {
> >> +        goto out;
> >> +    }
> >> +
> >> +    ret = qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", numa_node_id);
> >> +    if (ret < 0) {
> >> +        fprintf(stderr, "couldn't set %s/numa-node-id\n", nodename);
> >> +    }
> >> +
> >> +out:
> >> +    g_free(nodename);
> >> +    return ret;
> >> +}
> >> +  
> > 
> > not related question from hotplug POV,
> > is entry size fixed?  
> Sorry I don't get what entry you are referring to?
> > can we estimate exact size for #slots number of dimms and reserve it in advance
> > in FDT 'rom'?  
> Not sure I get your drift either.
> 
> patch "[RFC v3 09/15] hw/arm/boot: Expose the PC-DIMM nodes in the DT"
> builds the DT nodes for each node, by enumerating the MemoryDeviceInfoList.

In case of hotplug we don not care about adding DTB node at runtime
(guest won't see it anyways). However if we reboot machine it's reasonable
to regenerate DTB on reboot so guest would see previously hotplugged DIMMs.
Problem is that DTB is stored in fixed size 'rom' that's copied into guest's
RAM, so we should reserve a space for possible slots in advance or switch
to another mechanism to provide DTB to guest. (it could be a memory region
mapped outside of RAM)

[...]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory
  2018-08-09  8:45                                   ` Igor Mammedov
@ 2018-08-09  9:54                                     ` Auger Eric
  0 siblings, 0 replies; 62+ messages in thread
From: Auger Eric @ 2018-08-09  9:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: wei, peter.maydell, Andrew Jones, David Hildenbrand, qemu-devel,
	Shameerali Kolothum Thodi, agraf, qemu-arm, eric.auger.pro,
	dgilbert, david

Hi Igor,
On 08/09/2018 10:45 AM, Igor Mammedov wrote:
> On Wed, 8 Aug 2018 11:33:23 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> On 07/18/2018 03:00 PM, Igor Mammedov wrote:
>> [...]
>>>>>
>>>>> I think Igor wants one contiguous region for RAM, where additional
>>>>> space can be reserved for hotplugging.    
>>>> This is not compliant with 2012 ARM white paper, although I don't really
>>>> know if this document truly is a reference (did not get any reply).  
>>> it's upto QEMU to pick layout, if we have maxmem (upto 256Gb) we could
>>> accommodate legacy req and put single device_memory in 1Gb-256Gb GPA gap,
>>> if it's more we can move whole device_memory to 2Tb, 8Tb ...
>>> that keeps things manageable for us and fits specs (if such exist).
>>> WE should make selection of the next RAM base deterministic is possible
>>> when layout changes due to maxram size or IOVA, so that we won't need
>>> to use compat knobs/checks to keep machine migratable.  
>> Sorry for the delay. I was out of the office those past weeks.
>>
>> OK understood. Your preferred approach is to have a contiguous memory
>> region (initial + hotplug). So this depends on the FW capability to
>> support flexible RAM base. Let's see how this dependency gets resolved.
> I think Drew had already a look at FW side of the issue and has
> a prototype to works with.
> Once he's back in the office he planned to work on upstreaming EDK
> and qemu parts.
>  
>> This series does not bump the non hotpluggable memory region limit,
>> which is still limited to 255GB. The only way to add more memory is
>> though PCDIMM or NVDIMM (max 2TB atm). To do so you need to add ,maxmem
>> and ,slots options which need to be on both source and dest, right, +
>> the PCDIMM/NVDIMM device option lines? Also the series checks the
>> destination has at least the same IPA range capability as the source,
>> which conditions the fact the requested device_memory size can be
>> accommodated. At the moment I fail to see what are the other compat
>> knobs I must be prepared to handle.
> it looks the same to me.
> 
> We might use presence of slot/maxmem options as a knob to switch
> to a new all DIMM layout (initial + hotplug) with floating ram base.
> That way guests/fw that are designed to work with fixed RAM base will
> work just fine by default and guests/fw that are to work with
> mem hotplug or large RAM need vfio holes will use floating RAM base.
> Does it seem reasonable?

Yep, personally I don't have a strong option regarding using a single
contiguous RAM range or separate ones. I had the impression that both
were feasible but I also understand the concern about potential
migration compat issues you may have encountered in the past on pc
machine. As far as Peter is OK we can investigate the floating RAM base
solution and I will work closely with Drew to rebase this work.

Thanks

Eric
> 
> 
>> Thanks
>>
>> Eric
>>>
>>> [...]
>>>   
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
                   ` (15 preceding siblings ...)
  2018-07-18 14:08 ` [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Igor Mammedov
@ 2018-10-03 13:49 ` Auger Eric
  2018-10-03 14:13   ` Dr. David Alan Gilbert
  2018-10-04 11:11   ` Igor Mammedov
  16 siblings, 2 replies; 62+ messages in thread
From: Auger Eric @ 2018-10-03 13:49 UTC (permalink / raw)
  To: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, agraf, david, drjones, wei, Laszlo Ersek

Hi,

On 7/3/18 9:19 AM, Eric Auger wrote:
> This series aims at supporting PCDIMM/NVDIMM intantiation in
> machvirt at 2TB guest physical address.
> 
> This is achieved in 3 steps:
> 1) support more than 40b IPA/GPA
> 2) support PCDIMM instantiation
> 3) support NVDIMM instantiation

While respinning this series I have some general questions that raise up
when thinking about extending the RAM on mach-virt:

At the moment mach-virt offers 255GB max initial RAM starting at 1GB
("-m " option).

This series does not touch this initial RAM and only targets to add
device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
(legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK?

- Putting device memory at 2TB means only ARMv8/aarch64 would get
benefit of it. Is it an issue? ie. no device memory for ARMv7 or
ARMv8/aarch32. Do we need to put effort supporting more memory and
memory devices for those configs? there is less than 256GB free in the
existing 1TB mach-virt memory map anyway.

- is it OK to rely only on device memory to extend the existing 255 GB
RAM or would we need additional initial memory? device memory usage
induces a more complex command line so this puts a constraint on upper
layers. Is it acceptable though?

- I revisited the series so that the max IPA size shift would get
automatically computed according to the top address reached by the
device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
any additional kvm-type or explicit vm-phys-shift option to select the
correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
also assumes we don't put anything beyond the device memory. It is OK?

- Igor told me we was concerned about the split-memory RAM model as it
caused a lot of trouble regarding compat/migration on PC machine. After
having studied the pc machine code I now wonder if we can compare the PC
compat issues with the ones we could encounter on ARM with the proposed
split memory model.

On PC there are many knobs to tune the RAM layout
- max_ram_below_4g option tunes how much RAM we want below 4G
- gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
max_ram_below_4g
- plus the usual ram_size which affects the rest of the initial ram
- plus the maxram_size, slots which affect the size of the device memory
- the device memory is just behind the initial RAM, aligned to 1GB

Note the inital RAM and the device memory may be disjoint due to
misalignment of the initial ram size against 1GB

On ARM, we would have 3.0 virt machine supporting only initial RAM from
1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
initial RAM + device memory from 2TB to 4TB.

With that memory split and the different machine type, I don't see any
major hurdle with respect to migration. Do I miss something?

Alternative to have a split model is having a floating RAM base for a
contiguous initial + device memory (contiguity actually depends on
initial RAM size alignment too). This requires significant changes in FW
and also potentially impacts the legacy virt address map as we need to
pass the RAM floating base address in some way (using an SRAM at 1GB) or
using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
reluctance to move the RAM earlier
(https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).

Your feedbacks on those points are really welcome!

Thanks

Eric

> 
> This series reuses/rebases patches initially submitted by Shameer in [1]
> and Kwangwoo in [2].
> 
> I put all parts all together for consistency and due to dependencies
> however as soon as the kernel dependency is resolved we can consider
> upstreaming them separately.
> 
> Support more than 40b IPA/GPA [ patches 1 - 5 ]
> -----------------------------------------------
> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> 
> At the moment the guest physical address space is limited to 40b
> due to KVM limitations. [0] bumps this limitation and allows to
> create a VM with up to 52b GPA address space.
> 
> With this series, QEMU creates a virt VM with the max IPA range
> reported by the host kernel or 40b by default.
> 
> This choice can be overriden by using the -machine kvm-type=<bits>
> option with bits within [40, 52]. If <bits> are not supported by
> the host, the legacy 40b value is used.
> 
> Currently the EDK2 FW also hardcodes the max number of GPA bits to
> 40. This will need to be fixed.
> 
> PCDIMM Support [ patches 6 - 11 ]
> ---------------------------------
> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> 
> We instantiate the device_memory at 2TB. Using it obviously requires
> at least 42b of IPA/GPA. While its max capacity is currently limited
> to 2TB, the actual size depends on the initial guest RAM size and
> maxmem parameter.
> 
> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
> of support of those features in baremetal.
> 
> NVDIMM support [ patches 12 - 15 ]
> ----------------------------------
> 
> Once the memory hotplug framework is in place it is fairly
> straightforward to add support for NVDIMM. the machine "nvdimm" option
> turns the capability on.
> 
> Best Regards
> 
> Eric
> 
> References:
> 
> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
> https://www.spinics.net/lists/kernel/msg2841735.html
> 
> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> http://patchwork.ozlabs.org/cover/914694/
> 
> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> 
> Tests:
> - On Cavium Gigabyte, a 48b VM was created.
> - Migration tests were performed between kernel supporting the
>   feature and destination kernel not suporting it
> - test with ACPI: to overcome the limitation of EDK2 FW, virt
>   memory map was hacked to move the device memory below 1TB.
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
> 
> History:
> 
> v2 -> v3:
> - fix pc_q35 and pc_piix compilation error
> - kwangwoo's email being not valid anymore, remove his address
> 
> v1 -> v2:
> - kvm_get_max_vm_phys_shift moved in arch specific file
> - addition of NVDIMM part
> - single series
> - rebase on David's refactoring
> 
> v1:
> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> 
> Best Regards
> 
> Eric
> 
> 
> Eric Auger (9):
>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
>   hw/boards: Add a MachineState parameter to kvm_type callback
>   kvm: add kvm_arm_get_max_vm_phys_shift
>   hw/arm/virt: support kvm_type property
>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
>   hw/arm/virt: Allocate device_memory
>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
>   hw/arm/boot: Expose the pmem nodes in the DT
>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> 
> Kwangwoo Lee (2):
>   nvdimm: use configurable ACPI IO base and size
>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> 
> Shameer Kolothum (4):
>   hw/arm/virt: Add memory hotplug framework
>   hw/arm/boot: introduce fdt_add_memory_node helper
>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> 
>  accel/kvm/kvm-all.c                            |   2 +-
>  default-configs/arm-softmmu.mak                |   4 +
>  hw/acpi/aml-build.c                            |  51 ++++
>  hw/acpi/nvdimm.c                               |  28 ++-
>  hw/arm/boot.c                                  | 123 +++++++--
>  hw/arm/virt-acpi-build.c                       |  10 +
>  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
>  hw/i386/acpi-build.c                           |  49 ----
>  hw/i386/pc_piix.c                              |   8 +-
>  hw/i386/pc_q35.c                               |   8 +-
>  hw/ppc/mac_newworld.c                          |   2 +-
>  hw/ppc/mac_oldworld.c                          |   2 +-
>  hw/ppc/spapr.c                                 |   2 +-
>  include/hw/acpi/aml-build.h                    |   3 +
>  include/hw/arm/arm.h                           |   2 +
>  include/hw/arm/virt.h                          |   7 +
>  include/hw/boards.h                            |   2 +-
>  include/hw/mem/nvdimm.h                        |  12 +
>  include/standard-headers/linux/virtio_config.h |  16 +-
>  linux-headers/asm-mips/unistd.h                |  18 +-
>  linux-headers/asm-powerpc/kvm.h                |   1 +
>  linux-headers/linux/kvm.h                      |  16 ++
>  target/arm/kvm.c                               |   9 +
>  target/arm/kvm_arm.h                           |  16 ++
>  24 files changed, 597 insertions(+), 124 deletions(-)
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-03 13:49 ` Auger Eric
@ 2018-10-03 14:13   ` Dr. David Alan Gilbert
  2018-10-03 14:42     ` Auger Eric
  2018-10-04 11:11   ` Igor Mammedov
  1 sibling, 1 reply; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2018-10-03 14:13 UTC (permalink / raw)
  To: Auger Eric
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david, agraf, david,
	drjones, wei, Laszlo Ersek, Ard Biesheuvel

* Auger Eric (eric.auger@redhat.com) wrote:
> Hi,
> 
> On 7/3/18 9:19 AM, Eric Auger wrote:
> > This series aims at supporting PCDIMM/NVDIMM intantiation in
> > machvirt at 2TB guest physical address.
> > 
> > This is achieved in 3 steps:
> > 1) support more than 40b IPA/GPA
> > 2) support PCDIMM instantiation
> > 3) support NVDIMM instantiation
> 
> While respinning this series I have some general questions that raise up
> when thinking about extending the RAM on mach-virt:
> 
> At the moment mach-virt offers 255GB max initial RAM starting at 1GB
> ("-m " option).
> 
> This series does not touch this initial RAM and only targets to add
> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK?

Is there a reason not to make this configurable?
It sounds a perfectly reasonable number, but you wouldn't be too
surprised if someone came along with a pile of huge GPUs.

> - Putting device memory at 2TB means only ARMv8/aarch64 would get
> benefit of it. Is it an issue? ie. no device memory for ARMv7 or
> ARMv8/aarch32. Do we need to put effort supporting more memory and
> memory devices for those configs? there is less than 256GB free in the
> existing 1TB mach-virt memory map anyway.

They can always explicitly specify an address on a pc-dimm's addr
property can't they?

> - is it OK to rely only on device memory to extend the existing 255 GB
> RAM or would we need additional initial memory? device memory usage
> induces a more complex command line so this puts a constraint on upper
> layers. Is it acceptable though?

Check with a libvirt person?

> - I revisited the series so that the max IPA size shift would get
> automatically computed according to the top address reached by the
> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
> any additional kvm-type or explicit vm-phys-shift option to select the
> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
> also assumes we don't put anything beyond the device memory. It is OK?

Generically that probably sounds OK; be careful about how complex that
calculation gets, otherwise it might turn into a complex thing you have
to be careful of the effect of changing it (and eg if changing it causes
migration issues).

> - Igor told me we was concerned about the split-memory RAM model as it
> caused a lot of trouble regarding compat/migration on PC machine. After
> having studied the pc machine code I now wonder if we can compare the PC
> compat issues with the ones we could encounter on ARM with the proposed
> split memory model.
> 
> On PC there are many knobs to tune the RAM layout
> - max_ram_below_4g option tunes how much RAM we want below 4G
> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
> max_ram_below_4g
> - plus the usual ram_size which affects the rest of the initial ram
> - plus the maxram_size, slots which affect the size of the device memory
> - the device memory is just behind the initial RAM, aligned to 1GB
> 
> Note the inital RAM and the device memory may be disjoint due to
> misalignment of the initial ram size against 1GB
> 
> On ARM, we would have 3.0 virt machine supporting only initial RAM from
> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
> initial RAM + device memory from 2TB to 4TB.
> 
> With that memory split and the different machine type, I don't see any
> major hurdle with respect to migration. Do I miss something?

A lot of those knobs are there to keep migration compatibility due to
keeping behaviour the same for migration.

Dave

> Alternative to have a split model is having a floating RAM base for a
> contiguous initial + device memory (contiguity actually depends on
> initial RAM size alignment too). This requires significant changes in FW
> and also potentially impacts the legacy virt address map as we need to
> pass the RAM floating base address in some way (using an SRAM at 1GB) or
> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
> reluctance to move the RAM earlier
> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).
> 
> Your feedbacks on those points are really welcome!
> 
> Thanks
> 
> Eric
> 
> > 
> > This series reuses/rebases patches initially submitted by Shameer in [1]
> > and Kwangwoo in [2].
> > 
> > I put all parts all together for consistency and due to dependencies
> > however as soon as the kernel dependency is resolved we can consider
> > upstreaming them separately.
> > 
> > Support more than 40b IPA/GPA [ patches 1 - 5 ]
> > -----------------------------------------------
> > was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > 
> > At the moment the guest physical address space is limited to 40b
> > due to KVM limitations. [0] bumps this limitation and allows to
> > create a VM with up to 52b GPA address space.
> > 
> > With this series, QEMU creates a virt VM with the max IPA range
> > reported by the host kernel or 40b by default.
> > 
> > This choice can be overriden by using the -machine kvm-type=<bits>
> > option with bits within [40, 52]. If <bits> are not supported by
> > the host, the legacy 40b value is used.
> > 
> > Currently the EDK2 FW also hardcodes the max number of GPA bits to
> > 40. This will need to be fixed.
> > 
> > PCDIMM Support [ patches 6 - 11 ]
> > ---------------------------------
> > was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > 
> > We instantiate the device_memory at 2TB. Using it obviously requires
> > at least 42b of IPA/GPA. While its max capacity is currently limited
> > to 2TB, the actual size depends on the initial guest RAM size and
> > maxmem parameter.
> > 
> > Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
> > of support of those features in baremetal.
> > 
> > NVDIMM support [ patches 12 - 15 ]
> > ----------------------------------
> > 
> > Once the memory hotplug framework is in place it is fairly
> > straightforward to add support for NVDIMM. the machine "nvdimm" option
> > turns the capability on.
> > 
> > Best Regards
> > 
> > Eric
> > 
> > References:
> > 
> > [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
> > https://www.spinics.net/lists/kernel/msg2841735.html
> > 
> > [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> > http://patchwork.ozlabs.org/cover/914694/
> > 
> > [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> > https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> > 
> > Tests:
> > - On Cavium Gigabyte, a 48b VM was created.
> > - Migration tests were performed between kernel supporting the
> >   feature and destination kernel not suporting it
> > - test with ACPI: to overcome the limitation of EDK2 FW, virt
> >   memory map was hacked to move the device memory below 1TB.
> > 
> > This series can be found at:
> > https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
> > 
> > History:
> > 
> > v2 -> v3:
> > - fix pc_q35 and pc_piix compilation error
> > - kwangwoo's email being not valid anymore, remove his address
> > 
> > v1 -> v2:
> > - kvm_get_max_vm_phys_shift moved in arch specific file
> > - addition of NVDIMM part
> > - single series
> > - rebase on David's refactoring
> > 
> > v1:
> > - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > 
> > Best Regards
> > 
> > Eric
> > 
> > 
> > Eric Auger (9):
> >   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> >   hw/boards: Add a MachineState parameter to kvm_type callback
> >   kvm: add kvm_arm_get_max_vm_phys_shift
> >   hw/arm/virt: support kvm_type property
> >   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
> >   hw/arm/virt: Allocate device_memory
> >   acpi: move build_srat_hotpluggable_memory to generic ACPI source
> >   hw/arm/boot: Expose the pmem nodes in the DT
> >   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> > 
> > Kwangwoo Lee (2):
> >   nvdimm: use configurable ACPI IO base and size
> >   hw/arm/virt: Add nvdimm hot-plug infrastructure
> > 
> > Shameer Kolothum (4):
> >   hw/arm/virt: Add memory hotplug framework
> >   hw/arm/boot: introduce fdt_add_memory_node helper
> >   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> >   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> > 
> >  accel/kvm/kvm-all.c                            |   2 +-
> >  default-configs/arm-softmmu.mak                |   4 +
> >  hw/acpi/aml-build.c                            |  51 ++++
> >  hw/acpi/nvdimm.c                               |  28 ++-
> >  hw/arm/boot.c                                  | 123 +++++++--
> >  hw/arm/virt-acpi-build.c                       |  10 +
> >  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
> >  hw/i386/acpi-build.c                           |  49 ----
> >  hw/i386/pc_piix.c                              |   8 +-
> >  hw/i386/pc_q35.c                               |   8 +-
> >  hw/ppc/mac_newworld.c                          |   2 +-
> >  hw/ppc/mac_oldworld.c                          |   2 +-
> >  hw/ppc/spapr.c                                 |   2 +-
> >  include/hw/acpi/aml-build.h                    |   3 +
> >  include/hw/arm/arm.h                           |   2 +
> >  include/hw/arm/virt.h                          |   7 +
> >  include/hw/boards.h                            |   2 +-
> >  include/hw/mem/nvdimm.h                        |  12 +
> >  include/standard-headers/linux/virtio_config.h |  16 +-
> >  linux-headers/asm-mips/unistd.h                |  18 +-
> >  linux-headers/asm-powerpc/kvm.h                |   1 +
> >  linux-headers/linux/kvm.h                      |  16 ++
> >  target/arm/kvm.c                               |   9 +
> >  target/arm/kvm_arm.h                           |  16 ++
> >  24 files changed, 597 insertions(+), 124 deletions(-)
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-03 14:13   ` Dr. David Alan Gilbert
@ 2018-10-03 14:42     ` Auger Eric
  2018-10-03 14:46       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 62+ messages in thread
From: Auger Eric @ 2018-10-03 14:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david, agraf, david,
	drjones, wei, Laszlo Ersek, Ard Biesheuvel

Hi Dave,

On 10/3/18 4:13 PM, Dr. David Alan Gilbert wrote:
> * Auger Eric (eric.auger@redhat.com) wrote:
>> Hi,
>>
>> On 7/3/18 9:19 AM, Eric Auger wrote:
>>> This series aims at supporting PCDIMM/NVDIMM intantiation in
>>> machvirt at 2TB guest physical address.
>>>
>>> This is achieved in 3 steps:
>>> 1) support more than 40b IPA/GPA
>>> 2) support PCDIMM instantiation
>>> 3) support NVDIMM instantiation
>>
>> While respinning this series I have some general questions that raise up
>> when thinking about extending the RAM on mach-virt:
>>
>> At the moment mach-virt offers 255GB max initial RAM starting at 1GB
>> ("-m " option).
>>
>> This series does not touch this initial RAM and only targets to add
>> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
>> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
>> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK?
> 
> Is there a reason not to make this configurable?
> It sounds a perfectly reasonable number, but you wouldn't be too
> surprised if someone came along with a pile of huge GPUs.

GPUs consume PCI MMIO region right? (we have a high mem PCI MMIO region
[512GB, 1TB]).

you mean having an option to define the base address of the device
memory? Well it was just a matter of not having too many knobs.

> 
>> - Putting device memory at 2TB means only ARMv8/aarch64 would get
>> benefit of it. Is it an issue? ie. no device memory for ARMv7 or
>> ARMv8/aarch32. Do we need to put effort supporting more memory and
>> memory devices for those configs? there is less than 256GB free in the
>> existing 1TB mach-virt memory map anyway.
> 
> They can always explicitly specify an address on a pc-dimm's addr
> property can't they?

If an address is passed it must be within [2TB, 4TB]. This is checked in
memory_device_get_free_addr(). So no way.
> 
>> - is it OK to rely only on device memory to extend the existing 255 GB
>> RAM or would we need additional initial memory? device memory usage
>> induces a more complex command line so this puts a constraint on upper
>> layers. Is it acceptable though?
> 
> Check with a libvirt person?
definitively ;-)
> 
>> - I revisited the series so that the max IPA size shift would get
>> automatically computed according to the top address reached by the
>> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
>> any additional kvm-type or explicit vm-phys-shift option to select the
>> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
>> also assumes we don't put anything beyond the device memory. It is OK?
> 
> Generically that probably sounds OK; be careful about how complex that
> calculation gets, otherwise it might turn into a complex thing you have
> to be careful of the effect of changing it (and eg if changing it causes
> migration issues).

the function that does this computation would be a class function that
can be changed per virt version.
> 
>> - Igor told me we was concerned about the split-memory RAM model as it
>> caused a lot of trouble regarding compat/migration on PC machine. After
>> having studied the pc machine code I now wonder if we can compare the PC
>> compat issues with the ones we could encounter on ARM with the proposed
>> split memory model.
>>
>> On PC there are many knobs to tune the RAM layout
>> - max_ram_below_4g option tunes how much RAM we want below 4G
>> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
>> max_ram_below_4g
>> - plus the usual ram_size which affects the rest of the initial ram
>> - plus the maxram_size, slots which affect the size of the device memory
>> - the device memory is just behind the initial RAM, aligned to 1GB
>>
>> Note the inital RAM and the device memory may be disjoint due to
>> misalignment of the initial ram size against 1GB
>>
>> On ARM, we would have 3.0 virt machine supporting only initial RAM from
>> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
>> initial RAM + device memory from 2TB to 4TB.
>>
>> With that memory split and the different machine type, I don't see any
>> major hurdle with respect to migration. Do I miss something?
> 
> A lot of those knobs are there to keep migration compatibility due to
> keeping behaviour the same for migration.
OK

Thank you for your inputs.

Eric
> 
> Dave
> 
>> Alternative to have a split model is having a floating RAM base for a
>> contiguous initial + device memory (contiguity actually depends on
>> initial RAM size alignment too). This requires significant changes in FW
>> and also potentially impacts the legacy virt address map as we need to
>> pass the RAM floating base address in some way (using an SRAM at 1GB) or
>> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
>> reluctance to move the RAM earlier
>> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).
>>
>> Your feedbacks on those points are really welcome!
>>
>> Thanks
>>
>> Eric
>>
>>>
>>> This series reuses/rebases patches initially submitted by Shameer in [1]
>>> and Kwangwoo in [2].
>>>
>>> I put all parts all together for consistency and due to dependencies
>>> however as soon as the kernel dependency is resolved we can consider
>>> upstreaming them separately.
>>>
>>> Support more than 40b IPA/GPA [ patches 1 - 5 ]
>>> -----------------------------------------------
>>> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>>
>>> At the moment the guest physical address space is limited to 40b
>>> due to KVM limitations. [0] bumps this limitation and allows to
>>> create a VM with up to 52b GPA address space.
>>>
>>> With this series, QEMU creates a virt VM with the max IPA range
>>> reported by the host kernel or 40b by default.
>>>
>>> This choice can be overriden by using the -machine kvm-type=<bits>
>>> option with bits within [40, 52]. If <bits> are not supported by
>>> the host, the legacy 40b value is used.
>>>
>>> Currently the EDK2 FW also hardcodes the max number of GPA bits to
>>> 40. This will need to be fixed.
>>>
>>> PCDIMM Support [ patches 6 - 11 ]
>>> ---------------------------------
>>> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>>
>>> We instantiate the device_memory at 2TB. Using it obviously requires
>>> at least 42b of IPA/GPA. While its max capacity is currently limited
>>> to 2TB, the actual size depends on the initial guest RAM size and
>>> maxmem parameter.
>>>
>>> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
>>> of support of those features in baremetal.
>>>
>>> NVDIMM support [ patches 12 - 15 ]
>>> ----------------------------------
>>>
>>> Once the memory hotplug framework is in place it is fairly
>>> straightforward to add support for NVDIMM. the machine "nvdimm" option
>>> turns the capability on.
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> References:
>>>
>>> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
>>> https://www.spinics.net/lists/kernel/msg2841735.html
>>>
>>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>> http://patchwork.ozlabs.org/cover/914694/
>>>
>>> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>>
>>> Tests:
>>> - On Cavium Gigabyte, a 48b VM was created.
>>> - Migration tests were performed between kernel supporting the
>>>   feature and destination kernel not suporting it
>>> - test with ACPI: to overcome the limitation of EDK2 FW, virt
>>>   memory map was hacked to move the device memory below 1TB.
>>>
>>> This series can be found at:
>>> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
>>>
>>> History:
>>>
>>> v2 -> v3:
>>> - fix pc_q35 and pc_piix compilation error
>>> - kwangwoo's email being not valid anymore, remove his address
>>>
>>> v1 -> v2:
>>> - kvm_get_max_vm_phys_shift moved in arch specific file
>>> - addition of NVDIMM part
>>> - single series
>>> - rebase on David's refactoring
>>>
>>> v1:
>>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>>
>>> Eric Auger (9):
>>>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
>>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>>   kvm: add kvm_arm_get_max_vm_phys_shift
>>>   hw/arm/virt: support kvm_type property
>>>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
>>>   hw/arm/virt: Allocate device_memory
>>>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
>>>   hw/arm/boot: Expose the pmem nodes in the DT
>>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>>
>>> Kwangwoo Lee (2):
>>>   nvdimm: use configurable ACPI IO base and size
>>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>>
>>> Shameer Kolothum (4):
>>>   hw/arm/virt: Add memory hotplug framework
>>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>>
>>>  accel/kvm/kvm-all.c                            |   2 +-
>>>  default-configs/arm-softmmu.mak                |   4 +
>>>  hw/acpi/aml-build.c                            |  51 ++++
>>>  hw/acpi/nvdimm.c                               |  28 ++-
>>>  hw/arm/boot.c                                  | 123 +++++++--
>>>  hw/arm/virt-acpi-build.c                       |  10 +
>>>  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
>>>  hw/i386/acpi-build.c                           |  49 ----
>>>  hw/i386/pc_piix.c                              |   8 +-
>>>  hw/i386/pc_q35.c                               |   8 +-
>>>  hw/ppc/mac_newworld.c                          |   2 +-
>>>  hw/ppc/mac_oldworld.c                          |   2 +-
>>>  hw/ppc/spapr.c                                 |   2 +-
>>>  include/hw/acpi/aml-build.h                    |   3 +
>>>  include/hw/arm/arm.h                           |   2 +
>>>  include/hw/arm/virt.h                          |   7 +
>>>  include/hw/boards.h                            |   2 +-
>>>  include/hw/mem/nvdimm.h                        |  12 +
>>>  include/standard-headers/linux/virtio_config.h |  16 +-
>>>  linux-headers/asm-mips/unistd.h                |  18 +-
>>>  linux-headers/asm-powerpc/kvm.h                |   1 +
>>>  linux-headers/linux/kvm.h                      |  16 ++
>>>  target/arm/kvm.c                               |   9 +
>>>  target/arm/kvm_arm.h                           |  16 ++
>>>  24 files changed, 597 insertions(+), 124 deletions(-)
>>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-03 14:42     ` Auger Eric
@ 2018-10-03 14:46       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2018-10-03 14:46 UTC (permalink / raw)
  To: Auger Eric
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david, agraf, david,
	drjones, wei, Laszlo Ersek, Ard Biesheuvel

* Auger Eric (eric.auger@redhat.com) wrote:
> Hi Dave,
> 
> On 10/3/18 4:13 PM, Dr. David Alan Gilbert wrote:
> > * Auger Eric (eric.auger@redhat.com) wrote:
> >> Hi,
> >>
> >> On 7/3/18 9:19 AM, Eric Auger wrote:
> >>> This series aims at supporting PCDIMM/NVDIMM intantiation in
> >>> machvirt at 2TB guest physical address.
> >>>
> >>> This is achieved in 3 steps:
> >>> 1) support more than 40b IPA/GPA
> >>> 2) support PCDIMM instantiation
> >>> 3) support NVDIMM instantiation
> >>
> >> While respinning this series I have some general questions that raise up
> >> when thinking about extending the RAM on mach-virt:
> >>
> >> At the moment mach-virt offers 255GB max initial RAM starting at 1GB
> >> ("-m " option).
> >>
> >> This series does not touch this initial RAM and only targets to add
> >> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
> >> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
> >> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK?
> > 
> > Is there a reason not to make this configurable?
> > It sounds a perfectly reasonable number, but you wouldn't be too
> > surprised if someone came along with a pile of huge GPUs.
> 
> GPUs consume PCI MMIO region right? (we have a high mem PCI MMIO region
> [512GB, 1TB]).

Yeh I think so.

> you mean having an option to define the base address of the device
> memory? Well it was just a matter of not having too many knobs.

What's wrong with lots of knobs !

> > 
> >> - Putting device memory at 2TB means only ARMv8/aarch64 would get
> >> benefit of it. Is it an issue? ie. no device memory for ARMv7 or
> >> ARMv8/aarch32. Do we need to put effort supporting more memory and
> >> memory devices for those configs? there is less than 256GB free in the
> >> existing 1TB mach-virt memory map anyway.
> > 
> > They can always explicitly specify an address on a pc-dimm's addr
> > property can't they?
> 
> If an address is passed it must be within [2TB, 4TB]. This is checked in
> memory_device_get_free_addr(). So no way.

OK.

Dave

> >> - is it OK to rely only on device memory to extend the existing 255 GB
> >> RAM or would we need additional initial memory? device memory usage
> >> induces a more complex command line so this puts a constraint on upper
> >> layers. Is it acceptable though?
> > 
> > Check with a libvirt person?
> definitively ;-)
> > 
> >> - I revisited the series so that the max IPA size shift would get
> >> automatically computed according to the top address reached by the
> >> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
> >> any additional kvm-type or explicit vm-phys-shift option to select the
> >> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
> >> also assumes we don't put anything beyond the device memory. It is OK?
> > 
> > Generically that probably sounds OK; be careful about how complex that
> > calculation gets, otherwise it might turn into a complex thing you have
> > to be careful of the effect of changing it (and eg if changing it causes
> > migration issues).
> 
> the function that does this computation would be a class function that
> can be changed per virt version.
> > 
> >> - Igor told me we was concerned about the split-memory RAM model as it
> >> caused a lot of trouble regarding compat/migration on PC machine. After
> >> having studied the pc machine code I now wonder if we can compare the PC
> >> compat issues with the ones we could encounter on ARM with the proposed
> >> split memory model.
> >>
> >> On PC there are many knobs to tune the RAM layout
> >> - max_ram_below_4g option tunes how much RAM we want below 4G
> >> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
> >> max_ram_below_4g
> >> - plus the usual ram_size which affects the rest of the initial ram
> >> - plus the maxram_size, slots which affect the size of the device memory
> >> - the device memory is just behind the initial RAM, aligned to 1GB
> >>
> >> Note the inital RAM and the device memory may be disjoint due to
> >> misalignment of the initial ram size against 1GB
> >>
> >> On ARM, we would have 3.0 virt machine supporting only initial RAM from
> >> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
> >> initial RAM + device memory from 2TB to 4TB.
> >>
> >> With that memory split and the different machine type, I don't see any
> >> major hurdle with respect to migration. Do I miss something?
> > 
> > A lot of those knobs are there to keep migration compatibility due to
> > keeping behaviour the same for migration.
> OK
> 
> Thank you for your inputs.
> 
> Eric
> > 
> > Dave
> > 
> >> Alternative to have a split model is having a floating RAM base for a
> >> contiguous initial + device memory (contiguity actually depends on
> >> initial RAM size alignment too). This requires significant changes in FW
> >> and also potentially impacts the legacy virt address map as we need to
> >> pass the RAM floating base address in some way (using an SRAM at 1GB) or
> >> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
> >> reluctance to move the RAM earlier
> >> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).
> >>
> >> Your feedbacks on those points are really welcome!
> >>
> >> Thanks
> >>
> >> Eric
> >>
> >>>
> >>> This series reuses/rebases patches initially submitted by Shameer in [1]
> >>> and Kwangwoo in [2].
> >>>
> >>> I put all parts all together for consistency and due to dependencies
> >>> however as soon as the kernel dependency is resolved we can consider
> >>> upstreaming them separately.
> >>>
> >>> Support more than 40b IPA/GPA [ patches 1 - 5 ]
> >>> -----------------------------------------------
> >>> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> >>>
> >>> At the moment the guest physical address space is limited to 40b
> >>> due to KVM limitations. [0] bumps this limitation and allows to
> >>> create a VM with up to 52b GPA address space.
> >>>
> >>> With this series, QEMU creates a virt VM with the max IPA range
> >>> reported by the host kernel or 40b by default.
> >>>
> >>> This choice can be overriden by using the -machine kvm-type=<bits>
> >>> option with bits within [40, 52]. If <bits> are not supported by
> >>> the host, the legacy 40b value is used.
> >>>
> >>> Currently the EDK2 FW also hardcodes the max number of GPA bits to
> >>> 40. This will need to be fixed.
> >>>
> >>> PCDIMM Support [ patches 6 - 11 ]
> >>> ---------------------------------
> >>> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> >>>
> >>> We instantiate the device_memory at 2TB. Using it obviously requires
> >>> at least 42b of IPA/GPA. While its max capacity is currently limited
> >>> to 2TB, the actual size depends on the initial guest RAM size and
> >>> maxmem parameter.
> >>>
> >>> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
> >>> of support of those features in baremetal.
> >>>
> >>> NVDIMM support [ patches 12 - 15 ]
> >>> ----------------------------------
> >>>
> >>> Once the memory hotplug framework is in place it is fairly
> >>> straightforward to add support for NVDIMM. the machine "nvdimm" option
> >>> turns the capability on.
> >>>
> >>> Best Regards
> >>>
> >>> Eric
> >>>
> >>> References:
> >>>
> >>> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
> >>> https://www.spinics.net/lists/kernel/msg2841735.html
> >>>
> >>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> >>> http://patchwork.ozlabs.org/cover/914694/
> >>>
> >>> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> >>>
> >>> Tests:
> >>> - On Cavium Gigabyte, a 48b VM was created.
> >>> - Migration tests were performed between kernel supporting the
> >>>   feature and destination kernel not suporting it
> >>> - test with ACPI: to overcome the limitation of EDK2 FW, virt
> >>>   memory map was hacked to move the device memory below 1TB.
> >>>
> >>> This series can be found at:
> >>> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
> >>>
> >>> History:
> >>>
> >>> v2 -> v3:
> >>> - fix pc_q35 and pc_piix compilation error
> >>> - kwangwoo's email being not valid anymore, remove his address
> >>>
> >>> v1 -> v2:
> >>> - kvm_get_max_vm_phys_shift moved in arch specific file
> >>> - addition of NVDIMM part
> >>> - single series
> >>> - rebase on David's refactoring
> >>>
> >>> v1:
> >>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> >>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> >>>
> >>> Best Regards
> >>>
> >>> Eric
> >>>
> >>>
> >>> Eric Auger (9):
> >>>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> >>>   hw/boards: Add a MachineState parameter to kvm_type callback
> >>>   kvm: add kvm_arm_get_max_vm_phys_shift
> >>>   hw/arm/virt: support kvm_type property
> >>>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
> >>>   hw/arm/virt: Allocate device_memory
> >>>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
> >>>   hw/arm/boot: Expose the pmem nodes in the DT
> >>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> >>>
> >>> Kwangwoo Lee (2):
> >>>   nvdimm: use configurable ACPI IO base and size
> >>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> >>>
> >>> Shameer Kolothum (4):
> >>>   hw/arm/virt: Add memory hotplug framework
> >>>   hw/arm/boot: introduce fdt_add_memory_node helper
> >>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> >>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> >>>
> >>>  accel/kvm/kvm-all.c                            |   2 +-
> >>>  default-configs/arm-softmmu.mak                |   4 +
> >>>  hw/acpi/aml-build.c                            |  51 ++++
> >>>  hw/acpi/nvdimm.c                               |  28 ++-
> >>>  hw/arm/boot.c                                  | 123 +++++++--
> >>>  hw/arm/virt-acpi-build.c                       |  10 +
> >>>  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
> >>>  hw/i386/acpi-build.c                           |  49 ----
> >>>  hw/i386/pc_piix.c                              |   8 +-
> >>>  hw/i386/pc_q35.c                               |   8 +-
> >>>  hw/ppc/mac_newworld.c                          |   2 +-
> >>>  hw/ppc/mac_oldworld.c                          |   2 +-
> >>>  hw/ppc/spapr.c                                 |   2 +-
> >>>  include/hw/acpi/aml-build.h                    |   3 +
> >>>  include/hw/arm/arm.h                           |   2 +
> >>>  include/hw/arm/virt.h                          |   7 +
> >>>  include/hw/boards.h                            |   2 +-
> >>>  include/hw/mem/nvdimm.h                        |  12 +
> >>>  include/standard-headers/linux/virtio_config.h |  16 +-
> >>>  linux-headers/asm-mips/unistd.h                |  18 +-
> >>>  linux-headers/asm-powerpc/kvm.h                |   1 +
> >>>  linux-headers/linux/kvm.h                      |  16 ++
> >>>  target/arm/kvm.c                               |   9 +
> >>>  target/arm/kvm_arm.h                           |  16 ++
> >>>  24 files changed, 597 insertions(+), 124 deletions(-)
> >>>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-03 13:49 ` Auger Eric
  2018-10-03 14:13   ` Dr. David Alan Gilbert
@ 2018-10-04 11:11   ` Igor Mammedov
  2018-10-04 11:32     ` Auger Eric
  1 sibling, 1 reply; 62+ messages in thread
From: Igor Mammedov @ 2018-10-04 11:11 UTC (permalink / raw)
  To: Auger Eric
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, wei, drjones, Ard Biesheuvel,
	Dr. David Alan Gilbert, agraf, Laszlo Ersek, david

On Wed, 3 Oct 2018 15:49:03 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi,
> 
> On 7/3/18 9:19 AM, Eric Auger wrote:
> > This series aims at supporting PCDIMM/NVDIMM intantiation in
> > machvirt at 2TB guest physical address.
> > 
> > This is achieved in 3 steps:
> > 1) support more than 40b IPA/GPA
> > 2) support PCDIMM instantiation
> > 3) support NVDIMM instantiation  
> 
> While respinning this series I have some general questions that raise up
> when thinking about extending the RAM on mach-virt:
> 
> At the moment mach-virt offers 255GB max initial RAM starting at 1GB
> ("-m " option).
> 
> This series does not touch this initial RAM and only targets to add
> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK?
> 
> - Putting device memory at 2TB means only ARMv8/aarch64 would get
> benefit of it. Is it an issue? ie. no device memory for ARMv7 or
> ARMv8/aarch32. Do we need to put effort supporting more memory and
> memory devices for those configs? there is less than 256GB free in the
> existing 1TB mach-virt memory map anyway.
> 
> - is it OK to rely only on device memory to extend the existing 255 GB
> RAM or would we need additional initial memory? device memory usage
> induces a more complex command line so this puts a constraint on upper
> layers. Is it acceptable though?
> 
> - I revisited the series so that the max IPA size shift would get
> automatically computed according to the top address reached by the
> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
> any additional kvm-type or explicit vm-phys-shift option to select the
> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
> also assumes we don't put anything beyond the device memory. It is OK?
> 
> - Igor told me we was concerned about the split-memory RAM model as it
> caused a lot of trouble regarding compat/migration on PC machine. After
> having studied the pc machine code I now wonder if we can compare the PC
> compat issues with the ones we could encounter on ARM with the proposed
> split memory model.
that's not the only issue.

For example since initial memory isn't modeled as a device
(i.e. it's just a plain memory region), there is a bunch of numa
code to deal with it. If initial memory were replaced by pc-dimm,
we would drop some of it and if we deprecated old '-numa mem' we
should be able to drop the most of it (newer '-numa memdev' maps
directly into pc-dimm model).

 
> On PC there are many knobs to tune the RAM layout
> - max_ram_below_4g option tunes how much RAM we want below 4G
> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
> max_ram_below_4g
> - plus the usual ram_size which affects the rest of the initial ram
> - plus the maxram_size, slots which affect the size of the device memory
> - the device memory is just behind the initial RAM, aligned to 1GB
> 
> Note the inital RAM and the device memory may be disjoint due to
> misalignment of the initial ram size against 1GB
> 
> On ARM, we would have 3.0 virt machine supporting only initial RAM from
> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
> initial RAM + device memory from 2TB to 4TB.
> 
> With that memory split and the different machine type, I don't see any
> major hurdle with respect to migration. Do I miss something?
Later on someone with a need to punch holes in fixed initial RAM/device memory,
starts making it complex.

> Alternative to have a split model is having a floating RAM base for a
> contiguous initial + device memory (contiguity actually depends on
> initial RAM size alignment too). This requires significant changes in FW
> and also potentially impacts the legacy virt address map as we need to
> pass the RAM floating base address in some way (using an SRAM at 1GB) or
> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
> reluctance to move the RAM earlier
Drew is working on it, lets see outcome first.

We actually may try implement single region that uses pc-dimm for
all memory (including initial) and be still compatible with legacy layout
as far as legacy mode sticks to the current RAM limit and device memory
region is put at the current RAM base.
When flexible RAM base is available, we will move that region to
non legacy layout at 2TB (or wherever).

> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).
> 
> Your feedbacks on those points are really welcome!
> 
> Thanks
> 
> Eric
> 
> > 
> > This series reuses/rebases patches initially submitted by Shameer in [1]
> > and Kwangwoo in [2].
> > 
> > I put all parts all together for consistency and due to dependencies
> > however as soon as the kernel dependency is resolved we can consider
> > upstreaming them separately.
> > 
> > Support more than 40b IPA/GPA [ patches 1 - 5 ]
> > -----------------------------------------------
> > was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > 
> > At the moment the guest physical address space is limited to 40b
> > due to KVM limitations. [0] bumps this limitation and allows to
> > create a VM with up to 52b GPA address space.
> > 
> > With this series, QEMU creates a virt VM with the max IPA range
> > reported by the host kernel or 40b by default.
> > 
> > This choice can be overriden by using the -machine kvm-type=<bits>
> > option with bits within [40, 52]. If <bits> are not supported by
> > the host, the legacy 40b value is used.
> > 
> > Currently the EDK2 FW also hardcodes the max number of GPA bits to
> > 40. This will need to be fixed.
> > 
> > PCDIMM Support [ patches 6 - 11 ]
> > ---------------------------------
> > was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > 
> > We instantiate the device_memory at 2TB. Using it obviously requires
> > at least 42b of IPA/GPA. While its max capacity is currently limited
> > to 2TB, the actual size depends on the initial guest RAM size and
> > maxmem parameter.
> > 
> > Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
> > of support of those features in baremetal.
> > 
> > NVDIMM support [ patches 12 - 15 ]
> > ----------------------------------
> > 
> > Once the memory hotplug framework is in place it is fairly
> > straightforward to add support for NVDIMM. the machine "nvdimm" option
> > turns the capability on.
> > 
> > Best Regards
> > 
> > Eric
> > 
> > References:
> > 
> > [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
> > https://www.spinics.net/lists/kernel/msg2841735.html
> > 
> > [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> > http://patchwork.ozlabs.org/cover/914694/
> > 
> > [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> > https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> > 
> > Tests:
> > - On Cavium Gigabyte, a 48b VM was created.
> > - Migration tests were performed between kernel supporting the
> >   feature and destination kernel not suporting it
> > - test with ACPI: to overcome the limitation of EDK2 FW, virt
> >   memory map was hacked to move the device memory below 1TB.
> > 
> > This series can be found at:
> > https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
> > 
> > History:
> > 
> > v2 -> v3:
> > - fix pc_q35 and pc_piix compilation error
> > - kwangwoo's email being not valid anymore, remove his address
> > 
> > v1 -> v2:
> > - kvm_get_max_vm_phys_shift moved in arch specific file
> > - addition of NVDIMM part
> > - single series
> > - rebase on David's refactoring
> > 
> > v1:
> > - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > 
> > Best Regards
> > 
> > Eric
> > 
> > 
> > Eric Auger (9):
> >   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> >   hw/boards: Add a MachineState parameter to kvm_type callback
> >   kvm: add kvm_arm_get_max_vm_phys_shift
> >   hw/arm/virt: support kvm_type property
> >   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
> >   hw/arm/virt: Allocate device_memory
> >   acpi: move build_srat_hotpluggable_memory to generic ACPI source
> >   hw/arm/boot: Expose the pmem nodes in the DT
> >   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> > 
> > Kwangwoo Lee (2):
> >   nvdimm: use configurable ACPI IO base and size
> >   hw/arm/virt: Add nvdimm hot-plug infrastructure
> > 
> > Shameer Kolothum (4):
> >   hw/arm/virt: Add memory hotplug framework
> >   hw/arm/boot: introduce fdt_add_memory_node helper
> >   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> >   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> > 
> >  accel/kvm/kvm-all.c                            |   2 +-
> >  default-configs/arm-softmmu.mak                |   4 +
> >  hw/acpi/aml-build.c                            |  51 ++++
> >  hw/acpi/nvdimm.c                               |  28 ++-
> >  hw/arm/boot.c                                  | 123 +++++++--
> >  hw/arm/virt-acpi-build.c                       |  10 +
> >  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
> >  hw/i386/acpi-build.c                           |  49 ----
> >  hw/i386/pc_piix.c                              |   8 +-
> >  hw/i386/pc_q35.c                               |   8 +-
> >  hw/ppc/mac_newworld.c                          |   2 +-
> >  hw/ppc/mac_oldworld.c                          |   2 +-
> >  hw/ppc/spapr.c                                 |   2 +-
> >  include/hw/acpi/aml-build.h                    |   3 +
> >  include/hw/arm/arm.h                           |   2 +
> >  include/hw/arm/virt.h                          |   7 +
> >  include/hw/boards.h                            |   2 +-
> >  include/hw/mem/nvdimm.h                        |  12 +
> >  include/standard-headers/linux/virtio_config.h |  16 +-
> >  linux-headers/asm-mips/unistd.h                |  18 +-
> >  linux-headers/asm-powerpc/kvm.h                |   1 +
> >  linux-headers/linux/kvm.h                      |  16 ++
> >  target/arm/kvm.c                               |   9 +
> >  target/arm/kvm_arm.h                           |  16 ++
> >  24 files changed, 597 insertions(+), 124 deletions(-)
> >   
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-04 11:11   ` Igor Mammedov
@ 2018-10-04 11:32     ` Auger Eric
  2018-10-04 12:02       ` David Hildenbrand
  2018-10-04 13:16       ` Igor Mammedov
  0 siblings, 2 replies; 62+ messages in thread
From: Auger Eric @ 2018-10-04 11:32 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, wei, drjones, Ard Biesheuvel,
	Dr. David Alan Gilbert, agraf, Laszlo Ersek, david

Hi Igor,

On 10/4/18 1:11 PM, Igor Mammedov wrote:
> On Wed, 3 Oct 2018 15:49:03 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi,
>>
>> On 7/3/18 9:19 AM, Eric Auger wrote:
>>> This series aims at supporting PCDIMM/NVDIMM intantiation in
>>> machvirt at 2TB guest physical address.
>>>
>>> This is achieved in 3 steps:
>>> 1) support more than 40b IPA/GPA
>>> 2) support PCDIMM instantiation
>>> 3) support NVDIMM instantiation  
>>
>> While respinning this series I have some general questions that raise up
>> when thinking about extending the RAM on mach-virt:
>>
>> At the moment mach-virt offers 255GB max initial RAM starting at 1GB
>> ("-m " option).
>>
>> This series does not touch this initial RAM and only targets to add
>> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
>> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
>> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK?
>>
>> - Putting device memory at 2TB means only ARMv8/aarch64 would get
>> benefit of it. Is it an issue? ie. no device memory for ARMv7 or
>> ARMv8/aarch32. Do we need to put effort supporting more memory and
>> memory devices for those configs? there is less than 256GB free in the
>> existing 1TB mach-virt memory map anyway.
>>
>> - is it OK to rely only on device memory to extend the existing 255 GB
>> RAM or would we need additional initial memory? device memory usage
>> induces a more complex command line so this puts a constraint on upper
>> layers. Is it acceptable though?
>>
>> - I revisited the series so that the max IPA size shift would get
>> automatically computed according to the top address reached by the
>> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
>> any additional kvm-type or explicit vm-phys-shift option to select the
>> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
>> also assumes we don't put anything beyond the device memory. It is OK?
>>
>> - Igor told me we was concerned about the split-memory RAM model as it
>> caused a lot of trouble regarding compat/migration on PC machine. After
>> having studied the pc machine code I now wonder if we can compare the PC
>> compat issues with the ones we could encounter on ARM with the proposed
>> split memory model.
> that's not the only issue.
> 
> For example since initial memory isn't modeled as a device
> (i.e. it's just a plain memory region), there is a bunch of numa
> code to deal with it. If initial memory were replaced by pc-dimm,
> we would drop some of it and if we deprecated old '-numa mem' we
> should be able to drop the most of it (newer '-numa memdev' maps
> directly into pc-dimm model).
see my comment below.
> 
>  
>> On PC there are many knobs to tune the RAM layout
>> - max_ram_below_4g option tunes how much RAM we want below 4G
>> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
>> max_ram_below_4g
>> - plus the usual ram_size which affects the rest of the initial ram
>> - plus the maxram_size, slots which affect the size of the device memory
>> - the device memory is just behind the initial RAM, aligned to 1GB
>>
>> Note the inital RAM and the device memory may be disjoint due to
>> misalignment of the initial ram size against 1GB
>>
>> On ARM, we would have 3.0 virt machine supporting only initial RAM from
>> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
>> initial RAM + device memory from 2TB to 4TB.
>>
>> With that memory split and the different machine type, I don't see any
>> major hurdle with respect to migration. Do I miss something?
> Later on someone with a need to punch holes in fixed initial RAM/device memory,
> starts making it complex.
Support of host reserved regions is not acked yet but that's a valid
argument.
> 
>> Alternative to have a split model is having a floating RAM base for a
>> contiguous initial + device memory (contiguity actually depends on
>> initial RAM size alignment too). This requires significant changes in FW
>> and also potentially impacts the legacy virt address map as we need to
>> pass the RAM floating base address in some way (using an SRAM at 1GB) or
>> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
>> reluctance to move the RAM earlier
> Drew is working on it, lets see outcome first.
> 
> We actually may try implement single region that uses pc-dimm for
> all memory (including initial) and be still compatible with legacy layout
> as far as legacy mode sticks to the current RAM limit and device memory
> region is put at the current RAM base.
> When flexible RAM base is available, we will move that region to
> non legacy layout at 2TB (or wherever).

Oh I did not understand you wanted to also replace the initial memory by
device memory. So we would switch from a pure static initial RAM setup
to a pure dynamic device memory setup. Looks quite drastic a change to
me. As mentionned I am concerned about complexifying the qemu cmd line
and I asked livirt guys about the induced pain.

Thank you for your feedbacks

Eric


> 
>> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).
>>
>> Your feedbacks on those points are really welcome!
>>
>> Thanks
>>
>> Eric
>>
>>>
>>> This series reuses/rebases patches initially submitted by Shameer in [1]
>>> and Kwangwoo in [2].
>>>
>>> I put all parts all together for consistency and due to dependencies
>>> however as soon as the kernel dependency is resolved we can consider
>>> upstreaming them separately.
>>>
>>> Support more than 40b IPA/GPA [ patches 1 - 5 ]
>>> -----------------------------------------------
>>> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>>
>>> At the moment the guest physical address space is limited to 40b
>>> due to KVM limitations. [0] bumps this limitation and allows to
>>> create a VM with up to 52b GPA address space.
>>>
>>> With this series, QEMU creates a virt VM with the max IPA range
>>> reported by the host kernel or 40b by default.
>>>
>>> This choice can be overriden by using the -machine kvm-type=<bits>
>>> option with bits within [40, 52]. If <bits> are not supported by
>>> the host, the legacy 40b value is used.
>>>
>>> Currently the EDK2 FW also hardcodes the max number of GPA bits to
>>> 40. This will need to be fixed.
>>>
>>> PCDIMM Support [ patches 6 - 11 ]
>>> ---------------------------------
>>> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>>
>>> We instantiate the device_memory at 2TB. Using it obviously requires
>>> at least 42b of IPA/GPA. While its max capacity is currently limited
>>> to 2TB, the actual size depends on the initial guest RAM size and
>>> maxmem parameter.
>>>
>>> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
>>> of support of those features in baremetal.
>>>
>>> NVDIMM support [ patches 12 - 15 ]
>>> ----------------------------------
>>>
>>> Once the memory hotplug framework is in place it is fairly
>>> straightforward to add support for NVDIMM. the machine "nvdimm" option
>>> turns the capability on.
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> References:
>>>
>>> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
>>> https://www.spinics.net/lists/kernel/msg2841735.html
>>>
>>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>> http://patchwork.ozlabs.org/cover/914694/
>>>
>>> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>>
>>> Tests:
>>> - On Cavium Gigabyte, a 48b VM was created.
>>> - Migration tests were performed between kernel supporting the
>>>   feature and destination kernel not suporting it
>>> - test with ACPI: to overcome the limitation of EDK2 FW, virt
>>>   memory map was hacked to move the device memory below 1TB.
>>>
>>> This series can be found at:
>>> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
>>>
>>> History:
>>>
>>> v2 -> v3:
>>> - fix pc_q35 and pc_piix compilation error
>>> - kwangwoo's email being not valid anymore, remove his address
>>>
>>> v1 -> v2:
>>> - kvm_get_max_vm_phys_shift moved in arch specific file
>>> - addition of NVDIMM part
>>> - single series
>>> - rebase on David's refactoring
>>>
>>> v1:
>>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>>
>>> Eric Auger (9):
>>>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
>>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>>   kvm: add kvm_arm_get_max_vm_phys_shift
>>>   hw/arm/virt: support kvm_type property
>>>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
>>>   hw/arm/virt: Allocate device_memory
>>>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
>>>   hw/arm/boot: Expose the pmem nodes in the DT
>>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>>
>>> Kwangwoo Lee (2):
>>>   nvdimm: use configurable ACPI IO base and size
>>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>>
>>> Shameer Kolothum (4):
>>>   hw/arm/virt: Add memory hotplug framework
>>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>>
>>>  accel/kvm/kvm-all.c                            |   2 +-
>>>  default-configs/arm-softmmu.mak                |   4 +
>>>  hw/acpi/aml-build.c                            |  51 ++++
>>>  hw/acpi/nvdimm.c                               |  28 ++-
>>>  hw/arm/boot.c                                  | 123 +++++++--
>>>  hw/arm/virt-acpi-build.c                       |  10 +
>>>  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
>>>  hw/i386/acpi-build.c                           |  49 ----
>>>  hw/i386/pc_piix.c                              |   8 +-
>>>  hw/i386/pc_q35.c                               |   8 +-
>>>  hw/ppc/mac_newworld.c                          |   2 +-
>>>  hw/ppc/mac_oldworld.c                          |   2 +-
>>>  hw/ppc/spapr.c                                 |   2 +-
>>>  include/hw/acpi/aml-build.h                    |   3 +
>>>  include/hw/arm/arm.h                           |   2 +
>>>  include/hw/arm/virt.h                          |   7 +
>>>  include/hw/boards.h                            |   2 +-
>>>  include/hw/mem/nvdimm.h                        |  12 +
>>>  include/standard-headers/linux/virtio_config.h |  16 +-
>>>  linux-headers/asm-mips/unistd.h                |  18 +-
>>>  linux-headers/asm-powerpc/kvm.h                |   1 +
>>>  linux-headers/linux/kvm.h                      |  16 ++
>>>  target/arm/kvm.c                               |   9 +
>>>  target/arm/kvm_arm.h                           |  16 ++
>>>  24 files changed, 597 insertions(+), 124 deletions(-)
>>>   
>>
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-04 11:32     ` Auger Eric
@ 2018-10-04 12:02       ` David Hildenbrand
  2018-10-04 12:07         ` Auger Eric
  2018-10-04 13:16       ` Igor Mammedov
  1 sibling, 1 reply; 62+ messages in thread
From: David Hildenbrand @ 2018-10-04 12:02 UTC (permalink / raw)
  To: Auger Eric, Igor Mammedov
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, wei, drjones, Ard Biesheuvel,
	Dr. David Alan Gilbert, agraf, Laszlo Ersek, david

>>> Alternative to have a split model is having a floating RAM base for a
>>> contiguous initial + device memory (contiguity actually depends on
>>> initial RAM size alignment too). This requires significant changes in FW
>>> and also potentially impacts the legacy virt address map as we need to
>>> pass the RAM floating base address in some way (using an SRAM at 1GB) or
>>> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
>>> reluctance to move the RAM earlier
>> Drew is working on it, lets see outcome first.
>>
>> We actually may try implement single region that uses pc-dimm for
>> all memory (including initial) and be still compatible with legacy layout
>> as far as legacy mode sticks to the current RAM limit and device memory
>> region is put at the current RAM base.
>> When flexible RAM base is available, we will move that region to
>> non legacy layout at 2TB (or wherever).
> 
> Oh I did not understand you wanted to also replace the initial memory by
> device memory. So we would switch from a pure static initial RAM setup
> to a pure dynamic device memory setup. Looks quite drastic a change to
> me. As mentionned I am concerned about complexifying the qemu cmd line
> and I asked livirt guys about the induced pain.

One idea was to create internal memory devices (e.g. "memory chip") that
get created and placed automatically in guest physical address space.
These devices would not require a change on the cmdline, they would be
created automatically from the existing parameters.

The machine device memory region would than be one big region for both,
internal memory devices and external ("plugged") memory devices a.k.a.
dimms.

I guess that will require more work to be done.

> 
> Thank you for your feedbacks
> 
> Eric


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-04 12:02       ` David Hildenbrand
@ 2018-10-04 12:07         ` Auger Eric
  0 siblings, 0 replies; 62+ messages in thread
From: Auger Eric @ 2018-10-04 12:07 UTC (permalink / raw)
  To: David Hildenbrand, Igor Mammedov
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, wei, drjones, Ard Biesheuvel,
	Dr. David Alan Gilbert, agraf, Laszlo Ersek, david

Hi David,

On 10/4/18 2:02 PM, David Hildenbrand wrote:
>>>> Alternative to have a split model is having a floating RAM base for a
>>>> contiguous initial + device memory (contiguity actually depends on
>>>> initial RAM size alignment too). This requires significant changes in FW
>>>> and also potentially impacts the legacy virt address map as we need to
>>>> pass the RAM floating base address in some way (using an SRAM at 1GB) or
>>>> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
>>>> reluctance to move the RAM earlier
>>> Drew is working on it, lets see outcome first.
>>>
>>> We actually may try implement single region that uses pc-dimm for
>>> all memory (including initial) and be still compatible with legacy layout
>>> as far as legacy mode sticks to the current RAM limit and device memory
>>> region is put at the current RAM base.
>>> When flexible RAM base is available, we will move that region to
>>> non legacy layout at 2TB (or wherever).
>>
>> Oh I did not understand you wanted to also replace the initial memory by
>> device memory. So we would switch from a pure static initial RAM setup
>> to a pure dynamic device memory setup. Looks quite drastic a change to
>> me. As mentionned I am concerned about complexifying the qemu cmd line
>> and I asked livirt guys about the induced pain.
> 
> One idea was to create internal memory devices (e.g. "memory chip") that
> get created and placed automatically in guest physical address space.
> These devices would not require a change on the cmdline, they would be
> created automatically from the existing parameters.
> 
> The machine device memory region would than be one big region for both,
> internal memory devices and external ("plugged") memory devices a.k.a.
> dimms.
> 
> I guess that will require more work to be done.

OK interesting. Yes this adds some more work on the pile ...

Thanks

Eric
> 
>>
>> Thank you for your feedbacks
>>
>> Eric
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-04 11:32     ` Auger Eric
  2018-10-04 12:02       ` David Hildenbrand
@ 2018-10-04 13:16       ` Igor Mammedov
  2018-10-04 14:16         ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 62+ messages in thread
From: Igor Mammedov @ 2018-10-04 13:16 UTC (permalink / raw)
  To: Auger Eric
  Cc: wei, peter.maydell, drjones, david, Ard Biesheuvel, qemu-devel,
	shameerali.kolothum.thodi, Dr. David Alan Gilbert, agraf,
	qemu-arm, david, Laszlo Ersek, eric.auger.pro

On Thu, 4 Oct 2018 13:32:26 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 10/4/18 1:11 PM, Igor Mammedov wrote:
> > On Wed, 3 Oct 2018 15:49:03 +0200
> > Auger Eric <eric.auger@redhat.com> wrote:
> >   
> >> Hi,
> >>
> >> On 7/3/18 9:19 AM, Eric Auger wrote:  
> >>> This series aims at supporting PCDIMM/NVDIMM intantiation in
> >>> machvirt at 2TB guest physical address.
> >>>
> >>> This is achieved in 3 steps:
> >>> 1) support more than 40b IPA/GPA
> >>> 2) support PCDIMM instantiation
> >>> 3) support NVDIMM instantiation    
> >>
> >> While respinning this series I have some general questions that raise up
> >> when thinking about extending the RAM on mach-virt:
> >>
> >> At the moment mach-virt offers 255GB max initial RAM starting at 1GB
> >> ("-m " option).
> >>
> >> This series does not touch this initial RAM and only targets to add
> >> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
> >> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
> >> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK?
> >>
> >> - Putting device memory at 2TB means only ARMv8/aarch64 would get
> >> benefit of it. Is it an issue? ie. no device memory for ARMv7 or
> >> ARMv8/aarch32. Do we need to put effort supporting more memory and
> >> memory devices for those configs? there is less than 256GB free in the
> >> existing 1TB mach-virt memory map anyway.
> >>
> >> - is it OK to rely only on device memory to extend the existing 255 GB
> >> RAM or would we need additional initial memory? device memory usage
> >> induces a more complex command line so this puts a constraint on upper
> >> layers. Is it acceptable though?
> >>
> >> - I revisited the series so that the max IPA size shift would get
> >> automatically computed according to the top address reached by the
> >> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
> >> any additional kvm-type or explicit vm-phys-shift option to select the
> >> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
> >> also assumes we don't put anything beyond the device memory. It is OK?
> >>
> >> - Igor told me we was concerned about the split-memory RAM model as it
> >> caused a lot of trouble regarding compat/migration on PC machine. After
> >> having studied the pc machine code I now wonder if we can compare the PC
> >> compat issues with the ones we could encounter on ARM with the proposed
> >> split memory model.  
> > that's not the only issue.
> > 
> > For example since initial memory isn't modeled as a device
> > (i.e. it's just a plain memory region), there is a bunch of numa
> > code to deal with it. If initial memory were replaced by pc-dimm,
> > we would drop some of it and if we deprecated old '-numa mem' we
> > should be able to drop the most of it (newer '-numa memdev' maps
> > directly into pc-dimm model).  
> see my comment below.
> > 
> >    
> >> On PC there are many knobs to tune the RAM layout
> >> - max_ram_below_4g option tunes how much RAM we want below 4G
> >> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
> >> max_ram_below_4g
> >> - plus the usual ram_size which affects the rest of the initial ram
> >> - plus the maxram_size, slots which affect the size of the device memory
> >> - the device memory is just behind the initial RAM, aligned to 1GB
> >>
> >> Note the inital RAM and the device memory may be disjoint due to
> >> misalignment of the initial ram size against 1GB
> >>
> >> On ARM, we would have 3.0 virt machine supporting only initial RAM from
> >> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
> >> initial RAM + device memory from 2TB to 4TB.
> >>
> >> With that memory split and the different machine type, I don't see any
> >> major hurdle with respect to migration. Do I miss something?  
> > Later on someone with a need to punch holes in fixed initial RAM/device memory,
> > starts making it complex.  
> Support of host reserved regions is not acked yet but that's a valid
> argument.
> >   
> >> Alternative to have a split model is having a floating RAM base for a
> >> contiguous initial + device memory (contiguity actually depends on
> >> initial RAM size alignment too). This requires significant changes in FW
> >> and also potentially impacts the legacy virt address map as we need to
> >> pass the RAM floating base address in some way (using an SRAM at 1GB) or
> >> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
> >> reluctance to move the RAM earlier  
> > Drew is working on it, lets see outcome first.
> > 
> > We actually may try implement single region that uses pc-dimm for
> > all memory (including initial) and be still compatible with legacy layout
> > as far as legacy mode sticks to the current RAM limit and device memory
> > region is put at the current RAM base.
> > When flexible RAM base is available, we will move that region to
> > non legacy layout at 2TB (or wherever).  
> 
> Oh I did not understand you wanted to also replace the initial memory by
> device memory. So we would switch from a pure static initial RAM setup
> to a pure dynamic device memory setup. Looks quite drastic a change to
> me. As mentionned I am concerned about complexifying the qemu cmd line
> and I asked livirt guys about the induced pain.
Converting initial ram to memory device model beyond the current limits
within single RAM zone, is the reason why flexible RAM idea was brought in.
That way we'd end up with a single way to instantiate RAM (model after
bare-metal machines) and possibility to use hotplug/nvdimm/... with initial
RAM without any huge refactoring (+compat knobs) on top later.

2 regions solution is easier hack together right now. If there are
more regions and we leave initial RAM as is (there is no point
to bother with flexible RAM base) but it won't lead us to uniform
RAM handling and won't simplify anything.

Considering virt board doesn't have compat RAM layout baggage of x86,
it only looks drastic, but in reality it might turn out into a simple
refactoring.

As for complicated CLI, for compat reasons we will be forced to support
'-m size=!0', we should be able to translate that implicitly into dimm.
In addition with dimms as initial memory users would have a choice to ditch
"-numa (mem|memdev)" altogether and do
  -m 0,slots=X,maxmem=Y -device pc-dimm,node=x...
and related '-numa' would become a compat shim to translate into
the similar dimm devices set under the hood.
(looks like too much fantasy :))

Possible complications on QEMU side I see in handling of legacy '-numa mem'.
Easiest would be deprecate it and then do conversion or workaround
it by replacing it with pc-dimm like device that's treated like
a memory region that we have now.

> 
> Thank you for your feedbacks
> 
> Eric
> 
> 
> >   
> >> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).
> >>
> >> Your feedbacks on those points are really welcome!
> >>
> >> Thanks
> >>
> >> Eric
> >>  
> >>>
> >>> This series reuses/rebases patches initially submitted by Shameer in [1]
> >>> and Kwangwoo in [2].
> >>>
> >>> I put all parts all together for consistency and due to dependencies
> >>> however as soon as the kernel dependency is resolved we can consider
> >>> upstreaming them separately.
> >>>
> >>> Support more than 40b IPA/GPA [ patches 1 - 5 ]
> >>> -----------------------------------------------
> >>> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> >>>
> >>> At the moment the guest physical address space is limited to 40b
> >>> due to KVM limitations. [0] bumps this limitation and allows to
> >>> create a VM with up to 52b GPA address space.
> >>>
> >>> With this series, QEMU creates a virt VM with the max IPA range
> >>> reported by the host kernel or 40b by default.
> >>>
> >>> This choice can be overriden by using the -machine kvm-type=<bits>
> >>> option with bits within [40, 52]. If <bits> are not supported by
> >>> the host, the legacy 40b value is used.
> >>>
> >>> Currently the EDK2 FW also hardcodes the max number of GPA bits to
> >>> 40. This will need to be fixed.
> >>>
> >>> PCDIMM Support [ patches 6 - 11 ]
> >>> ---------------------------------
> >>> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> >>>
> >>> We instantiate the device_memory at 2TB. Using it obviously requires
> >>> at least 42b of IPA/GPA. While its max capacity is currently limited
> >>> to 2TB, the actual size depends on the initial guest RAM size and
> >>> maxmem parameter.
> >>>
> >>> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
> >>> of support of those features in baremetal.
> >>>
> >>> NVDIMM support [ patches 12 - 15 ]
> >>> ----------------------------------
> >>>
> >>> Once the memory hotplug framework is in place it is fairly
> >>> straightforward to add support for NVDIMM. the machine "nvdimm" option
> >>> turns the capability on.
> >>>
> >>> Best Regards
> >>>
> >>> Eric
> >>>
> >>> References:
> >>>
> >>> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
> >>> https://www.spinics.net/lists/kernel/msg2841735.html
> >>>
> >>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> >>> http://patchwork.ozlabs.org/cover/914694/
> >>>
> >>> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> >>>
> >>> Tests:
> >>> - On Cavium Gigabyte, a 48b VM was created.
> >>> - Migration tests were performed between kernel supporting the
> >>>   feature and destination kernel not suporting it
> >>> - test with ACPI: to overcome the limitation of EDK2 FW, virt
> >>>   memory map was hacked to move the device memory below 1TB.
> >>>
> >>> This series can be found at:
> >>> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
> >>>
> >>> History:
> >>>
> >>> v2 -> v3:
> >>> - fix pc_q35 and pc_piix compilation error
> >>> - kwangwoo's email being not valid anymore, remove his address
> >>>
> >>> v1 -> v2:
> >>> - kvm_get_max_vm_phys_shift moved in arch specific file
> >>> - addition of NVDIMM part
> >>> - single series
> >>> - rebase on David's refactoring
> >>>
> >>> v1:
> >>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> >>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> >>>
> >>> Best Regards
> >>>
> >>> Eric
> >>>
> >>>
> >>> Eric Auger (9):
> >>>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> >>>   hw/boards: Add a MachineState parameter to kvm_type callback
> >>>   kvm: add kvm_arm_get_max_vm_phys_shift
> >>>   hw/arm/virt: support kvm_type property
> >>>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
> >>>   hw/arm/virt: Allocate device_memory
> >>>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
> >>>   hw/arm/boot: Expose the pmem nodes in the DT
> >>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> >>>
> >>> Kwangwoo Lee (2):
> >>>   nvdimm: use configurable ACPI IO base and size
> >>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> >>>
> >>> Shameer Kolothum (4):
> >>>   hw/arm/virt: Add memory hotplug framework
> >>>   hw/arm/boot: introduce fdt_add_memory_node helper
> >>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> >>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> >>>
> >>>  accel/kvm/kvm-all.c                            |   2 +-
> >>>  default-configs/arm-softmmu.mak                |   4 +
> >>>  hw/acpi/aml-build.c                            |  51 ++++
> >>>  hw/acpi/nvdimm.c                               |  28 ++-
> >>>  hw/arm/boot.c                                  | 123 +++++++--
> >>>  hw/arm/virt-acpi-build.c                       |  10 +
> >>>  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
> >>>  hw/i386/acpi-build.c                           |  49 ----
> >>>  hw/i386/pc_piix.c                              |   8 +-
> >>>  hw/i386/pc_q35.c                               |   8 +-
> >>>  hw/ppc/mac_newworld.c                          |   2 +-
> >>>  hw/ppc/mac_oldworld.c                          |   2 +-
> >>>  hw/ppc/spapr.c                                 |   2 +-
> >>>  include/hw/acpi/aml-build.h                    |   3 +
> >>>  include/hw/arm/arm.h                           |   2 +
> >>>  include/hw/arm/virt.h                          |   7 +
> >>>  include/hw/boards.h                            |   2 +-
> >>>  include/hw/mem/nvdimm.h                        |  12 +
> >>>  include/standard-headers/linux/virtio_config.h |  16 +-
> >>>  linux-headers/asm-mips/unistd.h                |  18 +-
> >>>  linux-headers/asm-powerpc/kvm.h                |   1 +
> >>>  linux-headers/linux/kvm.h                      |  16 ++
> >>>  target/arm/kvm.c                               |   9 +
> >>>  target/arm/kvm_arm.h                           |  16 ++
> >>>  24 files changed, 597 insertions(+), 124 deletions(-)
> >>>     
> >>  
> >   
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-04 13:16       ` Igor Mammedov
@ 2018-10-04 14:16         ` Dr. David Alan Gilbert
  2018-10-05  8:18           ` Igor Mammedov
  0 siblings, 1 reply; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2018-10-04 14:16 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Auger Eric, wei, peter.maydell, drjones, david, Ard Biesheuvel,
	qemu-devel, shameerali.kolothum.thodi, agraf, qemu-arm, david,
	Laszlo Ersek, eric.auger.pro

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Thu, 4 Oct 2018 13:32:26 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
> > Hi Igor,
> > 
> > On 10/4/18 1:11 PM, Igor Mammedov wrote:
> > > On Wed, 3 Oct 2018 15:49:03 +0200
> > > Auger Eric <eric.auger@redhat.com> wrote:
> > >   
> > >> Hi,
> > >>
> > >> On 7/3/18 9:19 AM, Eric Auger wrote:  
> > >>> This series aims at supporting PCDIMM/NVDIMM intantiation in
> > >>> machvirt at 2TB guest physical address.
> > >>>
> > >>> This is achieved in 3 steps:
> > >>> 1) support more than 40b IPA/GPA
> > >>> 2) support PCDIMM instantiation
> > >>> 3) support NVDIMM instantiation    
> > >>
> > >> While respinning this series I have some general questions that raise up
> > >> when thinking about extending the RAM on mach-virt:
> > >>
> > >> At the moment mach-virt offers 255GB max initial RAM starting at 1GB
> > >> ("-m " option).
> > >>
> > >> This series does not touch this initial RAM and only targets to add
> > >> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
> > >> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
> > >> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK?
> > >>
> > >> - Putting device memory at 2TB means only ARMv8/aarch64 would get
> > >> benefit of it. Is it an issue? ie. no device memory for ARMv7 or
> > >> ARMv8/aarch32. Do we need to put effort supporting more memory and
> > >> memory devices for those configs? there is less than 256GB free in the
> > >> existing 1TB mach-virt memory map anyway.
> > >>
> > >> - is it OK to rely only on device memory to extend the existing 255 GB
> > >> RAM or would we need additional initial memory? device memory usage
> > >> induces a more complex command line so this puts a constraint on upper
> > >> layers. Is it acceptable though?
> > >>
> > >> - I revisited the series so that the max IPA size shift would get
> > >> automatically computed according to the top address reached by the
> > >> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
> > >> any additional kvm-type or explicit vm-phys-shift option to select the
> > >> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
> > >> also assumes we don't put anything beyond the device memory. It is OK?
> > >>
> > >> - Igor told me we was concerned about the split-memory RAM model as it
> > >> caused a lot of trouble regarding compat/migration on PC machine. After
> > >> having studied the pc machine code I now wonder if we can compare the PC
> > >> compat issues with the ones we could encounter on ARM with the proposed
> > >> split memory model.  
> > > that's not the only issue.
> > > 
> > > For example since initial memory isn't modeled as a device
> > > (i.e. it's just a plain memory region), there is a bunch of numa
> > > code to deal with it. If initial memory were replaced by pc-dimm,
> > > we would drop some of it and if we deprecated old '-numa mem' we
> > > should be able to drop the most of it (newer '-numa memdev' maps
> > > directly into pc-dimm model).  
> > see my comment below.
> > > 
> > >    
> > >> On PC there are many knobs to tune the RAM layout
> > >> - max_ram_below_4g option tunes how much RAM we want below 4G
> > >> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
> > >> max_ram_below_4g
> > >> - plus the usual ram_size which affects the rest of the initial ram
> > >> - plus the maxram_size, slots which affect the size of the device memory
> > >> - the device memory is just behind the initial RAM, aligned to 1GB
> > >>
> > >> Note the inital RAM and the device memory may be disjoint due to
> > >> misalignment of the initial ram size against 1GB
> > >>
> > >> On ARM, we would have 3.0 virt machine supporting only initial RAM from
> > >> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
> > >> initial RAM + device memory from 2TB to 4TB.
> > >>
> > >> With that memory split and the different machine type, I don't see any
> > >> major hurdle with respect to migration. Do I miss something?  
> > > Later on someone with a need to punch holes in fixed initial RAM/device memory,
> > > starts making it complex.  
> > Support of host reserved regions is not acked yet but that's a valid
> > argument.
> > >   
> > >> Alternative to have a split model is having a floating RAM base for a
> > >> contiguous initial + device memory (contiguity actually depends on
> > >> initial RAM size alignment too). This requires significant changes in FW
> > >> and also potentially impacts the legacy virt address map as we need to
> > >> pass the RAM floating base address in some way (using an SRAM at 1GB) or
> > >> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
> > >> reluctance to move the RAM earlier  
> > > Drew is working on it, lets see outcome first.
> > > 
> > > We actually may try implement single region that uses pc-dimm for
> > > all memory (including initial) and be still compatible with legacy layout
> > > as far as legacy mode sticks to the current RAM limit and device memory
> > > region is put at the current RAM base.
> > > When flexible RAM base is available, we will move that region to
> > > non legacy layout at 2TB (or wherever).  
> > 
> > Oh I did not understand you wanted to also replace the initial memory by
> > device memory. So we would switch from a pure static initial RAM setup
> > to a pure dynamic device memory setup. Looks quite drastic a change to
> > me. As mentionned I am concerned about complexifying the qemu cmd line
> > and I asked livirt guys about the induced pain.
> Converting initial ram to memory device model beyond the current limits
> within single RAM zone, is the reason why flexible RAM idea was brought in.
> That way we'd end up with a single way to instantiate RAM (model after
> bare-metal machines) and possibility to use hotplug/nvdimm/... with initial
> RAM without any huge refactoring (+compat knobs) on top later.
> 
> 2 regions solution is easier hack together right now. If there are
> more regions and we leave initial RAM as is (there is no point
> to bother with flexible RAM base) but it won't lead us to uniform
> RAM handling and won't simplify anything.
> 
> Considering virt board doesn't have compat RAM layout baggage of x86,
> it only looks drastic, but in reality it might turn out into a simple
> refactoring.
> 
> As for complicated CLI, for compat reasons we will be forced to support
> '-m size=!0', we should be able to translate that implicitly into dimm.
> In addition with dimms as initial memory users would have a choice to ditch
> "-numa (mem|memdev)" altogether and do
>   -m 0,slots=X,maxmem=Y -device pc-dimm,node=x...
> and related '-numa' would become a compat shim to translate into
> the similar dimm devices set under the hood.
> (looks like too much fantasy :))
> 
> Possible complications on QEMU side I see in handling of legacy '-numa mem'.
> Easiest would be deprecate it and then do conversion or workaround
> it by replacing it with pc-dimm like device that's treated like
> a memory region that we have now.

And any migration compatibility issues of the naming of the RAMBlocks;
if virt is at the point it cares about that compatibility.

Dave

> > 
> > Thank you for your feedbacks
> > 
> > Eric
> > 
> > 
> > >   
> > >> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).
> > >>
> > >> Your feedbacks on those points are really welcome!
> > >>
> > >> Thanks
> > >>
> > >> Eric
> > >>  
> > >>>
> > >>> This series reuses/rebases patches initially submitted by Shameer in [1]
> > >>> and Kwangwoo in [2].
> > >>>
> > >>> I put all parts all together for consistency and due to dependencies
> > >>> however as soon as the kernel dependency is resolved we can consider
> > >>> upstreaming them separately.
> > >>>
> > >>> Support more than 40b IPA/GPA [ patches 1 - 5 ]
> > >>> -----------------------------------------------
> > >>> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > >>>
> > >>> At the moment the guest physical address space is limited to 40b
> > >>> due to KVM limitations. [0] bumps this limitation and allows to
> > >>> create a VM with up to 52b GPA address space.
> > >>>
> > >>> With this series, QEMU creates a virt VM with the max IPA range
> > >>> reported by the host kernel or 40b by default.
> > >>>
> > >>> This choice can be overriden by using the -machine kvm-type=<bits>
> > >>> option with bits within [40, 52]. If <bits> are not supported by
> > >>> the host, the legacy 40b value is used.
> > >>>
> > >>> Currently the EDK2 FW also hardcodes the max number of GPA bits to
> > >>> 40. This will need to be fixed.
> > >>>
> > >>> PCDIMM Support [ patches 6 - 11 ]
> > >>> ---------------------------------
> > >>> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > >>>
> > >>> We instantiate the device_memory at 2TB. Using it obviously requires
> > >>> at least 42b of IPA/GPA. While its max capacity is currently limited
> > >>> to 2TB, the actual size depends on the initial guest RAM size and
> > >>> maxmem parameter.
> > >>>
> > >>> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
> > >>> of support of those features in baremetal.
> > >>>
> > >>> NVDIMM support [ patches 12 - 15 ]
> > >>> ----------------------------------
> > >>>
> > >>> Once the memory hotplug framework is in place it is fairly
> > >>> straightforward to add support for NVDIMM. the machine "nvdimm" option
> > >>> turns the capability on.
> > >>>
> > >>> Best Regards
> > >>>
> > >>> Eric
> > >>>
> > >>> References:
> > >>>
> > >>> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
> > >>> https://www.spinics.net/lists/kernel/msg2841735.html
> > >>>
> > >>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> > >>> http://patchwork.ozlabs.org/cover/914694/
> > >>>
> > >>> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> > >>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> > >>>
> > >>> Tests:
> > >>> - On Cavium Gigabyte, a 48b VM was created.
> > >>> - Migration tests were performed between kernel supporting the
> > >>>   feature and destination kernel not suporting it
> > >>> - test with ACPI: to overcome the limitation of EDK2 FW, virt
> > >>>   memory map was hacked to move the device memory below 1TB.
> > >>>
> > >>> This series can be found at:
> > >>> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
> > >>>
> > >>> History:
> > >>>
> > >>> v2 -> v3:
> > >>> - fix pc_q35 and pc_piix compilation error
> > >>> - kwangwoo's email being not valid anymore, remove his address
> > >>>
> > >>> v1 -> v2:
> > >>> - kvm_get_max_vm_phys_shift moved in arch specific file
> > >>> - addition of NVDIMM part
> > >>> - single series
> > >>> - rebase on David's refactoring
> > >>>
> > >>> v1:
> > >>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > >>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > >>>
> > >>> Best Regards
> > >>>
> > >>> Eric
> > >>>
> > >>>
> > >>> Eric Auger (9):
> > >>>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> > >>>   hw/boards: Add a MachineState parameter to kvm_type callback
> > >>>   kvm: add kvm_arm_get_max_vm_phys_shift
> > >>>   hw/arm/virt: support kvm_type property
> > >>>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
> > >>>   hw/arm/virt: Allocate device_memory
> > >>>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
> > >>>   hw/arm/boot: Expose the pmem nodes in the DT
> > >>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> > >>>
> > >>> Kwangwoo Lee (2):
> > >>>   nvdimm: use configurable ACPI IO base and size
> > >>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> > >>>
> > >>> Shameer Kolothum (4):
> > >>>   hw/arm/virt: Add memory hotplug framework
> > >>>   hw/arm/boot: introduce fdt_add_memory_node helper
> > >>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> > >>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> > >>>
> > >>>  accel/kvm/kvm-all.c                            |   2 +-
> > >>>  default-configs/arm-softmmu.mak                |   4 +
> > >>>  hw/acpi/aml-build.c                            |  51 ++++
> > >>>  hw/acpi/nvdimm.c                               |  28 ++-
> > >>>  hw/arm/boot.c                                  | 123 +++++++--
> > >>>  hw/arm/virt-acpi-build.c                       |  10 +
> > >>>  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
> > >>>  hw/i386/acpi-build.c                           |  49 ----
> > >>>  hw/i386/pc_piix.c                              |   8 +-
> > >>>  hw/i386/pc_q35.c                               |   8 +-
> > >>>  hw/ppc/mac_newworld.c                          |   2 +-
> > >>>  hw/ppc/mac_oldworld.c                          |   2 +-
> > >>>  hw/ppc/spapr.c                                 |   2 +-
> > >>>  include/hw/acpi/aml-build.h                    |   3 +
> > >>>  include/hw/arm/arm.h                           |   2 +
> > >>>  include/hw/arm/virt.h                          |   7 +
> > >>>  include/hw/boards.h                            |   2 +-
> > >>>  include/hw/mem/nvdimm.h                        |  12 +
> > >>>  include/standard-headers/linux/virtio_config.h |  16 +-
> > >>>  linux-headers/asm-mips/unistd.h                |  18 +-
> > >>>  linux-headers/asm-powerpc/kvm.h                |   1 +
> > >>>  linux-headers/linux/kvm.h                      |  16 ++
> > >>>  target/arm/kvm.c                               |   9 +
> > >>>  target/arm/kvm_arm.h                           |  16 ++
> > >>>  24 files changed, 597 insertions(+), 124 deletions(-)
> > >>>     
> > >>  
> > >   
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-10-04 14:16         ` Dr. David Alan Gilbert
@ 2018-10-05  8:18           ` Igor Mammedov
  0 siblings, 0 replies; 62+ messages in thread
From: Igor Mammedov @ 2018-10-05  8:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Auger Eric, wei, peter.maydell, drjones, david, Ard Biesheuvel,
	qemu-devel, shameerali.kolothum.thodi, agraf, qemu-arm, david,
	Laszlo Ersek, eric.auger.pro

On Thu, 4 Oct 2018 15:16:13 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Thu, 4 Oct 2018 13:32:26 +0200
> > Auger Eric <eric.auger@redhat.com> wrote:
> >   
> > > Hi Igor,
> > > 
> > > On 10/4/18 1:11 PM, Igor Mammedov wrote:  
> > > > On Wed, 3 Oct 2018 15:49:03 +0200
> > > > Auger Eric <eric.auger@redhat.com> wrote:
> > > >     
> > > >> Hi,
> > > >>
> > > >> On 7/3/18 9:19 AM, Eric Auger wrote:    
> > > >>> This series aims at supporting PCDIMM/NVDIMM intantiation in
> > > >>> machvirt at 2TB guest physical address.
> > > >>>
> > > >>> This is achieved in 3 steps:
> > > >>> 1) support more than 40b IPA/GPA
> > > >>> 2) support PCDIMM instantiation
> > > >>> 3) support NVDIMM instantiation      
> > > >>
> > > >> While respinning this series I have some general questions that raise up
> > > >> when thinking about extending the RAM on mach-virt:
> > > >>
> > > >> At the moment mach-virt offers 255GB max initial RAM starting at 1GB
> > > >> ("-m " option).
> > > >>
> > > >> This series does not touch this initial RAM and only targets to add
> > > >> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in
> > > >> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB
> > > >> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK?
> > > >>
> > > >> - Putting device memory at 2TB means only ARMv8/aarch64 would get
> > > >> benefit of it. Is it an issue? ie. no device memory for ARMv7 or
> > > >> ARMv8/aarch32. Do we need to put effort supporting more memory and
> > > >> memory devices for those configs? there is less than 256GB free in the
> > > >> existing 1TB mach-virt memory map anyway.
> > > >>
> > > >> - is it OK to rely only on device memory to extend the existing 255 GB
> > > >> RAM or would we need additional initial memory? device memory usage
> > > >> induces a more complex command line so this puts a constraint on upper
> > > >> layers. Is it acceptable though?
> > > >>
> > > >> - I revisited the series so that the max IPA size shift would get
> > > >> automatically computed according to the top address reached by the
> > > >> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need
> > > >> any additional kvm-type or explicit vm-phys-shift option to select the
> > > >> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This
> > > >> also assumes we don't put anything beyond the device memory. It is OK?
> > > >>
> > > >> - Igor told me we was concerned about the split-memory RAM model as it
> > > >> caused a lot of trouble regarding compat/migration on PC machine. After
> > > >> having studied the pc machine code I now wonder if we can compare the PC
> > > >> compat issues with the ones we could encounter on ARM with the proposed
> > > >> split memory model.    
> > > > that's not the only issue.
> > > > 
> > > > For example since initial memory isn't modeled as a device
> > > > (i.e. it's just a plain memory region), there is a bunch of numa
> > > > code to deal with it. If initial memory were replaced by pc-dimm,
> > > > we would drop some of it and if we deprecated old '-numa mem' we
> > > > should be able to drop the most of it (newer '-numa memdev' maps
> > > > directly into pc-dimm model).    
> > > see my comment below.  
> > > > 
> > > >      
> > > >> On PC there are many knobs to tune the RAM layout
> > > >> - max_ram_below_4g option tunes how much RAM we want below 4G
> > > >> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size >
> > > >> max_ram_below_4g
> > > >> - plus the usual ram_size which affects the rest of the initial ram
> > > >> - plus the maxram_size, slots which affect the size of the device memory
> > > >> - the device memory is just behind the initial RAM, aligned to 1GB
> > > >>
> > > >> Note the inital RAM and the device memory may be disjoint due to
> > > >> misalignment of the initial ram size against 1GB
> > > >>
> > > >> On ARM, we would have 3.0 virt machine supporting only initial RAM from
> > > >> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same
> > > >> initial RAM + device memory from 2TB to 4TB.
> > > >>
> > > >> With that memory split and the different machine type, I don't see any
> > > >> major hurdle with respect to migration. Do I miss something?    
> > > > Later on someone with a need to punch holes in fixed initial RAM/device memory,
> > > > starts making it complex.    
> > > Support of host reserved regions is not acked yet but that's a valid
> > > argument.  
> > > >     
> > > >> Alternative to have a split model is having a floating RAM base for a
> > > >> contiguous initial + device memory (contiguity actually depends on
> > > >> initial RAM size alignment too). This requires significant changes in FW
> > > >> and also potentially impacts the legacy virt address map as we need to
> > > >> pass the RAM floating base address in some way (using an SRAM at 1GB) or
> > > >> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their
> > > >> reluctance to move the RAM earlier    
> > > > Drew is working on it, lets see outcome first.
> > > > 
> > > > We actually may try implement single region that uses pc-dimm for
> > > > all memory (including initial) and be still compatible with legacy layout
> > > > as far as legacy mode sticks to the current RAM limit and device memory
> > > > region is put at the current RAM base.
> > > > When flexible RAM base is available, we will move that region to
> > > > non legacy layout at 2TB (or wherever).    
> > > 
> > > Oh I did not understand you wanted to also replace the initial memory by
> > > device memory. So we would switch from a pure static initial RAM setup
> > > to a pure dynamic device memory setup. Looks quite drastic a change to
> > > me. As mentionned I am concerned about complexifying the qemu cmd line
> > > and I asked livirt guys about the induced pain.  
> > Converting initial ram to memory device model beyond the current limits
> > within single RAM zone, is the reason why flexible RAM idea was brought in.
> > That way we'd end up with a single way to instantiate RAM (model after
> > bare-metal machines) and possibility to use hotplug/nvdimm/... with initial
> > RAM without any huge refactoring (+compat knobs) on top later.
> > 
> > 2 regions solution is easier hack together right now. If there are
> > more regions and we leave initial RAM as is (there is no point
> > to bother with flexible RAM base) but it won't lead us to uniform
> > RAM handling and won't simplify anything.
> > 
> > Considering virt board doesn't have compat RAM layout baggage of x86,
> > it only looks drastic, but in reality it might turn out into a simple
> > refactoring.
> > 
> > As for complicated CLI, for compat reasons we will be forced to support
> > '-m size=!0', we should be able to translate that implicitly into dimm.
> > In addition with dimms as initial memory users would have a choice to ditch
> > "-numa (mem|memdev)" altogether and do
> >   -m 0,slots=X,maxmem=Y -device pc-dimm,node=x...
> > and related '-numa' would become a compat shim to translate into
> > the similar dimm devices set under the hood.
> > (looks like too much fantasy :))
> > 
> > Possible complications on QEMU side I see in handling of legacy '-numa mem'.
> > Easiest would be deprecate it and then do conversion or workaround
> > it by replacing it with pc-dimm like device that's treated like
> > a memory region that we have now.  
> 
> And any migration compatibility issues of the naming of the RAMBlocks;
> if virt is at the point it cares about that compatibility.
That's what I've meant, lets remove migration altogether and make life simpler :)

Jokes aside, '-numa memdev' based variant isn't an issue, we would map 
that memdevs to dimms i.e. RAMBlocks stay the same,
but for '-numa mem' or numaless '-m X' we would need to make up a way
to create RAMBlocks with the same ids.

If whole ARM conversion turns out to be successful, it would be less scary
to do the same to x86/ppc/... and drop a bunch of adhoc numa code

> 
> Dave
> 
> > > 
> > > Thank you for your feedbacks
> > > 
> > > Eric
> > > 
> > >   
> > > >     
> > > >> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html).
> > > >>
> > > >> Your feedbacks on those points are really welcome!
> > > >>
> > > >> Thanks
> > > >>
> > > >> Eric
> > > >>    
> > > >>>
> > > >>> This series reuses/rebases patches initially submitted by Shameer in [1]
> > > >>> and Kwangwoo in [2].
> > > >>>
> > > >>> I put all parts all together for consistency and due to dependencies
> > > >>> however as soon as the kernel dependency is resolved we can consider
> > > >>> upstreaming them separately.
> > > >>>
> > > >>> Support more than 40b IPA/GPA [ patches 1 - 5 ]
> > > >>> -----------------------------------------------
> > > >>> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > > >>>
> > > >>> At the moment the guest physical address space is limited to 40b
> > > >>> due to KVM limitations. [0] bumps this limitation and allows to
> > > >>> create a VM with up to 52b GPA address space.
> > > >>>
> > > >>> With this series, QEMU creates a virt VM with the max IPA range
> > > >>> reported by the host kernel or 40b by default.
> > > >>>
> > > >>> This choice can be overriden by using the -machine kvm-type=<bits>
> > > >>> option with bits within [40, 52]. If <bits> are not supported by
> > > >>> the host, the legacy 40b value is used.
> > > >>>
> > > >>> Currently the EDK2 FW also hardcodes the max number of GPA bits to
> > > >>> 40. This will need to be fixed.
> > > >>>
> > > >>> PCDIMM Support [ patches 6 - 11 ]
> > > >>> ---------------------------------
> > > >>> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > > >>>
> > > >>> We instantiate the device_memory at 2TB. Using it obviously requires
> > > >>> at least 42b of IPA/GPA. While its max capacity is currently limited
> > > >>> to 2TB, the actual size depends on the initial guest RAM size and
> > > >>> maxmem parameter.
> > > >>>
> > > >>> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
> > > >>> of support of those features in baremetal.
> > > >>>
> > > >>> NVDIMM support [ patches 12 - 15 ]
> > > >>> ----------------------------------
> > > >>>
> > > >>> Once the memory hotplug framework is in place it is fairly
> > > >>> straightforward to add support for NVDIMM. the machine "nvdimm" option
> > > >>> turns the capability on.
> > > >>>
> > > >>> Best Regards
> > > >>>
> > > >>> Eric
> > > >>>
> > > >>> References:
> > > >>>
> > > >>> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
> > > >>> https://www.spinics.net/lists/kernel/msg2841735.html
> > > >>>
> > > >>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> > > >>> http://patchwork.ozlabs.org/cover/914694/
> > > >>>
> > > >>> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> > > >>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> > > >>>
> > > >>> Tests:
> > > >>> - On Cavium Gigabyte, a 48b VM was created.
> > > >>> - Migration tests were performed between kernel supporting the
> > > >>>   feature and destination kernel not suporting it
> > > >>> - test with ACPI: to overcome the limitation of EDK2 FW, virt
> > > >>>   memory map was hacked to move the device memory below 1TB.
> > > >>>
> > > >>> This series can be found at:
> > > >>> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
> > > >>>
> > > >>> History:
> > > >>>
> > > >>> v2 -> v3:
> > > >>> - fix pc_q35 and pc_piix compilation error
> > > >>> - kwangwoo's email being not valid anymore, remove his address
> > > >>>
> > > >>> v1 -> v2:
> > > >>> - kvm_get_max_vm_phys_shift moved in arch specific file
> > > >>> - addition of NVDIMM part
> > > >>> - single series
> > > >>> - rebase on David's refactoring
> > > >>>
> > > >>> v1:
> > > >>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > > >>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > > >>>
> > > >>> Best Regards
> > > >>>
> > > >>> Eric
> > > >>>
> > > >>>
> > > >>> Eric Auger (9):
> > > >>>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> > > >>>   hw/boards: Add a MachineState parameter to kvm_type callback
> > > >>>   kvm: add kvm_arm_get_max_vm_phys_shift
> > > >>>   hw/arm/virt: support kvm_type property
> > > >>>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
> > > >>>   hw/arm/virt: Allocate device_memory
> > > >>>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
> > > >>>   hw/arm/boot: Expose the pmem nodes in the DT
> > > >>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> > > >>>
> > > >>> Kwangwoo Lee (2):
> > > >>>   nvdimm: use configurable ACPI IO base and size
> > > >>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> > > >>>
> > > >>> Shameer Kolothum (4):
> > > >>>   hw/arm/virt: Add memory hotplug framework
> > > >>>   hw/arm/boot: introduce fdt_add_memory_node helper
> > > >>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> > > >>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> > > >>>
> > > >>>  accel/kvm/kvm-all.c                            |   2 +-
> > > >>>  default-configs/arm-softmmu.mak                |   4 +
> > > >>>  hw/acpi/aml-build.c                            |  51 ++++
> > > >>>  hw/acpi/nvdimm.c                               |  28 ++-
> > > >>>  hw/arm/boot.c                                  | 123 +++++++--
> > > >>>  hw/arm/virt-acpi-build.c                       |  10 +
> > > >>>  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
> > > >>>  hw/i386/acpi-build.c                           |  49 ----
> > > >>>  hw/i386/pc_piix.c                              |   8 +-
> > > >>>  hw/i386/pc_q35.c                               |   8 +-
> > > >>>  hw/ppc/mac_newworld.c                          |   2 +-
> > > >>>  hw/ppc/mac_oldworld.c                          |   2 +-
> > > >>>  hw/ppc/spapr.c                                 |   2 +-
> > > >>>  include/hw/acpi/aml-build.h                    |   3 +
> > > >>>  include/hw/arm/arm.h                           |   2 +
> > > >>>  include/hw/arm/virt.h                          |   7 +
> > > >>>  include/hw/boards.h                            |   2 +-
> > > >>>  include/hw/mem/nvdimm.h                        |  12 +
> > > >>>  include/standard-headers/linux/virtio_config.h |  16 +-
> > > >>>  linux-headers/asm-mips/unistd.h                |  18 +-
> > > >>>  linux-headers/asm-powerpc/kvm.h                |   1 +
> > > >>>  linux-headers/linux/kvm.h                      |  16 ++
> > > >>>  target/arm/kvm.c                               |   9 +
> > > >>>  target/arm/kvm_arm.h                           |  16 ++
> > > >>>  24 files changed, 597 insertions(+), 124 deletions(-)
> > > >>>       
> > > >>    
> > > >     
> > >   
> >   
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB
  2018-07-18 14:08 ` [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Igor Mammedov
@ 2018-10-18 12:56   ` Auger Eric
  0 siblings, 0 replies; 62+ messages in thread
From: Auger Eric @ 2018-10-18 12:56 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, agraf, david,
	drjones, wei

Hi Igor,

On 7/18/18 4:08 PM, Igor Mammedov wrote:
> On Tue,  3 Jul 2018 09:19:43 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> This series aims at supporting PCDIMM/NVDIMM intantiation in
>> machvirt at 2TB guest physical address.
>>
>> This is achieved in 3 steps:
>> 1) support more than 40b IPA/GPA
> will it work for TCG as well?
> /important from make check pov and maybe in cases when there is no ARM system available to test/play with the feature/
> 

Sorry I missed this comment.

On A TCG guest ID_AA64MMFR0_EL1.PARange ID register field is the machine
limiting factor as it returns the supported physical address range
(target/arm/cpu64.c):

aarch64_a53_initfn hardcodes PA range to 40bits
	cpu->id_aa64mmfr0 = 0x00001122
aarch64_a57_initfn hardcodes PA Range to 44 bits
    	cpu->id_aa64mmfr0 = 0x00001124

for TCG guests we may add support for the phys-bits option which would
allow to set the PARange instead of hardcoding it.

Thanks

Eric

> 
> 
>> 2) support PCDIMM instantiation
>> 3) support NVDIMM instantiation
>>
>> This series reuses/rebases patches initially submitted by Shameer in [1]
>> and Kwangwoo in [2].
>>
>> I put all parts all together for consistency and due to dependencies
>> however as soon as the kernel dependency is resolved we can consider
>> upstreaming them separately.
>>
>> Support more than 40b IPA/GPA [ patches 1 - 5 ]
>> -----------------------------------------------
>> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>
>> At the moment the guest physical address space is limited to 40b
>> due to KVM limitations. [0] bumps this limitation and allows to
>> create a VM with up to 52b GPA address space.
>>
>> With this series, QEMU creates a virt VM with the max IPA range
>> reported by the host kernel or 40b by default.
>>
>> This choice can be overriden by using the -machine kvm-type=<bits>
>> option with bits within [40, 52]. If <bits> are not supported by
>> the host, the legacy 40b value is used.
>>
>> Currently the EDK2 FW also hardcodes the max number of GPA bits to
>> 40. This will need to be fixed.
>>
>> PCDIMM Support [ patches 6 - 11 ]
>> ---------------------------------
>> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>
>> We instantiate the device_memory at 2TB. Using it obviously requires
>> at least 42b of IPA/GPA. While its max capacity is currently limited
>> to 2TB, the actual size depends on the initial guest RAM size and
>> maxmem parameter.
>>
>> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack
>> of support of those features in baremetal.
>>
>> NVDIMM support [ patches 12 - 15 ]
>> ----------------------------------
>>
>> Once the memory hotplug framework is in place it is fairly
>> straightforward to add support for NVDIMM. the machine "nvdimm" option
>> turns the capability on.
>>
>> Best Regards
>>
>> Eric
>>
>> References:
>>
>> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
>> https://www.spinics.net/lists/kernel/msg2841735.html
>>
>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>> http://patchwork.ozlabs.org/cover/914694/
>>
>> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>
>> Tests:
>> - On Cavium Gigabyte, a 48b VM was created.
>> - Migration tests were performed between kernel supporting the
>>   feature and destination kernel not suporting it
>> - test with ACPI: to overcome the limitation of EDK2 FW, virt
>>   memory map was hacked to move the device memory below 1TB.
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3
>>
>> History:
>>
>> v2 -> v3:
>> - fix pc_q35 and pc_piix compilation error
>> - kwangwoo's email being not valid anymore, remove his address
>>
>> v1 -> v2:
>> - kvm_get_max_vm_phys_shift moved in arch specific file
>> - addition of NVDIMM part
>> - single series
>> - rebase on David's refactoring
>>
>> v1:
>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>
>> Best Regards
>>
>> Eric
>>
>>
>> Eric Auger (9):
>>   linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>   kvm: add kvm_arm_get_max_vm_phys_shift
>>   hw/arm/virt: support kvm_type property
>>   hw/arm/virt: handle max_vm_phys_shift conflicts on migration
>>   hw/arm/virt: Allocate device_memory
>>   acpi: move build_srat_hotpluggable_memory to generic ACPI source
>>   hw/arm/boot: Expose the pmem nodes in the DT
>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>
>> Kwangwoo Lee (2):
>>   nvdimm: use configurable ACPI IO base and size
>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>
>> Shameer Kolothum (4):
>>   hw/arm/virt: Add memory hotplug framework
>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>
>>  accel/kvm/kvm-all.c                            |   2 +-
>>  default-configs/arm-softmmu.mak                |   4 +
>>  hw/acpi/aml-build.c                            |  51 ++++
>>  hw/acpi/nvdimm.c                               |  28 ++-
>>  hw/arm/boot.c                                  | 123 +++++++--
>>  hw/arm/virt-acpi-build.c                       |  10 +
>>  hw/arm/virt.c                                  | 330 ++++++++++++++++++++++---
>>  hw/i386/acpi-build.c                           |  49 ----
>>  hw/i386/pc_piix.c                              |   8 +-
>>  hw/i386/pc_q35.c                               |   8 +-
>>  hw/ppc/mac_newworld.c                          |   2 +-
>>  hw/ppc/mac_oldworld.c                          |   2 +-
>>  hw/ppc/spapr.c                                 |   2 +-
>>  include/hw/acpi/aml-build.h                    |   3 +
>>  include/hw/arm/arm.h                           |   2 +
>>  include/hw/arm/virt.h                          |   7 +
>>  include/hw/boards.h                            |   2 +-
>>  include/hw/mem/nvdimm.h                        |  12 +
>>  include/standard-headers/linux/virtio_config.h |  16 +-
>>  linux-headers/asm-mips/unistd.h                |  18 +-
>>  linux-headers/asm-powerpc/kvm.h                |   1 +
>>  linux-headers/linux/kvm.h                      |  16 ++
>>  target/arm/kvm.c                               |   9 +
>>  target/arm/kvm_arm.h                           |  16 ++
>>  24 files changed, 597 insertions(+), 124 deletions(-)
>>
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2018-10-18 12:56 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-03  7:19 [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 01/15] linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 02/15] hw/boards: Add a MachineState parameter to kvm_type callback Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 03/15] kvm: add kvm_arm_get_max_vm_phys_shift Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 04/15] hw/arm/virt: support kvm_type property Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 05/15] hw/arm/virt: handle max_vm_phys_shift conflicts on migration Eric Auger
2018-07-03 18:41   ` David Hildenbrand
2018-07-03 19:32     ` Auger Eric
2018-07-04 11:53       ` David Hildenbrand
2018-07-04 12:50         ` Auger Eric
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate device_memory Eric Auger
2018-07-03 18:25   ` David Hildenbrand
2018-07-03 19:27     ` Auger Eric
2018-07-04 12:05       ` David Hildenbrand
2018-07-05 11:42         ` Auger Eric
2018-07-05 11:54           ` David Hildenbrand
2018-07-05 12:00             ` Auger Eric
2018-07-05 12:09               ` David Hildenbrand
2018-07-05 12:17                 ` Auger Eric
2018-07-05 13:19                   ` Shameerali Kolothum Thodi
2018-07-05 14:27                     ` Auger Eric
2018-07-11 13:17                       ` Igor Mammedov
2018-07-12 14:22                         ` Auger Eric
2018-07-12 14:45                           ` Andrew Jones
2018-07-12 14:53                             ` Auger Eric
2018-07-12 15:15                               ` Andrew Jones
2018-07-18 13:00                               ` Igor Mammedov
2018-08-08  9:33                                 ` Auger Eric
2018-08-09  8:45                                   ` Igor Mammedov
2018-08-09  9:54                                     ` Auger Eric
2018-07-18 13:05   ` Igor Mammedov
2018-08-08  9:33     ` Auger Eric
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 07/15] hw/arm/virt: Add memory hotplug framework Eric Auger
2018-07-03 18:28   ` David Hildenbrand
2018-07-03 19:28     ` Auger Eric
2018-07-03 18:44   ` David Hildenbrand
2018-07-03 19:34     ` Auger Eric
2018-07-04 11:47       ` David Hildenbrand
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 08/15] hw/arm/boot: introduce fdt_add_memory_node helper Eric Auger
2018-07-18 14:04   ` Igor Mammedov
2018-08-08  9:44     ` Auger Eric
2018-08-09  8:57       ` Igor Mammedov
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 09/15] hw/arm/boot: Expose the PC-DIMM nodes in the DT Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 10/15] acpi: move build_srat_hotpluggable_memory to generic ACPI source Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 11/15] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 12/15] nvdimm: use configurable ACPI IO base and size Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 13/15] hw/arm/virt: Add nvdimm hot-plug infrastructure Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 14/15] hw/arm/boot: Expose the pmem nodes in the DT Eric Auger
2018-07-03  7:19 ` [Qemu-devel] [RFC v3 15/15] hw/arm/virt: Add nvdimm and nvdimm-persistence options Eric Auger
2018-07-18 14:08 ` [Qemu-devel] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB Igor Mammedov
2018-10-18 12:56   ` Auger Eric
2018-10-03 13:49 ` Auger Eric
2018-10-03 14:13   ` Dr. David Alan Gilbert
2018-10-03 14:42     ` Auger Eric
2018-10-03 14:46       ` Dr. David Alan Gilbert
2018-10-04 11:11   ` Igor Mammedov
2018-10-04 11:32     ` Auger Eric
2018-10-04 12:02       ` David Hildenbrand
2018-10-04 12:07         ` Auger Eric
2018-10-04 13:16       ` Igor Mammedov
2018-10-04 14:16         ` Dr. David Alan Gilbert
2018-10-05  8:18           ` Igor Mammedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.