All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations
@ 2020-09-23 11:38 David Hildenbrand
  2020-09-23 11:38 ` [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size David Hildenbrand
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: David Hildenbrand @ 2020-09-23 11:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: David Hildenbrand

Let's try to detect the actual THP size and use it as default block size
(unless the page size of the backend is bigger). Handle large block sizes
better, avoiding a virtio-spec violation and optimizing address
auto-detection.

David Hildenbrand (5):
  virtio-mem: Probe THP size to determine default block size
  virtio-mem: Check that "memaddr" is multiples of the block size
  memory-device: Support big alignment requirements
  memory-device: Add get_min_alignment() callback
  virito-mem: Implement get_min_alignment()

 hw/mem/memory-device.c         | 20 ++++---
 hw/virtio/virtio-mem-pci.c     | 14 +++++
 hw/virtio/virtio-mem.c         | 95 ++++++++++++++++++++++++++++++++--
 include/hw/mem/memory-device.h | 11 ++++
 4 files changed, 130 insertions(+), 10 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size
  2020-09-23 11:38 [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand
@ 2020-09-23 11:38 ` David Hildenbrand
  2020-09-25 13:46   ` Pankaj Gupta
  2020-09-23 11:38 ` [PATCH v1 2/5] virtio-mem: Check that "memaddr" is multiples of the " David Hildenbrand
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-09-23 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Wei Yang, Igor Mammedov

Let's allow a minimum block size of 1 MiB in all configurations. Use
a default block size based on the THP size, and warn if something
smaller is configured by the user.

VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
THP size unconditionally.

For now we only support virtio-mem on x86-64 - there isn't a user-visiable
change (x86-64 only supports 2 MiB THP on the PMD level) - the default
was, and will be 2 MiB.

If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
expect to have a trigger to explicitly opt-in for the new THP granularity.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 82 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 78 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 8fbec77ccc..58098686ee 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -33,10 +33,70 @@
 #include "trace.h"
 
 /*
- * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
- * memory (e.g., 2MB on x86_64).
+ * Let's not allow blocks smaller than 1 MiB, for example, to keep the tracking
+ * bitmap small.
  */
-#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
+#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
+
+/*
+ * We want to have a reasonable default block size such that
+ * 1. We avoid splitting THPs when unplugging memory, which degrades
+ *    performance.
+ * 2. We avoid placing THPs for plugged blocks that also cover unplugged
+ *    blocks.
+ *
+ * The actual THP size might differ between Linux kernels, so we try to probe
+ * it. In the future (if we ever run into issues regarding 2.), we might want
+ * to disable THP in case we fail to properly probe the THP size, or if the
+ * block size is configured smaller than the THP size.
+ */
+static uint32_t default_block_size;
+
+#define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
+static uint32_t virtio_mem_default_block_size(void)
+{
+    gchar *content = NULL;
+    const char *endptr;
+    uint64_t tmp;
+
+    if (default_block_size) {
+        return default_block_size;
+    }
+
+    /*
+     * Try to probe the actual THP size, fallback to (sane but eventually
+     * incorrect) default sizes.
+     */
+    if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
+        !qemu_strtou64(content, &endptr, 0, &tmp) &&
+        (!endptr || *endptr == '\n')) {
+        /*
+         * Sanity-check the value, if it's too big (e.g., aarch64 with 64k base
+         * pages) or weird, fallback to something smaller.
+         */
+        if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
+            warn_report("Detected a THP size of %" PRIx64
+                        " MiB, falling back to 1 MiB.", tmp / MiB);
+            default_block_size = 1 * MiB;
+        } else {
+            default_block_size = tmp;
+        }
+    } else {
+#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
+    defined(__powerpc64__)
+        default_block_size = 2 * MiB;
+#else
+        /* fallback to 1 MiB (e.g., the THP size on s390x) */
+        default_block_size = 1 * MiB;
+#endif
+        warn_report("Could not detect THP size, falling back to %" PRIx64
+                    "  MiB.", default_block_size / MiB);
+    }
+
+    g_free(content);
+    return default_block_size;
+}
+
 /*
  * Size the usable region bigger than the requested size if possible. Esp.
  * Linux guests will only add (aligned) memory blocks in case they fully
@@ -437,6 +497,15 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     rb = vmem->memdev->mr.ram_block;
     page_size = qemu_ram_pagesize(rb);
 
+    /*
+     * If the block size wasn't configured by the user, use a sane default. This
+     * allows using hugetlbfs backends with a pagesize bigger than the detected
+     * THP size without manual intervention/configuration.
+     */
+    if (!vmem->block_size) {
+        vmem->block_size = MAX(page_size, virtio_mem_default_block_size());
+    }
+
     if (vmem->block_size < page_size) {
         error_setg(errp, "'%s' property has to be at least the page size (0x%"
                    PRIx64 ")", VIRTIO_MEM_BLOCK_SIZE_PROP, page_size);
@@ -760,6 +829,12 @@ static void virtio_mem_set_block_size(Object *obj, Visitor *v, const char *name,
         error_setg(errp, "'%s' property has to be a power of two", name);
         return;
     }
+
+    if (value < virtio_mem_default_block_size()) {
+        warn_report("'%s' property is smaller than the default block size "
+                    "(detected THP size) of %" PRIx64 " MiB", name,
+                    virtio_mem_default_block_size() / MiB);
+    }
     vmem->block_size = value;
 }
 
@@ -810,7 +885,6 @@ static void virtio_mem_instance_init(Object *obj)
 {
     VirtIOMEM *vmem = VIRTIO_MEM(obj);
 
-    vmem->block_size = VIRTIO_MEM_MIN_BLOCK_SIZE;
     notifier_list_init(&vmem->size_change_notifiers);
     vmem->precopy_notifier.notify = virtio_mem_precopy_notify;
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 2/5] virtio-mem: Check that "memaddr" is multiples of the block size
  2020-09-23 11:38 [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand
  2020-09-23 11:38 ` [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size David Hildenbrand
@ 2020-09-23 11:38 ` David Hildenbrand
  2020-09-25 13:57   ` Pankaj Gupta
  2020-09-23 11:38 ` [PATCH v1 3/5] memory-device: Support big alignment requirements David Hildenbrand
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-09-23 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Wei Yang, Igor Mammedov

The spec requires us to set the "addr" in guest physical address space to
multiples of the block size. In some cases, this is not the case right
now: For example, when starting a VM with 4 GiB boot memory and a
virtio-mem device with a block size of 2 GiB, "memaddr" will be
auto-assigned to 0x140000000 / 5 GiB.

We'll try to improve auto-assignment for memory devices next, to avoid
bailing out in case memory device code selects a bad address.

Note: The Linux driver doesn't support such big block sizes yet.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 58098686ee..716eddd792 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -515,6 +515,11 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
                    ")", VIRTIO_MEM_REQUESTED_SIZE_PROP,
                    VIRTIO_MEM_BLOCK_SIZE_PROP, vmem->block_size);
         return;
+    } else if (!QEMU_IS_ALIGNED(vmem->addr, vmem->block_size)) {
+        error_setg(errp, "'%s' property has to be multiples of '%s' (0x%" PRIx64
+                   ")", VIRTIO_MEM_ADDR_PROP, VIRTIO_MEM_BLOCK_SIZE_PROP,
+                   vmem->block_size);
+        return;
     } else if (!QEMU_IS_ALIGNED(memory_region_size(&vmem->memdev->mr),
                                 vmem->block_size)) {
         error_setg(errp, "'%s' property memdev size has to be multiples of"
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 3/5] memory-device: Support big alignment requirements
  2020-09-23 11:38 [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand
  2020-09-23 11:38 ` [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size David Hildenbrand
  2020-09-23 11:38 ` [PATCH v1 2/5] virtio-mem: Check that "memaddr" is multiples of the " David Hildenbrand
@ 2020-09-23 11:38 ` David Hildenbrand
  2020-09-25 14:18   ` Pankaj Gupta
  2020-09-23 11:38 ` [PATCH v1 4/5] memory-device: Add get_min_alignment() callback David Hildenbrand
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-09-23 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Wei Yang, Igor Mammedov

Let's warn instead of bailing out - the worst thing that can happen is
that we'll fail hot/coldplug later. The user got warned, and this should
be rare.

This will be necessary for memory devices with rather big (user-defined)
alignment requirements - say a virtio-mem device with a 2G block size -
which will become important, for example, when supporting vfio in the
future.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/mem/memory-device.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 4bc9cf0917..8a736f1a26 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -119,9 +119,10 @@ static uint64_t memory_device_get_free_addr(MachineState *ms,
 
     /* start of address space indicates the maximum alignment we expect */
     if (!QEMU_IS_ALIGNED(range_lob(&as), align)) {
-        error_setg(errp, "the alignment (0x%" PRIx64 ") is not supported",
-                   align);
-        return 0;
+        warn_report("the alignment (0x%" PRIx64 ") exceeds the expected"
+                    " maximum alignment, memory will get fragmented and not"
+                    " all 'maxmem' might be usable for memory devices.",
+                    align);
     }
 
     memory_device_check_addable(ms, size, &err);
@@ -151,7 +152,7 @@ static uint64_t memory_device_get_free_addr(MachineState *ms,
             return 0;
         }
     } else {
-        if (range_init(&new, range_lob(&as), size)) {
+        if (range_init(&new, QEMU_ALIGN_UP(range_lob(&as), align), size)) {
             error_setg(errp, "can't add memory device, device too big");
             return 0;
         }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 4/5] memory-device: Add get_min_alignment() callback
  2020-09-23 11:38 [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-09-23 11:38 ` [PATCH v1 3/5] memory-device: Support big alignment requirements David Hildenbrand
@ 2020-09-23 11:38 ` David Hildenbrand
  2020-09-26 21:07   ` Pankaj Gupta
  2020-09-23 11:39 ` [PATCH v1 5/5] virito-mem: Implement get_min_alignment() David Hildenbrand
  2020-09-25 10:16 ` [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand
  5 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-09-23 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Wei Yang, Igor Mammedov

Will be used by virtio-mem to express special alignment requirements due
to manually configured, big block sizes. This avoids failing later when
realizing, because auto-detection wasn't able to assign a properly
aligned address.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/mem/memory-device.c         | 11 +++++++++--
 include/hw/mem/memory-device.h | 11 +++++++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 8a736f1a26..cf0627fd01 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -259,7 +259,7 @@ void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms,
 {
     const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
     Error *local_err = NULL;
-    uint64_t addr, align;
+    uint64_t addr, align = 0;
     MemoryRegion *mr;
 
     mr = mdc->get_memory_region(md, &local_err);
@@ -267,7 +267,14 @@ void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms,
         goto out;
     }
 
-    align = legacy_align ? *legacy_align : memory_region_get_alignment(mr);
+    if (legacy_align) {
+        align = *legacy_align;
+    } else {
+        if (mdc->get_min_alignment) {
+            align = mdc->get_min_alignment(md);
+        }
+        align = MAX(align, memory_region_get_alignment(mr));
+    }
     addr = mdc->get_addr(md);
     addr = memory_device_get_free_addr(ms, !addr ? NULL : &addr, align,
                                        memory_region_size(mr), &local_err);
diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
index cde52e83c9..563893854a 100644
--- a/include/hw/mem/memory-device.h
+++ b/include/hw/mem/memory-device.h
@@ -88,6 +88,17 @@ struct MemoryDeviceClass {
      */
     MemoryRegion *(*get_memory_region)(MemoryDeviceState *md, Error **errp);
 
+    /*
+     * Optional: Return the desired minimum alignment of the device in guest
+     * physical address space, ignoring the alignment requirements of the
+     * memory region (e.g., based on the page size). The final alignment is
+     * computed by selecting the maximum of both alignments.
+     *
+     * Called when plugging the memory device to detect the required alignment
+     * during address assignment.
+     */
+    uint64_t (*get_min_alignment)(const MemoryDeviceState *md);
+
     /*
      * Translate the memory device into #MemoryDeviceInfo.
      */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 5/5] virito-mem: Implement get_min_alignment()
  2020-09-23 11:38 [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-09-23 11:38 ` [PATCH v1 4/5] memory-device: Add get_min_alignment() callback David Hildenbrand
@ 2020-09-23 11:39 ` David Hildenbrand
  2020-09-26 21:23   ` Pankaj Gupta
  2020-09-25 10:16 ` [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand
  5 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-09-23 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Wei Yang, Igor Mammedov

This allows auto-assignment of a properly aligned address in guest
physical address space. For example, when specifying a 2GB block size
for a virtio-mem device with 10GB with a memory setup "-m 4G, 20G",
we'll no longer fail when realizing.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem-pci.c | 14 ++++++++++++++
 hw/virtio/virtio-mem.c     |  8 ++++++++
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
index 590cec041b..2bfa2474fb 100644
--- a/hw/virtio/virtio-mem-pci.c
+++ b/hw/virtio/virtio-mem-pci.c
@@ -75,6 +75,19 @@ static void virtio_mem_pci_fill_device_info(const MemoryDeviceState *md,
     info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM;
 }
 
+static uint64_t virtio_mem_pci_get_min_alignment(const MemoryDeviceState *md)
+{
+    /*
+     * If no block size was configured, returns the default block size.
+     * Before the device was realized, this might be smaller than the
+     * final block size (because it ignores the page size of the memory region).
+     * However, the alignment of the memory region properly considers the
+     * page size of the memory region.
+     */
+    return object_property_get_uint(OBJECT(md), VIRTIO_MEM_BLOCK_SIZE_PROP,
+                                    &error_abort);
+}
+
 static void virtio_mem_pci_size_change_notify(Notifier *notifier, void *data)
 {
     VirtIOMEMPCI *pci_mem = container_of(notifier, VirtIOMEMPCI,
@@ -109,6 +122,7 @@ static void virtio_mem_pci_class_init(ObjectClass *klass, void *data)
     mdc->get_plugged_size = virtio_mem_pci_get_plugged_size;
     mdc->get_memory_region = virtio_mem_pci_get_memory_region;
     mdc->fill_device_info = virtio_mem_pci_fill_device_info;
+    mdc->get_min_alignment = virtio_mem_pci_get_min_alignment;
 }
 
 static void virtio_mem_pci_instance_init(Object *obj)
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 716eddd792..d8222153cf 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -805,6 +805,14 @@ static void virtio_mem_get_block_size(Object *obj, Visitor *v, const char *name,
     const VirtIOMEM *vmem = VIRTIO_MEM(obj);
     uint64_t value = vmem->block_size;
 
+    /*
+     * If not configured by the user (and we're not realized yet), use the
+     * default block size.
+     */
+    if (!value) {
+        value = virtio_mem_default_block_size();
+    }
+
     visit_type_size(v, name, &value, errp);
 }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations
  2020-09-23 11:38 [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-09-23 11:39 ` [PATCH v1 5/5] virito-mem: Implement get_min_alignment() David Hildenbrand
@ 2020-09-25 10:16 ` David Hildenbrand
  5 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2020-09-25 10:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Pankaj Gupta, Wei Yang, Dr . David Alan Gilbert,
	Michael S. Tsirkin

On 23.09.20 13:38, David Hildenbrand wrote:
> Let's try to detect the actual THP size and use it as default block size
> (unless the page size of the backend is bigger). Handle large block sizes
> better, avoiding a virtio-spec violation and optimizing address
> auto-detection.
> 
> David Hildenbrand (5):
>   virtio-mem: Probe THP size to determine default block size
>   virtio-mem: Check that "memaddr" is multiples of the block size
>   memory-device: Support big alignment requirements
>   memory-device: Add get_min_alignment() callback
>   virito-mem: Implement get_min_alignment()
> 
>  hw/mem/memory-device.c         | 20 ++++---
>  hw/virtio/virtio-mem-pci.c     | 14 +++++
>  hw/virtio/virtio-mem.c         | 95 ++++++++++++++++++++++++++++++++--
>  include/hw/mem/memory-device.h | 11 ++++
>  4 files changed, 130 insertions(+), 10 deletions(-)
> 

Just noticed that spring cleaning due to meson change removed my
cc-cmd.sh script, so adding people manually to the cover for context.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size
  2020-09-23 11:38 ` [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size David Hildenbrand
@ 2020-09-25 13:46   ` Pankaj Gupta
  2020-09-28  8:58     ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Pankaj Gupta @ 2020-09-25 13:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, Wei Yang, Qemu Developers,
	Dr . David Alan Gilbert, Michael S. Tsirkin

> Let's allow a minimum block size of 1 MiB in all configurations. Use
> a default block size based on the THP size, and warn if something
> smaller is configured by the user.
>
> VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
> THP size unconditionally.
>
> For now we only support virtio-mem on x86-64 - there isn't a user-visiable
> change (x86-64 only supports 2 MiB THP on the PMD level) - the default
> was, and will be 2 MiB.
>
> If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
> expect to have a trigger to explicitly opt-in for the new THP granularity.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/virtio/virtio-mem.c | 82 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 8fbec77ccc..58098686ee 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -33,10 +33,70 @@
>  #include "trace.h"
>
>  /*
> - * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
> - * memory (e.g., 2MB on x86_64).
> + * Let's not allow blocks smaller than 1 MiB, for example, to keep the tracking
> + * bitmap small.
>   */
> -#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
> +#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
> +
> +/*
> + * We want to have a reasonable default block size such that
> + * 1. We avoid splitting THPs when unplugging memory, which degrades
> + *    performance.
> + * 2. We avoid placing THPs for plugged blocks that also cover unplugged
> + *    blocks.
> + *
> + * The actual THP size might differ between Linux kernels, so we try to probe
> + * it. In the future (if we ever run into issues regarding 2.), we might want
> + * to disable THP in case we fail to properly probe the THP size, or if the
> + * block size is configured smaller than the THP size.
> + */
> +static uint32_t default_block_size;
> +
> +#define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> +static uint32_t virtio_mem_default_block_size(void)
> +{
> +    gchar *content = NULL;
> +    const char *endptr;
> +    uint64_t tmp;
> +
> +    if (default_block_size) {
> +        return default_block_size;
> +    }
> +
> +    /*
> +     * Try to probe the actual THP size, fallback to (sane but eventually
> +     * incorrect) default sizes.
> +     */
> +    if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
> +        !qemu_strtou64(content, &endptr, 0, &tmp) &&
> +        (!endptr || *endptr == '\n')) {
> +        /*
> +         * Sanity-check the value, if it's too big (e.g., aarch64 with 64k base
> +         * pages) or weird, fallback to something smaller.
> +         */
> +        if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
> +            warn_report("Detected a THP size of %" PRIx64
> +                        " MiB, falling back to 1 MiB.", tmp / MiB);
> +            default_block_size = 1 * MiB;

Probably use macro "VIRTIO_MEM_MIN_BLOCK_SIZE"
> +        } else {
> +            default_block_size = tmp;
> +        }
> +    } else {
> +#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
> +    defined(__powerpc64__)
> +        default_block_size = 2 * MiB;
> +#else
> +        /* fallback to 1 MiB (e.g., the THP size on s390x) */
> +        default_block_size = 1 * MiB;
> +#endif

Maybe we can declare this macro near to "VIRTIO_MEM_MIN_BLOCK_SIZE
((uint32_t)(1 * MiB))"
or club into one, just a thought.

> +        warn_report("Could not detect THP size, falling back to %" PRIx64
> +                    "  MiB.", default_block_size / MiB);
> +    }
> +
> +    g_free(content);
> +    return default_block_size;
> +}
> +
>  /*
>   * Size the usable region bigger than the requested size if possible. Esp.
>   * Linux guests will only add (aligned) memory blocks in case they fully
> @@ -437,6 +497,15 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>      rb = vmem->memdev->mr.ram_block;
>      page_size = qemu_ram_pagesize(rb);
>
> +    /*
> +     * If the block size wasn't configured by the user, use a sane default. This
> +     * allows using hugetlbfs backends with a pagesize bigger than the detected
> +     * THP size without manual intervention/configuration.
> +     */
> +    if (!vmem->block_size) {
> +        vmem->block_size = MAX(page_size, virtio_mem_default_block_size());
> +    }
> +
>      if (vmem->block_size < page_size) {
>          error_setg(errp, "'%s' property has to be at least the page size (0x%"
>                     PRIx64 ")", VIRTIO_MEM_BLOCK_SIZE_PROP, page_size);
> @@ -760,6 +829,12 @@ static void virtio_mem_set_block_size(Object *obj, Visitor *v, const char *name,
>          error_setg(errp, "'%s' property has to be a power of two", name);
>          return;
>      }
> +
> +    if (value < virtio_mem_default_block_size()) {
> +        warn_report("'%s' property is smaller than the default block size "
> +                    "(detected THP size) of %" PRIx64 " MiB", name,
> +                    virtio_mem_default_block_size() / MiB);
> +    }
>      vmem->block_size = value;
>  }
>
> @@ -810,7 +885,6 @@ static void virtio_mem_instance_init(Object *obj)
>  {
>      VirtIOMEM *vmem = VIRTIO_MEM(obj);
>
> -    vmem->block_size = VIRTIO_MEM_MIN_BLOCK_SIZE;
>      notifier_list_init(&vmem->size_change_notifiers);
>      vmem->precopy_notifier.notify = virtio_mem_precopy_notify;
>
> --
> 2.26.2
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 2/5] virtio-mem: Check that "memaddr" is multiples of the block size
  2020-09-23 11:38 ` [PATCH v1 2/5] virtio-mem: Check that "memaddr" is multiples of the " David Hildenbrand
@ 2020-09-25 13:57   ` Pankaj Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Pankaj Gupta @ 2020-09-25 13:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, Wei Yang, Qemu Developers,
	Dr . David Alan Gilbert, Michael S. Tsirkin

> The spec requires us to set the "addr" in guest physical address space to
> multiples of the block size. In some cases, this is not the case right
> now: For example, when starting a VM with 4 GiB boot memory and a
> virtio-mem device with a block size of 2 GiB, "memaddr" will be
> auto-assigned to 0x140000000 / 5 GiB.
>
> We'll try to improve auto-assignment for memory devices next, to avoid
> bailing out in case memory device code selects a bad address.
>
> Note: The Linux driver doesn't support such big block sizes yet.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/virtio/virtio-mem.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 58098686ee..716eddd792 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -515,6 +515,11 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>                     ")", VIRTIO_MEM_REQUESTED_SIZE_PROP,
>                     VIRTIO_MEM_BLOCK_SIZE_PROP, vmem->block_size);
>          return;
> +    } else if (!QEMU_IS_ALIGNED(vmem->addr, vmem->block_size)) {
> +        error_setg(errp, "'%s' property has to be multiples of '%s' (0x%" PRIx64
> +                   ")", VIRTIO_MEM_ADDR_PROP, VIRTIO_MEM_BLOCK_SIZE_PROP,
> +                   vmem->block_size);
> +        return;
>      } else if (!QEMU_IS_ALIGNED(memory_region_size(&vmem->memdev->mr),
>                                  vmem->block_size)) {
>          error_setg(errp, "'%s' property memdev size has to be multiples of"
> --
> 2.26.2

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 3/5] memory-device: Support big alignment requirements
  2020-09-23 11:38 ` [PATCH v1 3/5] memory-device: Support big alignment requirements David Hildenbrand
@ 2020-09-25 14:18   ` Pankaj Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Pankaj Gupta @ 2020-09-25 14:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, Wei Yang, Qemu Developers,
	Dr . David Alan Gilbert, Michael S. Tsirkin

> Let's warn instead of bailing out - the worst thing that can happen is
> that we'll fail hot/coldplug later. The user got warned, and this should
> be rare.
>
> This will be necessary for memory devices with rather big (user-defined)
> alignment requirements - say a virtio-mem device with a 2G block size -
> which will become important, for example, when supporting vfio in the
> future.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/mem/memory-device.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> index 4bc9cf0917..8a736f1a26 100644
> --- a/hw/mem/memory-device.c
> +++ b/hw/mem/memory-device.c
> @@ -119,9 +119,10 @@ static uint64_t memory_device_get_free_addr(MachineState *ms,
>
>      /* start of address space indicates the maximum alignment we expect */
>      if (!QEMU_IS_ALIGNED(range_lob(&as), align)) {
> -        error_setg(errp, "the alignment (0x%" PRIx64 ") is not supported",
> -                   align);
> -        return 0;
> +        warn_report("the alignment (0x%" PRIx64 ") exceeds the expected"
> +                    " maximum alignment, memory will get fragmented and not"
> +                    " all 'maxmem' might be usable for memory devices.",
> +                    align);
>      }
>
>      memory_device_check_addable(ms, size, &err);
> @@ -151,7 +152,7 @@ static uint64_t memory_device_get_free_addr(MachineState *ms,
>              return 0;
>          }
>      } else {
> -        if (range_init(&new, range_lob(&as), size)) {
> +        if (range_init(&new, QEMU_ALIGN_UP(range_lob(&as), align), size)) {
>              error_setg(errp, "can't add memory device, device too big");
>              return 0;
>          }

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 4/5] memory-device: Add get_min_alignment() callback
  2020-09-23 11:38 ` [PATCH v1 4/5] memory-device: Add get_min_alignment() callback David Hildenbrand
@ 2020-09-26 21:07   ` Pankaj Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Pankaj Gupta @ 2020-09-26 21:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, Wei Yang, Qemu Developers,
	Dr . David Alan Gilbert, Michael S. Tsirkin

> Will be used by virtio-mem to express special alignment requirements due
> to manually configured, big block sizes. This avoids failing later when
> realizing, because auto-detection wasn't able to assign a properly
> aligned address.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/mem/memory-device.c         | 11 +++++++++--
>  include/hw/mem/memory-device.h | 11 +++++++++++
>  2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> index 8a736f1a26..cf0627fd01 100644
> --- a/hw/mem/memory-device.c
> +++ b/hw/mem/memory-device.c
> @@ -259,7 +259,7 @@ void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms,
>  {
>      const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
>      Error *local_err = NULL;
> -    uint64_t addr, align;
> +    uint64_t addr, align = 0;
>      MemoryRegion *mr;
>
>      mr = mdc->get_memory_region(md, &local_err);
> @@ -267,7 +267,14 @@ void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms,
>          goto out;
>      }
>
> -    align = legacy_align ? *legacy_align : memory_region_get_alignment(mr);
> +    if (legacy_align) {
> +        align = *legacy_align;
> +    } else {
> +        if (mdc->get_min_alignment) {
> +            align = mdc->get_min_alignment(md);
> +        }
> +        align = MAX(align, memory_region_get_alignment(mr));
> +    }
>      addr = mdc->get_addr(md);
>      addr = memory_device_get_free_addr(ms, !addr ? NULL : &addr, align,
>                                         memory_region_size(mr), &local_err);
> diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
> index cde52e83c9..563893854a 100644
> --- a/include/hw/mem/memory-device.h
> +++ b/include/hw/mem/memory-device.h
> @@ -88,6 +88,17 @@ struct MemoryDeviceClass {
>       */
>      MemoryRegion *(*get_memory_region)(MemoryDeviceState *md, Error **errp);
>
> +    /*
> +     * Optional: Return the desired minimum alignment of the device in guest
> +     * physical address space, ignoring the alignment requirements of the
> +     * memory region (e.g., based on the page size). The final alignment is
> +     * computed by selecting the maximum of both alignments.
> +     *
> +     * Called when plugging the memory device to detect the required alignment
> +     * during address assignment.
> +     */
> +    uint64_t (*get_min_alignment)(const MemoryDeviceState *md);
> +
>      /*
>       * Translate the memory device into #MemoryDeviceInfo.
>       */
> --

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 5/5] virito-mem: Implement get_min_alignment()
  2020-09-23 11:39 ` [PATCH v1 5/5] virito-mem: Implement get_min_alignment() David Hildenbrand
@ 2020-09-26 21:23   ` Pankaj Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Pankaj Gupta @ 2020-09-26 21:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, Wei Yang, Qemu Developers,
	Dr . David Alan Gilbert, Michael S. Tsirkin

> This allows auto-assignment of a properly aligned address in guest
> physical address space. For example, when specifying a 2GB block size
> for a virtio-mem device with 10GB with a memory setup "-m 4G, 20G",
> we'll no longer fail when realizing.
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/virtio/virtio-mem-pci.c | 14 ++++++++++++++
>  hw/virtio/virtio-mem.c     |  8 ++++++++
>  2 files changed, 22 insertions(+)
>
> diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
> index 590cec041b..2bfa2474fb 100644
> --- a/hw/virtio/virtio-mem-pci.c
> +++ b/hw/virtio/virtio-mem-pci.c
> @@ -75,6 +75,19 @@ static void virtio_mem_pci_fill_device_info(const MemoryDeviceState *md,
>      info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM;
>  }
>
> +static uint64_t virtio_mem_pci_get_min_alignment(const MemoryDeviceState *md)
> +{
> +    /*
> +     * If no block size was configured, returns the default block size.
> +     * Before the device was realized, this might be smaller than the
> +     * final block size (because it ignores the page size of the memory region).
> +     * However, the alignment of the memory region properly considers the
> +     * page size of the memory region.
> +     */
> +    return object_property_get_uint(OBJECT(md), VIRTIO_MEM_BLOCK_SIZE_PROP,
> +                                    &error_abort);
> +}
> +
>  static void virtio_mem_pci_size_change_notify(Notifier *notifier, void *data)
>  {
>      VirtIOMEMPCI *pci_mem = container_of(notifier, VirtIOMEMPCI,
> @@ -109,6 +122,7 @@ static void virtio_mem_pci_class_init(ObjectClass *klass, void *data)
>      mdc->get_plugged_size = virtio_mem_pci_get_plugged_size;
>      mdc->get_memory_region = virtio_mem_pci_get_memory_region;
>      mdc->fill_device_info = virtio_mem_pci_fill_device_info;
> +    mdc->get_min_alignment = virtio_mem_pci_get_min_alignment;
>  }
>
>  static void virtio_mem_pci_instance_init(Object *obj)
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 716eddd792..d8222153cf 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -805,6 +805,14 @@ static void virtio_mem_get_block_size(Object *obj, Visitor *v, const char *name,
>      const VirtIOMEM *vmem = VIRTIO_MEM(obj);
>      uint64_t value = vmem->block_size;
>
> +    /*
> +     * If not configured by the user (and we're not realized yet), use the
> +     * default block size.
> +     */
> +    if (!value) {
> +        value = virtio_mem_default_block_size();
> +    }
> +
>      visit_type_size(v, name, &value, errp);
>  }
>

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size
  2020-09-25 13:46   ` Pankaj Gupta
@ 2020-09-28  8:58     ` David Hildenbrand
  2020-09-28  9:31       ` Pankaj Gupta
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-09-28  8:58 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Igor Mammedov, Wei Yang, Qemu Developers,
	Dr . David Alan Gilbert, Michael S. Tsirkin

On 25.09.20 15:46, Pankaj Gupta wrote:
>> Let's allow a minimum block size of 1 MiB in all configurations. Use
>> a default block size based on the THP size, and warn if something
>> smaller is configured by the user.
>>
>> VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
>> THP size unconditionally.
>>
>> For now we only support virtio-mem on x86-64 - there isn't a user-visiable
>> change (x86-64 only supports 2 MiB THP on the PMD level) - the default
>> was, and will be 2 MiB.
>>
>> If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
>> expect to have a trigger to explicitly opt-in for the new THP granularity.
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Wei Yang <richardw.yang@linux.intel.com>
>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/virtio/virtio-mem.c | 82 +++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 78 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>> index 8fbec77ccc..58098686ee 100644
>> --- a/hw/virtio/virtio-mem.c
>> +++ b/hw/virtio/virtio-mem.c
>> @@ -33,10 +33,70 @@
>>  #include "trace.h"
>>
>>  /*
>> - * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
>> - * memory (e.g., 2MB on x86_64).
>> + * Let's not allow blocks smaller than 1 MiB, for example, to keep the tracking
>> + * bitmap small.
>>   */
>> -#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
>> +#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
>> +
>> +/*
>> + * We want to have a reasonable default block size such that
>> + * 1. We avoid splitting THPs when unplugging memory, which degrades
>> + *    performance.
>> + * 2. We avoid placing THPs for plugged blocks that also cover unplugged
>> + *    blocks.
>> + *
>> + * The actual THP size might differ between Linux kernels, so we try to probe
>> + * it. In the future (if we ever run into issues regarding 2.), we might want
>> + * to disable THP in case we fail to properly probe the THP size, or if the
>> + * block size is configured smaller than the THP size.
>> + */
>> +static uint32_t default_block_size;
>> +
>> +#define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
>> +static uint32_t virtio_mem_default_block_size(void)
>> +{
>> +    gchar *content = NULL;
>> +    const char *endptr;
>> +    uint64_t tmp;
>> +
>> +    if (default_block_size) {
>> +        return default_block_size;
>> +    }
>> +
>> +    /*
>> +     * Try to probe the actual THP size, fallback to (sane but eventually
>> +     * incorrect) default sizes.
>> +     */
>> +    if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
>> +        !qemu_strtou64(content, &endptr, 0, &tmp) &&
>> +        (!endptr || *endptr == '\n')) {
>> +        /*
>> +         * Sanity-check the value, if it's too big (e.g., aarch64 with 64k base
>> +         * pages) or weird, fallback to something smaller.
>> +         */
>> +        if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
>> +            warn_report("Detected a THP size of %" PRIx64
>> +                        " MiB, falling back to 1 MiB.", tmp / MiB);
>> +            default_block_size = 1 * MiB;
> 
> Probably use macro "VIRTIO_MEM_MIN_BLOCK_SIZE"

Indeed.

>> +        } else {
>> +            default_block_size = tmp;
>> +        }
>> +    } else {
>> +#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
>> +    defined(__powerpc64__)
>> +        default_block_size = 2 * MiB;
>> +#else
>> +        /* fallback to 1 MiB (e.g., the THP size on s390x) */
>> +        default_block_size = 1 * MiB;
>> +#endif
> 
> Maybe we can declare this macro near to "VIRTIO_MEM_MIN_BLOCK_SIZE
> ((uint32_t)(1 * MiB))"
> or club into one, just a thought.

I decided to not use VIRTIO_MEM_MIN_BLOCK_SIZE here to not accidentally
mess up the s390x THP size when ever wanting to decrease
VIRTIO_MEM_MIN_BLOCK_SIZE. But as we have a comment here, people should
know whats happening when ever changing VIRTIO_MEM_MIN_BLOCK_SIZE.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size
  2020-09-28  8:58     ` David Hildenbrand
@ 2020-09-28  9:31       ` Pankaj Gupta
  2020-09-28  9:36         ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Pankaj Gupta @ 2020-09-28  9:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, Wei Yang, Qemu Developers,
	Dr . David Alan Gilbert, Michael S. Tsirkin

> >> Let's allow a minimum block size of 1 MiB in all configurations. Use
> >> a default block size based on the THP size, and warn if something
> >> smaller is configured by the user.
> >>
> >> VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
> >> THP size unconditionally.
> >>
> >> For now we only support virtio-mem on x86-64 - there isn't a user-visiable
> >> change (x86-64 only supports 2 MiB THP on the PMD level) - the default
> >> was, and will be 2 MiB.
> >>
> >> If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
> >> expect to have a trigger to explicitly opt-in for the new THP granularity.
> >>
> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >> Cc: Wei Yang <richardw.yang@linux.intel.com>
> >> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> Cc: Igor Mammedov <imammedo@redhat.com>
> >> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  hw/virtio/virtio-mem.c | 82 +++++++++++++++++++++++++++++++++++++++---
> >>  1 file changed, 78 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> >> index 8fbec77ccc..58098686ee 100644
> >> --- a/hw/virtio/virtio-mem.c
> >> +++ b/hw/virtio/virtio-mem.c
> >> @@ -33,10 +33,70 @@
> >>  #include "trace.h"
> >>
> >>  /*
> >> - * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
> >> - * memory (e.g., 2MB on x86_64).
> >> + * Let's not allow blocks smaller than 1 MiB, for example, to keep the tracking
> >> + * bitmap small.
> >>   */
> >> -#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
> >> +#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
> >> +
> >> +/*
> >> + * We want to have a reasonable default block size such that
> >> + * 1. We avoid splitting THPs when unplugging memory, which degrades
> >> + *    performance.
> >> + * 2. We avoid placing THPs for plugged blocks that also cover unplugged
> >> + *    blocks.
> >> + *
> >> + * The actual THP size might differ between Linux kernels, so we try to probe
> >> + * it. In the future (if we ever run into issues regarding 2.), we might want
> >> + * to disable THP in case we fail to properly probe the THP size, or if the
> >> + * block size is configured smaller than the THP size.
> >> + */
> >> +static uint32_t default_block_size;
> >> +
> >> +#define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> >> +static uint32_t virtio_mem_default_block_size(void)
> >> +{
> >> +    gchar *content = NULL;
> >> +    const char *endptr;
> >> +    uint64_t tmp;
> >> +
> >> +    if (default_block_size) {
> >> +        return default_block_size;
> >> +    }
> >> +
> >> +    /*
> >> +     * Try to probe the actual THP size, fallback to (sane but eventually
> >> +     * incorrect) default sizes.
> >> +     */
> >> +    if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
> >> +        !qemu_strtou64(content, &endptr, 0, &tmp) &&
> >> +        (!endptr || *endptr == '\n')) {
> >> +        /*
> >> +         * Sanity-check the value, if it's too big (e.g., aarch64 with 64k base
> >> +         * pages) or weird, fallback to something smaller.
> >> +         */
> >> +        if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
> >> +            warn_report("Detected a THP size of %" PRIx64
> >> +                        " MiB, falling back to 1 MiB.", tmp / MiB);
> >> +            default_block_size = 1 * MiB;
> >
> > Probably use macro "VIRTIO_MEM_MIN_BLOCK_SIZE"
>
> Indeed.
>
> >> +        } else {
> >> +            default_block_size = tmp;
> >> +        }
> >> +    } else {
> >> +#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
> >> +    defined(__powerpc64__)
> >> +        default_block_size = 2 * MiB;
> >> +#else
> >> +        /* fallback to 1 MiB (e.g., the THP size on s390x) */
> >> +        default_block_size = 1 * MiB;
> >> +#endif
> >
> > Maybe we can declare this macro near to "VIRTIO_MEM_MIN_BLOCK_SIZE
> > ((uint32_t)(1 * MiB))"
> > or club into one, just a thought.
>
> I decided to not use VIRTIO_MEM_MIN_BLOCK_SIZE here to not accidentally
> mess up the s390x THP size when ever wanting to decrease
> VIRTIO_MEM_MIN_BLOCK_SIZE. But as we have a comment here, people should
> know whats happening when ever changing VIRTIO_MEM_MIN_BLOCK_SIZE.

Thanks for answering. Makes sense.

Overall the patch looks good to me.
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size
  2020-09-28  9:31       ` Pankaj Gupta
@ 2020-09-28  9:36         ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2020-09-28  9:36 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Igor Mammedov, Wei Yang, Qemu Developers,
	Dr . David Alan Gilbert, Michael S. Tsirkin

On 28.09.20 11:31, Pankaj Gupta wrote:
>>>> Let's allow a minimum block size of 1 MiB in all configurations. Use
>>>> a default block size based on the THP size, and warn if something
>>>> smaller is configured by the user.
>>>>
>>>> VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
>>>> THP size unconditionally.
>>>>
>>>> For now we only support virtio-mem on x86-64 - there isn't a user-visiable
>>>> change (x86-64 only supports 2 MiB THP on the PMD level) - the default
>>>> was, and will be 2 MiB.
>>>>
>>>> If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
>>>> expect to have a trigger to explicitly opt-in for the new THP granularity.
>>>>
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Wei Yang <richardw.yang@linux.intel.com>
>>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  hw/virtio/virtio-mem.c | 82 +++++++++++++++++++++++++++++++++++++++---
>>>>  1 file changed, 78 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>>>> index 8fbec77ccc..58098686ee 100644
>>>> --- a/hw/virtio/virtio-mem.c
>>>> +++ b/hw/virtio/virtio-mem.c
>>>> @@ -33,10 +33,70 @@
>>>>  #include "trace.h"
>>>>
>>>>  /*
>>>> - * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
>>>> - * memory (e.g., 2MB on x86_64).
>>>> + * Let's not allow blocks smaller than 1 MiB, for example, to keep the tracking
>>>> + * bitmap small.
>>>>   */
>>>> -#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
>>>> +#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
>>>> +
>>>> +/*
>>>> + * We want to have a reasonable default block size such that
>>>> + * 1. We avoid splitting THPs when unplugging memory, which degrades
>>>> + *    performance.
>>>> + * 2. We avoid placing THPs for plugged blocks that also cover unplugged
>>>> + *    blocks.
>>>> + *
>>>> + * The actual THP size might differ between Linux kernels, so we try to probe
>>>> + * it. In the future (if we ever run into issues regarding 2.), we might want
>>>> + * to disable THP in case we fail to properly probe the THP size, or if the
>>>> + * block size is configured smaller than the THP size.
>>>> + */
>>>> +static uint32_t default_block_size;
>>>> +
>>>> +#define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
>>>> +static uint32_t virtio_mem_default_block_size(void)
>>>> +{
>>>> +    gchar *content = NULL;
>>>> +    const char *endptr;
>>>> +    uint64_t tmp;
>>>> +
>>>> +    if (default_block_size) {
>>>> +        return default_block_size;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Try to probe the actual THP size, fallback to (sane but eventually
>>>> +     * incorrect) default sizes.
>>>> +     */
>>>> +    if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
>>>> +        !qemu_strtou64(content, &endptr, 0, &tmp) &&
>>>> +        (!endptr || *endptr == '\n')) {
>>>> +        /*
>>>> +         * Sanity-check the value, if it's too big (e.g., aarch64 with 64k base
>>>> +         * pages) or weird, fallback to something smaller.
>>>> +         */
>>>> +        if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
>>>> +            warn_report("Detected a THP size of %" PRIx64
>>>> +                        " MiB, falling back to 1 MiB.", tmp / MiB);
>>>> +            default_block_size = 1 * MiB;
>>>
>>> Probably use macro "VIRTIO_MEM_MIN_BLOCK_SIZE"
>>
>> Indeed.
>>
>>>> +        } else {
>>>> +            default_block_size = tmp;
>>>> +        }
>>>> +    } else {
>>>> +#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
>>>> +    defined(__powerpc64__)
>>>> +        default_block_size = 2 * MiB;
>>>> +#else
>>>> +        /* fallback to 1 MiB (e.g., the THP size on s390x) */
>>>> +        default_block_size = 1 * MiB;
>>>> +#endif
>>>
>>> Maybe we can declare this macro near to "VIRTIO_MEM_MIN_BLOCK_SIZE
>>> ((uint32_t)(1 * MiB))"
>>> or club into one, just a thought.
>>
>> I decided to not use VIRTIO_MEM_MIN_BLOCK_SIZE here to not accidentally
>> mess up the s390x THP size when ever wanting to decrease
>> VIRTIO_MEM_MIN_BLOCK_SIZE. But as we have a comment here, people should
>> know whats happening when ever changing VIRTIO_MEM_MIN_BLOCK_SIZE.
> 
> Thanks for answering. Makes sense.
> 
> Overall the patch looks good to me.
> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> 

I'll do some tweaks to the patch to make something weird (but possible)
be handled correctly (and make it overall look nicer):

It's possible to have a THP size that's bigger than a hugetlbfs page
size. In that case, we want to use the hugetlbfs page size instead (and
don't want to warn).

(I'll drop the RB for now, because it would be good if you gave it
another look - thanks!)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-09-28  9:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 11:38 [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand
2020-09-23 11:38 ` [PATCH v1 1/5] virtio-mem: Probe THP size to determine default block size David Hildenbrand
2020-09-25 13:46   ` Pankaj Gupta
2020-09-28  8:58     ` David Hildenbrand
2020-09-28  9:31       ` Pankaj Gupta
2020-09-28  9:36         ` David Hildenbrand
2020-09-23 11:38 ` [PATCH v1 2/5] virtio-mem: Check that "memaddr" is multiples of the " David Hildenbrand
2020-09-25 13:57   ` Pankaj Gupta
2020-09-23 11:38 ` [PATCH v1 3/5] memory-device: Support big alignment requirements David Hildenbrand
2020-09-25 14:18   ` Pankaj Gupta
2020-09-23 11:38 ` [PATCH v1 4/5] memory-device: Add get_min_alignment() callback David Hildenbrand
2020-09-26 21:07   ` Pankaj Gupta
2020-09-23 11:39 ` [PATCH v1 5/5] virito-mem: Implement get_min_alignment() David Hildenbrand
2020-09-26 21:23   ` Pankaj Gupta
2020-09-25 10:16 ` [PATCH v1 0/5] virtio-mem: block size and address-assignment optimizations David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.