All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	David Hildenbrand <david@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Wei Yang <richardw.yang@linux.intel.com>,
	Igor Mammedov <imammedo@redhat.com>
Subject: [PULL 04/38] virtio-mem: Probe THP size to determine default block size
Date: Tue, 3 Nov 2020 09:34:11 -0500	[thread overview]
Message-ID: <20201103142306.71782-5-mst@redhat.com> (raw)
In-Reply-To: <20201103142306.71782-1-mst@redhat.com>

From: David Hildenbrand <david@redhat.com>

Let's allow a minimum block size of 1 MiB in all configurations. Select
the default block size based on
- The page size of the memory backend.
- The THP size if the memory backend size corresponds to the real host
  page size.
- The global minimum of 1 MiB.
and warn if something smaller is configured by the user.

VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
THP size unconditionally.

For now we only support virtio-mem on x86-64 - there isn't a user-visible
change (x86-64 only supports 2 MiB THP on the PMD level) - the default
was, and will be 2 MiB.

If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
expect it to be more transparent - e.g., to only optimize fully populated
ranges unless explicitly told /configured otherwise (in contrast to PMD
THP).

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20201008083029.9504-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-mem.c | 105 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 101 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 461ac68ee8..655824ff81 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -33,10 +33,83 @@
 #include "trace.h"
 
 /*
- * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
- * memory (e.g., 2MB on x86_64).
+ * Let's not allow blocks smaller than 1 MiB, for example, to keep the tracking
+ * bitmap small.
  */
-#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
+#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
+
+#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
+    defined(__powerpc64__)
+#define VIRTIO_MEM_DEFAULT_THP_SIZE ((uint32_t)(2 * MiB))
+#else
+        /* fallback to 1 MiB (e.g., the THP size on s390x) */
+#define VIRTIO_MEM_DEFAULT_THP_SIZE VIRTIO_MEM_MIN_BLOCK_SIZE
+#endif
+
+/*
+ * We want to have a reasonable default block size such that
+ * 1. We avoid splitting THPs when unplugging memory, which degrades
+ *    performance.
+ * 2. We avoid placing THPs for plugged blocks that also cover unplugged
+ *    blocks.
+ *
+ * The actual THP size might differ between Linux kernels, so we try to probe
+ * it. In the future (if we ever run into issues regarding 2.), we might want
+ * to disable THP in case we fail to properly probe the THP size, or if the
+ * block size is configured smaller than the THP size.
+ */
+static uint32_t thp_size;
+
+#define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
+static uint32_t virtio_mem_thp_size(void)
+{
+    gchar *content = NULL;
+    const char *endptr;
+    uint64_t tmp;
+
+    if (thp_size) {
+        return thp_size;
+    }
+
+    /*
+     * Try to probe the actual THP size, fallback to (sane but eventually
+     * incorrect) default sizes.
+     */
+    if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
+        !qemu_strtou64(content, &endptr, 0, &tmp) &&
+        (!endptr || *endptr == '\n')) {
+        /*
+         * Sanity-check the value, if it's too big (e.g., aarch64 with 64k base
+         * pages) or weird, fallback to something smaller.
+         */
+        if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
+            warn_report("Read unsupported THP size: %" PRIx64, tmp);
+        } else {
+            thp_size = tmp;
+        }
+    }
+
+    if (!thp_size) {
+        thp_size = VIRTIO_MEM_DEFAULT_THP_SIZE;
+        warn_report("Could not detect THP size, falling back to %" PRIx64
+                    "  MiB.", thp_size / MiB);
+    }
+
+    g_free(content);
+    return thp_size;
+}
+
+static uint64_t virtio_mem_default_block_size(RAMBlock *rb)
+{
+    const uint64_t page_size = qemu_ram_pagesize(rb);
+
+    /* We can have hugetlbfs with a page size smaller than the THP size. */
+    if (page_size == qemu_real_host_page_size) {
+        return MAX(page_size, virtio_mem_thp_size());
+    }
+    return MAX(page_size, VIRTIO_MEM_MIN_BLOCK_SIZE);
+}
+
 /*
  * Size the usable region bigger than the requested size if possible. Esp.
  * Linux guests will only add (aligned) memory blocks in case they fully
@@ -443,10 +516,23 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     rb = vmem->memdev->mr.ram_block;
     page_size = qemu_ram_pagesize(rb);
 
+    /*
+     * If the block size wasn't configured by the user, use a sane default. This
+     * allows using hugetlbfs backends of any page size without manual
+     * intervention.
+     */
+    if (!vmem->block_size) {
+        vmem->block_size = virtio_mem_default_block_size(rb);
+    }
+
     if (vmem->block_size < page_size) {
         error_setg(errp, "'%s' property has to be at least the page size (0x%"
                    PRIx64 ")", VIRTIO_MEM_BLOCK_SIZE_PROP, page_size);
         return;
+    } else if (vmem->block_size < virtio_mem_default_block_size(rb)) {
+        warn_report("'%s' property is smaller than the default block size (%"
+                    PRIx64 " MiB)", VIRTIO_MEM_BLOCK_SIZE_PROP,
+                    virtio_mem_default_block_size(rb) / MiB);
     } else if (!QEMU_IS_ALIGNED(vmem->requested_size, vmem->block_size)) {
         error_setg(errp, "'%s' property has to be multiples of '%s' (0x%" PRIx64
                    ")", VIRTIO_MEM_REQUESTED_SIZE_PROP,
@@ -742,6 +828,18 @@ static void virtio_mem_get_block_size(Object *obj, Visitor *v, const char *name,
     const VirtIOMEM *vmem = VIRTIO_MEM(obj);
     uint64_t value = vmem->block_size;
 
+    /*
+     * If not configured by the user (and we're not realized yet), use the
+     * default block size we would use with the current memory backend.
+     */
+    if (!value) {
+        if (vmem->memdev && memory_region_is_ram(&vmem->memdev->mr)) {
+            value = virtio_mem_default_block_size(vmem->memdev->mr.ram_block);
+        } else {
+            value = virtio_mem_thp_size();
+        }
+    }
+
     visit_type_size(v, name, &value, errp);
 }
 
@@ -821,7 +919,6 @@ static void virtio_mem_instance_init(Object *obj)
 {
     VirtIOMEM *vmem = VIRTIO_MEM(obj);
 
-    vmem->block_size = VIRTIO_MEM_MIN_BLOCK_SIZE;
     notifier_list_init(&vmem->size_change_notifiers);
     vmem->precopy_notifier.notify = virtio_mem_precopy_notify;
 
-- 
MST



  parent reply	other threads:[~2020-11-03 14:38 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-03 14:33 [PULL 00/38] pc,pci,vhost,virtio: fixes Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 01/38] pc: comment style fixup Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 02/38] virtio-mem: Make sure "addr" is always multiples of the block size Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 03/38] virtio-mem: Make sure "usable_region_size" " Michael S. Tsirkin
2020-11-03 14:34 ` Michael S. Tsirkin [this message]
2020-11-03 14:34 ` [PULL 05/38] memory-device: Support big alignment requirements Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 06/38] memory-device: Add get_min_alignment() callback Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 07/38] virito-mem: Implement get_min_alignment() Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 08/38] hw/acpi : Don't use '#' flag of printf format Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 09/38] hw/acpi : add space before the open parenthesis '(' Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 10/38] hw/acpi : add spaces around operator Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 11/38] hw/virtio/vhost-backend: Fix Coverity CID 1432871 Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 12/38] hw/smbios: Fix leaked fd in save_opt_one() error path Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 13/38] virtio-iommu: Fix virtio_iommu_mr() Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 14/38] virtio-iommu: Store memory region in endpoint struct Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 15/38] virtio-iommu: Add memory notifiers for map/unmap Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 16/38] virtio-iommu: Call memory notifiers in attach/detach Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 17/38] virtio-iommu: Add replay() memory region callback Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 18/38] virtio-iommu: Add notify_flag_changed() " Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 19/38] memory: Add interface to set iommu page size mask Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 20/38] vfio: Set IOMMU page size as per host supported page size Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 21/38] virtio-iommu: Set supported page size mask Michael S. Tsirkin
2020-11-03 14:34 ` [PULL 22/38] vfio: Don't issue full 2^64 unmap Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 23/38] vhost-vdpa: Add qemu_close in vhost_vdpa_cleanup Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 24/38] net: Add vhost-vdpa in show_netdevs() Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 25/38] Revert "vhost-blk: set features before setting inflight feature" Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 26/38] vhost-blk: set features before setting inflight feature Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 27/38] libvhost-user: follow QEMU comment style Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 28/38] configure: introduce --enable-vhost-user-blk-server Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 29/38] block/export: make vhost-user-blk config space little-endian Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 30/38] block/export: fix vhost-user-blk get_config() information leak Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 31/38] contrib/vhost-user-blk: fix " Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 32/38] test: new qTest case to test the vhost-user-blk-server Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 33/38] tests/qtest: add multi-queue test case to vhost-user-blk-test Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 34/38] libqtest: add qtest_socket_server() Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 35/38] vhost-user-blk-test: rename destroy_drive() to destroy_file() Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 36/38] vhost-user-blk-test: close fork child file descriptors Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 37/38] vhost-user-blk-test: drop unused return value Michael S. Tsirkin
2020-11-03 14:35 ` [PULL 38/38] vhost-user-blk-test: fix races by using fd passing Michael S. Tsirkin
2020-11-03 15:59 ` [PULL 00/38] pc,pci,vhost,virtio: fixes Peter Maydell
2020-11-03 21:40   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201103142306.71782-5-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=pankaj.gupta.linux@gmail.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richardw.yang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.