qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX
@ 2020-02-12 13:35 David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 01/16] virtio-mem: Prototype David Hildenbrand
                   ` (16 more replies)
  0 siblings, 17 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

We already allow resizable ram blocks for anonymous memory, however, they
are not actually resized. All memory is mmaped() R/W, including the memory
exceeding the used_length, up to the max_length.

When resizing, effectively only the boundary is moved. Implement actually
resizable anonymous allocations and make use of them in resizable ram
blocks when possible. Memory exceeding the used_length will be
inaccessible. Especially ram block notifiers require care.

Having actually resizable anonymous allocations (via mmap-hackery) allows
to reserve a big region in virtual address space and grow the
accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
is set to "never" under Linux, huge reservations will succeed. If there is
not enough memory when resizing (to populate parts of the reserved region),
trying to resize will fail. Only the actually used size is reserved in the
OS.

E.g., virtio-mem [1] wants to reserve big resizable memory regions and
grow the usable part on demand. I think this change is worth sending out
individually. Accompanied by a bunch of minor fixes and cleanups.

Especially, memory notifiers already handle resizing by first removing
the old region, and then re-adding the resized region. prealloc is
currently not possible with resizable ram blocks. mlock() should continue
to work as is. Resizing is currently rare and must only happen on the
start of an incoming migration, or during resets. No code path (except
HAX and SEV ram block notifiers) should access memory outside of the usable
range - and if we ever find one, that one has to be fixed (I did not
identify any).

v1 -> v2:
- Add "util: vfio-helpers: Fix qemu_vfio_close()"
- Add "util: vfio-helpers: Remove Error parameter from
       qemu_vfio_undo_mapping()"
- Add "util: vfio-helpers: Factor out removal from
       qemu_vfio_undo_mapping()"
- "util/mmap-alloc: ..."
 -- Minor changes due to review feedback (e.g., assert alignment, return
    bool when resizing)
- "util: vfio-helpers: Implement ram_block_resized()"
 -- Reserve max_size in the IOVA address space.
 -- On resize, undo old mapping and do new mapping. We can later implement
    a new ioctl to resize the mapping directly.
- "numa: Teach ram block notifiers about resizable ram blocks"
 -- Pass size/max_size to ram block notifiers, which makes things easier an
    cleaner
- "exec: Ram blocks with resizable anonymous allocations under POSIX"
 -- Adapt to new ram block notifiers
 -- Shrink after notifying. Always trigger ram block notifiers on resizes
 -- Add a safety net that all ram block notifiers registered at runtime
    support resizes.

[1] https://lore.kernel.org/kvm/20191212171137.13872-1-david@redhat.com/

David Hildenbrand (16):
  util: vfio-helpers: Factor out and fix processing of existing ram
    blocks
  util: vfio-helpers: Fix qemu_vfio_close()
  util: vfio-helpers: Remove Error parameter from
    qemu_vfio_undo_mapping()
  util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  exec: Factor out setting ram settings (madvise ...) into
    qemu_ram_apply_settings()
  exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  exec: Drop "shared" parameter from ram_block_add()
  util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  util/mmap-alloc: Factor out reserving of a memory region to
    mmap_reserve()
  util/mmap-alloc: Factor out populating of memory to mmap_populate()
  util/mmap-alloc: Prepare for resizable mmaps
  util/mmap-alloc: Implement resizable mmaps
  numa: Teach ram block notifiers about resizable ram blocks
  util: vfio-helpers: Implement ram_block_resized()
  util: oslib: Resizable anonymous allocations under POSIX
  exec: Ram blocks with resizable anonymous allocations under POSIX

 exec.c                     | 104 +++++++++++++++++++----
 hw/core/numa.c             |  53 +++++++++++-
 hw/i386/xen/xen-mapcache.c |   7 +-
 include/exec/cpu-common.h  |   3 +
 include/exec/memory.h      |   8 ++
 include/exec/ramlist.h     |  14 +++-
 include/qemu/mmap-alloc.h  |  21 +++--
 include/qemu/osdep.h       |   6 +-
 stubs/ram-block.c          |  20 -----
 target/i386/hax-mem.c      |   5 +-
 target/i386/sev.c          |  18 ++--
 util/mmap-alloc.c          | 165 +++++++++++++++++++++++--------------
 util/oslib-posix.c         |  37 ++++++++-
 util/oslib-win32.c         |  14 ++++
 util/trace-events          |   9 +-
 util/vfio-helpers.c        | 145 +++++++++++++++++++++-----------
 16 files changed, 450 insertions(+), 179 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 01/16] virtio-mem: Prototype
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 14:15   ` Eric Blake
  2020-02-12 13:35 ` [PATCH v2 02/16] virtio-pci: Proxy for virtio-mem David Hildenbrand
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/Kconfig              |  11 +
 hw/virtio/Makefile.objs        |   1 +
 hw/virtio/virtio-mem.c         | 805 +++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-mem.h |  83 ++++
 qapi/misc.json                 |  39 +-
 5 files changed, 938 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/virtio-mem.c
 create mode 100644 include/hw/virtio/virtio-mem.h

diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig
index f87def27a6..638fe120b1 100644
--- a/hw/virtio/Kconfig
+++ b/hw/virtio/Kconfig
@@ -42,3 +42,14 @@ config VIRTIO_PMEM
     depends on VIRTIO
     depends on VIRTIO_PMEM_SUPPORTED
     select MEM_DEVICE
+
+config VIRTIO_MEM_SUPPORTED
+    bool
+
+config VIRTIO_MEM
+    bool
+    default y
+    depends on VIRTIO
+    depends on LINUX
+    depends on VIRTIO_MEM_SUPPORTED
+    select MEM_DEVICE
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index de0f5fc39b..3ed94c84d7 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -17,6 +17,7 @@ obj-$(CONFIG_VIRTIO_PMEM) += virtio-pmem.o
 common-obj-$(call land,$(CONFIG_VIRTIO_PMEM),$(CONFIG_VIRTIO_PCI)) += virtio-pmem-pci.o
 obj-$(call land,$(CONFIG_VHOST_USER_FS),$(CONFIG_VIRTIO_PCI)) += vhost-user-fs-pci.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
+obj-$(CONFIG_VIRTIO_MEM) += virtio-mem.o
 
 ifeq ($(CONFIG_VIRTIO_PCI),y)
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock-pci.o
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
new file mode 100644
index 0000000000..2f759578fe
--- /dev/null
+++ b/hw/virtio/virtio-mem.c
@@ -0,0 +1,805 @@
+/*
+ * Virtio MEM device
+ *
+ * Copyright (C) 2018-2019 Red Hat, Inc.
+ *
+ * Authors:
+ *  David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/iov.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "qemu/units.h"
+#include "sysemu/kvm.h"
+#include "sysemu/numa.h"
+#include "sysemu/balloon.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/reset.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-mem.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "exec/ram_addr.h"
+#include "migration/postcopy-ram.h"
+#include "migration/misc.h"
+#include "hw/boards.h"
+#include "hw/qdev-properties.h"
+#include "config-devices.h"
+
+/*
+ * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
+ * memory.
+ */
+#define VIRTIO_MEM_DEFAULT_BLOCK_SIZE QEMU_VMALLOC_ALIGN
+#define VIRTIO_MEM_MIN_BLOCK_SIZE QEMU_VMALLOC_ALIGN
+/*
+ * Size the usable region slightly bigger than the requested size if
+ * possible. This allows guests to make use of most requested memory even
+ * if the memory region in guest physical memory has strange alignment.
+ * E.g. x86-64 has alignment requirements for sections of 128 MiB.
+ */
+#define VIRTIO_MEM_USABLE_EXTENT (256 * MiB)
+
+static bool virtio_mem_busy(void)
+{
+    /*
+     * Better don't mess with dumps and migration - especially when
+     * resizing memory regions. Also, RDMA migration pins all memory.
+     */
+    if (!migration_is_idle()) {
+        return true;
+    }
+    if (dump_in_progress()) {
+        return true;
+    }
+    /*
+     * We can't use madvise(DONTNEED) e.g. with certain VFIO devices,
+     * also resizing memory regions might be problematic. Bad thing is,
+     * this might change suddenly, e.g. when hotplugging a VFIO device.
+     */
+    if (qemu_balloon_is_inhibited()) {
+        return true;
+    }
+    return false;
+}
+
+static bool virtio_mem_test_bitmap(VirtIOMEM *vm, uint64_t start_gpa,
+                                   uint64_t size, bool plug)
+{
+    uint64_t bit = (start_gpa - vm->addr) / vm->block_size;
+
+    g_assert(QEMU_IS_ALIGNED(start_gpa, vm->block_size));
+    g_assert(QEMU_IS_ALIGNED(size, vm->block_size));
+    g_assert(vm->bitmap);
+
+    while (size) {
+        g_assert((bit / BITS_PER_BYTE) <= vm->bitmap_size);
+
+        if (plug && !test_bit(bit, vm->bitmap)) {
+            return false;
+        } else if (!plug && test_bit(bit, vm->bitmap)) {
+            return false;
+        }
+        size -= vm->block_size;
+        bit++;
+    }
+    return true;
+}
+
+static void virtio_mem_set_bitmap(VirtIOMEM *vm, uint64_t start_gpa,
+                                  uint64_t size, bool plug)
+{
+    const uint64_t bit = (start_gpa - vm->addr) / vm->block_size;
+    const uint64_t nbits = size / vm->block_size;
+
+    g_assert(QEMU_IS_ALIGNED(start_gpa, vm->block_size));
+    g_assert(QEMU_IS_ALIGNED(size, vm->block_size));
+    g_assert(vm->bitmap);
+
+    if (plug) {
+        bitmap_set(vm->bitmap, bit, nbits);
+    } else {
+        bitmap_clear(vm->bitmap, bit, nbits);
+    }
+}
+
+static void virtio_mem_set_block_state(VirtIOMEM *vm, uint64_t start_gpa,
+                                       uint64_t size, bool plug)
+{
+    const uint64_t offset = start_gpa - vm->addr;
+
+    g_assert(start_gpa + size > start_gpa);
+    g_assert(QEMU_IS_ALIGNED(start_gpa, vm->block_size));
+    g_assert(size && QEMU_IS_ALIGNED(size, vm->block_size));
+    if (!plug) {
+        ram_block_discard_range(vm->memdev->mr.ram_block, offset, size);
+    }
+
+    virtio_mem_set_bitmap(vm, start_gpa, size, plug);
+}
+
+static void virtio_mem_send_response(VirtIOMEM *vm, VirtQueueElement *elem,
+                                     struct virtio_mem_resp *resp)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(vm);
+    VirtQueue *vq = vm->vq;
+
+    iov_from_buf(elem->in_sg, elem->in_num, 0, resp, sizeof(*resp));
+
+    virtqueue_push(vq, elem, sizeof(*resp));
+    virtio_notify(vdev, vq);
+}
+
+static void virtio_mem_send_response_simple(VirtIOMEM *vm,
+                                            VirtQueueElement *elem,
+                                            uint16_t type)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(vm);
+    struct virtio_mem_resp resp = {};
+
+    virtio_stw_p(vdev, &resp.type, type);
+    virtio_mem_send_response(vm, elem, &resp);
+}
+
+static void virtio_mem_bad_request(VirtIOMEM *vm, const char *msg)
+{
+    virtio_error(VIRTIO_DEVICE(vm), "virtio-mem protocol violation: %s", msg);
+}
+
+static bool virtio_mem_valid_range(VirtIOMEM *vm, uint64_t gpa, uint64_t size)
+{
+    /* address properly aligned? */
+    if (!QEMU_IS_ALIGNED(gpa, vm->block_size)) {
+            return false;
+    }
+
+    /* reasonable size */
+    if (gpa + size <= gpa || size == 0) {
+        return false;
+    }
+
+    /* start address in usable range? */
+    if (gpa < vm->addr ||
+        gpa >= vm->addr + vm->usable_region_size) {
+        return false;
+    }
+
+    /* end address in usable range? */
+    if (gpa + size - 1 >= vm->addr + vm->usable_region_size) {
+        return false;
+    }
+    return true;
+}
+
+static int virtio_mem_state_change_request(VirtIOMEM *vm, uint64_t gpa,
+                                           uint16_t nb_blocks, bool plug)
+{
+    const uint64_t size = nb_blocks * vm->block_size;
+
+    if (!virtio_mem_valid_range(vm, gpa, size)) {
+        return VIRTIO_MEM_RESP_ERROR;
+    }
+
+    /* trying to plug more than requested */
+    if (plug && (vm->size + size > vm->requested_size)) {
+        return VIRTIO_MEM_RESP_NACK;
+    }
+
+    /* sometimes we cannot discard blocks */
+    if (virtio_mem_busy()) {
+        return VIRTIO_MEM_RESP_BUSY;
+    }
+
+    /* test if really all blocks are in the opposite state */
+    if (!virtio_mem_test_bitmap(vm, gpa, size, !plug)) {
+        return VIRTIO_MEM_RESP_ERROR;
+    }
+
+    /* update the block state */
+    virtio_mem_set_block_state(vm, gpa, size, plug);
+
+    /* update the size */
+    if (plug) {
+        vm->size += size;
+    } else {
+        vm->size -= size;
+    }
+    return VIRTIO_MEM_RESP_ACK;
+}
+
+static void virtio_mem_plug_request(VirtIOMEM *vm, VirtQueueElement *elem,
+                                    struct virtio_mem_req *req)
+{
+    const uint64_t gpa = le64_to_cpu(req->u.plug.addr);
+    const uint16_t nb_blocks = le16_to_cpu(req->u.plug.nb_blocks);
+    uint16_t type;
+
+    type = virtio_mem_state_change_request(vm, gpa, nb_blocks, true);
+    virtio_mem_send_response_simple(vm, elem, type);
+}
+
+static void virtio_mem_unplug_request(VirtIOMEM *vm, VirtQueueElement *elem,
+                                      struct virtio_mem_req *req)
+{
+    const uint64_t gpa = le64_to_cpu(req->u.unplug.addr);
+    const uint16_t nb_blocks = le16_to_cpu(req->u.unplug.nb_blocks);
+    uint16_t type;
+
+    type = virtio_mem_state_change_request(vm, gpa, nb_blocks, false);
+    virtio_mem_send_response_simple(vm, elem, type);
+}
+
+/*
+ * Unplug all memory and shrink the usable region.
+ */
+static void virtio_mem_unplug_all(VirtIOMEM *vm)
+{
+    if (vm->size) {
+        virtio_mem_set_block_state(vm, vm->addr,
+                                   memory_region_size(&vm->memdev->mr), false);
+        vm->size = 0;
+    }
+    vm->usable_region_size = MIN(memory_region_size(&vm->memdev->mr),
+                                 vm->requested_size + VIRTIO_MEM_USABLE_EXTENT);
+}
+
+static void virtio_mem_unplug_all_request(VirtIOMEM *vm, VirtQueueElement *elem)
+{
+
+    if (virtio_mem_busy()) {
+        virtio_mem_send_response_simple(vm, elem,  VIRTIO_MEM_RESP_BUSY);
+        return;
+    }
+
+    virtio_mem_unplug_all(vm);
+    virtio_mem_send_response_simple(vm, elem,  VIRTIO_MEM_RESP_ACK);
+}
+
+static void virtio_mem_state_request(VirtIOMEM *vm, VirtQueueElement *elem,
+                                     struct virtio_mem_req *req)
+{
+    const uint64_t gpa = le64_to_cpu(req->u.state.addr);
+    const uint16_t nb_blocks = le16_to_cpu(req->u.state.nb_blocks);
+    const uint64_t size = nb_blocks * vm->block_size;
+    VirtIODevice *vdev = VIRTIO_DEVICE(vm);
+    struct virtio_mem_resp resp = {};
+
+    if (!virtio_mem_valid_range(vm, gpa, size)) {
+        virtio_mem_send_response_simple(vm, elem, VIRTIO_MEM_RESP_ERROR);
+        return;
+    }
+
+    virtio_stw_p(vdev, &resp.type, VIRTIO_MEM_RESP_ACK);
+    if (virtio_mem_test_bitmap(vm, gpa, size, true)) {
+        virtio_stw_p(vdev, &resp.u.state.state, VIRTIO_MEM_STATE_PLUGGED);
+    } else if (virtio_mem_test_bitmap(vm, gpa, size, false)) {
+        virtio_stw_p(vdev, &resp.u.state.state, VIRTIO_MEM_STATE_UNPLUGGED);
+    } else {
+        virtio_stw_p(vdev, &resp.u.state.state, VIRTIO_MEM_STATE_MIXED);
+    }
+    virtio_mem_send_response(vm, elem, &resp);
+}
+
+static void virtio_mem_handle_request(VirtIODevice *vdev, VirtQueue *vq)
+{
+    const int len = sizeof(struct virtio_mem_req);
+    VirtIOMEM *vm = VIRTIO_MEM(vdev);
+    VirtQueueElement *elem;
+    struct virtio_mem_req req;
+    uint64_t type;
+
+    elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+    if (!elem) {
+        return;
+    }
+
+    if (iov_to_buf(elem->out_sg, elem->out_num, 0, &req, len) < len) {
+        virtio_mem_bad_request(vm, "invalid request size");
+        goto out_free;
+    }
+
+    if (iov_size(elem->in_sg, elem->in_num) < sizeof(struct virtio_mem_resp)) {
+        virtio_mem_bad_request(vm, "not enough space for response");
+        goto out_free;
+    }
+
+    type = le16_to_cpu(req.type);
+    switch (type) {
+    case VIRTIO_MEM_REQ_PLUG:
+        virtio_mem_plug_request(vm, elem, &req);
+        break;
+    case VIRTIO_MEM_REQ_UNPLUG:
+        virtio_mem_unplug_request(vm, elem, &req);
+        break;
+    case VIRTIO_MEM_REQ_UNPLUG_ALL:
+        virtio_mem_unplug_all_request(vm, elem);
+        break;
+    case VIRTIO_MEM_REQ_STATE:
+        virtio_mem_state_request(vm, elem, &req);
+        break;
+    default:
+        virtio_mem_bad_request(vm, "unknown request type");
+        goto out_free;
+    }
+
+out_free:
+    g_free(elem);
+}
+
+static void virtio_mem_get_config(VirtIODevice *vdev, uint8_t *config_data)
+{
+    VirtIOMEM *vm = VIRTIO_MEM(vdev);
+    struct virtio_mem_config *config = (void *) config_data;
+
+    config->block_size = cpu_to_le32(vm->block_size);
+    config->node_id = cpu_to_le16(vm->node);
+    config->requested_size = cpu_to_le64(vm->requested_size);
+    config->plugged_size = cpu_to_le64(vm->size);
+    config->addr = cpu_to_le64(vm->addr);
+    config->region_size = cpu_to_le64(memory_region_size(&vm->memdev->mr));
+    config->usable_region_size = cpu_to_le64(vm->usable_region_size);
+}
+
+static uint64_t virtio_mem_get_features(VirtIODevice *vdev, uint64_t features,
+                                        Error **errp)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+
+    if (ms->numa_state) {
+#if defined(CONFIG_ACPI)
+        virtio_add_feature(&features, VIRTIO_MEM_F_ACPI_PXM);
+#endif
+    }
+    return features;
+}
+
+static void virtio_mem_system_reset(void *opaque)
+{
+    VirtIOMEM *vm = VIRTIO_MEM(opaque);
+
+    /*
+     * During usual resets, we will unplug all memory and shrink the usable
+     * region size. This is, however, not possible in all scenarios. Then,
+     * the guest has to deal with this manually (VIRTIO_MEM_REQ_UNPLUG_ALL).
+     */
+    if (virtio_mem_busy()) {
+        return;
+    }
+
+    virtio_mem_unplug_all(vm);
+}
+
+static int virtio_mem_postcopy_notifier(NotifierWithReturn *notifier,
+                                        void *opaque)
+{
+    struct PostcopyNotifyData *pnd = opaque;
+
+    /*
+     * TODO: We cannot use madvise(DONTNEED) with concurrent postcopy. While
+     *       can simply tell the guest to retry later on plug/unplug requests,
+     *       system resets + restoring the unplugged state during migration
+     *       requires more thought.
+     *
+     *       We will have to delay such activity until postcopy is finished and
+     *       (notifies us via its notifier) and then restore the unplugged
+     *       state. When we switch to userfaultfd (WP), we will temporarily
+     *       have to unregister our userfaultfd handler when postcopy is
+     *       about to start and reregister when postcopy is finished.
+     */
+    switch (pnd->reason) {
+    case POSTCOPY_NOTIFY_PROBE:
+        error_setg(pnd->errp, "virtio-mem does not support postcopy yet");
+        return -ENOENT;
+    default:
+        break;
+    }
+    return 0;
+}
+
+static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    int nb_numa_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtIOMEM *vm = VIRTIO_MEM(dev);
+    Error *local_err = NULL;
+    uint64_t page_size;
+
+    /* verify the memdev */
+    if (!vm->memdev) {
+        error_setg(&local_err, "'%s' property must be set",
+                   VIRTIO_MEM_MEMDEV_PROP);
+        goto out;
+    } else if (host_memory_backend_is_mapped(vm->memdev)) {
+        char *path = object_get_canonical_path_component(OBJECT(vm->memdev));
+
+        error_setg(&local_err, "can't use already busy memdev: %s", path);
+        g_free(path);
+        goto out;
+    }
+
+    /* verify the node */
+    if ((nb_numa_nodes && vm->node >= nb_numa_nodes) ||
+        (!nb_numa_nodes && vm->node)) {
+        error_setg(&local_err, "Property '%s' has value '%" PRIu32
+                   "', which exceeds the number of numa nodes: %d",
+                   VIRTIO_MEM_NODE_PROP, vm->node,
+                   nb_numa_nodes ? nb_numa_nodes : 1);
+        goto out;
+    }
+
+    /* mmap/madvise changes have to be reflected in guest physical memory */
+    if (kvm_enabled() && !kvm_has_sync_mmu()) {
+        error_set(&local_err, ERROR_CLASS_KVM_MISSING_CAP,
+                  "Using KVM without synchronous MMU, virtio-mem unavailable");
+        goto out;
+    }
+
+    /*
+     * TODO: madvise(DONTNEED) does not work with mlock. We might be able
+     * to temporarily unlock and relock at the right places to make it work.
+     */
+    if (enable_mlock) {
+        error_setg(&local_err, "Memory is locked, virtio-mem unavailable");
+        goto out;
+    }
+
+    g_assert(memory_region_is_ram(&vm->memdev->mr));
+    g_assert(!memory_region_is_rom(&vm->memdev->mr));
+    g_assert(vm->memdev->mr.ram_block);
+
+    /*
+     * TODO: Huge pages under Linux don't support the zero page, therefore
+     * dump and migration could result in a high memory consumption. Disallow
+     * it.
+     */
+    page_size = qemu_ram_pagesize(vm->memdev->mr.ram_block);
+    if (page_size != getpagesize()) {
+        error_setg(&local_err, "'%s' page size (0x%" PRIx64 ") not supported",
+                   VIRTIO_MEM_MEMDEV_PROP, page_size);
+        goto out;
+    }
+
+    /* now that memdev and block_size is fixed, verify the properties */
+    if (vm->block_size < page_size) {
+        error_setg(&local_err, "'%s' has to be at least the page size (0x%"
+                   PRIx64 ")", VIRTIO_MEM_BLOCK_SIZE_PROP, page_size);
+        goto out;
+    } else if (!QEMU_IS_ALIGNED(vm->requested_size, vm->block_size)) {
+        error_setg(errp, "'%s' has to be multiples of '%s' (0x%" PRIx32
+                   ")", VIRTIO_MEM_REQUESTED_SIZE_PROP,
+                   VIRTIO_MEM_BLOCK_SIZE_PROP, vm->block_size);
+    } else if (!QEMU_IS_ALIGNED(memory_region_size(&vm->memdev->mr),
+                                vm->block_size)) {
+        error_setg(&local_err, "'%s' size has to be multiples of '%s' (0x%"
+                   PRIx32 ")", VIRTIO_MEM_MEMDEV_PROP,
+                   VIRTIO_MEM_BLOCK_SIZE_PROP, vm->block_size);
+        goto out;
+    }
+
+    /*
+     * If possible, we size the usable region a little bit bigger than the
+     * requested size, so the guest has more flexibility.
+     */
+    vm->usable_region_size = MIN(memory_region_size(&vm->memdev->mr),
+                                 vm->requested_size + VIRTIO_MEM_USABLE_EXTENT);
+
+    /* allocate the bitmap for tracking the state of a block */
+    vm->bitmap_size = memory_region_size(&vm->memdev->mr) / vm->block_size;
+    vm->bitmap = bitmap_new(vm->bitmap_size);
+
+    /* all memory is unplugged initially */
+    virtio_mem_set_block_state(vm, vm->addr,
+                               memory_region_size(&vm->memdev->mr), false);
+
+    /* setup the virtqueue */
+    virtio_init(vdev, TYPE_VIRTIO_MEM, VIRTIO_ID_MEM,
+                sizeof(struct virtio_mem_config));
+    vm->vq = virtio_add_queue(vdev, 128, virtio_mem_handle_request);
+
+    host_memory_backend_set_mapped(vm->memdev, true);
+    vmstate_register_ram(&vm->memdev->mr, DEVICE(vm));
+    vm->postcopy_notifier.notify = virtio_mem_postcopy_notifier;
+    postcopy_add_notifier(&vm->postcopy_notifier);
+    qemu_register_reset(virtio_mem_system_reset, vm);
+out:
+    error_propagate(errp, local_err);
+}
+
+static void virtio_mem_device_unrealize(DeviceState *dev, Error **errp)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtIOMEM *vm = VIRTIO_MEM(dev);
+
+    qemu_unregister_reset(virtio_mem_system_reset, vm);
+    postcopy_remove_notifier(&vm->postcopy_notifier);
+    vmstate_unregister_ram(&vm->memdev->mr, DEVICE(vm));
+    host_memory_backend_set_mapped(vm->memdev, false);
+    virtio_del_queue(vdev, 0);
+    virtio_cleanup(vdev);
+    g_free(vm->bitmap);
+}
+
+static int virtio_mem_pre_save(void *opaque)
+{
+    VirtIOMEM *vm = VIRTIO_MEM(opaque);
+
+    vm->migration_addr = vm->addr;
+    vm->migration_block_size = vm->block_size;
+
+    return 0;
+}
+
+static int virtio_mem_restore_unplugged(VirtIOMEM *vm)
+{
+    unsigned long bit;
+    uint64_t gpa;
+
+    /*
+     * Called after all migrated memory has been restored, but before postcopy
+     * is enabled. Either way, we have to restore our state from the bitmap
+     * first.
+     */
+    bit = find_first_zero_bit(vm->bitmap, vm->bitmap_size);
+    while (bit < vm->bitmap_size) {
+        gpa = vm->addr + bit * vm->block_size;
+
+        virtio_mem_set_block_state(vm, gpa, vm->block_size, false);
+        bit = find_next_zero_bit(vm->bitmap, vm->bitmap_size, bit + 1);
+    }
+
+    return 0;
+}
+
+static int virtio_mem_post_load(void *opaque, int version_id)
+{
+    VirtIOMEM *vm = VIRTIO_MEM(opaque);
+
+    if (vm->migration_block_size != vm->block_size) {
+        error_report("'%s' doesn't match", VIRTIO_MEM_BLOCK_SIZE_PROP);
+        return -EINVAL;
+    }
+    if (vm->migration_addr != vm->addr) {
+        error_report("'%s' doesn't match", VIRTIO_MEM_ADDR_PROP);
+        return -EINVAL;
+    }
+    return virtio_mem_restore_unplugged(vm);
+}
+
+static const VMStateDescription vmstate_virtio_mem_device = {
+    .name = "virtio-mem-device",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .pre_save = virtio_mem_pre_save,
+    .post_load = virtio_mem_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
+        VMSTATE_UINT64(size, VirtIOMEM),
+        VMSTATE_UINT64(requested_size, VirtIOMEM),
+        VMSTATE_UINT64(migration_addr, VirtIOMEM),
+        VMSTATE_UINT32(migration_block_size, VirtIOMEM),
+        VMSTATE_BITMAP(bitmap, VirtIOMEM, 0, bitmap_size),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_virtio_mem = {
+    .name = "virtio-mem",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static void virtio_mem_fill_device_info(const VirtIOMEM *vmem,
+                                        VirtioMEMDeviceInfo *vi)
+{
+    vi->memaddr = vmem->addr;
+    vi->node = vmem->node;
+    vi->requested_size = vmem->requested_size;
+    vi->size = vmem->size;
+    vi->max_size = memory_region_size(&vmem->memdev->mr);
+    vi->block_size = vmem->block_size;
+    vi->memdev = object_get_canonical_path(OBJECT(vmem->memdev));
+}
+
+static MemoryRegion *virtio_mem_get_memory_region(VirtIOMEM *vmem, Error **errp)
+{
+    if (!vmem->memdev) {
+        error_setg(errp, "'%s' property must be set", VIRTIO_MEM_MEMDEV_PROP);
+        return NULL;
+    }
+
+    return &vmem->memdev->mr;
+}
+
+static void virtio_mem_get_size(Object *obj, Visitor *v, const char *name,
+                                void *opaque, Error **errp)
+{
+    const VirtIOMEM *vm = VIRTIO_MEM(obj);
+    uint64_t value = vm->size;
+
+    visit_type_size(v, name, &value, errp);
+}
+
+static void virtio_mem_get_requested_size(Object *obj, Visitor *v,
+                                          const char *name, void *opaque,
+                                          Error **errp)
+{
+    const VirtIOMEM *vm = VIRTIO_MEM(obj);
+    uint64_t value = vm->requested_size;
+
+    visit_type_size(v, name, &value, errp);
+}
+
+static void virtio_mem_set_requested_size(Object *obj, Visitor *v,
+                                          const char *name, void *opaque,
+                                          Error **errp)
+{
+    VirtIOMEM *vm = VIRTIO_MEM(obj);
+    Error *local_err = NULL;
+    uint64_t value;
+
+    visit_type_size(v, name, &value, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Growing the usable region might later not be possible, disallow it. */
+    if (virtio_mem_busy() && value > vm->requested_size) {
+        error_setg(errp, "'%s' cannot be increased while migrating,"
+                   " while dumping, or when certain vfio devices are used.",
+                   name);
+        return;
+    }
+
+    /*
+     * The block size and memory backend are not fixed until the device was
+     * realized. realize() will verify these properties then.
+     */
+    if (DEVICE(obj)->realized) {
+        if (!QEMU_IS_ALIGNED(value, vm->block_size)) {
+            error_setg(errp, "'%s' has to be multiples of '%s' (0x%" PRIx32
+                       ")", name, VIRTIO_MEM_BLOCK_SIZE_PROP,
+                       vm->block_size);
+            return;
+        } else if (value > memory_region_size(&vm->memdev->mr)) {
+            error_setg(errp, "'%s' cannot exceed the memory backend size"
+                       "(0x%" PRIx64 ")", name,
+                       memory_region_size(&vm->memdev->mr));
+            return;
+        }
+
+        if (value != vm->requested_size) {
+            uint64_t tmp_size;
+
+            vm->requested_size = value;
+
+            /* Grow the usable region if required */
+            tmp_size = MIN(memory_region_size(&vm->memdev->mr),
+                           vm->requested_size + VIRTIO_MEM_USABLE_EXTENT);
+            vm->usable_region_size = MAX(vm->usable_region_size, tmp_size);
+        }
+        /*
+         * Trigger a config update so the guest gets notified. We trigger
+         * even if the size didn't change (especially helpful for debugging).
+         */
+        virtio_notify_config(VIRTIO_DEVICE(vm));
+    } else {
+        vm->requested_size = value;
+    }
+}
+
+static void virtio_mem_get_block_size(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    const VirtIOMEM *vm = VIRTIO_MEM(obj);
+    uint64_t value = vm->block_size;
+
+    visit_type_size(v, name, &value, errp);
+}
+
+static void virtio_mem_set_block_size(Object *obj, Visitor *v, const char *name,
+                                      void *opaque, Error **errp)
+{
+    VirtIOMEM *vm = VIRTIO_MEM(obj);
+    Error *local_err = NULL;
+    uint64_t value;
+
+    if (DEVICE(obj)->realized) {
+        error_setg(errp, "'%s' cannot be changed", name);
+        return;
+    }
+
+    visit_type_size(v, name, &value, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    if (value > UINT32_MAX) {
+        error_setg(errp, "'%s' has to be smaller than 0x%" PRIx32, name,
+                   UINT32_MAX);
+        return;
+    } else if (value < VIRTIO_MEM_MIN_BLOCK_SIZE) {
+        error_setg(errp, "'%s' has to be at least 0x%" PRIx32, name,
+                   VIRTIO_MEM_MIN_BLOCK_SIZE);
+        return;
+    } else if (!is_power_of_2(value)) {
+        error_setg(errp, "'%s' has to be a power of two", name);
+        return;
+    }
+    vm->block_size = value;
+}
+
+static void virtio_mem_instance_init(Object *obj)
+{
+    VirtIOMEM *vm = VIRTIO_MEM(obj);
+
+    vm->block_size = VIRTIO_MEM_DEFAULT_BLOCK_SIZE;
+
+    object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_size,
+                        NULL, NULL, NULL, &error_abort);
+    object_property_add(obj, VIRTIO_MEM_REQUESTED_SIZE_PROP, "size",
+                        virtio_mem_get_requested_size,
+                        virtio_mem_set_requested_size, NULL, NULL,
+                        &error_abort);
+    object_property_add(obj, VIRTIO_MEM_BLOCK_SIZE_PROP, "size",
+                        virtio_mem_get_block_size, virtio_mem_set_block_size,
+                        NULL, NULL, &error_abort);
+}
+
+static Property virtio_mem_properties[] = {
+    DEFINE_PROP_UINT64(VIRTIO_MEM_ADDR_PROP, VirtIOMEM, addr, 0),
+    DEFINE_PROP_UINT32(VIRTIO_MEM_NODE_PROP, VirtIOMEM, node, 0),
+    DEFINE_PROP_LINK(VIRTIO_MEM_MEMDEV_PROP, VirtIOMEM, memdev,
+                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_mem_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+    VirtIOMEMClass *vmc = VIRTIO_MEM_CLASS(klass);
+
+    device_class_set_props(dc, virtio_mem_properties);
+    dc->vmsd = &vmstate_virtio_mem;
+
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    vdc->realize = virtio_mem_device_realize;
+    vdc->unrealize = virtio_mem_device_unrealize;
+    vdc->get_config = virtio_mem_get_config;
+    vdc->get_features = virtio_mem_get_features;
+    vdc->vmsd = &vmstate_virtio_mem_device;
+
+    vmc->fill_device_info = virtio_mem_fill_device_info;
+    vmc->get_memory_region = virtio_mem_get_memory_region;
+}
+
+static const TypeInfo virtio_mem_info = {
+    .name = TYPE_VIRTIO_MEM,
+    .parent = TYPE_VIRTIO_DEVICE,
+    .instance_size = sizeof(VirtIOMEM),
+    .instance_init = virtio_mem_instance_init,
+    .class_init = virtio_mem_class_init,
+    .class_size = sizeof(VirtIOMEMClass),
+};
+
+static void virtio_register_types(void)
+{
+    type_register_static(&virtio_mem_info);
+}
+
+type_init(virtio_register_types)
diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
new file mode 100644
index 0000000000..0a0d75ad6c
--- /dev/null
+++ b/include/hw/virtio/virtio-mem.h
@@ -0,0 +1,83 @@
+/*
+ * Virtio MEM device
+ *
+ * Copyright (C) 2018-2019 Red Hat, Inc.
+ *
+ * Authors:
+ *  David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_MEM_H
+#define HW_VIRTIO_MEM_H
+
+#include "standard-headers/linux/virtio_mem.h"
+#include "hw/virtio/virtio.h"
+#include "qapi/qapi-types-misc.h"
+#include "sysemu/hostmem.h"
+
+#define TYPE_VIRTIO_MEM "virtio-mem"
+
+#define VIRTIO_MEM(obj) \
+        OBJECT_CHECK(VirtIOMEM, (obj), TYPE_VIRTIO_MEM)
+#define VIRTIO_MEM_CLASS(oc) \
+        OBJECT_CLASS_CHECK(VirtIOMEMClass, (oc), TYPE_VIRTIO_MEM)
+#define VIRTIO_MEM_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(VirtIOMEMClass, (obj), TYPE_VIRTIO_MEM)
+
+#define VIRTIO_MEM_MEMDEV_PROP "memdev"
+#define VIRTIO_MEM_NODE_PROP "node"
+#define VIRTIO_MEM_SIZE_PROP "size"
+#define VIRTIO_MEM_REQUESTED_SIZE_PROP "requested-size"
+#define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size"
+#define VIRTIO_MEM_ADDR_PROP "memaddr"
+
+typedef struct VirtIOMEM {
+    VirtIODevice parent_obj;
+
+    /* guest -> host request queue */
+    VirtQueue *vq;
+
+    /* postcopy notifier */
+    NotifierWithReturn postcopy_notifier;
+
+    /* bitmap used to track unplugged memory */
+    int32_t bitmap_size;
+    unsigned long *bitmap;
+
+    /* assigned memory backend and memory region */
+    HostMemoryBackend *memdev;
+
+    /* NUMA node */
+    uint32_t node;
+
+    /* assigned address of the region in guest physical memory */
+    uint64_t addr;
+    uint64_t migration_addr;
+
+    /* usable region size (<= region_size) */
+    uint64_t usable_region_size;
+
+    /* actual size (how much the guest plugged) */
+    uint64_t size;
+
+    /* requested size */
+    uint64_t requested_size;
+
+    /* block size and alignment */
+    uint32_t block_size;
+    uint32_t migration_block_size;
+} VirtIOMEM;
+
+typedef struct VirtIOMEMClass {
+    /* private */
+    VirtIODevice parent;
+
+    /* public */
+    void (*fill_device_info)(const VirtIOMEM *vmen, VirtioMEMDeviceInfo *vi);
+    MemoryRegion *(*get_memory_region)(VirtIOMEM *vmem, Error **errp);
+} VirtIOMEMClass;
+
+#endif
diff --git a/qapi/misc.json b/qapi/misc.json
index 33b94e3589..cbbb8a35e1 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1557,19 +1557,56 @@
           }
 }
 
+##
+# @VirtioMEMDeviceInfo:
+#
+# VirtioMEMDevice state information
+#
+# @id: device's ID
+#
+# @memaddr: physical address in memory, where device is mapped
+#
+# @requested-size: the user requested size of the device
+#
+# @size: the (current) size of memory that the device provides
+#
+# @max-size: the maximum size of memory that the device can provide
+#
+# @block-size: the block size of memory that the device provides
+#
+# @node: NUMA node number where device is assigned to
+#
+# @memdev: memory backend linked with the region
+#
+# Since: 4.1
+##
+{ 'struct': 'VirtioMEMDeviceInfo',
+  'data': { '*id': 'str',
+            'memaddr': 'size',
+            'requested-size': 'size',
+            'size': 'size',
+            'max-size': 'size',
+            'block-size': 'size',
+            'node': 'int',
+            'memdev': 'str'
+          }
+}
+
 ##
 # @MemoryDeviceInfo:
 #
 # Union containing information about a memory device
 #
 # nvdimm is included since 2.12. virtio-pmem is included since 4.1.
+# virtio-mem is included since 4.2.
 #
 # Since: 2.1
 ##
 { 'union': 'MemoryDeviceInfo',
   'data': { 'dimm': 'PCDIMMDeviceInfo',
             'nvdimm': 'PCDIMMDeviceInfo',
-            'virtio-pmem': 'VirtioPMEMDeviceInfo'
+            'virtio-pmem': 'VirtioPMEMDeviceInfo',
+            'virtio-mem': 'VirtioMEMDeviceInfo'
           }
 }
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 02/16] virtio-pci: Proxy for virtio-mem
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 01/16] virtio-mem: Prototype David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 03/16] hmp: Handle virtio-mem when printing memory device infos David Hildenbrand
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/Makefile.objs    |   1 +
 hw/virtio/virtio-mem-pci.c | 136 +++++++++++++++++++++++++++++++++++++
 hw/virtio/virtio-mem-pci.h |  33 +++++++++
 include/hw/pci/pci.h       |   1 +
 4 files changed, 171 insertions(+)
 create mode 100644 hw/virtio/virtio-mem-pci.c
 create mode 100644 hw/virtio/virtio-mem-pci.h

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 3ed94c84d7..3f8a281d36 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -18,6 +18,7 @@ common-obj-$(call land,$(CONFIG_VIRTIO_PMEM),$(CONFIG_VIRTIO_PCI)) += virtio-pme
 obj-$(call land,$(CONFIG_VHOST_USER_FS),$(CONFIG_VIRTIO_PCI)) += vhost-user-fs-pci.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
 obj-$(CONFIG_VIRTIO_MEM) += virtio-mem.o
+common-obj-$(call land,$(CONFIG_VIRTIO_MEM),$(CONFIG_VIRTIO_PCI)) += virtio-mem-pci.o
 
 ifeq ($(CONFIG_VIRTIO_PCI),y)
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock-pci.o
diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
new file mode 100644
index 0000000000..d3a2c99492
--- /dev/null
+++ b/hw/virtio/virtio-mem-pci.c
@@ -0,0 +1,136 @@
+/*
+ * Virtio MEM PCI device
+ *
+ * Copyright (C) 2018-2019 Red Hat, Inc.
+ *
+ * Authors:
+ *  David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "virtio-mem-pci.h"
+#include "hw/mem/memory-device.h"
+#include "qapi/error.h"
+
+static void virtio_mem_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+    VirtIOMEMPCI *mem_pci = VIRTIO_MEM_PCI(vpci_dev);
+    DeviceState *vdev = DEVICE(&mem_pci->vdev);
+
+    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
+    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
+}
+
+static void virtio_mem_pci_set_addr(MemoryDeviceState *md, uint64_t addr,
+                                     Error **errp)
+{
+    object_property_set_uint(OBJECT(md), addr, VIRTIO_MEM_ADDR_PROP, errp);
+}
+
+static uint64_t virtio_mem_pci_get_addr(const MemoryDeviceState *md)
+{
+    return object_property_get_uint(OBJECT(md), VIRTIO_MEM_ADDR_PROP,
+                                    &error_abort);
+}
+
+static MemoryRegion *virtio_mem_pci_get_memory_region(MemoryDeviceState *md,
+                                                      Error **errp)
+{
+    VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md);
+    VirtIOMEM *vmem = VIRTIO_MEM(&pci_mem->vdev);
+    VirtIOMEMClass *vmc = VIRTIO_MEM_GET_CLASS(vmem);
+
+    return vmc->get_memory_region(vmem, errp);
+}
+
+static uint64_t virtio_mem_pci_get_plugged_size(const MemoryDeviceState *md,
+                                                 Error **errp)
+{
+    VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md);
+    VirtIOMEM *mem = VIRTIO_MEM(&pci_mem->vdev);
+    VirtIOMEMClass *vpc = VIRTIO_MEM_GET_CLASS(mem);
+    MemoryRegion *mr = vpc->get_memory_region(mem, errp);
+
+    /* the plugged size corresponds to the region size */
+    return mr ? 0 : memory_region_size(mr);
+}
+
+static void virtio_mem_pci_fill_device_info(const MemoryDeviceState *md,
+                                             MemoryDeviceInfo *info)
+{
+    VirtioMEMDeviceInfo *vi = g_new0(VirtioMEMDeviceInfo, 1);
+    VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md);
+    VirtIOMEM *mem = VIRTIO_MEM(&pci_mem->vdev);
+    VirtIOMEMClass *vpc = VIRTIO_MEM_GET_CLASS(mem);
+    DeviceState *dev = DEVICE(md);
+
+    if (dev->id) {
+        vi->has_id = true;
+        vi->id = g_strdup(dev->id);
+    }
+
+    /* let the real device handle everything else */
+    vpc->fill_device_info(mem, vi);
+
+    info->u.virtio_mem.data = vi;
+    info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM;
+}
+
+static void virtio_mem_pci_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+    MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
+
+    k->realize = virtio_mem_pci_realize;
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_MEM;
+    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
+    pcidev_k->class_id = PCI_CLASS_OTHERS;
+
+    mdc->get_addr = virtio_mem_pci_get_addr;
+    mdc->set_addr = virtio_mem_pci_set_addr;
+    mdc->get_plugged_size = virtio_mem_pci_get_plugged_size;
+    mdc->get_memory_region = virtio_mem_pci_get_memory_region;
+    mdc->fill_device_info = virtio_mem_pci_fill_device_info;
+}
+
+static void virtio_mem_pci_instance_init(Object *obj)
+{
+    VirtIOMEMPCI *dev = VIRTIO_MEM_PCI(obj);
+
+    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+                                TYPE_VIRTIO_MEM);
+    object_property_add_alias(obj, VIRTIO_MEM_BLOCK_SIZE_PROP,
+                              OBJECT(&dev->vdev),
+                              VIRTIO_MEM_BLOCK_SIZE_PROP, &error_abort);
+    object_property_add_alias(obj, VIRTIO_MEM_SIZE_PROP, OBJECT(&dev->vdev),
+                              VIRTIO_MEM_SIZE_PROP, &error_abort);
+    object_property_add_alias(obj, VIRTIO_MEM_REQUESTED_SIZE_PROP,
+                              OBJECT(&dev->vdev),
+                              VIRTIO_MEM_REQUESTED_SIZE_PROP, &error_abort);
+}
+
+static const VirtioPCIDeviceTypeInfo virtio_mem_pci_info = {
+    .base_name = TYPE_VIRTIO_MEM_PCI,
+    .generic_name = "virtio-mem-pci",
+    .instance_size = sizeof(VirtIOMEMPCI),
+    .instance_init = virtio_mem_pci_instance_init,
+    .class_init = virtio_mem_pci_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_MEMORY_DEVICE },
+        { }
+    },
+};
+
+static void virtio_mem_pci_register_types(void)
+{
+    virtio_pci_types_register(&virtio_mem_pci_info);
+}
+type_init(virtio_mem_pci_register_types)
diff --git a/hw/virtio/virtio-mem-pci.h b/hw/virtio/virtio-mem-pci.h
new file mode 100644
index 0000000000..bef1c188cf
--- /dev/null
+++ b/hw/virtio/virtio-mem-pci.h
@@ -0,0 +1,33 @@
+/*
+ * Virtio MEM PCI device
+ *
+ * Copyright (C) 2018-2019 Red Hat, Inc.
+ *
+ * Authors:
+ *  David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_VIRTIO_MEM_PCI_H
+#define QEMU_VIRTIO_MEM_PCI_H
+
+#include "hw/virtio/virtio-pci.h"
+#include "hw/virtio/virtio-mem.h"
+
+typedef struct VirtIOMEMPCI VirtIOMEMPCI;
+
+/*
+ * virtio-mem-pci: This extends VirtioPCIProxy.
+ */
+#define TYPE_VIRTIO_MEM_PCI "virtio-mem-pci-base"
+#define VIRTIO_MEM_PCI(obj) \
+        OBJECT_CHECK(VirtIOMEMPCI, (obj), TYPE_VIRTIO_MEM_PCI)
+
+struct VirtIOMEMPCI {
+    VirtIOPCIProxy parent_obj;
+    VirtIOMEM vdev;
+};
+
+#endif /* QEMU_VIRTIO_MEM_PCI_H */
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 2acd8321af..54c21a265e 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -86,6 +86,7 @@ extern bool pci_available;
 #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
 #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
 #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
+#define PCI_DEVICE_ID_VIRTIO_MEM         0x1014
 
 #define PCI_VENDOR_ID_REDHAT             0x1b36
 #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 03/16] hmp: Handle virtio-mem when printing memory device infos
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 01/16] virtio-mem: Prototype David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 02/16] virtio-pci: Proxy for virtio-mem David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 04/16] numa: Handle virtio-mem in NUMA stats David Hildenbrand
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Print the memory device info just like other memory devices.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 monitor/hmp-cmds.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 558fe06b8f..798aead52e 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -2542,6 +2542,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
     MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err);
     MemoryDeviceInfoList *info;
     VirtioPMEMDeviceInfo *vpi;
+    VirtioMEMDeviceInfo *vmi;
     MemoryDeviceInfo *value;
     PCDIMMDeviceInfo *di;
 
@@ -2576,6 +2577,21 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
                 monitor_printf(mon, "  size: %" PRIu64 "\n", vpi->size);
                 monitor_printf(mon, "  memdev: %s\n", vpi->memdev);
                 break;
+            case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM:
+                vmi = value->u.virtio_mem.data;
+                monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
+                               MemoryDeviceInfoKind_str(value->type),
+                               vmi->id ? vmi->id : "");
+                monitor_printf(mon, "  memaddr: 0x%" PRIx64 "\n", vmi->memaddr);
+                monitor_printf(mon, "  node: %" PRId64 "\n", vmi->node);
+                monitor_printf(mon, "  requested-size: %" PRIu64 "\n",
+                               vmi->requested_size);
+                monitor_printf(mon, "  size: %" PRIu64 "\n", vmi->size);
+                monitor_printf(mon, "  max-size: %" PRIu64 "\n", vmi->max_size);
+                monitor_printf(mon, "  block-size: %" PRIu64 "\n",
+                               vmi->block_size);
+                monitor_printf(mon, "  memdev: %s\n", vmi->memdev);
+                break;
             default:
                 g_assert_not_reached();
             }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 04/16] numa: Handle virtio-mem in NUMA stats
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 03/16] hmp: Handle virtio-mem when printing memory device infos David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 05/16] pc: Support for virtio-mem-pci David Hildenbrand
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Account the memory to the configured nide.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/numa.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 601cf9f603..4deb27ebee 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -855,10 +855,11 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
 {
     MemoryDeviceInfoList *info_list = qmp_memory_device_list();
     MemoryDeviceInfoList *info;
-    PCDIMMDeviceInfo     *pcdimm_info;
     VirtioPMEMDeviceInfo *vpi;
+    VirtioMEMDeviceInfo *vmi;
 
     for (info = info_list; info; info = info->next) {
+        PCDIMMDeviceInfo *pcdimm_info = NULL;;
         MemoryDeviceInfo *value = info->value;
 
         if (value) {
@@ -877,6 +878,11 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
                 node_mem[0].node_mem += vpi->size;
                 node_mem[0].node_plugged_mem += vpi->size;
                 break;
+            case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM:
+                vmi = value->u.virtio_mem.data;
+                node_mem[vmi->node].node_mem += vmi->size;
+                node_mem[vmi->node].node_plugged_mem += vmi->size;
+                break;
             default:
                 g_assert_not_reached();
             }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 05/16] pc: Support for virtio-mem-pci
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 04/16] numa: Handle virtio-mem in NUMA stats David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 06/16] exec: Provide owner when resizing memory region David Hildenbrand
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/i386/Kconfig |  1 +
 hw/i386/pc.c    | 42 ++++++++++++++++++++++++------------------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index cdc851598c..e8ce582edd 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -35,6 +35,7 @@ config PC
     select ACPI_PCI
     select ACPI_VMGENID
     select VIRTIO_PMEM_SUPPORTED
+    select VIRTIO_MEM_SUPPORTED
 
 config PC_PCI
     bool
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2ddce4230a..ed8850f31d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -85,6 +85,7 @@
 #include "hw/net/ne2000-isa.h"
 #include "standard-headers/asm-x86/bootparam.h"
 #include "hw/virtio/virtio-pmem-pci.h"
+#include "hw/virtio/virtio-mem-pci.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/replay.h"
 #include "qapi/qmp/qerror.h"
@@ -1648,8 +1649,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
     numa_cpu_pre_plug(cpu_slot, dev, errp);
 }
 
-static void pc_virtio_pmem_pci_pre_plug(HotplugHandler *hotplug_dev,
-                                        DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_pre_plug(HotplugHandler *hotplug_dev,
+                                      DeviceState *dev, Error **errp)
 {
     HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
     Error *local_err = NULL;
@@ -1660,7 +1661,7 @@ static void pc_virtio_pmem_pci_pre_plug(HotplugHandler *hotplug_dev,
          * order. This should never be the case on x86, however better add
          * a safety net.
          */
-        error_setg(errp, "virtio-pmem-pci not supported on this bus.");
+        error_setg(errp, "virtio based memory devices not supported on this bus.");
         return;
     }
     /*
@@ -1675,8 +1676,8 @@ static void pc_virtio_pmem_pci_pre_plug(HotplugHandler *hotplug_dev,
     error_propagate(errp, local_err);
 }
 
-static void pc_virtio_pmem_pci_plug(HotplugHandler *hotplug_dev,
-                                    DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_plug(HotplugHandler *hotplug_dev,
+                                  DeviceState *dev, Error **errp)
 {
     HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
     Error *local_err = NULL;
@@ -1694,15 +1695,15 @@ static void pc_virtio_pmem_pci_plug(HotplugHandler *hotplug_dev,
     error_propagate(errp, local_err);
 }
 
-static void pc_virtio_pmem_pci_unplug_request(HotplugHandler *hotplug_dev,
-                                              DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_unplug_request(HotplugHandler *hotplug_dev,
+                                            DeviceState *dev, Error **errp)
 {
     /* We don't support virtio pmem hot unplug */
     error_setg(errp, "virtio pmem device unplug not supported.");
 }
 
-static void pc_virtio_pmem_pci_unplug(HotplugHandler *hotplug_dev,
-                                      DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_unplug(HotplugHandler *hotplug_dev,
+                                    DeviceState *dev, Error **errp)
 {
     /* We don't support virtio pmem hot unplug */
 }
@@ -1714,8 +1715,9 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
         pc_memory_pre_plug(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         pc_cpu_pre_plug(hotplug_dev, dev, errp);
-    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
-        pc_virtio_pmem_pci_pre_plug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+               object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+        pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
     }
 }
 
@@ -1726,8 +1728,9 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
         pc_memory_plug(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         pc_cpu_plug(hotplug_dev, dev, errp);
-    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
-        pc_virtio_pmem_pci_plug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+               object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+        pc_virtio_md_pci_plug(hotplug_dev, dev, errp);
     }
 }
 
@@ -1738,8 +1741,9 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
         pc_memory_unplug_request(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         pc_cpu_unplug_request_cb(hotplug_dev, dev, errp);
-    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
-        pc_virtio_pmem_pci_unplug_request(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+               object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+        pc_virtio_md_pci_unplug_request(hotplug_dev, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug request for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
@@ -1753,8 +1757,9 @@ static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
         pc_memory_unplug(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         pc_cpu_unplug_cb(hotplug_dev, dev, errp);
-    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
-        pc_virtio_pmem_pci_unplug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+               object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+        pc_virtio_md_pci_unplug(hotplug_dev, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
@@ -1766,7 +1771,8 @@ static HotplugHandler *pc_get_hotplug_handler(MachineState *machine,
 {
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
         object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
-        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
         return HOTPLUG_HANDLER(machine);
     }
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 06/16] exec: Provide owner when resizing memory region
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 05/16] pc: Support for virtio-mem-pci David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 07/16] memory: Add memory_region_max_size() and memory_region_is_resizable() David Hildenbrand
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Let's pass the owner in the callback. While touching it, introduce a
typedef for the callback.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                  | 13 +++++--------
 hw/core/loader.c        |  3 ++-
 include/exec/memory.h   |  7 ++++---
 include/exec/ram_addr.h |  4 +---
 include/exec/ramblock.h |  3 ++-
 memory.c                |  4 +---
 6 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/exec.c b/exec.c
index 71e32dcc11..5bc9b231c4 100644
--- a/exec.c
+++ b/exec.c
@@ -2193,7 +2193,8 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 
     memory_region_set_size(block->mr, newsize);
     if (block->resized) {
-        block->resized(block->idstr, newsize, block->host);
+        block->resized(memory_region_owner(block->mr), block->idstr, newsize,
+                       block->host);
     }
 
     /*
@@ -2476,9 +2477,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
 
 static
 RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
-                                  void (*resized)(const char*,
-                                                  uint64_t length,
-                                                  void *host),
+                                  memory_region_resized_fn resized,
                                   void *host, bool resizeable, bool share,
                                   MemoryRegion *mr, Error **errp)
 {
@@ -2529,10 +2528,8 @@ RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share,
 }
 
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t maxsz,
-                                     void (*resized)(const char*,
-                                                     uint64_t length,
-                                                     void *host),
-                                     MemoryRegion *mr, Error **errp)
+                                    memory_region_resized_fn resized,
+                                    MemoryRegion *mr, Error **errp)
 {
     return qemu_ram_alloc_internal(size, maxsz, resized, NULL, true,
                                    false, mr, errp);
diff --git a/hw/core/loader.c b/hw/core/loader.c
index d1b78f60cd..59fb1620f1 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -912,7 +912,8 @@ static void rom_insert(Rom *rom)
     QTAILQ_INSERT_TAIL(&roms, rom, next);
 }
 
-static void fw_cfg_resized(const char *id, uint64_t length, void *host)
+static void fw_cfg_resized(Object *owner, const char *id, uint64_t length,
+                           void *host)
 {
     if (fw_cfg) {
         fw_cfg_modify_file(fw_cfg, id + strlen("/rom@"), host, length);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 19417943a2..9f02bb7830 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -846,6 +846,9 @@ void memory_region_init_ram_shared_nomigrate(MemoryRegion *mr,
                                              bool share,
                                              Error **errp);
 
+typedef void (*memory_region_resized_fn)(Object *owner, const char*id,
+                                         uint64_t length, void *host);
+
 /**
  * memory_region_init_resizeable_ram:  Initialize memory region with resizeable
  *                                     RAM.  Accesses into the region will
@@ -870,9 +873,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
                                        const char *name,
                                        uint64_t size,
                                        uint64_t max_size,
-                                       void (*resized)(const char*,
-                                                       uint64_t length,
-                                                       void *host),
+                                       memory_region_resized_fn resized,
                                        Error **errp);
 #ifdef CONFIG_POSIX
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 5e59a3d8d7..0ee3126361 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -128,9 +128,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
 RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share, MemoryRegion *mr,
                          Error **errp);
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t max_size,
-                                    void (*resized)(const char*,
-                                                    uint64_t length,
-                                                    void *host),
+                                    memory_region_resized_fn resized,
                                     MemoryRegion *mr, Error **errp);
 void qemu_ram_free(RAMBlock *block);
 
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 07d50864d8..437b8f82ea 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -21,6 +21,7 @@
 
 #ifndef CONFIG_USER_ONLY
 #include "cpu-common.h"
+#include "exec/memory.h"
 
 struct RAMBlock {
     struct rcu_head rcu;
@@ -30,7 +31,7 @@ struct RAMBlock {
     ram_addr_t offset;
     ram_addr_t used_length;
     ram_addr_t max_length;
-    void (*resized)(const char*, uint64_t length, void *host);
+    memory_region_resized_fn resized;
     uint32_t flags;
     /* Protected by iothread lock.  */
     char idstr[256];
diff --git a/memory.c b/memory.c
index aeaa8dcc9e..cb09a8ee59 100644
--- a/memory.c
+++ b/memory.c
@@ -1535,9 +1535,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
                                        const char *name,
                                        uint64_t size,
                                        uint64_t max_size,
-                                       void (*resized)(const char*,
-                                                       uint64_t length,
-                                                       void *host),
+                                       memory_region_resized_fn resized,
                                        Error **errp)
 {
     Error *err = NULL;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 07/16] memory: Add memory_region_max_size() and memory_region_is_resizable()
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 06/16] exec: Provide owner when resizing memory region David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 08/16] memory: Disallow resizing to 0 David Hildenbrand
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

We want to pass resizable memory regions to devices that can deal
with them (and autoamtically resize them). Allow them to easily
identify if a region can be resized and what the maximum size is.

Add both functions, adding qemu_ram_is_resizable() as a helper.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/memory.h | 17 +++++++++++++++++
 memory.c              | 18 ++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9f02bb7830..dfedd88f13 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -395,6 +395,7 @@ struct MemoryRegion {
     void *opaque;
     MemoryRegion *container;
     Int128 size;
+    Int128 max_size;
     hwaddr addr;
     void (*destructor)(MemoryRegion *mr);
     uint64_t align;
@@ -1180,6 +1181,13 @@ struct Object *memory_region_owner(MemoryRegion *mr);
  */
 uint64_t memory_region_size(MemoryRegion *mr);
 
+/**
+ * memory_region_max_size: get a memory region's maximum size.
+ *
+ * @mr: the memory region being queried.
+ */
+uint64_t memory_region_max_size(MemoryRegion *mr);
+
 /**
  * memory_region_is_ram: check whether a memory region is random access
  *
@@ -1471,6 +1479,15 @@ MemoryRegion *memory_region_from_host(void *ptr, ram_addr_t *offset);
  */
 void *memory_region_get_ram_ptr(MemoryRegion *mr);
 
+/**
+ * memory_region_is_resizable: check whether a memory region resizable
+ *
+ * Returns %true if a memory region is resizable.
+ *
+ * @mr: the memory region being queried
+ */
+bool memory_region_is_resizable(MemoryRegion *mr);
+
 /* memory_region_ram_resize: Resize a RAM region.
  *
  * Only legal before guest might have detected the memory size: e.g. on
diff --git a/memory.c b/memory.c
index cb09a8ee59..5c62702618 100644
--- a/memory.c
+++ b/memory.c
@@ -1130,6 +1130,7 @@ static void memory_region_do_init(MemoryRegion *mr,
     if (size == UINT64_MAX) {
         mr->size = int128_2_64();
     }
+    mr->max_size = mr->size;
     mr->name = g_strdup(name);
     mr->owner = owner;
     mr->ram_block = NULL;
@@ -1540,6 +1541,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
 {
     Error *err = NULL;
     memory_region_init(mr, owner, name, size);
+    mr->max_size = int128_make64(max_size);
+    if (max_size == UINT64_MAX) {
+        mr->max_size = int128_2_64();
+    }
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
@@ -1779,6 +1784,14 @@ uint64_t memory_region_size(MemoryRegion *mr)
     return int128_get64(mr->size);
 }
 
+uint64_t memory_region_max_size(MemoryRegion *mr)
+{
+    if (int128_eq(mr->max_size, int128_2_64())) {
+        return UINT64_MAX;
+    }
+    return int128_get64(mr->max_size);
+}
+
 const char *memory_region_name(const MemoryRegion *mr)
 {
     if (!mr->name) {
@@ -2198,6 +2211,11 @@ ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
     return mr->ram_block ? mr->ram_block->offset : RAM_ADDR_INVALID;
 }
 
+bool memory_region_is_resizable(MemoryRegion *mr)
+{
+    return mr->ram_block && qemu_ram_is_resizable(mr->ram_block);
+}
+
 void memory_region_ram_resize(MemoryRegion *mr, ram_addr_t newsize, Error **errp)
 {
     assert(mr->ram_block);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 08/16] memory: Disallow resizing to 0
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 07/16] memory: Add memory_region_max_size() and memory_region_is_resizable() David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 09/16] memory-device: properly deal with resizable memory regions David Hildenbrand
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Memory regions / qemu ramblocks always have to have a size > 0.
Especially, otherwise, ramblock_ptr() will bail out with an assert.
Enforce this.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/exec.c b/exec.c
index 5bc9b231c4..161e40e16e 100644
--- a/exec.c
+++ b/exec.c
@@ -2160,6 +2160,11 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
         return 0;
     }
 
+    if (!newsize) {
+        error_setg_errno(errp, EINVAL, "Length cannot be 0: %s", block->idstr);
+        return -EINVAL;
+    }
+
     if (!qemu_ram_is_resizable(block)) {
         error_setg_errno(errp, EINVAL,
                          "Length mismatch: %s: 0x" RAM_ADDR_FMT
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 09/16] memory-device: properly deal with resizable memory regions
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 08/16] memory: Disallow resizing to 0 David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 10/16] hostmem: Factor out applying settings David Hildenbrand
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

In case we are dealing with resizable memory regions, we always have to
assign space in the physical address space which can fit the maximum
region size.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/mem/memory-device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 4bc9cf0917..32d0c5d334 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -269,7 +269,7 @@ void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms,
     align = legacy_align ? *legacy_align : memory_region_get_alignment(mr);
     addr = mdc->get_addr(md);
     addr = memory_device_get_free_addr(ms, !addr ? NULL : &addr, align,
-                                       memory_region_size(mr), &local_err);
+                                       memory_region_max_size(mr), &local_err);
     if (local_err) {
         goto out;
     }
@@ -329,7 +329,7 @@ uint64_t memory_device_get_region_size(const MemoryDeviceState *md,
         return 0;
     }
 
-    return memory_region_size(mr);
+    return memory_region_max_size(mr);
 }
 
 static const TypeInfo memory_device_info = {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 10/16] hostmem: Factor out applying settings
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (8 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 09/16] memory-device: properly deal with resizable memory regions David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 11/16] hostmem: Factor out common checks into host_memory_backend_validate() David Hildenbrand
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

We want to reuse the functionality when resizing resizable memory
region.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem.c | 137 +++++++++++++++++++++++++--------------------
 1 file changed, 76 insertions(+), 61 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index e773bdfa6e..2c8e4567e1 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -308,15 +308,85 @@ size_t host_memory_backend_pagesize(HostMemoryBackend *memdev)
 }
 #endif
 
+static void host_memory_backend_apply_settings(HostMemoryBackend *backend,
+                                               Error **errp)
+{
+    const uint64_t sz = memory_region_size(&backend->mr);
+    void *ptr = memory_region_get_ram_ptr(&backend->mr);
+    MachineState *ms = MACHINE(qdev_get_machine());
+    Error *local_err = NULL;
+
+    if (backend->merge) {
+        qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
+    }
+    if (!backend->dump) {
+        qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
+    }
+#ifdef CONFIG_NUMA
+   unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
+   /* lastbit == MAX_NODES means maxnode = 0 */
+   unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
+   /*
+    * Ensure policy won't be ignored in case memory is preallocated before
+    * mbind(). note: MPOL_MF_STRICT is ignored on hugepages so this doesn't
+    * catch hugepage case.
+    */
+   unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
+
+   /*
+    * Check for invalid host-nodes and policies and give more verbose error
+    * messages than mbind().
+    */
+   if (maxnode && backend->policy == MPOL_DEFAULT) {
+       error_setg(errp, "host-nodes must be empty for policy default,"
+                  " or you should explicitly specify a policy other"
+                  " than default");
+       return;
+   } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
+       error_setg(errp, "host-nodes must be set for policy %s",
+                  HostMemPolicy_str(backend->policy));
+       return;
+   }
+
+   /*
+    * We can have up to MAX_NODES nodes, but we need to pass maxnode+1 as
+    * argument to mbind() due to an old Linux bug (feature?) which cuts off the
+    * last specified node. This means backend->host_nodes must have MAX_NODES+1
+    * bits available.
+    */
+   assert(sizeof(backend->host_nodes) >=
+          BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
+   assert(maxnode <= MAX_NODES);
+   if (mbind(ptr, sz, backend->policy,
+             maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
+       if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
+           error_setg_errno(errp, errno,
+                            "cannot bind memory to host NUMA nodes");
+           return;
+       }
+   }
+#endif
+    /*
+     * Preallocate memory after the NUMA policy has been instantiated. This is
+     * necessary to guarantee memory is allocated with specified NUMA policy
+     * in place.
+     */
+    if (backend->prealloc) {
+        os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz,
+                        ms->smp.cpus, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
 static void
 host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(uc);
     HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
-    MachineState *ms = MACHINE(qdev_get_machine());
     Error *local_err = NULL;
-    void *ptr;
-    uint64_t sz;
 
     if (bc->alloc) {
         bc->alloc(backend, &local_err);
@@ -324,64 +394,9 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
             goto out;
         }
 
-        ptr = memory_region_get_ram_ptr(&backend->mr);
-        sz = memory_region_size(&backend->mr);
-
-        if (backend->merge) {
-            qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
-        }
-        if (!backend->dump) {
-            qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
-        }
-#ifdef CONFIG_NUMA
-        unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
-        /* lastbit == MAX_NODES means maxnode = 0 */
-        unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
-        /* ensure policy won't be ignored in case memory is preallocated
-         * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
-         * this doesn't catch hugepage case. */
-        unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
-
-        /* check for invalid host-nodes and policies and give more verbose
-         * error messages than mbind(). */
-        if (maxnode && backend->policy == MPOL_DEFAULT) {
-            error_setg(errp, "host-nodes must be empty for policy default,"
-                       " or you should explicitly specify a policy other"
-                       " than default");
-            return;
-        } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
-            error_setg(errp, "host-nodes must be set for policy %s",
-                       HostMemPolicy_str(backend->policy));
-            return;
-        }
-
-        /* We can have up to MAX_NODES nodes, but we need to pass maxnode+1
-         * as argument to mbind() due to an old Linux bug (feature?) which
-         * cuts off the last specified node. This means backend->host_nodes
-         * must have MAX_NODES+1 bits available.
-         */
-        assert(sizeof(backend->host_nodes) >=
-               BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
-        assert(maxnode <= MAX_NODES);
-        if (mbind(ptr, sz, backend->policy,
-                  maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
-            if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
-                error_setg_errno(errp, errno,
-                                 "cannot bind memory to host NUMA nodes");
-                return;
-            }
-        }
-#endif
-        /* Preallocate memory after the NUMA policy has been instantiated.
-         * This is necessary to guarantee memory is allocated with
-         * specified NUMA policy in place.
-         */
-        if (backend->prealloc) {
-            os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz,
-                            ms->smp.cpus, &local_err);
-            if (local_err) {
-                goto out;
-            }
+        host_memory_backend_apply_settings(backend, &local_err);
+        if (local_err) {
+            goto out;
         }
     }
 out:
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 11/16] hostmem: Factor out common checks into host_memory_backend_validate()
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (9 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 10/16] hostmem: Factor out applying settings David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 12/16] hostmem: Introduce "managed-size" for memory-backend-ram David Hildenbrand
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

All users want to perform similar checks. Lat's factor it out to prepare
for more checks.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem.c       | 14 ++++++++++++++
 hw/mem/pc-dimm.c         | 12 +++++-------
 hw/misc/ivshmem.c        | 11 ++++-------
 hw/virtio/virtio-mem.c   | 15 +++++----------
 hw/virtio/virtio-pmem.c  | 13 ++++---------
 include/sysemu/hostmem.h |  2 ++
 6 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 2c8e4567e1..de37f1bf5d 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -291,6 +291,20 @@ bool host_memory_backend_is_mapped(HostMemoryBackend *backend)
     return backend->is_mapped;
 }
 
+void host_memory_backend_validate(HostMemoryBackend *backend,
+                                  const char *property, Error **errp)
+{
+    char *path = object_get_canonical_path_component(OBJECT(backend));
+
+    if (!backend) {
+        error_setg(errp, "'%s' property is not set", property);
+    } else if (host_memory_backend_is_mapped(backend)) {
+        error_setg(errp, "'%s' property specifies a busy memdev: %s",
+                   property, path);
+    }
+    g_free(path);
+}
+
 #ifdef __linux__
 size_t host_memory_backend_pagesize(HostMemoryBackend *memdev)
 {
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 8f50b8afea..9ee634ee89 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -174,16 +174,14 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
     PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
     MachineState *ms = MACHINE(qdev_get_machine());
     int nb_numa_nodes = ms->numa_state->num_nodes;
+    Error *err = NULL;
 
-    if (!dimm->hostmem) {
-        error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
-        return;
-    } else if (host_memory_backend_is_mapped(dimm->hostmem)) {
-        char *path = object_get_canonical_path_component(OBJECT(dimm->hostmem));
-        error_setg(errp, "can't use already busy memdev: %s", path);
-        g_free(path);
+    host_memory_backend_validate(dimm->hostmem, PC_DIMM_MEMDEV_PROP, &err);
+    if (err) {
+        error_propagate(errp, err);
         return;
     }
+
     if (((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) ||
         (!nb_numa_nodes && dimm->node)) {
         error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value %"
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 1a0fad74e1..39bffceadf 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -1035,14 +1035,11 @@ static Property ivshmem_plain_properties[] = {
 static void ivshmem_plain_realize(PCIDevice *dev, Error **errp)
 {
     IVShmemState *s = IVSHMEM_COMMON(dev);
+    Error *err = NULL;
 
-    if (!s->hostmem) {
-        error_setg(errp, "You must specify a 'memdev'");
-        return;
-    } else if (host_memory_backend_is_mapped(s->hostmem)) {
-        char *path = object_get_canonical_path_component(OBJECT(s->hostmem));
-        error_setg(errp, "can't use already busy memdev: %s", path);
-        g_free(path);
+    host_memory_backend_validate(s->hostmem, "memdev", &err);
+    if (err) {
+        error_propagate(errp, err);
         return;
     }
 
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 2f759578fe..4b7b4cf950 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -414,16 +414,11 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     uint64_t page_size;
 
     /* verify the memdev */
-    if (!vm->memdev) {
-        error_setg(&local_err, "'%s' property must be set",
-                   VIRTIO_MEM_MEMDEV_PROP);
-        goto out;
-    } else if (host_memory_backend_is_mapped(vm->memdev)) {
-        char *path = object_get_canonical_path_component(OBJECT(vm->memdev));
-
-        error_setg(&local_err, "can't use already busy memdev: %s", path);
-        g_free(path);
-        goto out;
+    host_memory_backend_validate(vm->memdev, VIRTIO_MEM_MEMDEV_PROP,
+                                 &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
     }
 
     /* verify the node */
diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
index 97287e923b..85cb337ed5 100644
--- a/hw/virtio/virtio-pmem.c
+++ b/hw/virtio/virtio-pmem.c
@@ -105,16 +105,11 @@ static void virtio_pmem_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOPMEM *pmem = VIRTIO_PMEM(dev);
+    Error *err = NULL;
 
-    if (!pmem->memdev) {
-        error_setg(errp, "virtio-pmem memdev not set");
-        return;
-    }
-
-    if (host_memory_backend_is_mapped(pmem->memdev)) {
-        char *path = object_get_canonical_path_component(OBJECT(pmem->memdev));
-        error_setg(errp, "can't use already busy memdev: %s", path);
-        g_free(path);
+    host_memory_backend_validate(pmem->memdev, "memdev", &err);
+    if (err) {
+        error_propagate(errp, err);
         return;
     }
 
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 4dbdadd39e..d4dbf108ca 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -65,6 +65,8 @@ MemoryRegion *host_memory_backend_get_memory(HostMemoryBackend *backend);
 
 void host_memory_backend_set_mapped(HostMemoryBackend *backend, bool mapped);
 bool host_memory_backend_is_mapped(HostMemoryBackend *backend);
+void host_memory_backend_validate(HostMemoryBackend *backend,
+                                  const char *property, Error **errp);
 size_t host_memory_backend_pagesize(HostMemoryBackend *memdev);
 char *host_memory_backend_get_name(HostMemoryBackend *backend);
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 12/16] hostmem: Introduce "managed-size" for memory-backend-ram
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (10 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 11/16] hostmem: Factor out common checks into host_memory_backend_validate() David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:35 ` [PATCH v2 13/16] qmp/hmp: Expose "managed-size" for memory backends David Hildenbrand
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

virtio-mem wants to make use of resizable memory regions. Allow to
create them by the user by specifying "managed-size".

Disallow setting "managed-size" with "prealloc" and "shared". The latter
might theoretically be possible, however has to be wired up internally
first.

Support for memory-backend-ram only for now. Support for other backends
(especially, hugepages), can be added later (and once e.g., virtio-mem
also supports hugepages).

When the memory region gets resized, apply the same settings just as when
allocating the memory.

Fence off the all such memory backends in all existing users. We'll
convert virtio-mem soon.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem-ram.c   | 18 ++++++++--
 backends/hostmem.c       | 72 ++++++++++++++++++++++++++++++++++++++--
 hw/mem/pc-dimm.c         |  3 +-
 hw/misc/ivshmem.c        |  2 +-
 hw/virtio/virtio-mem.c   |  2 +-
 hw/virtio/virtio-pmem.c  |  2 +-
 include/sysemu/hostmem.h |  8 +++--
 7 files changed, 97 insertions(+), 10 deletions(-)

diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 6aab8d3a73..881276cf6b 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -29,8 +29,21 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     }
 
     name = host_memory_backend_get_name(backend);
-    memory_region_init_ram_shared_nomigrate(&backend->mr, OBJECT(backend), name,
-                           backend->size, backend->share, errp);
+    if (backend->managed_size) {
+        /*
+         * The size of a memory region must always be > 0 - start with 1. The
+         * managing object/device will resize accordingly.
+         */
+        g_assert(!backend->share);
+        memory_region_init_resizeable_ram(&backend->mr, OBJECT(backend), name,
+                                          1, backend->size,
+                                          host_memory_backend_resized,
+                                          errp);
+    } else {
+        memory_region_init_ram_shared_nomigrate(&backend->mr, OBJECT(backend),
+                                                name, backend->size,
+                                                backend->share, errp);
+    }
     g_free(name);
 }
 
@@ -40,6 +53,7 @@ ram_backend_class_init(ObjectClass *oc, void *data)
     HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
 
     bc->alloc = ram_backend_memory_alloc;
+    bc->managed_size_supported = true;
 }
 
 static const TypeInfo ram_backend_info = {
diff --git a/backends/hostmem.c b/backends/hostmem.c
index de37f1bf5d..c3c453753a 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -238,7 +238,10 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
         return;
     }
 
-    if (value && !backend->prealloc) {
+    if (value && backend->managed_size) {
+        error_setg(errp, "'prealloc' is not compatible with 'managed-size'");
+        return;
+    } else if (value && !backend->prealloc) {
         int fd = memory_region_get_fd(&backend->mr);
         void *ptr = memory_region_get_ram_ptr(&backend->mr);
         uint64_t sz = memory_region_size(&backend->mr);
@@ -292,7 +295,8 @@ bool host_memory_backend_is_mapped(HostMemoryBackend *backend)
 }
 
 void host_memory_backend_validate(HostMemoryBackend *backend,
-                                  const char *property, Error **errp)
+                                  const char *property,
+                                  bool managed_size_support, Error **errp)
 {
     char *path = object_get_canonical_path_component(OBJECT(backend));
 
@@ -301,6 +305,10 @@ void host_memory_backend_validate(HostMemoryBackend *backend,
     } else if (host_memory_backend_is_mapped(backend)) {
         error_setg(errp, "'%s' property specifies a busy memdev: %s",
                    property, path);
+    } else if (backend->managed_size && !managed_size_support) {
+        error_setg(errp,
+                   "'%s' property does not support a memdev with a managed size: %s",
+                   property, path);
     }
     g_free(path);
 }
@@ -395,6 +403,24 @@ static void host_memory_backend_apply_settings(HostMemoryBackend *backend,
     }
 }
 
+void host_memory_backend_resized(Object *owner, const char *idstr,
+                                 uint64_t size, void *host)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(owner);
+    Error *local_err = NULL;
+
+    /*
+     * Just apply the settings for all (resized) memory again. Note that
+     * "shared" and "prealloc" is currently not compatible with resizable memory
+     * regions ("managed-size"). Warn only.
+     */
+    host_memory_backend_apply_settings(backend, &local_err);
+    if (local_err) {
+         warn_report_err(local_err);
+         local_err = NULL;
+    }
+}
+
 static void
 host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
 {
@@ -441,6 +467,9 @@ static void host_memory_backend_set_share(Object *o, bool value, Error **errp)
     if (host_memory_backend_mr_inited(backend)) {
         error_setg(errp, "cannot change property value");
         return;
+    } else if (value && backend->managed_size) {
+        error_setg(errp, "'prealloc' is not compatible with 'managed-size'");
+        return;
     }
     backend->share = value;
 }
@@ -462,6 +491,39 @@ host_memory_backend_set_use_canonical_path(Object *obj, bool value,
     backend->use_canonical_path = value;
 }
 
+static bool
+ram_backend_get_managed_size(Object *obj, Error **errp)
+{
+    return MEMORY_BACKEND(obj)->managed_size;
+}
+
+static void
+ram_backend_set_managed_size(Object *obj, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(obj);
+
+    if (host_memory_backend_mr_inited(backend)) {
+        error_setg(errp, "cannot change property @managed_size'");
+        return;
+    } else if (value && !bc->managed_size_supported) {
+        error_setg(errp,
+                   "'managed-size' is not supported yet for %s",
+                   object_get_typename(obj));
+        return;
+    } else if (value && (backend->force_prealloc || backend->prealloc)) {
+        error_setg(errp,
+                   "'managed-size' is not compatible with preallocated memory");
+        return;
+    } else if (value && backend->share) {
+        error_setg(errp,
+                   "'managed-size' is not compatible with shared memory");
+        return;
+    }
+
+    backend->managed_size = value;
+}
+
 static void
 host_memory_backend_class_init(ObjectClass *oc, void *data)
 {
@@ -511,6 +573,12 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
     object_class_property_add_bool(oc, "x-use-canonical-path-for-ramblock-id",
         host_memory_backend_get_use_canonical_path,
         host_memory_backend_set_use_canonical_path, &error_abort);
+    object_class_property_add_bool(oc, "managed-size",
+                                   ram_backend_get_managed_size,
+                                   ram_backend_set_managed_size, &error_abort);
+    object_class_property_set_description(oc, "managed-size",
+        "The owner manages the actual size, 'size' is an upper limit",
+                                          &error_abort);
 }
 
 static const TypeInfo host_memory_backend_info = {
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 9ee634ee89..5021cb347d 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -176,7 +176,8 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
     int nb_numa_nodes = ms->numa_state->num_nodes;
     Error *err = NULL;
 
-    host_memory_backend_validate(dimm->hostmem, PC_DIMM_MEMDEV_PROP, &err);
+    host_memory_backend_validate(dimm->hostmem, PC_DIMM_MEMDEV_PROP, false,
+                                 &err);
     if (err) {
         error_propagate(errp, err);
         return;
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 39bffceadf..69d16c2dca 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -1037,7 +1037,7 @@ static void ivshmem_plain_realize(PCIDevice *dev, Error **errp)
     IVShmemState *s = IVSHMEM_COMMON(dev);
     Error *err = NULL;
 
-    host_memory_backend_validate(s->hostmem, "memdev", &err);
+    host_memory_backend_validate(s->hostmem, "memdev", false, &err);
     if (err) {
         error_propagate(errp, err);
         return;
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 4b7b4cf950..093b6eb0bb 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -415,7 +415,7 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
 
     /* verify the memdev */
     host_memory_backend_validate(vm->memdev, VIRTIO_MEM_MEMDEV_PROP,
-                                 &local_err);
+                                 false, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
index 85cb337ed5..51f01e52fd 100644
--- a/hw/virtio/virtio-pmem.c
+++ b/hw/virtio/virtio-pmem.c
@@ -107,7 +107,7 @@ static void virtio_pmem_realize(DeviceState *dev, Error **errp)
     VirtIOPMEM *pmem = VIRTIO_PMEM(dev);
     Error *err = NULL;
 
-    host_memory_backend_validate(pmem->memdev, "memdev", &err);
+    host_memory_backend_validate(pmem->memdev, "memdev", false, &err);
     if (err) {
         error_propagate(errp, err);
         return;
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index d4dbf108ca..f5ef7016bc 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -37,6 +37,7 @@ struct HostMemoryBackendClass {
     ObjectClass parent_class;
 
     void (*alloc)(HostMemoryBackend *backend, Error **errp);
+    bool managed_size_supported;
 };
 
 /**
@@ -53,7 +54,7 @@ struct HostMemoryBackend {
     /* protected */
     uint64_t size;
     bool merge, dump, use_canonical_path;
-    bool prealloc, force_prealloc, is_mapped, share;
+    bool prealloc, force_prealloc, is_mapped, share, managed_size;
     DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
     HostMemPolicy policy;
 
@@ -61,12 +62,15 @@ struct HostMemoryBackend {
 };
 
 bool host_memory_backend_mr_inited(HostMemoryBackend *backend);
+void host_memory_backend_resized(Object *owner, const char *idstr,
+                                 uint64_t size, void *host);
 MemoryRegion *host_memory_backend_get_memory(HostMemoryBackend *backend);
 
 void host_memory_backend_set_mapped(HostMemoryBackend *backend, bool mapped);
 bool host_memory_backend_is_mapped(HostMemoryBackend *backend);
 void host_memory_backend_validate(HostMemoryBackend *backend,
-                                  const char *property, Error **errp);
+                                  const char *property,
+                                  bool managed_size_support, Error **errp);
 size_t host_memory_backend_pagesize(HostMemoryBackend *memdev);
 char *host_memory_backend_get_name(HostMemoryBackend *backend);
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 13/16] qmp/hmp: Expose "managed-size" for memory backends
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (11 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 12/16] hostmem: Introduce "managed-size" for memory-backend-ram David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 14:17   ` Eric Blake
  2020-02-12 13:35 ` [PATCH v2 14/16] virtio-mem: Support for resizable memory regions David Hildenbrand
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Expose it, and document what it means and when it was added.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/machine-hmp-cmds.c | 2 ++
 hw/core/machine-qmp-cmds.c | 3 +++
 qapi/machine.json          | 6 ++++++
 3 files changed, 11 insertions(+)

diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index b76f7223af..681216198d 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -122,6 +122,8 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
                        m->value->dump ? "true" : "false");
         monitor_printf(mon, "  prealloc: %s\n",
                        m->value->prealloc ? "true" : "false");
+        monitor_printf(mon, "  managed-size: %s\n",
+                       m->value->managed_size ? "true" : "false");
         monitor_printf(mon, "  policy: %s\n",
                        HostMemPolicy_str(m->value->policy));
         visit_complete(v, &str);
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index eed5aeb2f7..800b55af5d 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -321,6 +321,9 @@ static int query_memdev(Object *obj, void *opaque)
         m->value->prealloc = object_property_get_bool(obj,
                                                       "prealloc",
                                                       &error_abort);
+        m->value->managed_size = object_property_get_bool(obj,
+                                                          "managed-size",
+                                                          &error_abort);
         m->value->policy = object_property_get_enum(obj,
                                                     "policy",
                                                     "HostMemPolicy",
diff --git a/qapi/machine.json b/qapi/machine.json
index b3d30bc816..0c31818853 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -758,6 +758,9 @@
 #
 # @prealloc: enables or disables memory preallocation
 #
+# @managed-size: the owner manages the actual size, 'size' is an upper limit
+#                (since 5.1)
+#
 # @host-nodes: host nodes for its memory policy
 #
 # @policy: memory policy of memory backend
@@ -771,6 +774,7 @@
     'merge':      'bool',
     'dump':       'bool',
     'prealloc':   'bool',
+    'managed-size': 'bool',
     'host-nodes': ['uint16'],
     'policy':     'HostMemPolicy' }}
 
@@ -793,6 +797,7 @@
 #          "merge": false,
 #          "dump": true,
 #          "prealloc": false,
+#          "manmaged-size": false,
 #          "host-nodes": [0, 1],
 #          "policy": "bind"
 #        },
@@ -801,6 +806,7 @@
 #          "merge": false,
 #          "dump": true,
 #          "prealloc": true,
+#          "manmaged-size": false,
 #          "host-nodes": [2, 3],
 #          "policy": "preferred"
 #        }
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 14/16] virtio-mem: Support for resizable memory regions
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (12 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 13/16] qmp/hmp: Expose "managed-size" for memory backends David Hildenbrand
@ 2020-02-12 13:35 ` David Hildenbrand
  2020-02-12 13:36 ` [PATCH v2 15/16] memory: Add region_resize() callback to memory notifier David Hildenbrand
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 168 ++++++++++++++++++++++++++---------------
 1 file changed, 109 insertions(+), 59 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 093b6eb0bb..d28b501778 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -237,30 +237,78 @@ static void virtio_mem_unplug_request(VirtIOMEM *vm, VirtQueueElement *elem,
     virtio_mem_send_response_simple(vm, elem, type);
 }
 
+/*
+ * Try to resize the usable region to hold at least the requested size.
+ */
+static void virtio_mem_resize_usable_region(VirtIOMEM *vm,
+                                            uint64_t requested_size,
+                                            Error **errp)
+{
+    /*
+     * If possible, we size the usable region a little bit bigger than the
+     * requested size, so the guest has more flexibility.
+     */
+    uint64_t newsize = MIN(memory_region_max_size(&vm->memdev->mr),
+                           requested_size + VIRTIO_MEM_USABLE_EXTENT);
+    Error *err = NULL;
+
+    /*
+     * Size it as small as possible (0 is not valid).
+     */
+    if (!requested_size) {
+        newsize = vm->block_size;
+    }
+
+    if (newsize == vm->usable_region_size) {
+        return;
+    }
+
+    /* resize the memory region, if supported */
+    if (memory_region_is_resizable(&vm->memdev->mr)) {
+        memory_region_ram_resize(&vm->memdev->mr, newsize, &err);
+    }
+    if (!err) {
+        vm->usable_region_size = newsize;
+        fprintf(stderr, "New usable_region_size: %" PRIx64 "\n",
+                vm->usable_region_size);
+    }
+    error_propagate(errp, err);
+}
+
 /*
  * Unplug all memory and shrink the usable region.
  */
-static void virtio_mem_unplug_all(VirtIOMEM *vm)
+static int virtio_mem_unplug_all(VirtIOMEM *vm)
 {
+    Error *err = NULL;
+
+    if (virtio_mem_busy()) {
+        return -EBUSY;
+    }
+
+    virtio_mem_resize_usable_region(vm, vm->requested_size, &err);
+    if (err) {
+        /* It's unlikely that shrinking fails. */
+        warn_report_err(err);
+        return -ENOMEM;
+    }
     if (vm->size) {
-        virtio_mem_set_block_state(vm, vm->addr,
-                                   memory_region_size(&vm->memdev->mr), false);
+        ram_block_discard_range(vm->memdev->mr.ram_block, 0,
+                                memory_region_size(&vm->memdev->mr));
+        bitmap_clear(vm->bitmap, 0, vm->bitmap_size);
         vm->size = 0;
     }
-    vm->usable_region_size = MIN(memory_region_size(&vm->memdev->mr),
-                                 vm->requested_size + VIRTIO_MEM_USABLE_EXTENT);
+    return 0;
 }
 
 static void virtio_mem_unplug_all_request(VirtIOMEM *vm, VirtQueueElement *elem)
 {
 
-    if (virtio_mem_busy()) {
+    if (virtio_mem_unplug_all(vm)) {
         virtio_mem_send_response_simple(vm, elem,  VIRTIO_MEM_RESP_BUSY);
-        return;
+    } else {
+        virtio_mem_send_response_simple(vm, elem,  VIRTIO_MEM_RESP_ACK);
     }
-
-    virtio_mem_unplug_all(vm);
-    virtio_mem_send_response_simple(vm, elem,  VIRTIO_MEM_RESP_ACK);
 }
 
 static void virtio_mem_state_request(VirtIOMEM *vm, VirtQueueElement *elem,
@@ -344,7 +392,7 @@ static void virtio_mem_get_config(VirtIODevice *vdev, uint8_t *config_data)
     config->requested_size = cpu_to_le64(vm->requested_size);
     config->plugged_size = cpu_to_le64(vm->size);
     config->addr = cpu_to_le64(vm->addr);
-    config->region_size = cpu_to_le64(memory_region_size(&vm->memdev->mr));
+    config->region_size = cpu_to_le64(memory_region_max_size(&vm->memdev->mr));
     config->usable_region_size = cpu_to_le64(vm->usable_region_size);
 }
 
@@ -370,10 +418,6 @@ static void virtio_mem_system_reset(void *opaque)
      * region size. This is, however, not possible in all scenarios. Then,
      * the guest has to deal with this manually (VIRTIO_MEM_REQ_UNPLUG_ALL).
      */
-    if (virtio_mem_busy()) {
-        return;
-    }
-
     virtio_mem_unplug_all(vm);
 }
 
@@ -410,32 +454,32 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     int nb_numa_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0;
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOMEM *vm = VIRTIO_MEM(dev);
-    Error *local_err = NULL;
+    Error *err = NULL;
     uint64_t page_size;
 
     /* verify the memdev */
     host_memory_backend_validate(vm->memdev, VIRTIO_MEM_MEMDEV_PROP,
-                                 false, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
+                                 true, &err);
+    if (err) {
+        error_propagate(errp, err);
         return;
     }
 
     /* verify the node */
     if ((nb_numa_nodes && vm->node >= nb_numa_nodes) ||
         (!nb_numa_nodes && vm->node)) {
-        error_setg(&local_err, "Property '%s' has value '%" PRIu32
+        error_setg(errp, "Property '%s' has value '%" PRIu32
                    "', which exceeds the number of numa nodes: %d",
                    VIRTIO_MEM_NODE_PROP, vm->node,
                    nb_numa_nodes ? nb_numa_nodes : 1);
-        goto out;
+        return;
     }
 
     /* mmap/madvise changes have to be reflected in guest physical memory */
     if (kvm_enabled() && !kvm_has_sync_mmu()) {
-        error_set(&local_err, ERROR_CLASS_KVM_MISSING_CAP,
+        error_set(errp, ERROR_CLASS_KVM_MISSING_CAP,
                   "Using KVM without synchronous MMU, virtio-mem unavailable");
-        goto out;
+        return;
     }
 
     /*
@@ -443,8 +487,14 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
      * to temporarily unlock and relock at the right places to make it work.
      */
     if (enable_mlock) {
-        error_setg(&local_err, "Memory is locked, virtio-mem unavailable");
-        goto out;
+        error_setg(errp, "Memory is locked, virtio-mem unavailable");
+        return;
+    }
+
+    if (virtio_mem_busy()) {
+        error_setg(errp, "virtio-mem devices cannot be created while migrating,"
+                   " while dumping, or when certain vfio devices are used.");
+        return;
     }
 
     g_assert(memory_region_is_ram(&vm->memdev->mr));
@@ -458,37 +508,37 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
      */
     page_size = qemu_ram_pagesize(vm->memdev->mr.ram_block);
     if (page_size != getpagesize()) {
-        error_setg(&local_err, "'%s' page size (0x%" PRIx64 ") not supported",
+        error_setg(errp, "'%s' page size (0x%" PRIx64 ") not supported",
                    VIRTIO_MEM_MEMDEV_PROP, page_size);
-        goto out;
+        return;
     }
 
     /* now that memdev and block_size is fixed, verify the properties */
     if (vm->block_size < page_size) {
-        error_setg(&local_err, "'%s' has to be at least the page size (0x%"
+        error_setg(errp, "'%s' has to be at least the page size (0x%"
                    PRIx64 ")", VIRTIO_MEM_BLOCK_SIZE_PROP, page_size);
-        goto out;
+        return;
     } else if (!QEMU_IS_ALIGNED(vm->requested_size, vm->block_size)) {
         error_setg(errp, "'%s' has to be multiples of '%s' (0x%" PRIx32
                    ")", VIRTIO_MEM_REQUESTED_SIZE_PROP,
                    VIRTIO_MEM_BLOCK_SIZE_PROP, vm->block_size);
-    } else if (!QEMU_IS_ALIGNED(memory_region_size(&vm->memdev->mr),
+        return;
+    } else if (!QEMU_IS_ALIGNED(memory_region_max_size(&vm->memdev->mr),
                                 vm->block_size)) {
-        error_setg(&local_err, "'%s' size has to be multiples of '%s' (0x%"
+        error_setg(errp, "'%s' size has to be multiples of '%s' (0x%"
                    PRIx32 ")", VIRTIO_MEM_MEMDEV_PROP,
                    VIRTIO_MEM_BLOCK_SIZE_PROP, vm->block_size);
-        goto out;
+        return;
     }
 
-    /*
-     * If possible, we size the usable region a little bit bigger than the
-     * requested size, so the guest has more flexibility.
-     */
-    vm->usable_region_size = MIN(memory_region_size(&vm->memdev->mr),
-                                 vm->requested_size + VIRTIO_MEM_USABLE_EXTENT);
+    virtio_mem_resize_usable_region(vm, vm->requested_size, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
 
     /* allocate the bitmap for tracking the state of a block */
-    vm->bitmap_size = memory_region_size(&vm->memdev->mr) / vm->block_size;
+    vm->bitmap_size = memory_region_max_size(&vm->memdev->mr) / vm->block_size;
     vm->bitmap = bitmap_new(vm->bitmap_size);
 
     /* all memory is unplugged initially */
@@ -505,8 +555,6 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     vm->postcopy_notifier.notify = virtio_mem_postcopy_notifier;
     postcopy_add_notifier(&vm->postcopy_notifier);
     qemu_register_reset(virtio_mem_system_reset, vm);
-out:
-    error_propagate(errp, local_err);
 }
 
 static void virtio_mem_device_unrealize(DeviceState *dev, Error **errp)
@@ -603,7 +651,7 @@ static void virtio_mem_fill_device_info(const VirtIOMEM *vmem,
     vi->node = vmem->node;
     vi->requested_size = vmem->requested_size;
     vi->size = vmem->size;
-    vi->max_size = memory_region_size(&vmem->memdev->mr);
+    vi->max_size = memory_region_max_size(&vmem->memdev->mr);
     vi->block_size = vmem->block_size;
     vi->memdev = object_get_canonical_path(OBJECT(vmem->memdev));
 }
@@ -651,14 +699,6 @@ static void virtio_mem_set_requested_size(Object *obj, Visitor *v,
         return;
     }
 
-    /* Growing the usable region might later not be possible, disallow it. */
-    if (virtio_mem_busy() && value > vm->requested_size) {
-        error_setg(errp, "'%s' cannot be increased while migrating,"
-                   " while dumping, or when certain vfio devices are used.",
-                   name);
-        return;
-    }
-
     /*
      * The block size and memory backend are not fixed until the device was
      * realized. realize() will verify these properties then.
@@ -669,22 +709,32 @@ static void virtio_mem_set_requested_size(Object *obj, Visitor *v,
                        ")", name, VIRTIO_MEM_BLOCK_SIZE_PROP,
                        vm->block_size);
             return;
-        } else if (value > memory_region_size(&vm->memdev->mr)) {
+        } else if (value > memory_region_max_size(&vm->memdev->mr)) {
             error_setg(errp, "'%s' cannot exceed the memory backend size"
                        "(0x%" PRIx64 ")", name,
-                       memory_region_size(&vm->memdev->mr));
+                       memory_region_max_size(&vm->memdev->mr));
             return;
         }
 
         if (value != vm->requested_size) {
-            uint64_t tmp_size;
-
+            if (virtio_mem_busy()) {
+                error_setg(errp, "'%s' cannot be changed while migrating,"
+                           " while dumping, or when certain vfio devices are used.",
+                           name);
+                return;
+            }
+
+            /* We are only allowed to grow the region */
+            if (value > vm->requested_size) {
+                Error *err = NULL;
+
+                virtio_mem_resize_usable_region(vm, value, &err);
+                if (err) {
+                    error_propagate(errp, err);
+                    return;
+                }
+            }
             vm->requested_size = value;
-
-            /* Grow the usable region if required */
-            tmp_size = MIN(memory_region_size(&vm->memdev->mr),
-                           vm->requested_size + VIRTIO_MEM_USABLE_EXTENT);
-            vm->usable_region_size = MAX(vm->usable_region_size, tmp_size);
         }
         /*
          * Trigger a config update so the guest gets notified. We trigger
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 15/16] memory: Add region_resize() callback to memory notifier
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (13 preceding siblings ...)
  2020-02-12 13:35 ` [PATCH v2 14/16] virtio-mem: Support for resizable memory regions David Hildenbrand
@ 2020-02-12 13:36 ` David Hildenbrand
  2020-02-12 13:36 ` [PATCH v2 16/16] kvm: Implement region_resize() for atomic memory section resizes David Hildenbrand
  2020-02-12 13:40 ` [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

Let's provide a way for memory notifiers to get notified about a resize.
If the region_resize() callback is not implemented by a notifier, we
mimic the old behavior by removing the old section and adding the
new, resized section.

The existing code would remove all sections first and then add the new
ones. When resizing, we will now remove+re-add in a single shot. As we
grow in the adding phase and shrink in the removal phase, this should
not make a difference.

This callback is helpful when backends (like KVM) want to implement
atomic resizes of memory sections (e.g., resize while VCPUs are running and
using the section).

Note 1: Resizing while changing logging is unlikely, but nothing speaks
        against allowing it.
Note 2: Resizing MMIO regions is unlikely (coalesced io handling), but
        nothing speaks against it.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/memory.h | 19 ++++++++++
 memory.c              | 85 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index dfedd88f13..1ec5432340 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -493,6 +493,25 @@ struct MemoryListener {
      */
     void (*region_nop)(MemoryListener *listener, MemoryRegionSection *section);
 
+    /**
+     * @region_resize:
+     *
+     * Called during an address space update transaction,
+     * for a section of the address space that is in the same place in the
+     * address space as in the last transaction, however, the size changed.
+     * Dirty memory logging can change as well.
+     *
+     * Note: If this callback is not implemented well, the resize is
+     *       communicated via a region_del(), followed by a region_add()
+     *       instead.
+     *
+     * @listener: The #MemoryListener.
+     * @section: The old #MemoryRegionSection.
+     * @new: The new size.
+     */
+    void (*region_resize)(MemoryListener *listener,
+                          MemoryRegionSection *section, Int128 new);
+
     /**
      * @log_start:
      *
diff --git a/memory.c b/memory.c
index 5c62702618..0d9fe189ad 100644
--- a/memory.c
+++ b/memory.c
@@ -246,6 +246,17 @@ static bool flatrange_equal(FlatRange *a, FlatRange *b)
         && a->nonvolatile == b->nonvolatile;
 }
 
+static bool flatrange_resized(FlatRange *a, FlatRange *b)
+{
+    return a->mr == b->mr
+        && int128_eq(a->addr.start, b->addr.start)
+        && int128_ne(a->addr.size, b->addr.size)
+        && a->offset_in_region == b->offset_in_region
+        && a->romd_mode == b->romd_mode
+        && a->readonly == b->readonly
+        && a->nonvolatile == b->nonvolatile;
+}
+
 static FlatView *flatview_new(MemoryRegion *mr_root)
 {
     FlatView *view;
@@ -875,6 +886,51 @@ static void flat_range_coalesced_io_add(FlatRange *fr, AddressSpace *as)
     }
 }
 
+static void memory_listener_resize_region(FlatRange *fr, AddressSpace *as,
+                                          enum ListenerDirection dir,
+                                          Int128 new)
+{
+    FlatView *as_view = address_space_to_flatview(as);
+    MemoryRegionSection old_mrs = section_from_flat_range(fr, as_view);
+    MemoryRegionSection new_mrs = old_mrs;
+    MemoryListener *listener;
+
+    new_mrs.size = new;
+
+    switch (dir) {
+    case Forward:
+        QTAILQ_FOREACH(listener, &as->listeners, link_as) {
+            if (listener->region_resize) {
+                listener->region_resize(listener, &old_mrs, new);
+                continue;
+            }
+            if (listener->region_del) {
+                listener->region_del(listener, &old_mrs);
+            }
+            if (listener->region_add) {
+                listener->region_add(listener, &new_mrs);
+            }
+        }
+        break;
+    case Reverse:
+        QTAILQ_FOREACH_REVERSE(listener, &as->listeners, link_as) {
+            if (listener->region_resize) {
+                listener->region_resize(listener, &old_mrs, new);
+                continue;
+            }
+            if (listener->region_del) {
+                listener->region_del(listener, &old_mrs);
+            }
+            if (listener->region_add) {
+                listener->region_add(listener, &new_mrs);
+            }
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void address_space_update_topology_pass(AddressSpace *as,
                                                const FlatView *old_view,
                                                const FlatView *new_view,
@@ -899,11 +955,30 @@ static void address_space_update_topology_pass(AddressSpace *as,
             frnew = NULL;
         }
 
-        if (frold
-            && (!frnew
-                || int128_lt(frold->addr.start, frnew->addr.start)
-                || (int128_eq(frold->addr.start, frnew->addr.start)
-                    && !flatrange_equal(frold, frnew)))) {
+        if (frold && frnew && flatrange_resized(frold, frnew)) {
+            /* In both and only the size (+ eventually logging) changed. */
+
+            if (int128_lt(frold->addr.size, frnew->addr.size) && adding) {
+                /* Grow in the adding phase. */
+                memory_listener_resize_region(frold, as, Forward,
+                                              frnew->addr.size);
+                flat_range_coalesced_io_del(frold, as);
+                flat_range_coalesced_io_add(frnew, as);
+            } else if (int128_gt(frold->addr.size, frnew->addr.size) &&
+                       !adding) {
+                /* Shrink in the removal phase. */
+                memory_listener_resize_region(frold, as, Reverse,
+                                              frnew->addr.size);
+                flat_range_coalesced_io_del(frold, as);
+                flat_range_coalesced_io_add(frnew, as);
+            }
+
+            ++iold;
+            ++inew;
+        } else if (frold && (!frnew
+                             || int128_lt(frold->addr.start, frnew->addr.start)
+                             || (int128_eq(frold->addr.start, frnew->addr.start)
+                                 && !flatrange_equal(frold, frnew)))) {
             /* In old but not in new, or in both but attributes changed. */
 
             if (!adding) {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 16/16] kvm: Implement region_resize() for atomic memory section resizes
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (14 preceding siblings ...)
  2020-02-12 13:36 ` [PATCH v2 15/16] memory: Add region_resize() callback to memory notifier David Hildenbrand
@ 2020-02-12 13:36 ` David Hildenbrand
  2020-02-12 13:40 ` [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Igor Mammedov, Paolo Bonzini,
	Richard Henderson

virtio-mem wants to resize (esp. grow) memory regions while the guest is
already aware of them and makes use of them. Resizing a KVM slot can
only currently be done by removing it and re-adding it. While the kvm slot
is temporarily removed, VCPUs that try to read from these slots will fault.

Let's inhibit KVM_RUN while performing the resize. Keep it lightweight by
remembering using one bool per VCPU, if the VCPU is executing in the
kernel.

Note1: Instead of implementing region_resize(), we could also inhibit in
begin() and let the VCPUs continue to run in commit(). This would also
handle atomic splitting of memory regions. (I remember a BUG report but
cannot dig up the mail). However, using the region_resize() callback we
can later wire up an ioctl that can perform the resize atomically, and
make the inhibit conditional. Also, this way we inhibit KVM only when
resizing - not on any address space changes. This will not affect existing
RT workloads (resizes currently only happen during reboot or at the
start of an incoming migration).

Note2: We cannot use pause_all_vcpus()/resume_all_vcpus(), as it will
temporarily drop the BQL, which is something most caller cannot deal
with when trying to resize a memory region.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 accel/kvm/kvm-all.c   | 87 +++++++++++++++++++++++++++++++++++++++++++
 include/hw/core/cpu.h |  3 ++
 2 files changed, 90 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c111312dfd..e24805771c 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -148,6 +148,10 @@ bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
 static bool kvm_immediate_exit;
 static hwaddr kvm_max_slot_size = ~0;
+static QemuMutex kvm_run_mutex;
+static QemuCond kvm_run_cond;
+static QemuCond kvm_run_inhibit_cond;
+static int kvm_run_inhibited;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
     KVM_CAP_INFO(USER_MEMORY),
@@ -1121,6 +1125,57 @@ static void kvm_region_del(MemoryListener *listener,
     memory_region_unref(section->mr);
 }
 
+/*
+ * Certain updates (e.g., resizing memory regions) require temporarily removing
+ * kvm memory slots. Avoid any VCPU to fault by making sure all VCPUs
+ * left KVM_RUN and won't enter it again until unblocked.
+ */
+static void kvm_run_inhibit_begin(void)
+{
+    CPUState *cpu;
+
+    atomic_inc(&kvm_run_inhibited);
+    while (true) {
+        bool any_in_kernel = false;
+
+        CPU_FOREACH(cpu) {
+            if (atomic_read(&cpu->in_kernel)) {
+                any_in_kernel = true;
+                qemu_cpu_kick(cpu);
+            }
+        }
+        if (!any_in_kernel) {
+            break;
+        }
+        qemu_mutex_lock(&kvm_run_mutex);
+        qemu_cond_wait(&kvm_run_inhibit_cond, &kvm_run_mutex);
+        qemu_mutex_unlock(&kvm_run_mutex);
+    }
+}
+
+static void kvm_run_inhibit_end(void)
+{
+    atomic_dec(&kvm_run_inhibited);
+    qemu_mutex_lock(&kvm_run_mutex);
+    qemu_cond_broadcast(&kvm_run_cond);
+    qemu_mutex_unlock(&kvm_run_mutex);
+}
+
+static void kvm_region_resize(MemoryListener *listener,
+                              MemoryRegionSection *section, Int128 new)
+{
+    KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
+    MemoryRegionSection new_section = *section;
+
+    new_section.size = new;
+
+    /* Inhibit KVM while we temporarily remove slots. */
+    kvm_run_inhibit_begin();
+    kvm_set_phys_mem(kml, section, false);
+    kvm_set_phys_mem(kml, &new_section, true);
+    kvm_run_inhibit_end();
+}
+
 static void kvm_log_sync(MemoryListener *listener,
                          MemoryRegionSection *section)
 {
@@ -1239,6 +1294,7 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
 
     kml->listener.region_add = kvm_region_add;
     kml->listener.region_del = kvm_region_del;
+    kml->listener.region_resize = kvm_region_resize;
     kml->listener.log_start = kvm_log_start;
     kml->listener.log_stop = kvm_log_stop;
     kml->listener.log_sync = kvm_log_sync;
@@ -1884,6 +1940,9 @@ static int kvm_init(MachineState *ms)
     assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size);
 
     s->sigmask_len = 8;
+    qemu_mutex_init(&kvm_run_mutex);
+    qemu_cond_init(&kvm_run_cond);
+    qemu_cond_init(&kvm_run_inhibit_cond);
 
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     QTAILQ_INIT(&s->kvm_sw_breakpoints);
@@ -2294,6 +2353,29 @@ static void kvm_eat_signals(CPUState *cpu)
     } while (sigismember(&chkset, SIG_IPI));
 }
 
+static void kvm_set_cpu_in_kernel(CPUState *cpu, bool in_kernel)
+{
+    atomic_set(&cpu->in_kernel, in_kernel);
+    if (in_kernel) {
+        /* wait until KVM_RUN is no longer inhibited */
+        while (unlikely(atomic_read(&kvm_run_inhibited))) {
+            atomic_set(&cpu->in_kernel, false);
+            qemu_mutex_lock(&kvm_run_mutex);
+            qemu_cond_broadcast(&kvm_run_inhibit_cond);
+            qemu_cond_wait(&kvm_run_cond, &kvm_run_mutex);
+            qemu_mutex_unlock(&kvm_run_mutex);
+            atomic_set(&cpu->in_kernel, true);
+        }
+    } else {
+        /* wake up somebody wanting to inhibit KVM_RUN */
+        if (unlikely(atomic_read(&kvm_run_inhibited))) {
+            qemu_mutex_lock(&kvm_run_mutex);
+            qemu_cond_broadcast(&kvm_run_inhibit_cond);
+            qemu_mutex_unlock(&kvm_run_mutex);
+        }
+    }
+}
+
 int kvm_cpu_exec(CPUState *cpu)
 {
     struct kvm_run *run = cpu->kvm_run;
@@ -2318,6 +2400,9 @@ int kvm_cpu_exec(CPUState *cpu)
         }
 
         kvm_arch_pre_run(cpu, run);
+
+        kvm_set_cpu_in_kernel(cpu, true);
+
         if (atomic_read(&cpu->exit_request)) {
             DPRINTF("interrupt exit requested\n");
             /*
@@ -2335,6 +2420,8 @@ int kvm_cpu_exec(CPUState *cpu)
 
         run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
 
+        kvm_set_cpu_in_kernel(cpu, false);
+
         attrs = kvm_arch_post_run(cpu, run);
 
 #ifdef KVM_HAVE_MCE_INJECTION
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 73e9a869a4..83614e537b 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -431,6 +431,9 @@ struct CPUState {
     /* shared by kvm, hax and hvf */
     bool vcpu_dirty;
 
+    /* kvm only for now: VCPU is executing in the kernel (KVM_RUN) */
+    bool in_kernel;
+
     /* Used to keep track of an outstanding cpu throttle thread for migration
      * autoconverge
      */
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX
  2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
                   ` (15 preceding siblings ...)
  2020-02-12 13:36 ` [PATCH v2 16/16] kvm: Implement region_resize() for atomic memory section resizes David Hildenbrand
@ 2020-02-12 13:40 ` David Hildenbrand
  16 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 13:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	Igor Mammedov, Paolo Bonzini, Richard Henderson

On 12.02.20 14:35, David Hildenbrand wrote:
> We already allow resizable ram blocks for anonymous memory, however, they
> are not actually resized. All memory is mmaped() R/W, including the memory
> exceeding the used_length, up to the max_length.
> 
> When resizing, effectively only the boundary is moved. Implement actually
> resizable anonymous allocations and make use of them in resizable ram
> blocks when possible. Memory exceeding the used_length will be
> inaccessible. Especially ram block notifiers require care.
> 
> Having actually resizable anonymous allocations (via mmap-hackery) allows
> to reserve a big region in virtual address space and grow the
> accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
> is set to "never" under Linux, huge reservations will succeed. If there is
> not enough memory when resizing (to populate parts of the reserved region),
> trying to resize will fail. Only the actually used size is reserved in the
> OS.
> 
> E.g., virtio-mem [1] wants to reserve big resizable memory regions and
> grow the usable part on demand. I think this change is worth sending out
> individually. Accompanied by a bunch of minor fixes and cleanups.
> 
> Especially, memory notifiers already handle resizing by first removing
> the old region, and then re-adding the resized region. prealloc is
> currently not possible with resizable ram blocks. mlock() should continue
> to work as is. Resizing is currently rare and must only happen on the
> start of an incoming migration, or during resets. No code path (except
> HAX and SEV ram block notifiers) should access memory outside of the usable
> range - and if we ever find one, that one has to be fixed (I did not
> identify any).
> 
> v1 -> v2:
> - Add "util: vfio-helpers: Fix qemu_vfio_close()"
> - Add "util: vfio-helpers: Remove Error parameter from
>        qemu_vfio_undo_mapping()"
> - Add "util: vfio-helpers: Factor out removal from
>        qemu_vfio_undo_mapping()"
> - "util/mmap-alloc: ..."
>  -- Minor changes due to review feedback (e.g., assert alignment, return
>     bool when resizing)
> - "util: vfio-helpers: Implement ram_block_resized()"
>  -- Reserve max_size in the IOVA address space.
>  -- On resize, undo old mapping and do new mapping. We can later implement
>     a new ioctl to resize the mapping directly.
> - "numa: Teach ram block notifiers about resizable ram blocks"
>  -- Pass size/max_size to ram block notifiers, which makes things easier an
>     cleaner
> - "exec: Ram blocks with resizable anonymous allocations under POSIX"
>  -- Adapt to new ram block notifiers
>  -- Shrink after notifying. Always trigger ram block notifiers on resizes
>  -- Add a safety net that all ram block notifiers registered at runtime
>     support resizes.
> 
> [1] https://lore.kernel.org/kvm/20191212171137.13872-1-david@redhat.com/
> 
> David Hildenbrand (16):
>   util: vfio-helpers: Factor out and fix processing of existing ram
>     blocks
>   util: vfio-helpers: Fix qemu_vfio_close()
>   util: vfio-helpers: Remove Error parameter from
>     qemu_vfio_undo_mapping()
>   util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
>   exec: Factor out setting ram settings (madvise ...) into
>     qemu_ram_apply_settings()
>   exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
>   exec: Drop "shared" parameter from ram_block_add()
>   util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
>   util/mmap-alloc: Factor out reserving of a memory region to
>     mmap_reserve()
>   util/mmap-alloc: Factor out populating of memory to mmap_populate()
>   util/mmap-alloc: Prepare for resizable mmaps
>   util/mmap-alloc: Implement resizable mmaps
>   numa: Teach ram block notifiers about resizable ram blocks
>   util: vfio-helpers: Implement ram_block_resized()
>   util: oslib: Resizable anonymous allocations under POSIX
>   exec: Ram blocks with resizable anonymous allocations under POSIX

I should double check what I send out while doing last minute changes.
Please ignore this series, will send the proper one right away.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 01/16] virtio-mem: Prototype
  2020-02-12 13:35 ` [PATCH v2 01/16] virtio-mem: Prototype David Hildenbrand
@ 2020-02-12 14:15   ` Eric Blake
  2020-02-12 14:20     ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Blake @ 2020-02-12 14:15 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	Paolo Bonzini, Igor Mammedov, Richard Henderson

On 2/12/20 7:35 AM, David Hildenbrand wrote:
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

It's at least worth mentioning VirtioMEMDeviceInfo in the commit 
message, to make it easier to find which commit introduces a given QAPI 
struct when searching the git log.

> +++ b/qapi/misc.json
> @@ -1557,19 +1557,56 @@
>             }
>   }
>   
> +##
> +# @VirtioMEMDeviceInfo:
> +#
> +# VirtioMEMDevice state information
> +#
> +# @id: device's ID
> +#
> +# @memaddr: physical address in memory, where device is mapped
> +#
> +# @requested-size: the user requested size of the device
> +#
> +# @size: the (current) size of memory that the device provides
> +#
> +# @max-size: the maximum size of memory that the device can provide
> +#
> +# @block-size: the block size of memory that the device provides
> +#
> +# @node: NUMA node number where device is assigned to
> +#
> +# @memdev: memory backend linked with the region
> +#
> +# Since: 4.1

5.0

> +##
> +{ 'struct': 'VirtioMEMDeviceInfo',
> +  'data': { '*id': 'str',

Does it make sense for id to be optional, or should it be mandatory?

> +            'memaddr': 'size',
> +            'requested-size': 'size',
> +            'size': 'size',
> +            'max-size': 'size',
> +            'block-size': 'size',
> +            'node': 'int',
> +            'memdev': 'str'
> +          }
> +}
> +
>   ##
>   # @MemoryDeviceInfo:
>   #
>   # Union containing information about a memory device
>   #
>   # nvdimm is included since 2.12. virtio-pmem is included since 4.1.
> +# virtio-mem is included since 4.2.

5.0

>   #
>   # Since: 2.1
>   ##
>   { 'union': 'MemoryDeviceInfo',
>     'data': { 'dimm': 'PCDIMMDeviceInfo',
>               'nvdimm': 'PCDIMMDeviceInfo',
> -            'virtio-pmem': 'VirtioPMEMDeviceInfo'
> +            'virtio-pmem': 'VirtioPMEMDeviceInfo',
> +            'virtio-mem': 'VirtioMEMDeviceInfo'
>             }
>   }
>   
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 13/16] qmp/hmp: Expose "managed-size" for memory backends
  2020-02-12 13:35 ` [PATCH v2 13/16] qmp/hmp: Expose "managed-size" for memory backends David Hildenbrand
@ 2020-02-12 14:17   ` Eric Blake
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Blake @ 2020-02-12 14:17 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	Paolo Bonzini, Igor Mammedov, Richard Henderson

On 2/12/20 7:35 AM, David Hildenbrand wrote:
> Expose it, and document what it means and when it was added.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   hw/core/machine-hmp-cmds.c | 2 ++
>   hw/core/machine-qmp-cmds.c | 3 +++
>   qapi/machine.json          | 6 ++++++
>   3 files changed, 11 insertions(+)

> +++ b/qapi/machine.json
> @@ -758,6 +758,9 @@
>   #
>   # @prealloc: enables or disables memory preallocation
>   #
> +# @managed-size: the owner manages the actual size, 'size' is an upper limit
> +#                (since 5.1)
> +#

There's still time to get this in 5.0, if the series is accepted before 
soft freeze.

>   # @host-nodes: host nodes for its memory policy
>   #
>   # @policy: memory policy of memory backend
> @@ -771,6 +774,7 @@
>       'merge':      'bool',
>       'dump':       'bool',
>       'prealloc':   'bool',
> +    'managed-size': 'bool',
>       'host-nodes': ['uint16'],
>       'policy':     'HostMemPolicy' }}
>   
> @@ -793,6 +797,7 @@
>   #          "merge": false,
>   #          "dump": true,
>   #          "prealloc": false,
> +#          "manmaged-size": false,

typo, managed-size

>   #          "host-nodes": [0, 1],
>   #          "policy": "bind"
>   #        },
> @@ -801,6 +806,7 @@
>   #          "merge": false,
>   #          "dump": true,
>   #          "prealloc": true,
> +#          "manmaged-size": false,

and again

>   #          "host-nodes": [2, 3],
>   #          "policy": "preferred"
>   #        }
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 01/16] virtio-mem: Prototype
  2020-02-12 14:15   ` Eric Blake
@ 2020-02-12 14:20     ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-02-12 14:20 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Eduardo Habkost, Michael S . Tsirkin, Dr . David Alan Gilbert,
	Paolo Bonzini, Igor Mammedov, Richard Henderson

On 12.02.20 15:15, Eric Blake wrote:
> On 2/12/20 7:35 AM, David Hildenbrand wrote:
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
> 
> It's at least worth mentioning VirtioMEMDeviceInfo in the commit 
> message, to make it easier to find which commit introduces a given QAPI 
> struct when searching the git log.

Patches in this series were sent by mistake (don't match the cover
letter), so they are not in a review/able state. Thanks for the feedback
anyway :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-02-12 14:21 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 13:35 [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 01/16] virtio-mem: Prototype David Hildenbrand
2020-02-12 14:15   ` Eric Blake
2020-02-12 14:20     ` David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 02/16] virtio-pci: Proxy for virtio-mem David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 03/16] hmp: Handle virtio-mem when printing memory device infos David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 04/16] numa: Handle virtio-mem in NUMA stats David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 05/16] pc: Support for virtio-mem-pci David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 06/16] exec: Provide owner when resizing memory region David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 07/16] memory: Add memory_region_max_size() and memory_region_is_resizable() David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 08/16] memory: Disallow resizing to 0 David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 09/16] memory-device: properly deal with resizable memory regions David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 10/16] hostmem: Factor out applying settings David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 11/16] hostmem: Factor out common checks into host_memory_backend_validate() David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 12/16] hostmem: Introduce "managed-size" for memory-backend-ram David Hildenbrand
2020-02-12 13:35 ` [PATCH v2 13/16] qmp/hmp: Expose "managed-size" for memory backends David Hildenbrand
2020-02-12 14:17   ` Eric Blake
2020-02-12 13:35 ` [PATCH v2 14/16] virtio-mem: Support for resizable memory regions David Hildenbrand
2020-02-12 13:36 ` [PATCH v2 15/16] memory: Add region_resize() callback to memory notifier David Hildenbrand
2020-02-12 13:36 ` [PATCH v2 16/16] kvm: Implement region_resize() for atomic memory section resizes David Hildenbrand
2020-02-12 13:40 ` [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).