All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation
@ 2020-05-21  5:00 Raphael Norwitz
  2020-05-21  5:00 ` [PATCH v4 01/10] Add helper to populate vhost-user message regions Raphael Norwitz
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

In QEMU today, a VM with a vhost-user device can hot add memory a
maximum of 8 times. See these threads, among others:

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
    https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html

[2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html

This series introduces a new protocol feature
VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
restriction on the maximum number RAM slots imposed by vhost-user.

Without vhost-user, a Qemu VM can support 256 ram slots (for ACPI targets),
or potentially more (the KVM max is 512). With each region, a file descriptor
must be sent over the socket. If that many regions are sent in a single message
there could be upwards of 256 file descriptors being opened in the backend process
at once. Opening that many fds could easily push the process past the open fd limit,
especially considering one backend process could have multiple vhost threads,
exposing different devices to different Qemu instances. Therefore to safely lift the
limit, transmitting regions should be split up over multiple messages.

In addition, the VHOST_USER_SET_MEM_TABLE message was not reused because
as the number of regions grows, the message becomes very large. In practice, such
large messages caused problems (truncated messages) and in the past it seems
the community has opted for smaller fixed size messages where possible. VRINGs,
for example, are sent to the backend individually instead of in one massive
message.

The implementation details are explained in more detail in the commit
messages, but at a high level the new protocol feature works as follows:
- If the VHOST_USER_PROTCOL_F_CONFIGURE_MEM_SLOTS feature is enabled,
  QEMU will send multiple VHOST_USER_ADD_MEM_REG and
  VHOST_USER_REM_MEM_REG messages to map and unmap individual memory
 regions instead of one large VHOST_USER_SET_MEM_TABLE message containing
  all memory regions.
- The vhost-user struct maintains a ’shadow state’ of memory regions
  already sent to the guest. Each time vhost_user_set_mem_table is called,
  the shadow state is compared with the new device state. A
  VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
  not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
  for each region in the device state but not the shadow state. After
  these messages have been sent, the shadow state will be updated to
  reflect the new device state.

The series consists of 10 changes:
1. Add helper to populate vhost-user message regions:
    This change adds a helper to populate a VhostUserMemoryRegion from a
    struct vhost_memory_region, which needs to be done in multiple places in
    in this series.

2. Add vhost-user helper to get MemoryRegion data
    This changes adds a helper to get a pointer to a MemoryRegion struct, along
    with it's offset address and associated file descriptor. This helper is used to
    simplify other vhost-user code paths and will be needed elsewhere in this
    series.

3. Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    This change adds the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    protocol feature. At this point, if negotiated, the feature only allows the
    backend to limit the number of max ram slots to a number less than
    VHOST_MEMORY_MAX_NREGIONS = 8.

4. Transmit vhost-user memory regions individually
    With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    protocol feature is enabled, Qemu will send regions to the backend using
    individual VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
    messages.
    The max number of ram slots supported is still limited to 8.

5. Lift max memory slots imposed by vhost-user
    With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    protocol feature is enabled, the backend can support a configurable number of
    ram slots up to the maximum allowed by the target platform.

6. Refactor out libvhost-user fault generation logic
    This cleanup moves some logic from vu_set_mem_table_exec_postcopy() to a
    separate helper, which will be needed elsewhere.

7. Support ram slot configuration in libvhost-user
   This change adds support for processing VHOST_USER_GET_MAX_MEMSLOTS
    messages in libvhost-user.
    The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
    enabled in libvhost-user, so at this point this change is non-functional.

8. Support adding individual regions in libvhost-user
    This change adds libvhost-user support for mapping in new memory regions
    when receiving VHOST_USER_ADD_MEM_REG messages.
    The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
    enabled in libvhost-user, so at this point this change is non-functional.

9. Support individual region unmap in libvhost-user
    This change adds libvhost-user support for unmapping removed memory regions
    when receiving VHOST_USER_REM_MEM_REG messages.
    The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
    enabled in libvhost-user, so at this point this change is non-functional.

10. Lift max ram slots limit in libvhost-user
   This change makes libvhost-user try to negotiate the
   VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and adds support for
   backends built using libvhost-user to support hot adding memory up to the
   32 times.

The changes were tested with the vhost-user-bridge sample.

Changes since V3:
    * Fixed compiler warnings caused by using pointers to packed elements
       (flagged by patchew building with -Waddress-of-packed-member)

Changes since V2:
    * Add support for VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
       for backends build with libvhost-user
    * Add support for postcopy live-migration when the
       VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol feature has
       been negotiated.
    * Add support for backends which want to support both
       VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS and
       VHOST_USER_PROTOCOL_F_REPLY_ACK
    * Change feature name from VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS
        to VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and any associated
        variable names.
    *Log a more descriptive message if the backend lowers the max ram slots limit
       on reconnect.

Changes since V1:
    * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
      to prevent corruption
    * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
      startup and cache the returned value so that QEMU does not need to
      query the backend every time vhost_backend_memslots_limit is called.

Best,
Raphael

Raphael Norwitz (10):
  Add helper to populate vhost-user message regions
  Add vhost-user helper to get MemoryRegion data
  Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
  Transmit vhost-user memory regions individually
  Lift max memory slots limit imposed by vhost-user
  Refactor out libvhost-user fault generation logic
  Support ram slot configuration in libvhost-user
  Support adding individual regions in libvhost-user
  Support individual region unmap in libvhost-user
  Lift max ram slots limit in libvhost-user

 contrib/libvhost-user/libvhost-user.c | 341 ++++++++++++++----
 contrib/libvhost-user/libvhost-user.h |  24 +-
 docs/interop/vhost-user.rst           |  44 +++
 hw/virtio/vhost-user.c                | 638 ++++++++++++++++++++++++++++------
 include/hw/virtio/vhost-user.h        |   1 +
 5 files changed, 873 insertions(+), 175 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 01/10] Add helper to populate vhost-user message regions
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-06-04 14:40   ` Marc-André Lureau
  2020-05-21  5:00 ` [PATCH v4 02/10] Add vhost-user helper to get MemoryRegion data Raphael Norwitz
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

When setting vhost-user memory tables, memory region descriptors must be
copied from the vhost_dev struct to the vhost-user message. To avoid
duplicating code in setting the memory tables, we should use a helper to
populate this field. This change adds this helper.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
---
 hw/virtio/vhost-user.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index ec21e8f..2e0552d 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -407,6 +407,15 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
     return 0;
 }
 
+static void vhost_user_fill_msg_region(VhostUserMemoryRegion *dst,
+                                       struct vhost_memory_region *src)
+{
+    assert(src != NULL && dst != NULL);
+    dst->userspace_addr = src->userspace_addr;
+    dst->memory_size = src->memory_size;
+    dst->guest_phys_addr = src->guest_phys_addr;
+}
+
 static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
                                              struct vhost_dev *dev,
                                              VhostUserMsg *msg,
@@ -417,6 +426,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
     ram_addr_t offset;
     MemoryRegion *mr;
     struct vhost_memory_region *reg;
+    VhostUserMemoryRegion region_buffer;
 
     msg->hdr.request = VHOST_USER_SET_MEM_TABLE;
 
@@ -441,12 +451,8 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
                 error_report("Failed preparing vhost-user memory table msg");
                 return -1;
             }
-            msg->payload.memory.regions[*fd_num].userspace_addr =
-                reg->userspace_addr;
-            msg->payload.memory.regions[*fd_num].memory_size =
-                reg->memory_size;
-            msg->payload.memory.regions[*fd_num].guest_phys_addr =
-                reg->guest_phys_addr;
+            vhost_user_fill_msg_region(&region_buffer, reg);
+            msg->payload.memory.regions[*fd_num] = region_buffer;
             msg->payload.memory.regions[*fd_num].mmap_offset = offset;
             fds[(*fd_num)++] = fd;
         } else if (track_ramblocks) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 02/10] Add vhost-user helper to get MemoryRegion data
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
  2020-05-21  5:00 ` [PATCH v4 01/10] Add helper to populate vhost-user message regions Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-06-04 14:41   ` Marc-André Lureau
  2020-05-21  5:00 ` [PATCH v4 03/10] Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS Raphael Norwitz
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

When setting the memory tables, qemu uses a memory region's userspace
address to look up the region's MemoryRegion struct. Among other things,
the MemoryRegion contains the region's offset and associated file
descriptor, all of which need to be sent to the backend.

With VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, this logic will be
needed in multiple places, so before feature support is added it
should be moved to a helper function.

This helper is also used to simplify the vhost_user_can_merge()
function.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
---
 hw/virtio/vhost-user.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 2e0552d..442b0d6 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -407,6 +407,18 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
     return 0;
 }
 
+static MemoryRegion *vhost_user_get_mr_data(uint64_t addr, ram_addr_t *offset,
+                                            int *fd)
+{
+    MemoryRegion *mr;
+
+    assert((uintptr_t)addr == addr);
+    mr = memory_region_from_host((void *)(uintptr_t)addr, offset);
+    *fd = memory_region_get_fd(mr);
+
+    return mr;
+}
+
 static void vhost_user_fill_msg_region(VhostUserMemoryRegion *dst,
                                        struct vhost_memory_region *src)
 {
@@ -433,10 +445,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
     for (i = 0; i < dev->mem->nregions; ++i) {
         reg = dev->mem->regions + i;
 
-        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
-        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
-                                     &offset);
-        fd = memory_region_get_fd(mr);
+        mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
         if (fd > 0) {
             if (track_ramblocks) {
                 assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
@@ -1551,13 +1560,9 @@ static bool vhost_user_can_merge(struct vhost_dev *dev,
 {
     ram_addr_t offset;
     int mfd, rfd;
-    MemoryRegion *mr;
-
-    mr = memory_region_from_host((void *)(uintptr_t)start1, &offset);
-    mfd = memory_region_get_fd(mr);
 
-    mr = memory_region_from_host((void *)(uintptr_t)start2, &offset);
-    rfd = memory_region_get_fd(mr);
+    (void)vhost_user_get_mr_data(start1, &offset, &mfd);
+    (void)vhost_user_get_mr_data(start2, &offset, &rfd);
 
     return mfd == rfd;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 03/10] Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
  2020-05-21  5:00 ` [PATCH v4 01/10] Add helper to populate vhost-user message regions Raphael Norwitz
  2020-05-21  5:00 ` [PATCH v4 02/10] Add vhost-user helper to get MemoryRegion data Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-06-04 14:42   ` Marc-André Lureau
  2020-05-21  5:00 ` [PATCH v4 04/10] Transmit vhost-user memory regions individually Raphael Norwitz
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: Peter Turschmid, raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

This change introduces a new feature to the vhost-user protocol allowing
a backend device to specify the maximum number of ram slots it supports.

At this point, the value returned by the backend will be capped at the
maximum number of ram slots which can be supported by vhost-user, which
is currently set to 8 because of underlying protocol limitations.

The returned value will be stored inside the VhostUserState struct so
that on device reconnect we can verify that the ram slot limitation
has not decreased since the last time the device connected.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
---
 docs/interop/vhost-user.rst    | 16 ++++++++++++++
 hw/virtio/vhost-user.c         | 49 ++++++++++++++++++++++++++++++++++++++++--
 include/hw/virtio/vhost-user.h |  1 +
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 3b1b660..b3cf5c3 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -815,6 +815,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12
   #define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13
   #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
+  #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
 
 Master message types
 --------------------
@@ -1263,6 +1264,21 @@ Master message types
 
   The state.num field is currently reserved and must be set to 0.
 
+``VHOST_USER_GET_MAX_MEM_SLOTS``
+  :id: 36
+  :equivalent ioctl: N/A
+  :slave payload: u64
+
+  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
+  feature has been successfully negotiated, this message is submitted
+  by master to the slave. The slave should return the message with a
+  u64 payload containing the maximum number of memory slots for
+  QEMU to expose to the guest. At this point, the value returned
+  by the backend will be capped at the maximum number of ram slots
+  which can be supported by vhost-user. Currently that limit is set
+  at VHOST_USER_MAX_RAM_SLOTS = 8 because of underlying protocol
+  limitations.
+
 Slave message types
 -------------------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 442b0d6..0af593f 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -59,6 +59,8 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
     VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
     VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
+    /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
+    VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -100,6 +102,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_INFLIGHT_FD = 32,
     VHOST_USER_GPU_SET_SOCKET = 33,
     VHOST_USER_RESET_DEVICE = 34,
+    /* Message number 35 reserved for VHOST_USER_VRING_KICK. */
+    VHOST_USER_GET_MAX_MEM_SLOTS = 36,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -895,6 +899,23 @@ static int vhost_user_set_owner(struct vhost_dev *dev)
     return 0;
 }
 
+static int vhost_user_get_max_memslots(struct vhost_dev *dev,
+                                       uint64_t *max_memslots)
+{
+    uint64_t backend_max_memslots;
+    int err;
+
+    err = vhost_user_get_u64(dev, VHOST_USER_GET_MAX_MEM_SLOTS,
+                             &backend_max_memslots);
+    if (err < 0) {
+        return err;
+    }
+
+    *max_memslots = backend_max_memslots;
+
+    return 0;
+}
+
 static int vhost_user_reset_device(struct vhost_dev *dev)
 {
     VhostUserMsg msg = {
@@ -1392,7 +1413,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
 
 static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
 {
-    uint64_t features, protocol_features;
+    uint64_t features, protocol_features, ram_slots;
     struct vhost_user *u;
     int err;
 
@@ -1454,6 +1475,27 @@ static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
                          "slave-req protocol features.");
             return -1;
         }
+
+        /* get max memory regions if backend supports configurable RAM slots */
+        if (!virtio_has_feature(dev->protocol_features,
+                                VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS)) {
+            u->user->memory_slots = VHOST_MEMORY_MAX_NREGIONS;
+        } else {
+            err = vhost_user_get_max_memslots(dev, &ram_slots);
+            if (err < 0) {
+                return err;
+            }
+
+            if (ram_slots < u->user->memory_slots) {
+                error_report("The backend specified a max ram slots limit "
+                             "of %lu, when the prior validated limit was %d. "
+                             "This limit should never decrease.", ram_slots,
+                             u->user->memory_slots);
+                return -1;
+            }
+
+            u->user->memory_slots = MIN(ram_slots, VHOST_MEMORY_MAX_NREGIONS);
+        }
     }
 
     if (dev->migration_blocker == NULL &&
@@ -1519,7 +1561,9 @@ static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx)
 
 static int vhost_user_memslots_limit(struct vhost_dev *dev)
 {
-    return VHOST_MEMORY_MAX_NREGIONS;
+    struct vhost_user *u = dev->opaque;
+
+    return u->user->memory_slots;
 }
 
 static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
@@ -1904,6 +1948,7 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp)
         return false;
     }
     user->chr = chr;
+    user->memory_slots = 0;
     return true;
 }
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 811e325..a9abca3 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -20,6 +20,7 @@ typedef struct VhostUserHostNotifier {
 typedef struct VhostUserState {
     CharBackend *chr;
     VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
+    int memory_slots;
 } VhostUserState;
 
 bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 04/10] Transmit vhost-user memory regions individually
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (2 preceding siblings ...)
  2020-05-21  5:00 ` [PATCH v4 03/10] Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-06-04 14:44   ` Marc-André Lureau
  2020-05-21  5:00 ` [PATCH v4 05/10] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: Peter Turschmid, raphael.s.norwitz, marcandre.lureau,
	Swapnil Ingle, Raphael Norwitz

With this change, when the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
protocol feature has been negotiated, Qemu no longer sends the backend
all the memory regions in a single message. Rather, when the memory
tables are set or updated, a series of VHOST_USER_ADD_MEM_REG and
VHOST_USER_REM_MEM_REG messages are sent to transmit the regions to map
and/or unmap instead of sending send all the regions in one fixed size
VHOST_USER_SET_MEM_TABLE message.

The vhost_user struct maintains a shadow state of the VM’s memory
regions. When the memory tables are modified, the
vhost_user_set_mem_table() function compares the new device memory state
to the shadow state and only sends regions which need to be unmapped or
mapped in. The regions which must be unmapped are sent first, followed
by the new regions to be mapped in. After all the messages have been
sent, the shadow state is set to the current virtual device state.

Existing backends which do not support
VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS are unaffected.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
Suggested-by: Mike Cui <cui@nutanix.com>
---
 docs/interop/vhost-user.rst |  33 ++-
 hw/virtio/vhost-user.c      | 510 +++++++++++++++++++++++++++++++++++++-------
 2 files changed, 469 insertions(+), 74 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index b3cf5c3..037eefa 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1276,8 +1276,37 @@ Master message types
   QEMU to expose to the guest. At this point, the value returned
   by the backend will be capped at the maximum number of ram slots
   which can be supported by vhost-user. Currently that limit is set
-  at VHOST_USER_MAX_RAM_SLOTS = 8 because of underlying protocol
-  limitations.
+  at VHOST_USER_MAX_RAM_SLOTS = 8.
+
+``VHOST_USER_ADD_MEM_REG``
+  :id: 37
+  :equivalent ioctl: N/A
+  :slave payload: memory region
+
+  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
+  feature has been successfully negotiated, this message is submitted
+  by the master to the slave. The message payload contains a memory
+  region descriptor struct, describing a region of guest memory which
+  the slave device must map in. When the
+  ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
+  been successfully negotiated, along with the
+  ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and
+  update the memory tables of the slave device.
+
+``VHOST_USER_REM_MEM_REG``
+  :id: 38
+  :equivalent ioctl: N/A
+  :slave payload: memory region
+
+  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
+  feature has been successfully negotiated, this message is submitted
+  by the master to the slave. The message payload contains a memory
+  region descriptor struct, describing a region of guest memory which
+  the slave device must unmap. When the
+  ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
+  been successfully negotiated, along with the
+  ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and
+  update the memory tables of the slave device.
 
 Slave message types
 -------------------
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 0af593f..9358406 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -104,6 +104,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_RESET_DEVICE = 34,
     /* Message number 35 reserved for VHOST_USER_VRING_KICK. */
     VHOST_USER_GET_MAX_MEM_SLOTS = 36,
+    VHOST_USER_ADD_MEM_REG = 37,
+    VHOST_USER_REM_MEM_REG = 38,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -128,6 +130,11 @@ typedef struct VhostUserMemory {
     VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
 } VhostUserMemory;
 
+typedef struct VhostUserMemRegMsg {
+    uint32_t padding;
+    VhostUserMemoryRegion region;
+} VhostUserMemRegMsg;
+
 typedef struct VhostUserLog {
     uint64_t mmap_size;
     uint64_t mmap_offset;
@@ -186,6 +193,7 @@ typedef union {
         struct vhost_vring_state state;
         struct vhost_vring_addr addr;
         VhostUserMemory memory;
+        VhostUserMemRegMsg mem_reg;
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
@@ -226,6 +234,16 @@ struct vhost_user {
 
     /* True once we've entered postcopy_listen */
     bool               postcopy_listen;
+
+    /* Our current regions */
+    int num_shadow_regions;
+    struct vhost_memory_region shadow_regions[VHOST_MEMORY_MAX_NREGIONS];
+};
+
+struct scrub_regions {
+    struct vhost_memory_region *region;
+    int reg_idx;
+    int fd_idx;
 };
 
 static bool ioeventfd_enabled(void)
@@ -489,8 +507,332 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
     return 1;
 }
 
+static inline bool reg_equal(struct vhost_memory_region *shadow_reg,
+                             struct vhost_memory_region *vdev_reg)
+{
+    return shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr &&
+        shadow_reg->userspace_addr == vdev_reg->userspace_addr &&
+        shadow_reg->memory_size == vdev_reg->memory_size;
+}
+
+static void scrub_shadow_regions(struct vhost_dev *dev,
+                                 struct scrub_regions *add_reg,
+                                 int *nr_add_reg,
+                                 struct scrub_regions *rem_reg,
+                                 int *nr_rem_reg, uint64_t *shadow_pcb,
+                                 bool track_ramblocks)
+{
+    struct vhost_user *u = dev->opaque;
+    bool found[VHOST_MEMORY_MAX_NREGIONS] = {};
+    struct vhost_memory_region *reg, *shadow_reg;
+    int i, j, fd, add_idx = 0, rm_idx = 0, fd_num = 0;
+    ram_addr_t offset;
+    MemoryRegion *mr;
+    bool matching;
+
+    /*
+     * Find memory regions present in our shadow state which are not in
+     * the device's current memory state.
+     *
+     * Mark regions in both the shadow and device state as "found".
+     */
+    for (i = 0; i < u->num_shadow_regions; i++) {
+        shadow_reg = &u->shadow_regions[i];
+        matching = false;
+
+        for (j = 0; j < dev->mem->nregions; j++) {
+            reg = &dev->mem->regions[j];
+
+            mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
+
+            if (reg_equal(shadow_reg, reg)) {
+                matching = true;
+                found[j] = true;
+                if (track_ramblocks) {
+                    /*
+                     * Reset postcopy client bases, region_rb, and
+                     * region_rb_offset in case regions are removed.
+                     */
+                    if (fd > 0) {
+                        u->region_rb_offset[j] = offset;
+                        u->region_rb[j] = mr->ram_block;
+                        shadow_pcb[j] = u->postcopy_client_bases[i];
+                    } else {
+                        u->region_rb_offset[j] = 0;
+                        u->region_rb[j] = NULL;
+                    }
+                }
+                break;
+            }
+        }
+
+        /*
+         * If the region was not found in the current device memory state
+         * create an entry for it in the removed list.
+         */
+        if (!matching) {
+            rem_reg[rm_idx].region = shadow_reg;
+            rem_reg[rm_idx++].reg_idx = i;
+        }
+    }
+
+    /*
+     * For regions not marked "found", create entries in the added list.
+     *
+     * Note their indexes in the device memory state and the indexes of their
+     * file descriptors.
+     */
+    for (i = 0; i < dev->mem->nregions; i++) {
+        reg = &dev->mem->regions[i];
+        mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
+        if (fd > 0) {
+            ++fd_num;
+        }
+
+        /*
+         * If the region was in both the shadow and device state we don't
+         * need to send a VHOST_USER_ADD_MEM_REG message for it.
+         */
+        if (found[i]) {
+            continue;
+        }
+
+        add_reg[add_idx].region = reg;
+        add_reg[add_idx].reg_idx = i;
+        add_reg[add_idx++].fd_idx = fd_num;
+    }
+    *nr_rem_reg = rm_idx;
+    *nr_add_reg = add_idx;
+
+    return;
+}
+
+static int send_remove_regions(struct vhost_dev *dev,
+                               struct scrub_regions *remove_reg,
+                               int nr_rem_reg, VhostUserMsg *msg,
+                               bool reply_supported)
+{
+    struct vhost_user *u = dev->opaque;
+    struct vhost_memory_region *shadow_reg;
+    int i, fd, shadow_reg_idx, ret;
+    ram_addr_t offset;
+    VhostUserMemoryRegion region_buffer;
+
+    /*
+     * The regions in remove_reg appear in the same order they do in the
+     * shadow table. Therefore we can minimize memory copies by iterating
+     * through remove_reg backwards.
+     */
+    for (i = nr_rem_reg - 1; i >= 0; i--) {
+        shadow_reg = remove_reg[i].region;
+        shadow_reg_idx = remove_reg[i].reg_idx;
+
+        vhost_user_get_mr_data(shadow_reg->userspace_addr, &offset, &fd);
+
+        if (fd > 0) {
+            msg->hdr.request = VHOST_USER_REM_MEM_REG;
+            vhost_user_fill_msg_region(&region_buffer, shadow_reg);
+            msg->payload.mem_reg.region = region_buffer;
+
+            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
+                return -1;
+            }
+
+            if (reply_supported) {
+                ret = process_message_reply(dev, msg);
+                if (ret) {
+                    return ret;
+                }
+            }
+        }
+
+        /*
+         * At this point we know the backend has unmapped the region. It is now
+         * safe to remove it from the shadow table.
+         */
+        memmove(&u->shadow_regions[shadow_reg_idx],
+                &u->shadow_regions[shadow_reg_idx + 1],
+                sizeof(struct vhost_memory_region) *
+                (u->num_shadow_regions - shadow_reg_idx));
+        u->num_shadow_regions--;
+    }
+
+    return 0;
+}
+
+static int send_add_regions(struct vhost_dev *dev,
+                            struct scrub_regions *add_reg, int nr_add_reg,
+                            VhostUserMsg *msg, uint64_t *shadow_pcb,
+                            bool reply_supported, bool track_ramblocks)
+{
+    struct vhost_user *u = dev->opaque;
+    int i, fd, ret, reg_idx, reg_fd_idx;
+    struct vhost_memory_region *reg;
+    MemoryRegion *mr;
+    ram_addr_t offset;
+    VhostUserMsg msg_reply;
+    VhostUserMemoryRegion region_buffer;
+
+    for (i = 0; i < nr_add_reg; i++) {
+        reg = add_reg[i].region;
+        reg_idx = add_reg[i].reg_idx;
+        reg_fd_idx = add_reg[i].fd_idx;
+
+        mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
+
+        if (fd > 0) {
+            if (track_ramblocks) {
+                trace_vhost_user_set_mem_table_withfd(reg_fd_idx, mr->name,
+                                                      reg->memory_size,
+                                                      reg->guest_phys_addr,
+                                                      reg->userspace_addr,
+                                                      offset);
+                u->region_rb_offset[reg_idx] = offset;
+                u->region_rb[reg_idx] = mr->ram_block;
+            }
+            msg->hdr.request = VHOST_USER_ADD_MEM_REG;
+            vhost_user_fill_msg_region(&region_buffer, reg);
+            msg->payload.mem_reg.region = region_buffer;
+            msg->payload.mem_reg.region.mmap_offset = offset;
+
+            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
+                return -1;
+            }
+
+            if (track_ramblocks) {
+                uint64_t reply_gpa;
+
+                if (vhost_user_read(dev, &msg_reply) < 0) {
+                    return -1;
+                }
+
+                reply_gpa = msg_reply.payload.mem_reg.region.guest_phys_addr;
+
+                if (msg_reply.hdr.request != VHOST_USER_ADD_MEM_REG) {
+                    error_report("%s: Received unexpected msg type."
+                                 "Expected %d received %d", __func__,
+                                 VHOST_USER_ADD_MEM_REG,
+                                 msg_reply.hdr.request);
+                    return -1;
+                }
+
+                /*
+                 * We're using the same structure, just reusing one of the
+                 * fields, so it should be the same size.
+                 */
+                if (msg_reply.hdr.size != msg->hdr.size) {
+                    error_report("%s: Unexpected size for postcopy reply "
+                                 "%d vs %d", __func__, msg_reply.hdr.size,
+                                 msg->hdr.size);
+                    return -1;
+                }
+
+                /* Get the postcopy client base from the backend's reply. */
+                if (reply_gpa == dev->mem->regions[reg_idx].guest_phys_addr) {
+                    shadow_pcb[reg_idx] =
+                        msg_reply.payload.mem_reg.region.userspace_addr;
+                    trace_vhost_user_set_mem_table_postcopy(
+                        msg_reply.payload.mem_reg.region.userspace_addr,
+                        msg->payload.mem_reg.region.userspace_addr,
+                        reg_fd_idx, reg_idx);
+                } else {
+                    error_report("%s: invalid postcopy reply for region. "
+                                 "Got guest physical address %lX, expected "
+                                 "%lX", __func__, reply_gpa,
+                                 dev->mem->regions[reg_idx].guest_phys_addr);
+                    return -1;
+                }
+            } else if (reply_supported) {
+                ret = process_message_reply(dev, msg);
+                if (ret) {
+                    return ret;
+                }
+            }
+        } else if (track_ramblocks) {
+            u->region_rb_offset[reg_idx] = 0;
+            u->region_rb[reg_idx] = NULL;
+        }
+
+        /*
+         * At this point, we know the backend has mapped in the new
+         * region, if the region has a valid file descriptor.
+         *
+         * The region should now be added to the shadow table.
+         */
+        u->shadow_regions[u->num_shadow_regions].guest_phys_addr =
+            reg->guest_phys_addr;
+        u->shadow_regions[u->num_shadow_regions].userspace_addr =
+            reg->userspace_addr;
+        u->shadow_regions[u->num_shadow_regions].memory_size =
+            reg->memory_size;
+        u->num_shadow_regions++;
+    }
+
+    return 0;
+}
+
+static int vhost_user_add_remove_regions(struct vhost_dev *dev,
+                                         VhostUserMsg *msg,
+                                         bool reply_supported,
+                                         bool track_ramblocks)
+{
+    struct vhost_user *u = dev->opaque;
+    struct scrub_regions add_reg[VHOST_MEMORY_MAX_NREGIONS];
+    struct scrub_regions rem_reg[VHOST_MEMORY_MAX_NREGIONS];
+    uint64_t shadow_pcb[VHOST_MEMORY_MAX_NREGIONS] = {};
+    int nr_add_reg, nr_rem_reg;
+
+    msg->hdr.size = sizeof(msg->payload.mem_reg.padding) +
+        sizeof(VhostUserMemoryRegion);
+
+    /* Find the regions which need to be removed or added. */
+    scrub_shadow_regions(dev, add_reg, &nr_add_reg, rem_reg, &nr_rem_reg,
+                         shadow_pcb, track_ramblocks);
+
+    if (nr_rem_reg && send_remove_regions(dev, rem_reg, nr_rem_reg, msg,
+                reply_supported) < 0)
+    {
+        goto err;
+    }
+
+    if (nr_add_reg && send_add_regions(dev, add_reg, nr_add_reg, msg,
+                shadow_pcb, reply_supported, track_ramblocks) < 0)
+    {
+        goto err;
+    }
+
+    if (track_ramblocks) {
+        memcpy(u->postcopy_client_bases, shadow_pcb,
+               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+        /*
+         * Now we've registered this with the postcopy code, we ack to the
+         * client, because now we're in the position to be able to deal with
+         * any faults it generates.
+         */
+        /* TODO: Use this for failure cases as well with a bad value. */
+        msg->hdr.size = sizeof(msg->payload.u64);
+        msg->payload.u64 = 0; /* OK */
+
+        if (vhost_user_write(dev, msg, NULL, 0) < 0) {
+            return -1;
+        }
+    }
+
+    return 0;
+
+err:
+    if (track_ramblocks) {
+        memcpy(u->postcopy_client_bases, shadow_pcb,
+               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+    }
+
+    return -1;
+}
+
 static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
-                                             struct vhost_memory *mem)
+                                             struct vhost_memory *mem,
+                                             bool reply_supported,
+                                             bool config_mem_slots)
 {
     struct vhost_user *u = dev->opaque;
     int fds[VHOST_MEMORY_MAX_NREGIONS];
@@ -513,71 +855,84 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
         u->region_rb_len = dev->mem->nregions;
     }
 
-    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
+    if (config_mem_slots) {
+        if (vhost_user_add_remove_regions(dev, &msg, reply_supported,
                                           true) < 0) {
-        return -1;
-    }
+            return -1;
+        }
+    } else {
+        if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
+                                              true) < 0) {
+            return -1;
+        }
 
-    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
-        return -1;
-    }
+        if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+            return -1;
+        }
 
-    if (vhost_user_read(dev, &msg_reply) < 0) {
-        return -1;
-    }
+        if (vhost_user_read(dev, &msg_reply) < 0) {
+            return -1;
+        }
 
-    if (msg_reply.hdr.request != VHOST_USER_SET_MEM_TABLE) {
-        error_report("%s: Received unexpected msg type."
-                     "Expected %d received %d", __func__,
-                     VHOST_USER_SET_MEM_TABLE, msg_reply.hdr.request);
-        return -1;
-    }
-    /* We're using the same structure, just reusing one of the
-     * fields, so it should be the same size.
-     */
-    if (msg_reply.hdr.size != msg.hdr.size) {
-        error_report("%s: Unexpected size for postcopy reply "
-                     "%d vs %d", __func__, msg_reply.hdr.size, msg.hdr.size);
-        return -1;
-    }
-
-    memset(u->postcopy_client_bases, 0,
-           sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
-
-    /* They're in the same order as the regions that were sent
-     * but some of the regions were skipped (above) if they
-     * didn't have fd's
-    */
-    for (msg_i = 0, region_i = 0;
-         region_i < dev->mem->nregions;
-        region_i++) {
-        if (msg_i < fd_num &&
-            msg_reply.payload.memory.regions[msg_i].guest_phys_addr ==
-            dev->mem->regions[region_i].guest_phys_addr) {
-            u->postcopy_client_bases[region_i] =
-                msg_reply.payload.memory.regions[msg_i].userspace_addr;
-            trace_vhost_user_set_mem_table_postcopy(
-                msg_reply.payload.memory.regions[msg_i].userspace_addr,
-                msg.payload.memory.regions[msg_i].userspace_addr,
-                msg_i, region_i);
-            msg_i++;
+        if (msg_reply.hdr.request != VHOST_USER_SET_MEM_TABLE) {
+            error_report("%s: Received unexpected msg type."
+                         "Expected %d received %d", __func__,
+                         VHOST_USER_SET_MEM_TABLE, msg_reply.hdr.request);
+            return -1;
+        }
+
+        /*
+         * We're using the same structure, just reusing one of the
+         * fields, so it should be the same size.
+         */
+        if (msg_reply.hdr.size != msg.hdr.size) {
+            error_report("%s: Unexpected size for postcopy reply "
+                         "%d vs %d", __func__, msg_reply.hdr.size,
+                         msg.hdr.size);
+            return -1;
+        }
+
+        memset(u->postcopy_client_bases, 0,
+               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+
+        /*
+         * They're in the same order as the regions that were sent
+         * but some of the regions were skipped (above) if they
+         * didn't have fd's
+         */
+        for (msg_i = 0, region_i = 0;
+             region_i < dev->mem->nregions;
+             region_i++) {
+            if (msg_i < fd_num &&
+                msg_reply.payload.memory.regions[msg_i].guest_phys_addr ==
+                dev->mem->regions[region_i].guest_phys_addr) {
+                u->postcopy_client_bases[region_i] =
+                    msg_reply.payload.memory.regions[msg_i].userspace_addr;
+                trace_vhost_user_set_mem_table_postcopy(
+                    msg_reply.payload.memory.regions[msg_i].userspace_addr,
+                    msg.payload.memory.regions[msg_i].userspace_addr,
+                    msg_i, region_i);
+                msg_i++;
+            }
+        }
+        if (msg_i != fd_num) {
+            error_report("%s: postcopy reply not fully consumed "
+                         "%d vs %zd",
+                         __func__, msg_i, fd_num);
+            return -1;
+        }
+
+        /*
+         * Now we've registered this with the postcopy code, we ack to the
+         * client, because now we're in the position to be able to deal
+         * with any faults it generates.
+         */
+        /* TODO: Use this for failure cases as well with a bad value. */
+        msg.hdr.size = sizeof(msg.payload.u64);
+        msg.payload.u64 = 0; /* OK */
+        if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+            return -1;
         }
-    }
-    if (msg_i != fd_num) {
-        error_report("%s: postcopy reply not fully consumed "
-                     "%d vs %zd",
-                     __func__, msg_i, fd_num);
-        return -1;
-    }
-    /* Now we've registered this with the postcopy code, we ack to the client,
-     * because now we're in the position to be able to deal with any faults
-     * it generates.
-     */
-    /* TODO: Use this for failure cases as well with a bad value */
-    msg.hdr.size = sizeof(msg.payload.u64);
-    msg.payload.u64 = 0; /* OK */
-    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
-        return -1;
     }
 
     return 0;
@@ -592,12 +947,17 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
     bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
     bool reply_supported = virtio_has_feature(dev->protocol_features,
                                               VHOST_USER_PROTOCOL_F_REPLY_ACK);
+    bool config_mem_slots =
+        virtio_has_feature(dev->protocol_features,
+                           VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS);
 
     if (do_postcopy) {
-        /* Postcopy has enough differences that it's best done in it's own
+        /*
+         * Postcopy has enough differences that it's best done in it's own
          * version
          */
-        return vhost_user_set_mem_table_postcopy(dev, mem);
+        return vhost_user_set_mem_table_postcopy(dev, mem, reply_supported,
+                                                 config_mem_slots);
     }
 
     VhostUserMsg msg = {
@@ -608,17 +968,23 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
     }
 
-    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
+    if (config_mem_slots) {
+        if (vhost_user_add_remove_regions(dev, &msg, reply_supported,
                                           false) < 0) {
-        return -1;
-    }
-
-    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
-        return -1;
-    }
+            return -1;
+        }
+    } else {
+        if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
+                                              false) < 0) {
+            return -1;
+        }
+        if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+            return -1;
+        }
 
-    if (reply_supported) {
-        return process_message_reply(dev, &msg);
+        if (reply_supported) {
+            return process_message_reply(dev, &msg);
+        }
     }
 
     return 0;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 05/10] Lift max memory slots limit imposed by vhost-user
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (3 preceding siblings ...)
  2020-05-21  5:00 ` [PATCH v4 04/10] Transmit vhost-user memory regions individually Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-06-04 14:45   ` Marc-André Lureau
  2020-05-21  5:00 ` [PATCH v4 06/10] Refactor out libvhost-user fault generation logic Raphael Norwitz
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: Peter Turschmid, raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

Historically, sending all memory regions to vhost-user backends in a
single message imposed a limitation on the number of times memory
could be hot-added to a VM with a vhost-user device. Now that backends
which support the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS send memory
regions individually, we no longer need to impose this limitation on
devices which support this feature.

With this change, VMs with a vhost-user device which supports the
VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS can support a configurable
number of memory slots, up to the maximum allowed by the target
platform.

Existing backends which do not support
VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS are unaffected.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
Suggested-by: Mike Cui <cui@nutanix.com>
---
 docs/interop/vhost-user.rst |  7 +++---
 hw/virtio/vhost-user.c      | 56 ++++++++++++++++++++++++++++++---------------
 2 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 037eefa..688b7c6 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1273,10 +1273,9 @@ Master message types
   feature has been successfully negotiated, this message is submitted
   by master to the slave. The slave should return the message with a
   u64 payload containing the maximum number of memory slots for
-  QEMU to expose to the guest. At this point, the value returned
-  by the backend will be capped at the maximum number of ram slots
-  which can be supported by vhost-user. Currently that limit is set
-  at VHOST_USER_MAX_RAM_SLOTS = 8.
+  QEMU to expose to the guest. The value returned by the backend
+  will be capped at the maximum number of ram slots which can be
+  supported by the target platform.
 
 ``VHOST_USER_ADD_MEM_REG``
   :id: 37
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 9358406..48b8081 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -35,11 +35,29 @@
 #include <linux/userfaultfd.h>
 #endif
 
-#define VHOST_MEMORY_MAX_NREGIONS    8
+#define VHOST_MEMORY_BASELINE_NREGIONS    8
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 #define VHOST_USER_SLAVE_MAX_FDS     8
 
 /*
+ * Set maximum number of RAM slots supported to
+ * the maximum number supported by the target
+ * hardware plaform.
+ */
+#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
+    defined(TARGET_ARM) || defined(TARGET_ARM_64)
+#include "hw/acpi/acpi.h"
+#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS
+
+#elif defined(TARGET_PPC) || defined(TARGET_PPC_64)
+#include "hw/ppc/spapr.h"
+#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS
+
+#else
+#define VHOST_USER_MAX_RAM_SLOTS 512
+#endif
+
+/*
  * Maximum size of virtio device config space
  */
 #define VHOST_USER_MAX_CONFIG_SIZE 256
@@ -127,7 +145,7 @@ typedef struct VhostUserMemoryRegion {
 typedef struct VhostUserMemory {
     uint32_t nregions;
     uint32_t padding;
-    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+    VhostUserMemoryRegion regions[VHOST_MEMORY_BASELINE_NREGIONS];
 } VhostUserMemory;
 
 typedef struct VhostUserMemRegMsg {
@@ -222,7 +240,7 @@ struct vhost_user {
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
-    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
+    uint64_t           postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
     /* Length of the region_rb and region_rb_offset arrays */
     size_t             region_rb_len;
     /* RAMBlock associated with a given region */
@@ -237,7 +255,7 @@ struct vhost_user {
 
     /* Our current regions */
     int num_shadow_regions;
-    struct vhost_memory_region shadow_regions[VHOST_MEMORY_MAX_NREGIONS];
+    struct vhost_memory_region shadow_regions[VHOST_USER_MAX_RAM_SLOTS];
 };
 
 struct scrub_regions {
@@ -392,7 +410,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd)
 static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
                                    struct vhost_log *log)
 {
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_USER_MAX_RAM_SLOTS];
     size_t fd_num = 0;
     bool shmfd = virtio_has_feature(dev->protocol_features,
                                     VHOST_USER_PROTOCOL_F_LOG_SHMFD);
@@ -470,7 +488,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
         mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
         if (fd > 0) {
             if (track_ramblocks) {
-                assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
+                assert(*fd_num < VHOST_MEMORY_BASELINE_NREGIONS);
                 trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
                                                       reg->memory_size,
                                                       reg->guest_phys_addr,
@@ -478,7 +496,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
                                                       offset);
                 u->region_rb_offset[i] = offset;
                 u->region_rb[i] = mr->ram_block;
-            } else if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
+            } else if (*fd_num == VHOST_MEMORY_BASELINE_NREGIONS) {
                 error_report("Failed preparing vhost-user memory table msg");
                 return -1;
             }
@@ -523,7 +541,7 @@ static void scrub_shadow_regions(struct vhost_dev *dev,
                                  bool track_ramblocks)
 {
     struct vhost_user *u = dev->opaque;
-    bool found[VHOST_MEMORY_MAX_NREGIONS] = {};
+    bool found[VHOST_USER_MAX_RAM_SLOTS] = {};
     struct vhost_memory_region *reg, *shadow_reg;
     int i, j, fd, add_idx = 0, rm_idx = 0, fd_num = 0;
     ram_addr_t offset;
@@ -777,9 +795,9 @@ static int vhost_user_add_remove_regions(struct vhost_dev *dev,
                                          bool track_ramblocks)
 {
     struct vhost_user *u = dev->opaque;
-    struct scrub_regions add_reg[VHOST_MEMORY_MAX_NREGIONS];
-    struct scrub_regions rem_reg[VHOST_MEMORY_MAX_NREGIONS];
-    uint64_t shadow_pcb[VHOST_MEMORY_MAX_NREGIONS] = {};
+    struct scrub_regions add_reg[VHOST_USER_MAX_RAM_SLOTS];
+    struct scrub_regions rem_reg[VHOST_USER_MAX_RAM_SLOTS];
+    uint64_t shadow_pcb[VHOST_USER_MAX_RAM_SLOTS] = {};
     int nr_add_reg, nr_rem_reg;
 
     msg->hdr.size = sizeof(msg->payload.mem_reg.padding) +
@@ -803,7 +821,7 @@ static int vhost_user_add_remove_regions(struct vhost_dev *dev,
 
     if (track_ramblocks) {
         memcpy(u->postcopy_client_bases, shadow_pcb,
-               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+               sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
         /*
          * Now we've registered this with the postcopy code, we ack to the
          * client, because now we're in the position to be able to deal with
@@ -823,7 +841,7 @@ static int vhost_user_add_remove_regions(struct vhost_dev *dev,
 err:
     if (track_ramblocks) {
         memcpy(u->postcopy_client_bases, shadow_pcb,
-               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+               sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
     }
 
     return -1;
@@ -835,7 +853,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
                                              bool config_mem_slots)
 {
     struct vhost_user *u = dev->opaque;
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_MEMORY_BASELINE_NREGIONS];
     size_t fd_num = 0;
     VhostUserMsg msg_reply;
     int region_i, msg_i;
@@ -893,7 +911,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
         }
 
         memset(u->postcopy_client_bases, 0,
-               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+               sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
 
         /*
          * They're in the same order as the regions that were sent
@@ -942,7 +960,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
                                     struct vhost_memory *mem)
 {
     struct vhost_user *u = dev->opaque;
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_MEMORY_BASELINE_NREGIONS];
     size_t fd_num = 0;
     bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
     bool reply_supported = virtio_has_feature(dev->protocol_features,
@@ -1149,7 +1167,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
                                 VhostUserRequest request,
                                 struct vhost_vring_file *file)
 {
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_USER_MAX_RAM_SLOTS];
     size_t fd_num = 0;
     VhostUserMsg msg = {
         .hdr.request = request,
@@ -1845,7 +1863,7 @@ static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
         /* get max memory regions if backend supports configurable RAM slots */
         if (!virtio_has_feature(dev->protocol_features,
                                 VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS)) {
-            u->user->memory_slots = VHOST_MEMORY_MAX_NREGIONS;
+            u->user->memory_slots = VHOST_MEMORY_BASELINE_NREGIONS;
         } else {
             err = vhost_user_get_max_memslots(dev, &ram_slots);
             if (err < 0) {
@@ -1860,7 +1878,7 @@ static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
                 return -1;
             }
 
-            u->user->memory_slots = MIN(ram_slots, VHOST_MEMORY_MAX_NREGIONS);
+            u->user->memory_slots = MIN(ram_slots, VHOST_USER_MAX_RAM_SLOTS);
         }
     }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 06/10] Refactor out libvhost-user fault generation logic
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (4 preceding siblings ...)
  2020-05-21  5:00 ` [PATCH v4 05/10] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-06-04 14:48   ` Marc-André Lureau
  2020-05-21  5:00 ` [PATCH v4 07/10] Support ram slot configuration in libvhost-user Raphael Norwitz
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

In libvhost-user, the incoming postcopy migration path for setting the
backend's memory tables has become convolued. In particular, moving the
logic which starts generating faults, having received the final ACK from
qemu can be moved to a separate function. This simplifies the code
substantially.

This logic will also be needed by the postcopy path once the
VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS feature is supported.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
---
 contrib/libvhost-user/libvhost-user.c | 147 ++++++++++++++++++----------------
 1 file changed, 79 insertions(+), 68 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 3bca996..cccfa22 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -584,6 +584,84 @@ map_ring(VuDev *dev, VuVirtq *vq)
 }
 
 static bool
+generate_faults(VuDev *dev) {
+    int i;
+    for (i = 0; i < dev->nregions; i++) {
+        VuDevRegion *dev_region = &dev->regions[i];
+        int ret;
+#ifdef UFFDIO_REGISTER
+        /*
+         * We should already have an open ufd. Mark each memory
+         * range as ufd.
+         * Discard any mapping we have here; note I can't use MADV_REMOVE
+         * or fallocate to make the hole since I don't want to lose
+         * data that's already arrived in the shared process.
+         * TODO: How to do hugepage
+         */
+        ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
+                      dev_region->size + dev_region->mmap_offset,
+                      MADV_DONTNEED);
+        if (ret) {
+            fprintf(stderr,
+                    "%s: Failed to madvise(DONTNEED) region %d: %s\n",
+                    __func__, i, strerror(errno));
+        }
+        /*
+         * Turn off transparent hugepages so we dont get lose wakeups
+         * in neighbouring pages.
+         * TODO: Turn this backon later.
+         */
+        ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
+                      dev_region->size + dev_region->mmap_offset,
+                      MADV_NOHUGEPAGE);
+        if (ret) {
+            /*
+             * Note: This can happen legally on kernels that are configured
+             * without madvise'able hugepages
+             */
+            fprintf(stderr,
+                    "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
+                    __func__, i, strerror(errno));
+        }
+        struct uffdio_register reg_struct;
+        reg_struct.range.start = (uintptr_t)dev_region->mmap_addr;
+        reg_struct.range.len = dev_region->size + dev_region->mmap_offset;
+        reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+        if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, &reg_struct)) {
+            vu_panic(dev, "%s: Failed to userfault region %d "
+                          "@%p + size:%zx offset: %zx: (ufd=%d)%s\n",
+                     __func__, i,
+                     dev_region->mmap_addr,
+                     dev_region->size, dev_region->mmap_offset,
+                     dev->postcopy_ufd, strerror(errno));
+            return false;
+        }
+        if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
+            vu_panic(dev, "%s Region (%d) doesn't support COPY",
+                     __func__, i);
+            return false;
+        }
+        DPRINT("%s: region %d: Registered userfault for %"
+               PRIx64 " + %" PRIx64 "\n", __func__, i,
+               (uint64_t)reg_struct.range.start,
+               (uint64_t)reg_struct.range.len);
+        /* Now it's registered we can let the client at it */
+        if (mprotect((void *)(uintptr_t)dev_region->mmap_addr,
+                     dev_region->size + dev_region->mmap_offset,
+                     PROT_READ | PROT_WRITE)) {
+            vu_panic(dev, "failed to mprotect region %d for postcopy (%s)",
+                     i, strerror(errno));
+            return false;
+        }
+        /* TODO: Stash 'zero' support flags somewhere */
+#endif
+    }
+
+    return true;
+}
+
+static bool
 vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
 {
     int i;
@@ -655,74 +733,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
     }
 
     /* OK, now we can go and register the memory and generate faults */
-    for (i = 0; i < dev->nregions; i++) {
-        VuDevRegion *dev_region = &dev->regions[i];
-        int ret;
-#ifdef UFFDIO_REGISTER
-        /* We should already have an open ufd. Mark each memory
-         * range as ufd.
-         * Discard any mapping we have here; note I can't use MADV_REMOVE
-         * or fallocate to make the hole since I don't want to lose
-         * data that's already arrived in the shared process.
-         * TODO: How to do hugepage
-         */
-        ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
-                      dev_region->size + dev_region->mmap_offset,
-                      MADV_DONTNEED);
-        if (ret) {
-            fprintf(stderr,
-                    "%s: Failed to madvise(DONTNEED) region %d: %s\n",
-                    __func__, i, strerror(errno));
-        }
-        /* Turn off transparent hugepages so we dont get lose wakeups
-         * in neighbouring pages.
-         * TODO: Turn this backon later.
-         */
-        ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
-                      dev_region->size + dev_region->mmap_offset,
-                      MADV_NOHUGEPAGE);
-        if (ret) {
-            /* Note: This can happen legally on kernels that are configured
-             * without madvise'able hugepages
-             */
-            fprintf(stderr,
-                    "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
-                    __func__, i, strerror(errno));
-        }
-        struct uffdio_register reg_struct;
-        reg_struct.range.start = (uintptr_t)dev_region->mmap_addr;
-        reg_struct.range.len = dev_region->size + dev_region->mmap_offset;
-        reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
-
-        if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, &reg_struct)) {
-            vu_panic(dev, "%s: Failed to userfault region %d "
-                          "@%p + size:%zx offset: %zx: (ufd=%d)%s\n",
-                     __func__, i,
-                     dev_region->mmap_addr,
-                     dev_region->size, dev_region->mmap_offset,
-                     dev->postcopy_ufd, strerror(errno));
-            return false;
-        }
-        if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
-            vu_panic(dev, "%s Region (%d) doesn't support COPY",
-                     __func__, i);
-            return false;
-        }
-        DPRINT("%s: region %d: Registered userfault for %"
-               PRIx64 " + %" PRIx64 "\n", __func__, i,
-               (uint64_t)reg_struct.range.start,
-               (uint64_t)reg_struct.range.len);
-        /* Now it's registered we can let the client at it */
-        if (mprotect((void *)(uintptr_t)dev_region->mmap_addr,
-                     dev_region->size + dev_region->mmap_offset,
-                     PROT_READ | PROT_WRITE)) {
-            vu_panic(dev, "failed to mprotect region %d for postcopy (%s)",
-                     i, strerror(errno));
-            return false;
-        }
-        /* TODO: Stash 'zero' support flags somewhere */
-#endif
-    }
+    (void)generate_faults(dev);
 
     return false;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 07/10] Support ram slot configuration in libvhost-user
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (5 preceding siblings ...)
  2020-05-21  5:00 ` [PATCH v4 06/10] Refactor out libvhost-user fault generation logic Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-06-04 14:49   ` Marc-André Lureau
  2020-05-21  5:00 ` [PATCH v4 08/10] Support adding individual regions " Raphael Norwitz
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

The VHOST_USER_GET_MAX_MEM_SLOTS message allows a vhost-user backend to
specify a maximum number of ram slots it is willing to support. This
change adds support for libvhost-user to process this message. For now
the backend will reply with 8 as the maximum number of regions
supported.

libvhost-user does not yet support the vhost-user protocol feature
VHOST_USER_PROTOCOL_F_CONFIGIRE_MEM_SLOTS, so qemu should never
send the VHOST_USER_GET_MAX_MEM_SLOTS message. Therefore this new
functionality is not currently used.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
---
 contrib/libvhost-user/libvhost-user.c | 19 +++++++++++++++++++
 contrib/libvhost-user/libvhost-user.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index cccfa22..9f039b7 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -137,6 +137,7 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_SET_INFLIGHT_FD),
         REQ(VHOST_USER_GPU_SET_SOCKET),
         REQ(VHOST_USER_VRING_KICK),
+        REQ(VHOST_USER_GET_MAX_MEM_SLOTS),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -1565,6 +1566,22 @@ vu_handle_vring_kick(VuDev *dev, VhostUserMsg *vmsg)
     return false;
 }
 
+static bool vu_handle_get_max_memslots(VuDev *dev, VhostUserMsg *vmsg)
+{
+    vmsg->flags = VHOST_USER_REPLY_MASK | VHOST_USER_VERSION;
+    vmsg->size  = sizeof(vmsg->payload.u64);
+    vmsg->payload.u64 = VHOST_MEMORY_MAX_NREGIONS;
+    vmsg->fd_num = 0;
+
+    if (!vu_message_write(dev, dev->sock, vmsg)) {
+        vu_panic(dev, "Failed to send max ram slots: %s\n", strerror(errno));
+    }
+
+    DPRINT("u64: 0x%016"PRIx64"\n", (uint64_t) VHOST_MEMORY_MAX_NREGIONS);
+
+    return false;
+}
+
 static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -1649,6 +1666,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_set_inflight_fd(dev, vmsg);
     case VHOST_USER_VRING_KICK:
         return vu_handle_vring_kick(dev, vmsg);
+    case VHOST_USER_GET_MAX_MEM_SLOTS:
+        return vu_handle_get_max_memslots(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index f30394f..88ef40d 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -97,6 +97,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_INFLIGHT_FD = 32,
     VHOST_USER_GPU_SET_SOCKET = 33,
     VHOST_USER_VRING_KICK = 35,
+    VHOST_USER_GET_MAX_MEM_SLOTS = 36,
     VHOST_USER_MAX
 } VhostUserRequest;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 08/10] Support adding individual regions in libvhost-user
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (6 preceding siblings ...)
  2020-05-21  5:00 ` [PATCH v4 07/10] Support ram slot configuration in libvhost-user Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-05-21  5:00 ` [PATCH v4 09/10] Support individual region unmap " Raphael Norwitz
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

When the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS is enabled, qemu will
transmit memory regions to a backend individually using the new message
VHOST_USER_ADD_MEM_REG. With this change vhost-user backends built with
libvhost-user can now map in new memory regions when VHOST_USER_ADD_MEM_REG
messages are received.

Qemu only sends VHOST_USER_ADD_MEM_REG messages when the
VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS feature is negotiated, and
since it is not yet supported in libvhost-user, this new functionality
is not yet used.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
---
 contrib/libvhost-user/libvhost-user.c | 103 ++++++++++++++++++++++++++++++++++
 contrib/libvhost-user/libvhost-user.h |   7 +++
 2 files changed, 110 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 9f039b7..d8ee7a2 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -138,6 +138,7 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_GPU_SET_SOCKET),
         REQ(VHOST_USER_VRING_KICK),
         REQ(VHOST_USER_GET_MAX_MEM_SLOTS),
+        REQ(VHOST_USER_ADD_MEM_REG),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -663,6 +664,106 @@ generate_faults(VuDev *dev) {
 }
 
 static bool
+vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
+    int i;
+    bool track_ramblocks = dev->postcopy_listening;
+    VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
+    VuDevRegion *dev_region = &dev->regions[dev->nregions];
+    void *mmap_addr;
+
+    /*
+     * If we are in postcopy mode and we receive a u64 payload with a 0 value
+     * we know all the postcopy client bases have been recieved, and we
+     * should start generating faults.
+     */
+    if (track_ramblocks &&
+        vmsg->size == sizeof(vmsg->payload.u64) &&
+        vmsg->payload.u64 == 0) {
+        (void)generate_faults(dev);
+        return false;
+    }
+
+    DPRINT("Adding region: %d\n", dev->nregions);
+    DPRINT("    guest_phys_addr: 0x%016"PRIx64"\n",
+           msg_region->guest_phys_addr);
+    DPRINT("    memory_size:     0x%016"PRIx64"\n",
+           msg_region->memory_size);
+    DPRINT("    userspace_addr   0x%016"PRIx64"\n",
+           msg_region->userspace_addr);
+    DPRINT("    mmap_offset      0x%016"PRIx64"\n",
+           msg_region->mmap_offset);
+
+    dev_region->gpa = msg_region->guest_phys_addr;
+    dev_region->size = msg_region->memory_size;
+    dev_region->qva = msg_region->userspace_addr;
+    dev_region->mmap_offset = msg_region->mmap_offset;
+
+    /*
+     * We don't use offset argument of mmap() since the
+     * mapped address has to be page aligned, and we use huge
+     * pages.
+     */
+    if (track_ramblocks) {
+        /*
+         * In postcopy we're using PROT_NONE here to catch anyone
+         * accessing it before we userfault.
+         */
+        mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
+                         PROT_NONE, MAP_SHARED,
+                         vmsg->fds[0], 0);
+    } else {
+        mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
+                         PROT_READ | PROT_WRITE, MAP_SHARED, vmsg->fds[0],
+                         0);
+    }
+
+    if (mmap_addr == MAP_FAILED) {
+        vu_panic(dev, "region mmap error: %s", strerror(errno));
+    } else {
+        dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
+        DPRINT("    mmap_addr:       0x%016"PRIx64"\n",
+               dev_region->mmap_addr);
+    }
+
+    close(vmsg->fds[0]);
+
+    if (track_ramblocks) {
+        /*
+         * Return the address to QEMU so that it can translate the ufd
+         * fault addresses back.
+         */
+        msg_region->userspace_addr = (uintptr_t)(mmap_addr +
+                                                 dev_region->mmap_offset);
+
+        /* Send the message back to qemu with the addresses filled in. */
+        vmsg->fd_num = 0;
+        if (!vu_send_reply(dev, dev->sock, vmsg)) {
+            vu_panic(dev, "failed to respond to add-mem-region for postcopy");
+            return false;
+        }
+
+        DPRINT("Successfully added new region in postcopy\n");
+        dev->nregions++;
+        return false;
+
+    } else {
+        for (i = 0; i < dev->max_queues; i++) {
+            if (dev->vq[i].vring.desc) {
+                if (map_ring(dev, &dev->vq[i])) {
+                    vu_panic(dev, "remapping queue %d for new memory region",
+                             i);
+                }
+            }
+        }
+
+        DPRINT("Successfully added new region\n");
+        dev->nregions++;
+        vmsg_set_reply_u64(vmsg, 0);
+        return true;
+    }
+}
+
+static bool
 vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
 {
     int i;
@@ -1668,6 +1769,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_handle_vring_kick(dev, vmsg);
     case VHOST_USER_GET_MAX_MEM_SLOTS:
         return vu_handle_get_max_memslots(dev, vmsg);
+    case VHOST_USER_ADD_MEM_REG:
+        return vu_add_mem_reg(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 88ef40d..60ef7fd 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -98,6 +98,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_GPU_SET_SOCKET = 33,
     VHOST_USER_VRING_KICK = 35,
     VHOST_USER_GET_MAX_MEM_SLOTS = 36,
+    VHOST_USER_ADD_MEM_REG = 37,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -124,6 +125,11 @@ typedef struct VhostUserMemory {
     VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
 } VhostUserMemory;
 
+typedef struct VhostUserMemRegMsg {
+    uint32_t padding;
+    VhostUserMemoryRegion region;
+} VhostUserMemRegMsg;
+
 typedef struct VhostUserLog {
     uint64_t mmap_size;
     uint64_t mmap_offset;
@@ -176,6 +182,7 @@ typedef struct VhostUserMsg {
         struct vhost_vring_state state;
         struct vhost_vring_addr addr;
         VhostUserMemory memory;
+        VhostUserMemRegMsg memreg;
         VhostUserLog log;
         VhostUserConfig config;
         VhostUserVringArea area;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 09/10] Support individual region unmap in libvhost-user
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (7 preceding siblings ...)
  2020-05-21  5:00 ` [PATCH v4 08/10] Support adding individual regions " Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-05-21  5:00 ` [PATCH v4 10/10] Lift max ram slots limit " Raphael Norwitz
  2020-06-04  4:11 ` [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
  10 siblings, 0 replies; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

When the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol feature is
enabled, on memory hot-unplug qemu will transmit memory regions to
remove individually using the new message VHOST_USER_REM_MEM_REG
message. With this change, vhost-user backends build with libvhost-user
can now unmap individual memory regions when receiving the
VHOST_USER_REM_MEM_REG message.

Qemu only sends VHOST_USER_REM_MEM_REG messages when the
VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS feature is negotiated, and
support for that feature has not yet been added in libvhost-user, this
new functionality is not yet used.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
---
 contrib/libvhost-user/libvhost-user.c | 63 +++++++++++++++++++++++++++++++++++
 contrib/libvhost-user/libvhost-user.h |  1 +
 2 files changed, 64 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index d8ee7a2..386449b 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -139,6 +139,7 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_VRING_KICK),
         REQ(VHOST_USER_GET_MAX_MEM_SLOTS),
         REQ(VHOST_USER_ADD_MEM_REG),
+        REQ(VHOST_USER_REM_MEM_REG),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -763,6 +764,66 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
     }
 }
 
+static inline bool reg_equal(VuDevRegion *vudev_reg,
+                             VhostUserMemoryRegion *msg_reg)
+{
+    if (vudev_reg->gpa == msg_reg->guest_phys_addr &&
+        vudev_reg->qva == msg_reg->userspace_addr &&
+        vudev_reg->size == msg_reg->memory_size) {
+        return true;
+    }
+
+    return false;
+}
+
+static bool
+vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
+    int i, j;
+    bool found = false;
+    VuDevRegion shadow_regions[VHOST_MEMORY_MAX_NREGIONS] = {};
+    VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
+
+    DPRINT("Removing region:\n");
+    DPRINT("    guest_phys_addr: 0x%016"PRIx64"\n",
+           msg_region->guest_phys_addr);
+    DPRINT("    memory_size:     0x%016"PRIx64"\n",
+           msg_region->memory_size);
+    DPRINT("    userspace_addr   0x%016"PRIx64"\n",
+           msg_region->userspace_addr);
+    DPRINT("    mmap_offset      0x%016"PRIx64"\n",
+           msg_region->mmap_offset);
+
+    for (i = 0, j = 0; i < dev->nregions; i++) {
+        if (!reg_equal(&dev->regions[i], msg_region)) {
+            shadow_regions[j].gpa = dev->regions[i].gpa;
+            shadow_regions[j].size = dev->regions[i].size;
+            shadow_regions[j].qva = dev->regions[i].qva;
+            shadow_regions[j].mmap_offset = dev->regions[i].mmap_offset;
+            j++;
+        } else {
+            found = true;
+            VuDevRegion *r = &dev->regions[i];
+            void *m = (void *) (uintptr_t) r->mmap_addr;
+
+            if (m) {
+                munmap(m, r->size + r->mmap_offset);
+            }
+        }
+    }
+
+    if (found) {
+        memcpy(dev->regions, shadow_regions,
+               sizeof(VuDevRegion) * VHOST_MEMORY_MAX_NREGIONS);
+        DPRINT("Successfully removed a region\n");
+        dev->nregions--;
+        vmsg_set_reply_u64(vmsg, 0);
+    } else {
+        vu_panic(dev, "Specified region not found\n");
+    }
+
+    return true;
+}
+
 static bool
 vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -1771,6 +1832,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_handle_get_max_memslots(dev, vmsg);
     case VHOST_USER_ADD_MEM_REG:
         return vu_add_mem_reg(dev, vmsg);
+    case VHOST_USER_REM_MEM_REG:
+        return vu_rem_mem_reg(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 60ef7fd..f843971 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -99,6 +99,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_VRING_KICK = 35,
     VHOST_USER_GET_MAX_MEM_SLOTS = 36,
     VHOST_USER_ADD_MEM_REG = 37,
+    VHOST_USER_REM_MEM_REG = 38,
     VHOST_USER_MAX
 } VhostUserRequest;
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 10/10] Lift max ram slots limit in libvhost-user
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (8 preceding siblings ...)
  2020-05-21  5:00 ` [PATCH v4 09/10] Support individual region unmap " Raphael Norwitz
@ 2020-05-21  5:00 ` Raphael Norwitz
  2020-06-04  4:11 ` [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
  10 siblings, 0 replies; 20+ messages in thread
From: Raphael Norwitz @ 2020-05-21  5:00 UTC (permalink / raw)
  To: qemu-devel, mst, marcandre.lureau
  Cc: raphael.s.norwitz, marcandre.lureau, Raphael Norwitz

Historically, VMs with vhost-user devices could hot-add memory a maximum
of 8 times. Now that the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
protocol feature has been added, VMs with vhost-user backends which
support this new feature can support a configurable number of ram slots
up to the maximum supported by the target platform.

This change adds VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS support for
backends built with libvhost-user, and increases the number of supported
ram slots from 8 to 32.

Memory hot-add, hot-remove and postcopy migration were tested with
the vhost-user-bridge sample.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
---
 contrib/libvhost-user/libvhost-user.c | 17 +++++++++--------
 contrib/libvhost-user/libvhost-user.h | 15 +++++++++++----
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 386449b..b1e6072 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -269,7 +269,7 @@ have_userfault(void)
 static bool
 vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
 {
-    char control[CMSG_SPACE(VHOST_MEMORY_MAX_NREGIONS * sizeof(int))] = { };
+    char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
     struct iovec iov = {
         .iov_base = (char *)vmsg,
         .iov_len = VHOST_USER_HDR_SIZE,
@@ -340,7 +340,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
 {
     int rc;
     uint8_t *p = (uint8_t *)vmsg;
-    char control[CMSG_SPACE(VHOST_MEMORY_MAX_NREGIONS * sizeof(int))] = { };
+    char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
     struct iovec iov = {
         .iov_base = (char *)vmsg,
         .iov_len = VHOST_USER_HDR_SIZE,
@@ -353,7 +353,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
     struct cmsghdr *cmsg;
 
     memset(control, 0, sizeof(control));
-    assert(vmsg->fd_num <= VHOST_MEMORY_MAX_NREGIONS);
+    assert(vmsg->fd_num <= VHOST_MEMORY_BASELINE_NREGIONS);
     if (vmsg->fd_num > 0) {
         size_t fdsize = vmsg->fd_num * sizeof(int);
         msg.msg_controllen = CMSG_SPACE(fdsize);
@@ -780,7 +780,7 @@ static bool
 vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
     int i, j;
     bool found = false;
-    VuDevRegion shadow_regions[VHOST_MEMORY_MAX_NREGIONS] = {};
+    VuDevRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS] = {};
     VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
 
     DPRINT("Removing region:\n");
@@ -813,7 +813,7 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 
     if (found) {
         memcpy(dev->regions, shadow_regions,
-               sizeof(VuDevRegion) * VHOST_MEMORY_MAX_NREGIONS);
+               sizeof(VuDevRegion) * VHOST_USER_MAX_RAM_SLOTS);
         DPRINT("Successfully removed a region\n");
         dev->nregions--;
         vmsg_set_reply_u64(vmsg, 0);
@@ -1394,7 +1394,8 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
                         1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ |
                         1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER |
                         1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD |
-                        1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK;
+                        1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK |
+                        1ULL << VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS;
 
     if (have_userfault()) {
         features |= 1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT;
@@ -1732,14 +1733,14 @@ static bool vu_handle_get_max_memslots(VuDev *dev, VhostUserMsg *vmsg)
 {
     vmsg->flags = VHOST_USER_REPLY_MASK | VHOST_USER_VERSION;
     vmsg->size  = sizeof(vmsg->payload.u64);
-    vmsg->payload.u64 = VHOST_MEMORY_MAX_NREGIONS;
+    vmsg->payload.u64 = VHOST_USER_MAX_RAM_SLOTS;
     vmsg->fd_num = 0;
 
     if (!vu_message_write(dev, dev->sock, vmsg)) {
         vu_panic(dev, "Failed to send max ram slots: %s\n", strerror(errno));
     }
 
-    DPRINT("u64: 0x%016"PRIx64"\n", (uint64_t) VHOST_MEMORY_MAX_NREGIONS);
+    DPRINT("u64: 0x%016"PRIx64"\n", (uint64_t) VHOST_USER_MAX_RAM_SLOTS);
 
     return false;
 }
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index f843971..844c37c 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -28,7 +28,13 @@
 
 #define VIRTQUEUE_MAX_SIZE 1024
 
-#define VHOST_MEMORY_MAX_NREGIONS 8
+#define VHOST_MEMORY_BASELINE_NREGIONS 8
+
+/*
+ * Set a reasonable maximum number of ram slots, which will be supported by
+ * any architecture.
+ */
+#define VHOST_USER_MAX_RAM_SLOTS 32
 
 typedef enum VhostSetConfigType {
     VHOST_SET_CONFIG_TYPE_MASTER = 0,
@@ -55,6 +61,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
     VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
     VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
+    VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -123,7 +130,7 @@ typedef struct VhostUserMemoryRegion {
 typedef struct VhostUserMemory {
     uint32_t nregions;
     uint32_t padding;
-    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+    VhostUserMemoryRegion regions[VHOST_MEMORY_BASELINE_NREGIONS];
 } VhostUserMemory;
 
 typedef struct VhostUserMemRegMsg {
@@ -190,7 +197,7 @@ typedef struct VhostUserMsg {
         VhostUserInflight inflight;
     } payload;
 
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_MEMORY_BASELINE_NREGIONS];
     int fd_num;
     uint8_t *data;
 } VU_PACKED VhostUserMsg;
@@ -368,7 +375,7 @@ typedef struct VuDevInflightInfo {
 struct VuDev {
     int sock;
     uint32_t nregions;
-    VuDevRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+    VuDevRegion regions[VHOST_USER_MAX_RAM_SLOTS];
     VuVirtq *vq;
     VuDevInflightInfo inflight_info;
     int log_call_fd;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation
  2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (9 preceding siblings ...)
  2020-05-21  5:00 ` [PATCH v4 10/10] Lift max ram slots limit " Raphael Norwitz
@ 2020-06-04  4:11 ` Raphael Norwitz
  10 siblings, 0 replies; 20+ messages in thread
From: Raphael Norwitz @ 2020-06-04  4:11 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: Marc-André Lureau, Marc-André Lureau, QEMU, Michael S. Tsirkin

ping

On Thu, May 21, 2020 at 1:00 AM Raphael Norwitz
<raphael.norwitz@nutanix.com> wrote:
>
> In QEMU today, a VM with a vhost-user device can hot add memory a
> maximum of 8 times. See these threads, among others:
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
>     https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html
>
> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html
>
> This series introduces a new protocol feature
> VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
> restriction on the maximum number RAM slots imposed by vhost-user.
>
> Without vhost-user, a Qemu VM can support 256 ram slots (for ACPI targets),
> or potentially more (the KVM max is 512). With each region, a file descriptor
> must be sent over the socket. If that many regions are sent in a single message
> there could be upwards of 256 file descriptors being opened in the backend process
> at once. Opening that many fds could easily push the process past the open fd limit,
> especially considering one backend process could have multiple vhost threads,
> exposing different devices to different Qemu instances. Therefore to safely lift the
> limit, transmitting regions should be split up over multiple messages.
>
> In addition, the VHOST_USER_SET_MEM_TABLE message was not reused because
> as the number of regions grows, the message becomes very large. In practice, such
> large messages caused problems (truncated messages) and in the past it seems
> the community has opted for smaller fixed size messages where possible. VRINGs,
> for example, are sent to the backend individually instead of in one massive
> message.
>
> The implementation details are explained in more detail in the commit
> messages, but at a high level the new protocol feature works as follows:
> - If the VHOST_USER_PROTCOL_F_CONFIGURE_MEM_SLOTS feature is enabled,
>   QEMU will send multiple VHOST_USER_ADD_MEM_REG and
>   VHOST_USER_REM_MEM_REG messages to map and unmap individual memory
>  regions instead of one large VHOST_USER_SET_MEM_TABLE message containing
>   all memory regions.
> - The vhost-user struct maintains a ’shadow state’ of memory regions
>   already sent to the guest. Each time vhost_user_set_mem_table is called,
>   the shadow state is compared with the new device state. A
>   VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
>   not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
>   for each region in the device state but not the shadow state. After
>   these messages have been sent, the shadow state will be updated to
>   reflect the new device state.
>
> The series consists of 10 changes:
> 1. Add helper to populate vhost-user message regions:
>     This change adds a helper to populate a VhostUserMemoryRegion from a
>     struct vhost_memory_region, which needs to be done in multiple places in
>     in this series.
>
> 2. Add vhost-user helper to get MemoryRegion data
>     This changes adds a helper to get a pointer to a MemoryRegion struct, along
>     with it's offset address and associated file descriptor. This helper is used to
>     simplify other vhost-user code paths and will be needed elsewhere in this
>     series.
>
> 3. Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>     This change adds the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>     protocol feature. At this point, if negotiated, the feature only allows the
>     backend to limit the number of max ram slots to a number less than
>     VHOST_MEMORY_MAX_NREGIONS = 8.
>
> 4. Transmit vhost-user memory regions individually
>     With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>     protocol feature is enabled, Qemu will send regions to the backend using
>     individual VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
>     messages.
>     The max number of ram slots supported is still limited to 8.
>
> 5. Lift max memory slots imposed by vhost-user
>     With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>     protocol feature is enabled, the backend can support a configurable number of
>     ram slots up to the maximum allowed by the target platform.
>
> 6. Refactor out libvhost-user fault generation logic
>     This cleanup moves some logic from vu_set_mem_table_exec_postcopy() to a
>     separate helper, which will be needed elsewhere.
>
> 7. Support ram slot configuration in libvhost-user
>    This change adds support for processing VHOST_USER_GET_MAX_MEMSLOTS
>     messages in libvhost-user.
>     The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
>     enabled in libvhost-user, so at this point this change is non-functional.
>
> 8. Support adding individual regions in libvhost-user
>     This change adds libvhost-user support for mapping in new memory regions
>     when receiving VHOST_USER_ADD_MEM_REG messages.
>     The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
>     enabled in libvhost-user, so at this point this change is non-functional.
>
> 9. Support individual region unmap in libvhost-user
>     This change adds libvhost-user support for unmapping removed memory regions
>     when receiving VHOST_USER_REM_MEM_REG messages.
>     The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
>     enabled in libvhost-user, so at this point this change is non-functional.
>
> 10. Lift max ram slots limit in libvhost-user
>    This change makes libvhost-user try to negotiate the
>    VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and adds support for
>    backends built using libvhost-user to support hot adding memory up to the
>    32 times.
>
> The changes were tested with the vhost-user-bridge sample.
>
> Changes since V3:
>     * Fixed compiler warnings caused by using pointers to packed elements
>        (flagged by patchew building with -Waddress-of-packed-member)
>
> Changes since V2:
>     * Add support for VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>        for backends build with libvhost-user
>     * Add support for postcopy live-migration when the
>        VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol feature has
>        been negotiated.
>     * Add support for backends which want to support both
>        VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS and
>        VHOST_USER_PROTOCOL_F_REPLY_ACK
>     * Change feature name from VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS
>         to VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and any associated
>         variable names.
>     *Log a more descriptive message if the backend lowers the max ram slots limit
>        on reconnect.
>
> Changes since V1:
>     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
>       to prevent corruption
>     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
>       startup and cache the returned value so that QEMU does not need to
>       query the backend every time vhost_backend_memslots_limit is called.
>
> Best,
> Raphael
>
> Raphael Norwitz (10):
>   Add helper to populate vhost-user message regions
>   Add vhost-user helper to get MemoryRegion data
>   Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>   Transmit vhost-user memory regions individually
>   Lift max memory slots limit imposed by vhost-user
>   Refactor out libvhost-user fault generation logic
>   Support ram slot configuration in libvhost-user
>   Support adding individual regions in libvhost-user
>   Support individual region unmap in libvhost-user
>   Lift max ram slots limit in libvhost-user
>
>  contrib/libvhost-user/libvhost-user.c | 341 ++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h |  24 +-
>  docs/interop/vhost-user.rst           |  44 +++
>  hw/virtio/vhost-user.c                | 638 ++++++++++++++++++++++++++++------
>  include/hw/virtio/vhost-user.h        |   1 +
>  5 files changed, 873 insertions(+), 175 deletions(-)
>
> --
> 1.8.3.1
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 01/10] Add helper to populate vhost-user message regions
  2020-05-21  5:00 ` [PATCH v4 01/10] Add helper to populate vhost-user message regions Raphael Norwitz
@ 2020-06-04 14:40   ` Marc-André Lureau
  0 siblings, 0 replies; 20+ messages in thread
From: Marc-André Lureau @ 2020-06-04 14:40 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: Raphael Norwitz, QEMU, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 2661 bytes --]

On Thu, May 21, 2020 at 7:00 AM Raphael Norwitz <raphael.norwitz@nutanix.com>
wrote:

> When setting vhost-user memory tables, memory region descriptors must be
> copied from the vhost_dev struct to the vhost-user message. To avoid
> duplicating code in setting the memory tables, we should use a helper to
> populate this field. This change adds this helper.
>
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

---
>  hw/virtio/vhost-user.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index ec21e8f..2e0552d 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -407,6 +407,15 @@ static int vhost_user_set_log_base(struct vhost_dev
> *dev, uint64_t base,
>      return 0;
>  }
>
> +static void vhost_user_fill_msg_region(VhostUserMemoryRegion *dst,
> +                                       struct vhost_memory_region *src)
> +{
> +    assert(src != NULL && dst != NULL);
> +    dst->userspace_addr = src->userspace_addr;
> +    dst->memory_size = src->memory_size;
> +    dst->guest_phys_addr = src->guest_phys_addr;
> +}
> +
>  static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
>                                               struct vhost_dev *dev,
>                                               VhostUserMsg *msg,
> @@ -417,6 +426,7 @@ static int vhost_user_fill_set_mem_table_msg(struct
> vhost_user *u,
>      ram_addr_t offset;
>      MemoryRegion *mr;
>      struct vhost_memory_region *reg;
> +    VhostUserMemoryRegion region_buffer;
>
>      msg->hdr.request = VHOST_USER_SET_MEM_TABLE;
>
> @@ -441,12 +451,8 @@ static int vhost_user_fill_set_mem_table_msg(struct
> vhost_user *u,
>                  error_report("Failed preparing vhost-user memory table
> msg");
>                  return -1;
>              }
> -            msg->payload.memory.regions[*fd_num].userspace_addr =
> -                reg->userspace_addr;
> -            msg->payload.memory.regions[*fd_num].memory_size =
> -                reg->memory_size;
> -            msg->payload.memory.regions[*fd_num].guest_phys_addr =
> -                reg->guest_phys_addr;
> +            vhost_user_fill_msg_region(&region_buffer, reg);
> +            msg->payload.memory.regions[*fd_num] = region_buffer;
>              msg->payload.memory.regions[*fd_num].mmap_offset = offset;
>              fds[(*fd_num)++] = fd;
>          } else if (track_ramblocks) {
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 3698 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 02/10] Add vhost-user helper to get MemoryRegion data
  2020-05-21  5:00 ` [PATCH v4 02/10] Add vhost-user helper to get MemoryRegion data Raphael Norwitz
@ 2020-06-04 14:41   ` Marc-André Lureau
  0 siblings, 0 replies; 20+ messages in thread
From: Marc-André Lureau @ 2020-06-04 14:41 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: Raphael Norwitz, QEMU, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 2961 bytes --]

Hi

On Thu, May 21, 2020 at 7:00 AM Raphael Norwitz <raphael.norwitz@nutanix.com>
wrote:

> When setting the memory tables, qemu uses a memory region's userspace
> address to look up the region's MemoryRegion struct. Among other things,
> the MemoryRegion contains the region's offset and associated file
> descriptor, all of which need to be sent to the backend.
>
> With VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, this logic will be
> needed in multiple places, so before feature support is added it
> should be moved to a helper function.
>
> This helper is also used to simplify the vhost_user_can_merge()
> function.
>
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

---
>  hw/virtio/vhost-user.c | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 2e0552d..442b0d6 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -407,6 +407,18 @@ static int vhost_user_set_log_base(struct vhost_dev
> *dev, uint64_t base,
>      return 0;
>  }
>
> +static MemoryRegion *vhost_user_get_mr_data(uint64_t addr, ram_addr_t
> *offset,
> +                                            int *fd)
> +{
> +    MemoryRegion *mr;
> +
> +    assert((uintptr_t)addr == addr);
> +    mr = memory_region_from_host((void *)(uintptr_t)addr, offset);
> +    *fd = memory_region_get_fd(mr);
> +
> +    return mr;
> +}
> +
>  static void vhost_user_fill_msg_region(VhostUserMemoryRegion *dst,
>                                         struct vhost_memory_region *src)
>  {
> @@ -433,10 +445,7 @@ static int vhost_user_fill_set_mem_table_msg(struct
> vhost_user *u,
>      for (i = 0; i < dev->mem->nregions; ++i) {
>          reg = dev->mem->regions + i;
>
> -        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> -        mr = memory_region_from_host((void
> *)(uintptr_t)reg->userspace_addr,
> -                                     &offset);
> -        fd = memory_region_get_fd(mr);
> +        mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
>          if (fd > 0) {
>              if (track_ramblocks) {
>                  assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
> @@ -1551,13 +1560,9 @@ static bool vhost_user_can_merge(struct vhost_dev
> *dev,
>  {
>      ram_addr_t offset;
>      int mfd, rfd;
> -    MemoryRegion *mr;
> -
> -    mr = memory_region_from_host((void *)(uintptr_t)start1, &offset);
> -    mfd = memory_region_get_fd(mr);
>
> -    mr = memory_region_from_host((void *)(uintptr_t)start2, &offset);
> -    rfd = memory_region_get_fd(mr);
> +    (void)vhost_user_get_mr_data(start1, &offset, &mfd);
> +    (void)vhost_user_get_mr_data(start2, &offset, &rfd);
>
>      return mfd == rfd;
>  }
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 3994 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 03/10] Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
  2020-05-21  5:00 ` [PATCH v4 03/10] Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS Raphael Norwitz
@ 2020-06-04 14:42   ` Marc-André Lureau
  0 siblings, 0 replies; 20+ messages in thread
From: Marc-André Lureau @ 2020-06-04 14:42 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: Peter Turschmid, Raphael Norwitz, QEMU, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 6793 bytes --]

Hi

On Thu, May 21, 2020 at 7:00 AM Raphael Norwitz <raphael.norwitz@nutanix.com>
wrote:

> This change introduces a new feature to the vhost-user protocol allowing
> a backend device to specify the maximum number of ram slots it supports.
>
> At this point, the value returned by the backend will be capped at the
> maximum number of ram slots which can be supported by vhost-user, which
> is currently set to 8 because of underlying protocol limitations.
>
> The returned value will be stored inside the VhostUserState struct so
> that on device reconnect we can verify that the ram slot limitation
> has not decreased since the last time the device connected.
>
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

---
>  docs/interop/vhost-user.rst    | 16 ++++++++++++++
>  hw/virtio/vhost-user.c         | 49
> ++++++++++++++++++++++++++++++++++++++++--
>  include/hw/virtio/vhost-user.h |  1 +
>  3 files changed, 64 insertions(+), 2 deletions(-)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 3b1b660..b3cf5c3 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -815,6 +815,7 @@ Protocol features
>    #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12
>    #define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13
>    #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
> +  #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
>
>  Master message types
>  --------------------
> @@ -1263,6 +1264,21 @@ Master message types
>
>    The state.num field is currently reserved and must be set to 0.
>
> +``VHOST_USER_GET_MAX_MEM_SLOTS``
> +  :id: 36
> +  :equivalent ioctl: N/A
> +  :slave payload: u64
> +
> +  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
> +  feature has been successfully negotiated, this message is submitted
> +  by master to the slave. The slave should return the message with a
> +  u64 payload containing the maximum number of memory slots for
> +  QEMU to expose to the guest. At this point, the value returned
> +  by the backend will be capped at the maximum number of ram slots
> +  which can be supported by vhost-user. Currently that limit is set
> +  at VHOST_USER_MAX_RAM_SLOTS = 8 because of underlying protocol
> +  limitations.
> +
>  Slave message types
>  -------------------
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 442b0d6..0af593f 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -59,6 +59,8 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
>      VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
>      VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
> +    /* Feature 14 reserved for
> VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
> +    VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>
> @@ -100,6 +102,8 @@ typedef enum VhostUserRequest {
>      VHOST_USER_SET_INFLIGHT_FD = 32,
>      VHOST_USER_GPU_SET_SOCKET = 33,
>      VHOST_USER_RESET_DEVICE = 34,
> +    /* Message number 35 reserved for VHOST_USER_VRING_KICK. */
> +    VHOST_USER_GET_MAX_MEM_SLOTS = 36,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>
> @@ -895,6 +899,23 @@ static int vhost_user_set_owner(struct vhost_dev *dev)
>      return 0;
>  }
>
> +static int vhost_user_get_max_memslots(struct vhost_dev *dev,
> +                                       uint64_t *max_memslots)
> +{
> +    uint64_t backend_max_memslots;
> +    int err;
> +
> +    err = vhost_user_get_u64(dev, VHOST_USER_GET_MAX_MEM_SLOTS,
> +                             &backend_max_memslots);
> +    if (err < 0) {
> +        return err;
> +    }
> +
> +    *max_memslots = backend_max_memslots;
> +
> +    return 0;
> +}
> +
>  static int vhost_user_reset_device(struct vhost_dev *dev)
>  {
>      VhostUserMsg msg = {
> @@ -1392,7 +1413,7 @@ static int
> vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>
>  static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
>  {
> -    uint64_t features, protocol_features;
> +    uint64_t features, protocol_features, ram_slots;
>      struct vhost_user *u;
>      int err;
>
> @@ -1454,6 +1475,27 @@ static int vhost_user_backend_init(struct vhost_dev
> *dev, void *opaque)
>                           "slave-req protocol features.");
>              return -1;
>          }
> +
> +        /* get max memory regions if backend supports configurable RAM
> slots */
> +        if (!virtio_has_feature(dev->protocol_features,
> +
> VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS)) {
> +            u->user->memory_slots = VHOST_MEMORY_MAX_NREGIONS;
> +        } else {
> +            err = vhost_user_get_max_memslots(dev, &ram_slots);
> +            if (err < 0) {
> +                return err;
> +            }
> +
> +            if (ram_slots < u->user->memory_slots) {
> +                error_report("The backend specified a max ram slots limit
> "
> +                             "of %lu, when the prior validated limit was
> %d. "
> +                             "This limit should never decrease.",
> ram_slots,
> +                             u->user->memory_slots);
> +                return -1;
> +            }
> +
> +            u->user->memory_slots = MIN(ram_slots,
> VHOST_MEMORY_MAX_NREGIONS);
> +        }
>      }
>
>      if (dev->migration_blocker == NULL &&
> @@ -1519,7 +1561,9 @@ static int vhost_user_get_vq_index(struct vhost_dev
> *dev, int idx)
>
>  static int vhost_user_memslots_limit(struct vhost_dev *dev)
>  {
> -    return VHOST_MEMORY_MAX_NREGIONS;
> +    struct vhost_user *u = dev->opaque;
> +
> +    return u->user->memory_slots;
>  }
>
>  static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
> @@ -1904,6 +1948,7 @@ bool vhost_user_init(VhostUserState *user,
> CharBackend *chr, Error **errp)
>          return false;
>      }
>      user->chr = chr;
> +    user->memory_slots = 0;
>      return true;
>  }
>
> diff --git a/include/hw/virtio/vhost-user.h
> b/include/hw/virtio/vhost-user.h
> index 811e325..a9abca3 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -20,6 +20,7 @@ typedef struct VhostUserHostNotifier {
>  typedef struct VhostUserState {
>      CharBackend *chr;
>      VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> +    int memory_slots;
>  } VhostUserState;
>
>  bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error
> **errp);
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 8344 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 04/10] Transmit vhost-user memory regions individually
  2020-05-21  5:00 ` [PATCH v4 04/10] Transmit vhost-user memory regions individually Raphael Norwitz
@ 2020-06-04 14:44   ` Marc-André Lureau
  2020-06-09 14:13     ` Raphael Norwitz
  0 siblings, 1 reply; 20+ messages in thread
From: Marc-André Lureau @ 2020-06-04 14:44 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: Peter Turschmid, Raphael Norwitz, QEMU, Swapnil Ingle,
	Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 27023 bytes --]

Hi

On Thu, May 21, 2020 at 7:00 AM Raphael Norwitz <raphael.norwitz@nutanix.com>
wrote:

> With this change, when the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
> protocol feature has been negotiated, Qemu no longer sends the backend
> all the memory regions in a single message. Rather, when the memory
> tables are set or updated, a series of VHOST_USER_ADD_MEM_REG and
> VHOST_USER_REM_MEM_REG messages are sent to transmit the regions to map
> and/or unmap instead of sending send all the regions in one fixed size
> VHOST_USER_SET_MEM_TABLE message.
>
> The vhost_user struct maintains a shadow state of the VM’s memory
> regions. When the memory tables are modified, the
> vhost_user_set_mem_table() function compares the new device memory state
> to the shadow state and only sends regions which need to be unmapped or
> mapped in. The regions which must be unmapped are sent first, followed
> by the new regions to be mapped in. After all the messages have been
> sent, the shadow state is set to the current virtual device state.
>
> Existing backends which do not support
> VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS are unaffected.
>
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> Suggested-by: Mike Cui <cui@nutanix.com>
>

The change is a bit tricky, why not add more pre-condition/post-condition
checks? This could really help in case we missed some OOB conditions.

I would also use longer names: rem->remove, reg->registry, etc.. I think
they improve readability.

Nonetheless, it looks ok and apparently 4 people already looked at it, so
for now:
Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>

---
>  docs/interop/vhost-user.rst |  33 ++-
>  hw/virtio/vhost-user.c      | 510
> +++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 469 insertions(+), 74 deletions(-)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index b3cf5c3..037eefa 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1276,8 +1276,37 @@ Master message types
>    QEMU to expose to the guest. At this point, the value returned
>    by the backend will be capped at the maximum number of ram slots
>    which can be supported by vhost-user. Currently that limit is set
> -  at VHOST_USER_MAX_RAM_SLOTS = 8 because of underlying protocol
> -  limitations.
> +  at VHOST_USER_MAX_RAM_SLOTS = 8.
> +
> +``VHOST_USER_ADD_MEM_REG``
> +  :id: 37
> +  :equivalent ioctl: N/A
> +  :slave payload: memory region
> +
> +  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
> +  feature has been successfully negotiated, this message is submitted
> +  by the master to the slave. The message payload contains a memory
> +  region descriptor struct, describing a region of guest memory which
> +  the slave device must map in. When the
> +  ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
> +  been successfully negotiated, along with the
> +  ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and
> +  update the memory tables of the slave device.
> +
> +``VHOST_USER_REM_MEM_REG``
> +  :id: 38
> +  :equivalent ioctl: N/A
> +  :slave payload: memory region
> +
> +  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
> +  feature has been successfully negotiated, this message is submitted
> +  by the master to the slave. The message payload contains a memory
> +  region descriptor struct, describing a region of guest memory which
> +  the slave device must unmap. When the
> +  ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
> +  been successfully negotiated, along with the
> +  ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and
> +  update the memory tables of the slave device.
>
>  Slave message types
>  -------------------
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 0af593f..9358406 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -104,6 +104,8 @@ typedef enum VhostUserRequest {
>      VHOST_USER_RESET_DEVICE = 34,
>      /* Message number 35 reserved for VHOST_USER_VRING_KICK. */
>      VHOST_USER_GET_MAX_MEM_SLOTS = 36,
> +    VHOST_USER_ADD_MEM_REG = 37,
> +    VHOST_USER_REM_MEM_REG = 38,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>
> @@ -128,6 +130,11 @@ typedef struct VhostUserMemory {
>      VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
>  } VhostUserMemory;
>
> +typedef struct VhostUserMemRegMsg {
> +    uint32_t padding;
> +    VhostUserMemoryRegion region;
> +} VhostUserMemRegMsg;
> +
>  typedef struct VhostUserLog {
>      uint64_t mmap_size;
>      uint64_t mmap_offset;
> @@ -186,6 +193,7 @@ typedef union {
>          struct vhost_vring_state state;
>          struct vhost_vring_addr addr;
>          VhostUserMemory memory;
> +        VhostUserMemRegMsg mem_reg;
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
> @@ -226,6 +234,16 @@ struct vhost_user {
>
>      /* True once we've entered postcopy_listen */
>      bool               postcopy_listen;
> +
> +    /* Our current regions */
> +    int num_shadow_regions;
> +    struct vhost_memory_region shadow_regions[VHOST_MEMORY_MAX_NREGIONS];
> +};
> +
> +struct scrub_regions {
> +    struct vhost_memory_region *region;
> +    int reg_idx;
> +    int fd_idx;
>  };
>
>  static bool ioeventfd_enabled(void)
> @@ -489,8 +507,332 @@ static int vhost_user_fill_set_mem_table_msg(struct
> vhost_user *u,
>      return 1;
>  }
>
> +static inline bool reg_equal(struct vhost_memory_region *shadow_reg,
> +                             struct vhost_memory_region *vdev_reg)
> +{
> +    return shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr &&
> +        shadow_reg->userspace_addr == vdev_reg->userspace_addr &&
> +        shadow_reg->memory_size == vdev_reg->memory_size;
> +}
> +
> +static void scrub_shadow_regions(struct vhost_dev *dev,
> +                                 struct scrub_regions *add_reg,
> +                                 int *nr_add_reg,
> +                                 struct scrub_regions *rem_reg,
> +                                 int *nr_rem_reg, uint64_t *shadow_pcb,
> +                                 bool track_ramblocks)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    bool found[VHOST_MEMORY_MAX_NREGIONS] = {};
> +    struct vhost_memory_region *reg, *shadow_reg;
> +    int i, j, fd, add_idx = 0, rm_idx = 0, fd_num = 0;
> +    ram_addr_t offset;
> +    MemoryRegion *mr;
> +    bool matching;
> +
> +    /*
> +     * Find memory regions present in our shadow state which are not in
> +     * the device's current memory state.
> +     *
> +     * Mark regions in both the shadow and device state as "found".
> +     */
> +    for (i = 0; i < u->num_shadow_regions; i++) {
> +        shadow_reg = &u->shadow_regions[i];
> +        matching = false;
> +
> +        for (j = 0; j < dev->mem->nregions; j++) {
> +            reg = &dev->mem->regions[j];
> +
> +            mr = vhost_user_get_mr_data(reg->userspace_addr, &offset,
> &fd);
> +
> +            if (reg_equal(shadow_reg, reg)) {
> +                matching = true;
> +                found[j] = true;
> +                if (track_ramblocks) {
> +                    /*
> +                     * Reset postcopy client bases, region_rb, and
> +                     * region_rb_offset in case regions are removed.
> +                     */
> +                    if (fd > 0) {
> +                        u->region_rb_offset[j] = offset;
> +                        u->region_rb[j] = mr->ram_block;
> +                        shadow_pcb[j] = u->postcopy_client_bases[i];
> +                    } else {
> +                        u->region_rb_offset[j] = 0;
> +                        u->region_rb[j] = NULL;
> +                    }
> +                }
> +                break;
> +            }
> +        }
> +
> +        /*
> +         * If the region was not found in the current device memory state
> +         * create an entry for it in the removed list.
> +         */
> +        if (!matching) {
> +            rem_reg[rm_idx].region = shadow_reg;
> +            rem_reg[rm_idx++].reg_idx = i;
> +        }
> +    }
> +
> +    /*
> +     * For regions not marked "found", create entries in the added list.
> +     *
> +     * Note their indexes in the device memory state and the indexes of
> their
> +     * file descriptors.
> +     */
> +    for (i = 0; i < dev->mem->nregions; i++) {
> +        reg = &dev->mem->regions[i];
> +        mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
> +        if (fd > 0) {
> +            ++fd_num;
> +        }
> +
> +        /*
> +         * If the region was in both the shadow and device state we don't
> +         * need to send a VHOST_USER_ADD_MEM_REG message for it.
> +         */
> +        if (found[i]) {
> +            continue;
> +        }
> +
> +        add_reg[add_idx].region = reg;
> +        add_reg[add_idx].reg_idx = i;
> +        add_reg[add_idx++].fd_idx = fd_num;
> +    }
> +    *nr_rem_reg = rm_idx;
> +    *nr_add_reg = add_idx;
> +
> +    return;
> +}
> +
> +static int send_remove_regions(struct vhost_dev *dev,
> +                               struct scrub_regions *remove_reg,
> +                               int nr_rem_reg, VhostUserMsg *msg,
> +                               bool reply_supported)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    struct vhost_memory_region *shadow_reg;
> +    int i, fd, shadow_reg_idx, ret;
> +    ram_addr_t offset;
> +    VhostUserMemoryRegion region_buffer;
> +
> +    /*
> +     * The regions in remove_reg appear in the same order they do in the
> +     * shadow table. Therefore we can minimize memory copies by iterating
> +     * through remove_reg backwards.
> +     */
> +    for (i = nr_rem_reg - 1; i >= 0; i--) {
> +        shadow_reg = remove_reg[i].region;
> +        shadow_reg_idx = remove_reg[i].reg_idx;
> +
> +        vhost_user_get_mr_data(shadow_reg->userspace_addr, &offset, &fd);
> +
> +        if (fd > 0) {
> +            msg->hdr.request = VHOST_USER_REM_MEM_REG;
> +            vhost_user_fill_msg_region(&region_buffer, shadow_reg);
> +            msg->payload.mem_reg.region = region_buffer;
> +
> +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> +                return -1;
> +            }
> +
> +            if (reply_supported) {
> +                ret = process_message_reply(dev, msg);
> +                if (ret) {
> +                    return ret;
> +                }
> +            }
> +        }
> +
> +        /*
> +         * At this point we know the backend has unmapped the region. It
> is now
> +         * safe to remove it from the shadow table.
> +         */
> +        memmove(&u->shadow_regions[shadow_reg_idx],
> +                &u->shadow_regions[shadow_reg_idx + 1],
> +                sizeof(struct vhost_memory_region) *
> +                (u->num_shadow_regions - shadow_reg_idx));
> +        u->num_shadow_regions--;
> +    }
> +
> +    return 0;
> +}
> +
> +static int send_add_regions(struct vhost_dev *dev,
> +                            struct scrub_regions *add_reg, int nr_add_reg,
> +                            VhostUserMsg *msg, uint64_t *shadow_pcb,
> +                            bool reply_supported, bool track_ramblocks)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    int i, fd, ret, reg_idx, reg_fd_idx;
> +    struct vhost_memory_region *reg;
> +    MemoryRegion *mr;
> +    ram_addr_t offset;
> +    VhostUserMsg msg_reply;
> +    VhostUserMemoryRegion region_buffer;
> +
> +    for (i = 0; i < nr_add_reg; i++) {
> +        reg = add_reg[i].region;
> +        reg_idx = add_reg[i].reg_idx;
> +        reg_fd_idx = add_reg[i].fd_idx;
> +
> +        mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
> +
> +        if (fd > 0) {
> +            if (track_ramblocks) {
> +                trace_vhost_user_set_mem_table_withfd(reg_fd_idx,
> mr->name,
> +                                                      reg->memory_size,
> +
> reg->guest_phys_addr,
> +                                                      reg->userspace_addr,
> +                                                      offset);
> +                u->region_rb_offset[reg_idx] = offset;
> +                u->region_rb[reg_idx] = mr->ram_block;
> +            }
> +            msg->hdr.request = VHOST_USER_ADD_MEM_REG;
> +            vhost_user_fill_msg_region(&region_buffer, reg);
> +            msg->payload.mem_reg.region = region_buffer;
> +            msg->payload.mem_reg.region.mmap_offset = offset;
> +
> +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> +                return -1;
> +            }
> +
> +            if (track_ramblocks) {
> +                uint64_t reply_gpa;
> +
> +                if (vhost_user_read(dev, &msg_reply) < 0) {
> +                    return -1;
> +                }
> +
> +                reply_gpa =
> msg_reply.payload.mem_reg.region.guest_phys_addr;
> +
> +                if (msg_reply.hdr.request != VHOST_USER_ADD_MEM_REG) {
> +                    error_report("%s: Received unexpected msg type."
> +                                 "Expected %d received %d", __func__,
> +                                 VHOST_USER_ADD_MEM_REG,
> +                                 msg_reply.hdr.request);
> +                    return -1;
> +                }
> +
> +                /*
> +                 * We're using the same structure, just reusing one of the
> +                 * fields, so it should be the same size.
> +                 */
> +                if (msg_reply.hdr.size != msg->hdr.size) {
> +                    error_report("%s: Unexpected size for postcopy reply "
> +                                 "%d vs %d", __func__, msg_reply.hdr.size,
> +                                 msg->hdr.size);
> +                    return -1;
> +                }
> +
> +                /* Get the postcopy client base from the backend's reply.
> */
> +                if (reply_gpa ==
> dev->mem->regions[reg_idx].guest_phys_addr) {
> +                    shadow_pcb[reg_idx] =
> +                        msg_reply.payload.mem_reg.region.userspace_addr;
> +                    trace_vhost_user_set_mem_table_postcopy(
> +                        msg_reply.payload.mem_reg.region.userspace_addr,
> +                        msg->payload.mem_reg.region.userspace_addr,
> +                        reg_fd_idx, reg_idx);
> +                } else {
> +                    error_report("%s: invalid postcopy reply for region. "
> +                                 "Got guest physical address %lX,
> expected "
> +                                 "%lX", __func__, reply_gpa,
> +
>  dev->mem->regions[reg_idx].guest_phys_addr);
> +                    return -1;
> +                }
> +            } else if (reply_supported) {
> +                ret = process_message_reply(dev, msg);
> +                if (ret) {
> +                    return ret;
> +                }
> +            }
> +        } else if (track_ramblocks) {
> +            u->region_rb_offset[reg_idx] = 0;
> +            u->region_rb[reg_idx] = NULL;
> +        }
> +
> +        /*
> +         * At this point, we know the backend has mapped in the new
> +         * region, if the region has a valid file descriptor.
> +         *
> +         * The region should now be added to the shadow table.
> +         */
> +        u->shadow_regions[u->num_shadow_regions].guest_phys_addr =
> +            reg->guest_phys_addr;
> +        u->shadow_regions[u->num_shadow_regions].userspace_addr =
> +            reg->userspace_addr;
> +        u->shadow_regions[u->num_shadow_regions].memory_size =
> +            reg->memory_size;
> +        u->num_shadow_regions++;
> +    }
> +
> +    return 0;
> +}
> +
> +static int vhost_user_add_remove_regions(struct vhost_dev *dev,
> +                                         VhostUserMsg *msg,
> +                                         bool reply_supported,
> +                                         bool track_ramblocks)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    struct scrub_regions add_reg[VHOST_MEMORY_MAX_NREGIONS];
> +    struct scrub_regions rem_reg[VHOST_MEMORY_MAX_NREGIONS];
> +    uint64_t shadow_pcb[VHOST_MEMORY_MAX_NREGIONS] = {};
> +    int nr_add_reg, nr_rem_reg;
> +
> +    msg->hdr.size = sizeof(msg->payload.mem_reg.padding) +
> +        sizeof(VhostUserMemoryRegion);
> +
> +    /* Find the regions which need to be removed or added. */
> +    scrub_shadow_regions(dev, add_reg, &nr_add_reg, rem_reg, &nr_rem_reg,
> +                         shadow_pcb, track_ramblocks);
> +
> +    if (nr_rem_reg && send_remove_regions(dev, rem_reg, nr_rem_reg, msg,
> +                reply_supported) < 0)
> +    {
> +        goto err;
> +    }
> +
> +    if (nr_add_reg && send_add_regions(dev, add_reg, nr_add_reg, msg,
> +                shadow_pcb, reply_supported, track_ramblocks) < 0)
> +    {
> +        goto err;
> +    }
> +
> +    if (track_ramblocks) {
> +        memcpy(u->postcopy_client_bases, shadow_pcb,
> +               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> +        /*
> +         * Now we've registered this with the postcopy code, we ack to the
> +         * client, because now we're in the position to be able to deal
> with
> +         * any faults it generates.
> +         */
> +        /* TODO: Use this for failure cases as well with a bad value. */
> +        msg->hdr.size = sizeof(msg->payload.u64);
> +        msg->payload.u64 = 0; /* OK */
> +
> +        if (vhost_user_write(dev, msg, NULL, 0) < 0) {
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +
> +err:
> +    if (track_ramblocks) {
> +        memcpy(u->postcopy_client_bases, shadow_pcb,
> +               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> +    }
> +
> +    return -1;
> +}
> +
>  static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
> -                                             struct vhost_memory *mem)
> +                                             struct vhost_memory *mem,
> +                                             bool reply_supported,
> +                                             bool config_mem_slots)
>  {
>      struct vhost_user *u = dev->opaque;
>      int fds[VHOST_MEMORY_MAX_NREGIONS];
> @@ -513,71 +855,84 @@ static int vhost_user_set_mem_table_postcopy(struct
> vhost_dev *dev,
>          u->region_rb_len = dev->mem->nregions;
>      }
>
> -    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> +    if (config_mem_slots) {
> +        if (vhost_user_add_remove_regions(dev, &msg, reply_supported,
>                                            true) < 0) {
> -        return -1;
> -    }
> +            return -1;
> +        }
> +    } else {
> +        if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> +                                              true) < 0) {
> +            return -1;
> +        }
>
> -    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> -        return -1;
> -    }
> +        if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> +            return -1;
> +        }
>
> -    if (vhost_user_read(dev, &msg_reply) < 0) {
> -        return -1;
> -    }
> +        if (vhost_user_read(dev, &msg_reply) < 0) {
> +            return -1;
> +        }
>
> -    if (msg_reply.hdr.request != VHOST_USER_SET_MEM_TABLE) {
> -        error_report("%s: Received unexpected msg type."
> -                     "Expected %d received %d", __func__,
> -                     VHOST_USER_SET_MEM_TABLE, msg_reply.hdr.request);
> -        return -1;
> -    }
> -    /* We're using the same structure, just reusing one of the
> -     * fields, so it should be the same size.
> -     */
> -    if (msg_reply.hdr.size != msg.hdr.size) {
> -        error_report("%s: Unexpected size for postcopy reply "
> -                     "%d vs %d", __func__, msg_reply.hdr.size,
> msg.hdr.size);
> -        return -1;
> -    }
> -
> -    memset(u->postcopy_client_bases, 0,
> -           sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> -
> -    /* They're in the same order as the regions that were sent
> -     * but some of the regions were skipped (above) if they
> -     * didn't have fd's
> -    */
> -    for (msg_i = 0, region_i = 0;
> -         region_i < dev->mem->nregions;
> -        region_i++) {
> -        if (msg_i < fd_num &&
> -            msg_reply.payload.memory.regions[msg_i].guest_phys_addr ==
> -            dev->mem->regions[region_i].guest_phys_addr) {
> -            u->postcopy_client_bases[region_i] =
> -                msg_reply.payload.memory.regions[msg_i].userspace_addr;
> -            trace_vhost_user_set_mem_table_postcopy(
> -                msg_reply.payload.memory.regions[msg_i].userspace_addr,
> -                msg.payload.memory.regions[msg_i].userspace_addr,
> -                msg_i, region_i);
> -            msg_i++;
> +        if (msg_reply.hdr.request != VHOST_USER_SET_MEM_TABLE) {
> +            error_report("%s: Received unexpected msg type."
> +                         "Expected %d received %d", __func__,
> +                         VHOST_USER_SET_MEM_TABLE, msg_reply.hdr.request);
> +            return -1;
> +        }
> +
> +        /*
> +         * We're using the same structure, just reusing one of the
> +         * fields, so it should be the same size.
> +         */
> +        if (msg_reply.hdr.size != msg.hdr.size) {
> +            error_report("%s: Unexpected size for postcopy reply "
> +                         "%d vs %d", __func__, msg_reply.hdr.size,
> +                         msg.hdr.size);
> +            return -1;
> +        }
> +
> +        memset(u->postcopy_client_bases, 0,
> +               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> +
> +        /*
> +         * They're in the same order as the regions that were sent
> +         * but some of the regions were skipped (above) if they
> +         * didn't have fd's
> +         */
> +        for (msg_i = 0, region_i = 0;
> +             region_i < dev->mem->nregions;
> +             region_i++) {
> +            if (msg_i < fd_num &&
> +                msg_reply.payload.memory.regions[msg_i].guest_phys_addr ==
> +                dev->mem->regions[region_i].guest_phys_addr) {
> +                u->postcopy_client_bases[region_i] =
> +
> msg_reply.payload.memory.regions[msg_i].userspace_addr;
> +                trace_vhost_user_set_mem_table_postcopy(
> +
> msg_reply.payload.memory.regions[msg_i].userspace_addr,
> +                    msg.payload.memory.regions[msg_i].userspace_addr,
> +                    msg_i, region_i);
> +                msg_i++;
> +            }
> +        }
> +        if (msg_i != fd_num) {
> +            error_report("%s: postcopy reply not fully consumed "
> +                         "%d vs %zd",
> +                         __func__, msg_i, fd_num);
> +            return -1;
> +        }
> +
> +        /*
> +         * Now we've registered this with the postcopy code, we ack to the
> +         * client, because now we're in the position to be able to deal
> +         * with any faults it generates.
> +         */
> +        /* TODO: Use this for failure cases as well with a bad value. */
> +        msg.hdr.size = sizeof(msg.payload.u64);
> +        msg.payload.u64 = 0; /* OK */
> +        if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +            return -1;
>          }
> -    }
> -    if (msg_i != fd_num) {
> -        error_report("%s: postcopy reply not fully consumed "
> -                     "%d vs %zd",
> -                     __func__, msg_i, fd_num);
> -        return -1;
> -    }
> -    /* Now we've registered this with the postcopy code, we ack to the
> client,
> -     * because now we're in the position to be able to deal with any
> faults
> -     * it generates.
> -     */
> -    /* TODO: Use this for failure cases as well with a bad value */
> -    msg.hdr.size = sizeof(msg.payload.u64);
> -    msg.payload.u64 = 0; /* OK */
> -    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> -        return -1;
>      }
>
>      return 0;
> @@ -592,12 +947,17 @@ static int vhost_user_set_mem_table(struct vhost_dev
> *dev,
>      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
>      bool reply_supported = virtio_has_feature(dev->protocol_features,
>
>  VHOST_USER_PROTOCOL_F_REPLY_ACK);
> +    bool config_mem_slots =
> +        virtio_has_feature(dev->protocol_features,
> +                           VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS);
>
>      if (do_postcopy) {
> -        /* Postcopy has enough differences that it's best done in it's own
> +        /*
> +         * Postcopy has enough differences that it's best done in it's own
>           * version
>           */
> -        return vhost_user_set_mem_table_postcopy(dev, mem);
> +        return vhost_user_set_mem_table_postcopy(dev, mem,
> reply_supported,
> +                                                 config_mem_slots);
>      }
>
>      VhostUserMsg msg = {
> @@ -608,17 +968,23 @@ static int vhost_user_set_mem_table(struct vhost_dev
> *dev,
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
>
> -    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> +    if (config_mem_slots) {
> +        if (vhost_user_add_remove_regions(dev, &msg, reply_supported,
>                                            false) < 0) {
> -        return -1;
> -    }
> -
> -    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> -        return -1;
> -    }
> +            return -1;
> +        }
> +    } else {
> +        if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> +                                              false) < 0) {
> +            return -1;
> +        }
> +        if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> +            return -1;
> +        }
>
> -    if (reply_supported) {
> -        return process_message_reply(dev, &msg);
> +        if (reply_supported) {
> +            return process_message_reply(dev, &msg);
> +        }
>      }
>
>      return 0;
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 33355 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 05/10] Lift max memory slots limit imposed by vhost-user
  2020-05-21  5:00 ` [PATCH v4 05/10] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
@ 2020-06-04 14:45   ` Marc-André Lureau
  0 siblings, 0 replies; 20+ messages in thread
From: Marc-André Lureau @ 2020-06-04 14:45 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: Peter Turschmid, Raphael Norwitz, QEMU, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 10515 bytes --]

On Thu, May 21, 2020 at 7:00 AM Raphael Norwitz <raphael.norwitz@nutanix.com>
wrote:

> Historically, sending all memory regions to vhost-user backends in a
> single message imposed a limitation on the number of times memory
> could be hot-added to a VM with a vhost-user device. Now that backends
> which support the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS send memory
> regions individually, we no longer need to impose this limitation on
> devices which support this feature.
>
> With this change, VMs with a vhost-user device which supports the
> VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS can support a configurable
> number of memory slots, up to the maximum allowed by the target
> platform.
>
> Existing backends which do not support
> VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS are unaffected.
>
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> Suggested-by: Mike Cui <cui@nutanix.com>
>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

---
>  docs/interop/vhost-user.rst |  7 +++---
>  hw/virtio/vhost-user.c      | 56
> ++++++++++++++++++++++++++++++---------------
>  2 files changed, 40 insertions(+), 23 deletions(-)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 037eefa..688b7c6 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1273,10 +1273,9 @@ Master message types
>    feature has been successfully negotiated, this message is submitted
>    by master to the slave. The slave should return the message with a
>    u64 payload containing the maximum number of memory slots for
> -  QEMU to expose to the guest. At this point, the value returned
> -  by the backend will be capped at the maximum number of ram slots
> -  which can be supported by vhost-user. Currently that limit is set
> -  at VHOST_USER_MAX_RAM_SLOTS = 8.
> +  QEMU to expose to the guest. The value returned by the backend
> +  will be capped at the maximum number of ram slots which can be
> +  supported by the target platform.
>
>  ``VHOST_USER_ADD_MEM_REG``
>    :id: 37
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 9358406..48b8081 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -35,11 +35,29 @@
>  #include <linux/userfaultfd.h>
>  #endif
>
> -#define VHOST_MEMORY_MAX_NREGIONS    8
> +#define VHOST_MEMORY_BASELINE_NREGIONS    8
>  #define VHOST_USER_F_PROTOCOL_FEATURES 30
>  #define VHOST_USER_SLAVE_MAX_FDS     8
>
>  /*
> + * Set maximum number of RAM slots supported to
> + * the maximum number supported by the target
> + * hardware plaform.
> + */
> +#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
> +    defined(TARGET_ARM) || defined(TARGET_ARM_64)
> +#include "hw/acpi/acpi.h"
> +#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS
> +
> +#elif defined(TARGET_PPC) || defined(TARGET_PPC_64)
> +#include "hw/ppc/spapr.h"
> +#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS
> +
> +#else
> +#define VHOST_USER_MAX_RAM_SLOTS 512
> +#endif
> +
> +/*
>   * Maximum size of virtio device config space
>   */
>  #define VHOST_USER_MAX_CONFIG_SIZE 256
> @@ -127,7 +145,7 @@ typedef struct VhostUserMemoryRegion {
>  typedef struct VhostUserMemory {
>      uint32_t nregions;
>      uint32_t padding;
> -    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> +    VhostUserMemoryRegion regions[VHOST_MEMORY_BASELINE_NREGIONS];
>  } VhostUserMemory;
>
>  typedef struct VhostUserMemRegMsg {
> @@ -222,7 +240,7 @@ struct vhost_user {
>      int slave_fd;
>      NotifierWithReturn postcopy_notifier;
>      struct PostCopyFD  postcopy_fd;
> -    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> +    uint64_t           postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
>      /* Length of the region_rb and region_rb_offset arrays */
>      size_t             region_rb_len;
>      /* RAMBlock associated with a given region */
> @@ -237,7 +255,7 @@ struct vhost_user {
>
>      /* Our current regions */
>      int num_shadow_regions;
> -    struct vhost_memory_region shadow_regions[VHOST_MEMORY_MAX_NREGIONS];
> +    struct vhost_memory_region shadow_regions[VHOST_USER_MAX_RAM_SLOTS];
>  };
>
>  struct scrub_regions {
> @@ -392,7 +410,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev,
> int fd)
>  static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>                                     struct vhost_log *log)
>  {
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_USER_MAX_RAM_SLOTS];
>      size_t fd_num = 0;
>      bool shmfd = virtio_has_feature(dev->protocol_features,
>                                      VHOST_USER_PROTOCOL_F_LOG_SHMFD);
> @@ -470,7 +488,7 @@ static int vhost_user_fill_set_mem_table_msg(struct
> vhost_user *u,
>          mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
>          if (fd > 0) {
>              if (track_ramblocks) {
> -                assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
> +                assert(*fd_num < VHOST_MEMORY_BASELINE_NREGIONS);
>                  trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
>                                                        reg->memory_size,
>
>  reg->guest_phys_addr,
> @@ -478,7 +496,7 @@ static int vhost_user_fill_set_mem_table_msg(struct
> vhost_user *u,
>                                                        offset);
>                  u->region_rb_offset[i] = offset;
>                  u->region_rb[i] = mr->ram_block;
> -            } else if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
> +            } else if (*fd_num == VHOST_MEMORY_BASELINE_NREGIONS) {
>                  error_report("Failed preparing vhost-user memory table
> msg");
>                  return -1;
>              }
> @@ -523,7 +541,7 @@ static void scrub_shadow_regions(struct vhost_dev *dev,
>                                   bool track_ramblocks)
>  {
>      struct vhost_user *u = dev->opaque;
> -    bool found[VHOST_MEMORY_MAX_NREGIONS] = {};
> +    bool found[VHOST_USER_MAX_RAM_SLOTS] = {};
>      struct vhost_memory_region *reg, *shadow_reg;
>      int i, j, fd, add_idx = 0, rm_idx = 0, fd_num = 0;
>      ram_addr_t offset;
> @@ -777,9 +795,9 @@ static int vhost_user_add_remove_regions(struct
> vhost_dev *dev,
>                                           bool track_ramblocks)
>  {
>      struct vhost_user *u = dev->opaque;
> -    struct scrub_regions add_reg[VHOST_MEMORY_MAX_NREGIONS];
> -    struct scrub_regions rem_reg[VHOST_MEMORY_MAX_NREGIONS];
> -    uint64_t shadow_pcb[VHOST_MEMORY_MAX_NREGIONS] = {};
> +    struct scrub_regions add_reg[VHOST_USER_MAX_RAM_SLOTS];
> +    struct scrub_regions rem_reg[VHOST_USER_MAX_RAM_SLOTS];
> +    uint64_t shadow_pcb[VHOST_USER_MAX_RAM_SLOTS] = {};
>      int nr_add_reg, nr_rem_reg;
>
>      msg->hdr.size = sizeof(msg->payload.mem_reg.padding) +
> @@ -803,7 +821,7 @@ static int vhost_user_add_remove_regions(struct
> vhost_dev *dev,
>
>      if (track_ramblocks) {
>          memcpy(u->postcopy_client_bases, shadow_pcb,
> -               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> +               sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
>          /*
>           * Now we've registered this with the postcopy code, we ack to the
>           * client, because now we're in the position to be able to deal
> with
> @@ -823,7 +841,7 @@ static int vhost_user_add_remove_regions(struct
> vhost_dev *dev,
>  err:
>      if (track_ramblocks) {
>          memcpy(u->postcopy_client_bases, shadow_pcb,
> -               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> +               sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
>      }
>
>      return -1;
> @@ -835,7 +853,7 @@ static int vhost_user_set_mem_table_postcopy(struct
> vhost_dev *dev,
>                                               bool config_mem_slots)
>  {
>      struct vhost_user *u = dev->opaque;
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      VhostUserMsg msg_reply;
>      int region_i, msg_i;
> @@ -893,7 +911,7 @@ static int vhost_user_set_mem_table_postcopy(struct
> vhost_dev *dev,
>          }
>
>          memset(u->postcopy_client_bases, 0,
> -               sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> +               sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
>
>          /*
>           * They're in the same order as the regions that were sent
> @@ -942,7 +960,7 @@ static int vhost_user_set_mem_table(struct vhost_dev
> *dev,
>                                      struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
>      bool reply_supported = virtio_has_feature(dev->protocol_features,
> @@ -1149,7 +1167,7 @@ static int vhost_set_vring_file(struct vhost_dev
> *dev,
>                                  VhostUserRequest request,
>                                  struct vhost_vring_file *file)
>  {
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_USER_MAX_RAM_SLOTS];
>      size_t fd_num = 0;
>      VhostUserMsg msg = {
>          .hdr.request = request,
> @@ -1845,7 +1863,7 @@ static int vhost_user_backend_init(struct vhost_dev
> *dev, void *opaque)
>          /* get max memory regions if backend supports configurable RAM
> slots */
>          if (!virtio_has_feature(dev->protocol_features,
>
>  VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS)) {
> -            u->user->memory_slots = VHOST_MEMORY_MAX_NREGIONS;
> +            u->user->memory_slots = VHOST_MEMORY_BASELINE_NREGIONS;
>          } else {
>              err = vhost_user_get_max_memslots(dev, &ram_slots);
>              if (err < 0) {
> @@ -1860,7 +1878,7 @@ static int vhost_user_backend_init(struct vhost_dev
> *dev, void *opaque)
>                  return -1;
>              }
>
> -            u->user->memory_slots = MIN(ram_slots,
> VHOST_MEMORY_MAX_NREGIONS);
> +            u->user->memory_slots = MIN(ram_slots,
> VHOST_USER_MAX_RAM_SLOTS);
>          }
>      }
>
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 12719 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 06/10] Refactor out libvhost-user fault generation logic
  2020-05-21  5:00 ` [PATCH v4 06/10] Refactor out libvhost-user fault generation logic Raphael Norwitz
@ 2020-06-04 14:48   ` Marc-André Lureau
  0 siblings, 0 replies; 20+ messages in thread
From: Marc-André Lureau @ 2020-06-04 14:48 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: Raphael Norwitz, QEMU, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 8273 bytes --]

Hi

On Thu, May 21, 2020 at 7:00 AM Raphael Norwitz <raphael.norwitz@nutanix.com>
wrote:

> In libvhost-user, the incoming postcopy migration path for setting the
> backend's memory tables has become convolued. In particular, moving the
> logic which starts generating faults, having received the final ACK from
> qemu can be moved to a separate function. This simplifies the code
> substantially.
>
> This logic will also be needed by the postcopy path once the
> VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS feature is supported.
>
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

---
>  contrib/libvhost-user/libvhost-user.c | 147
> ++++++++++++++++++----------------
>  1 file changed, 79 insertions(+), 68 deletions(-)
>
> diff --git a/contrib/libvhost-user/libvhost-user.c
> b/contrib/libvhost-user/libvhost-user.c
> index 3bca996..cccfa22 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -584,6 +584,84 @@ map_ring(VuDev *dev, VuVirtq *vq)
>  }
>
>  static bool
> +generate_faults(VuDev *dev) {
> +    int i;
> +    for (i = 0; i < dev->nregions; i++) {
> +        VuDevRegion *dev_region = &dev->regions[i];
> +        int ret;
> +#ifdef UFFDIO_REGISTER
> +        /*
> +         * We should already have an open ufd. Mark each memory
> +         * range as ufd.
> +         * Discard any mapping we have here; note I can't use MADV_REMOVE
> +         * or fallocate to make the hole since I don't want to lose
> +         * data that's already arrived in the shared process.
> +         * TODO: How to do hugepage
> +         */
> +        ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
> +                      dev_region->size + dev_region->mmap_offset,
> +                      MADV_DONTNEED);
> +        if (ret) {
> +            fprintf(stderr,
> +                    "%s: Failed to madvise(DONTNEED) region %d: %s\n",
> +                    __func__, i, strerror(errno));
> +        }
> +        /*
> +         * Turn off transparent hugepages so we dont get lose wakeups
> +         * in neighbouring pages.
> +         * TODO: Turn this backon later.
> +         */
> +        ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
> +                      dev_region->size + dev_region->mmap_offset,
> +                      MADV_NOHUGEPAGE);
> +        if (ret) {
> +            /*
> +             * Note: This can happen legally on kernels that are
> configured
> +             * without madvise'able hugepages
> +             */
> +            fprintf(stderr,
> +                    "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
> +                    __func__, i, strerror(errno));
> +        }
> +        struct uffdio_register reg_struct;
> +        reg_struct.range.start = (uintptr_t)dev_region->mmap_addr;
> +        reg_struct.range.len = dev_region->size + dev_region->mmap_offset;
> +        reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
> +
> +        if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, &reg_struct)) {
> +            vu_panic(dev, "%s: Failed to userfault region %d "
> +                          "@%p + size:%zx offset: %zx: (ufd=%d)%s\n",
> +                     __func__, i,
> +                     dev_region->mmap_addr,
> +                     dev_region->size, dev_region->mmap_offset,
> +                     dev->postcopy_ufd, strerror(errno));
> +            return false;
> +        }
> +        if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
> +            vu_panic(dev, "%s Region (%d) doesn't support COPY",
> +                     __func__, i);
> +            return false;
> +        }
> +        DPRINT("%s: region %d: Registered userfault for %"
> +               PRIx64 " + %" PRIx64 "\n", __func__, i,
> +               (uint64_t)reg_struct.range.start,
> +               (uint64_t)reg_struct.range.len);
> +        /* Now it's registered we can let the client at it */
> +        if (mprotect((void *)(uintptr_t)dev_region->mmap_addr,
> +                     dev_region->size + dev_region->mmap_offset,
> +                     PROT_READ | PROT_WRITE)) {
> +            vu_panic(dev, "failed to mprotect region %d for postcopy
> (%s)",
> +                     i, strerror(errno));
> +            return false;
> +        }
> +        /* TODO: Stash 'zero' support flags somewhere */
> +#endif
> +    }
> +
> +    return true;
> +}
> +
> +static bool
>  vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
>  {
>      int i;
> @@ -655,74 +733,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev,
> VhostUserMsg *vmsg)
>      }
>
>      /* OK, now we can go and register the memory and generate faults */
> -    for (i = 0; i < dev->nregions; i++) {
> -        VuDevRegion *dev_region = &dev->regions[i];
> -        int ret;
> -#ifdef UFFDIO_REGISTER
> -        /* We should already have an open ufd. Mark each memory
> -         * range as ufd.
> -         * Discard any mapping we have here; note I can't use MADV_REMOVE
> -         * or fallocate to make the hole since I don't want to lose
> -         * data that's already arrived in the shared process.
> -         * TODO: How to do hugepage
> -         */
> -        ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
> -                      dev_region->size + dev_region->mmap_offset,
> -                      MADV_DONTNEED);
> -        if (ret) {
> -            fprintf(stderr,
> -                    "%s: Failed to madvise(DONTNEED) region %d: %s\n",
> -                    __func__, i, strerror(errno));
> -        }
> -        /* Turn off transparent hugepages so we dont get lose wakeups
> -         * in neighbouring pages.
> -         * TODO: Turn this backon later.
> -         */
> -        ret = madvise((void *)(uintptr_t)dev_region->mmap_addr,
> -                      dev_region->size + dev_region->mmap_offset,
> -                      MADV_NOHUGEPAGE);
> -        if (ret) {
> -            /* Note: This can happen legally on kernels that are
> configured
> -             * without madvise'able hugepages
> -             */
> -            fprintf(stderr,
> -                    "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
> -                    __func__, i, strerror(errno));
> -        }
> -        struct uffdio_register reg_struct;
> -        reg_struct.range.start = (uintptr_t)dev_region->mmap_addr;
> -        reg_struct.range.len = dev_region->size + dev_region->mmap_offset;
> -        reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
> -
> -        if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, &reg_struct)) {
> -            vu_panic(dev, "%s: Failed to userfault region %d "
> -                          "@%p + size:%zx offset: %zx: (ufd=%d)%s\n",
> -                     __func__, i,
> -                     dev_region->mmap_addr,
> -                     dev_region->size, dev_region->mmap_offset,
> -                     dev->postcopy_ufd, strerror(errno));
> -            return false;
> -        }
> -        if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
> -            vu_panic(dev, "%s Region (%d) doesn't support COPY",
> -                     __func__, i);
> -            return false;
> -        }
> -        DPRINT("%s: region %d: Registered userfault for %"
> -               PRIx64 " + %" PRIx64 "\n", __func__, i,
> -               (uint64_t)reg_struct.range.start,
> -               (uint64_t)reg_struct.range.len);
> -        /* Now it's registered we can let the client at it */
> -        if (mprotect((void *)(uintptr_t)dev_region->mmap_addr,
> -                     dev_region->size + dev_region->mmap_offset,
> -                     PROT_READ | PROT_WRITE)) {
> -            vu_panic(dev, "failed to mprotect region %d for postcopy
> (%s)",
> -                     i, strerror(errno));
> -            return false;
> -        }
> -        /* TODO: Stash 'zero' support flags somewhere */
> -#endif
> -    }
> +    (void)generate_faults(dev);
>

>      return false;
>  }
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 10694 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 07/10] Support ram slot configuration in libvhost-user
  2020-05-21  5:00 ` [PATCH v4 07/10] Support ram slot configuration in libvhost-user Raphael Norwitz
@ 2020-06-04 14:49   ` Marc-André Lureau
  0 siblings, 0 replies; 20+ messages in thread
From: Marc-André Lureau @ 2020-06-04 14:49 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: Raphael Norwitz, QEMU, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 3170 bytes --]

On Thu, May 21, 2020 at 7:00 AM Raphael Norwitz <raphael.norwitz@nutanix.com>
wrote:

> The VHOST_USER_GET_MAX_MEM_SLOTS message allows a vhost-user backend to
> specify a maximum number of ram slots it is willing to support. This
> change adds support for libvhost-user to process this message. For now
> the backend will reply with 8 as the maximum number of regions
> supported.
>
> libvhost-user does not yet support the vhost-user protocol feature
> VHOST_USER_PROTOCOL_F_CONFIGIRE_MEM_SLOTS, so qemu should never
> send the VHOST_USER_GET_MAX_MEM_SLOTS message. Therefore this new
> functionality is not currently used.
>
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

---
>  contrib/libvhost-user/libvhost-user.c | 19 +++++++++++++++++++
>  contrib/libvhost-user/libvhost-user.h |  1 +
>  2 files changed, 20 insertions(+)
>
> diff --git a/contrib/libvhost-user/libvhost-user.c
> b/contrib/libvhost-user/libvhost-user.c
> index cccfa22..9f039b7 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -137,6 +137,7 @@ vu_request_to_string(unsigned int req)
>          REQ(VHOST_USER_SET_INFLIGHT_FD),
>          REQ(VHOST_USER_GPU_SET_SOCKET),
>          REQ(VHOST_USER_VRING_KICK),
> +        REQ(VHOST_USER_GET_MAX_MEM_SLOTS),
>          REQ(VHOST_USER_MAX),
>      };
>  #undef REQ
> @@ -1565,6 +1566,22 @@ vu_handle_vring_kick(VuDev *dev, VhostUserMsg *vmsg)
>      return false;
>  }
>
> +static bool vu_handle_get_max_memslots(VuDev *dev, VhostUserMsg *vmsg)
> +{
> +    vmsg->flags = VHOST_USER_REPLY_MASK | VHOST_USER_VERSION;
> +    vmsg->size  = sizeof(vmsg->payload.u64);
> +    vmsg->payload.u64 = VHOST_MEMORY_MAX_NREGIONS;
> +    vmsg->fd_num = 0;
> +
> +    if (!vu_message_write(dev, dev->sock, vmsg)) {
> +        vu_panic(dev, "Failed to send max ram slots: %s\n",
> strerror(errno));
> +    }
> +
> +    DPRINT("u64: 0x%016"PRIx64"\n", (uint64_t) VHOST_MEMORY_MAX_NREGIONS);
> +
> +    return false;
> +}
> +
>  static bool
>  vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
>  {
> @@ -1649,6 +1666,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
>          return vu_set_inflight_fd(dev, vmsg);
>      case VHOST_USER_VRING_KICK:
>          return vu_handle_vring_kick(dev, vmsg);
> +    case VHOST_USER_GET_MAX_MEM_SLOTS:
> +        return vu_handle_get_max_memslots(dev, vmsg);
>      default:
>          vmsg_close_fds(vmsg);
>          vu_panic(dev, "Unhandled request: %d", vmsg->request);
> diff --git a/contrib/libvhost-user/libvhost-user.h
> b/contrib/libvhost-user/libvhost-user.h
> index f30394f..88ef40d 100644
> --- a/contrib/libvhost-user/libvhost-user.h
> +++ b/contrib/libvhost-user/libvhost-user.h
> @@ -97,6 +97,7 @@ typedef enum VhostUserRequest {
>      VHOST_USER_SET_INFLIGHT_FD = 32,
>      VHOST_USER_GPU_SET_SOCKET = 33,
>      VHOST_USER_VRING_KICK = 35,
> +    VHOST_USER_GET_MAX_MEM_SLOTS = 36,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>
> --
> 1.8.3.1
>
>

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 4158 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 04/10] Transmit vhost-user memory regions individually
  2020-06-04 14:44   ` Marc-André Lureau
@ 2020-06-09 14:13     ` Raphael Norwitz
  0 siblings, 0 replies; 20+ messages in thread
From: Raphael Norwitz @ 2020-06-09 14:13 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Peter Turschmid, Michael S. Tsirkin, QEMU, Swapnil Ingle,
	Raphael Norwitz

On Thu, Jun 4, 2020 at 10:45 AM Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
>
> Hi
>
> On Thu, May 21, 2020 at 7:00 AM Raphael Norwitz <raphael.norwitz@nutanix.com> wrote:
>>
>> With this change, when the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>> protocol feature has been negotiated, Qemu no longer sends the backend
>> all the memory regions in a single message. Rather, when the memory
>> tables are set or updated, a series of VHOST_USER_ADD_MEM_REG and
>> VHOST_USER_REM_MEM_REG messages are sent to transmit the regions to map
>> and/or unmap instead of sending send all the regions in one fixed size
>> VHOST_USER_SET_MEM_TABLE message.
>>
>> The vhost_user struct maintains a shadow state of the VM’s memory
>> regions. When the memory tables are modified, the
>> vhost_user_set_mem_table() function compares the new device memory state
>> to the shadow state and only sends regions which need to be unmapped or
>> mapped in. The regions which must be unmapped are sent first, followed
>> by the new regions to be mapped in. After all the messages have been
>> sent, the shadow state is set to the current virtual device state.
>>
>> Existing backends which do not support
>> VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS are unaffected.
>>
>> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
>> Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
>> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
>> Suggested-by: Mike Cui <cui@nutanix.com>
> >
> >
> > The change is a bit tricky, why not add more pre-condition/post-condition checks? This could really help in case we missed some OOB conditions.

Here are a few conditions I could think of:

1. I can ensure that new regions won't overrun the shadow table - just
putting an assert(u->num_shadow_regions + add_idx - rm_idx <
VHOST_MEMORY_MAX_NREGIONS); at the end of scrub_shadow_regions()

2. I could add a precondition asserting that none of the regions in
the shadow table are overlapping.

3. i could add a post condition check that all remove regions are in
the shadow table but not the device memory state and that all add
regions are in the device memory state but not in the shadow table.

Can you think of any others?

(1) is simple but (2) and (3) are more involved and will introduce
overhead. I'm not sure how valuable they are, but if you want I can
tack them on to the end of the series.

>
> > I would also use longer names: rem->remove, reg->registry, etc.. I think they improve readability.

Sure - happy to change the variable names.

>
> > Nonetheless, it looks ok and apparently 4 people already looked at it, so for now:
> > Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Also did you have any comments on the rest of the libvhost-user
patches before I ship a V5?


>
>> ---
>>  docs/interop/vhost-user.rst |  33 ++-
>>  hw/virtio/vhost-user.c      | 510 +++++++++++++++++++++++++++++++++++++-------
>>  2 files changed, 469 insertions(+), 74 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-06-09 14:14 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21  5:00 [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
2020-05-21  5:00 ` [PATCH v4 01/10] Add helper to populate vhost-user message regions Raphael Norwitz
2020-06-04 14:40   ` Marc-André Lureau
2020-05-21  5:00 ` [PATCH v4 02/10] Add vhost-user helper to get MemoryRegion data Raphael Norwitz
2020-06-04 14:41   ` Marc-André Lureau
2020-05-21  5:00 ` [PATCH v4 03/10] Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS Raphael Norwitz
2020-06-04 14:42   ` Marc-André Lureau
2020-05-21  5:00 ` [PATCH v4 04/10] Transmit vhost-user memory regions individually Raphael Norwitz
2020-06-04 14:44   ` Marc-André Lureau
2020-06-09 14:13     ` Raphael Norwitz
2020-05-21  5:00 ` [PATCH v4 05/10] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
2020-06-04 14:45   ` Marc-André Lureau
2020-05-21  5:00 ` [PATCH v4 06/10] Refactor out libvhost-user fault generation logic Raphael Norwitz
2020-06-04 14:48   ` Marc-André Lureau
2020-05-21  5:00 ` [PATCH v4 07/10] Support ram slot configuration in libvhost-user Raphael Norwitz
2020-06-04 14:49   ` Marc-André Lureau
2020-05-21  5:00 ` [PATCH v4 08/10] Support adding individual regions " Raphael Norwitz
2020-05-21  5:00 ` [PATCH v4 09/10] Support individual region unmap " Raphael Norwitz
2020-05-21  5:00 ` [PATCH v4 10/10] Lift max ram slots limit " Raphael Norwitz
2020-06-04  4:11 ` [PATCH v4 00/10] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.