qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation
@ 2020-01-16  2:57 Raphael Norwitz
  2020-01-16  2:57 ` [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy Raphael Norwitz
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Raphael Norwitz @ 2020-01-16  2:57 UTC (permalink / raw)
  To: qemu-devel, mst; +Cc: raphael.s.norwitz, Raphael Norwitz

In QEMU today, a VM with a vhost-user device can hot add memory a
maximum of 8 times. See these threads, among others:

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
    https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html

[2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html

This series introduces a new protocol feature
VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
restriction on the maximum number RAM slots imposed by vhost-user.

The patch consists of 3 changes:
1. Fixed assert in vhost_user_set_mem_table_postcopy:
   This is a bug fix in the postcopy migration path
2. Refactor vhost_user_set_mem_table functions:
   This is a non-functional change refractoring the
   vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy
   functions such that the feature can be more cleanly added.
3. Lift max memory slots limit imposed by vhost-user:
   This change introduces the new protocol feature
   VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS.

The implementation details are explained in more detail in the commit
messages, but at a high level the new protocol feature works as follows:
- If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will
  send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
  messages to map and unmap individual memory regions instead of one large
  VHOST_USER_SET_MEM_TABLE message containing all memory regions.
- The vhost-user struct maintains a ’shadow state’ of memory regions
  already sent to the guest. Each time vhost_user_set_mem_table is called,
  the shadow state is compared with the new device state. A
  VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
  not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
  for each region in the device state but not the shadow state. After
  these messages have been sent, the shadow state will be updated to
  reflect the new device state.

The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of
regions grows, the message becomes very large. In practice, such large
messages caused problems (truncated messages) and in the past it seems the
community has opted for smaller fixed size messages where possible. VRINGs,
for example, are sent to the backend individually instead of in one massive
message.

Current Limitations:
- postcopy migration is not supported when the
  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. 
- VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when
  VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated.

Both of these limitations are due to resource contraints. They are not
imposed for technical reasons.

Changes since V1:
    * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
      to prevent corruption
    * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
      startup and cache the returned value so that QEMU does not need to
      query the backend every time vhost_backend_memslots_limit is called.

Best,
Raphael

Raphael Norwitz (3):
  Fixed assert in vhost_user_set_mem_table_postcopy
  Refactor vhost_user_set_mem_table functions
  Lift max memory slots limit imposed by vhost-user

 docs/interop/vhost-user.rst |  43 +++++
 hw/virtio/vhost-user.c      | 385 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 336 insertions(+), 92 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy
  2020-01-16  2:57 [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
@ 2020-01-16  2:57 ` Raphael Norwitz
  2020-02-06  8:17   ` Michael S. Tsirkin
  2020-01-16  2:57 ` [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions Raphael Norwitz
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Raphael Norwitz @ 2020-01-16  2:57 UTC (permalink / raw)
  To: qemu-devel, mst; +Cc: Peter Turschmid, raphael.s.norwitz, Raphael Norwitz

The current vhost_user_set_mem_table_postcopy() implementation
populates each region of the VHOST_USER_SET_MEM_TABLE message without
first checking if there are more than VHOST_MEMORY_MAX_NREGIONS already
populated. This can cause memory corruption if too many regions are
added to the message during the postcopy step.

This change moves an existing assert up such that attempting to
construct a VHOST_USER_SET_MEM_TABLE message with too many memory
regions will gracefully bring down qemu instead of corrupting memory.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
---
 hw/virtio/vhost-user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 2e81f55..cce851a 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -443,6 +443,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
                                      &offset);
         fd = memory_region_get_fd(mr);
         if (fd > 0) {
+            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
             trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
                                                   reg->memory_size,
                                                   reg->guest_phys_addr,
@@ -455,7 +456,6 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
             msg.payload.memory.regions[fd_num].guest_phys_addr =
                 reg->guest_phys_addr;
             msg.payload.memory.regions[fd_num].mmap_offset = offset;
-            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
             fds[fd_num++] = fd;
         } else {
             u->region_rb_offset[i] = 0;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions
  2020-01-16  2:57 [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
  2020-01-16  2:57 ` [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy Raphael Norwitz
@ 2020-01-16  2:57 ` Raphael Norwitz
  2020-02-06  8:21   ` Michael S. Tsirkin
  2020-01-16  2:57 ` [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Raphael Norwitz @ 2020-01-16  2:57 UTC (permalink / raw)
  To: qemu-devel, mst; +Cc: Peter Turschmid, raphael.s.norwitz, Raphael Norwitz

vhost_user_set_mem_table() and vhost_user_set_mem_table_postcopy() have
gotten convoluted, and have some identical code.

This change moves the logic populating the VhostUserMemory struct and
fds array from vhost_user_set_mem_table() and
vhost_user_set_mem_table_postcopy() to a new function,
vhost_user_fill_set_mem_table_msg().

No functionality is impacted.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
---
 hw/virtio/vhost-user.c | 143 +++++++++++++++++++++++--------------------------
 1 file changed, 67 insertions(+), 76 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index cce851a..af83fdd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -407,18 +407,79 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
     return 0;
 }
 
+static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
+                                             struct vhost_dev *dev,
+                                             VhostUserMsg *msg,
+                                             int *fds, size_t *fd_num,
+                                             bool postcopy)
+{
+    int i, fd;
+    ram_addr_t offset;
+    MemoryRegion *mr;
+    struct vhost_memory_region *reg;
+
+    msg->hdr.request = VHOST_USER_SET_MEM_TABLE;
+
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        reg = dev->mem->regions + i;
+
+        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                     &offset);
+        fd = memory_region_get_fd(mr);
+        if (fd > 0) {
+            if (postcopy) {
+                assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
+                trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
+                                                      reg->memory_size,
+                                                      reg->guest_phys_addr,
+                                                      reg->userspace_addr,
+                                                      offset);
+                u->region_rb_offset[i] = offset;
+                u->region_rb[i] = mr->ram_block;
+            } else if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
+                error_report("Failed preparing vhost-user memory table msg");
+                return -1;
+            }
+            msg->payload.memory.regions[*fd_num].userspace_addr =
+                reg->userspace_addr;
+            msg->payload.memory.regions[*fd_num].memory_size =
+                reg->memory_size;
+            msg->payload.memory.regions[*fd_num].guest_phys_addr =
+                reg->guest_phys_addr;
+            msg->payload.memory.regions[*fd_num].mmap_offset = offset;
+            fds[(*fd_num)++] = fd;
+        } else if (postcopy) {
+            u->region_rb_offset[i] = 0;
+            u->region_rb[i] = NULL;
+        }
+    }
+
+    msg->payload.memory.nregions = *fd_num;
+
+    if (!*fd_num) {
+        error_report("Failed initializing vhost-user memory map, "
+                     "consider using -object memory-backend-file share=on");
+        return -1;
+    }
+
+    msg->hdr.size = sizeof(msg->payload.memory.nregions);
+    msg->hdr.size += sizeof(msg->payload.memory.padding);
+    msg->hdr.size += *fd_num * sizeof(VhostUserMemoryRegion);
+
+    return 1;
+}
+
 static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
                                              struct vhost_memory *mem)
 {
     struct vhost_user *u = dev->opaque;
     int fds[VHOST_MEMORY_MAX_NREGIONS];
-    int i, fd;
     size_t fd_num = 0;
     VhostUserMsg msg_reply;
     int region_i, msg_i;
 
     VhostUserMsg msg = {
-        .hdr.request = VHOST_USER_SET_MEM_TABLE,
         .hdr.flags = VHOST_USER_VERSION,
     };
 
@@ -433,48 +494,11 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
         u->region_rb_len = dev->mem->nregions;
     }
 
-    for (i = 0; i < dev->mem->nregions; ++i) {
-        struct vhost_memory_region *reg = dev->mem->regions + i;
-        ram_addr_t offset;
-        MemoryRegion *mr;
-
-        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
-        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
-                                     &offset);
-        fd = memory_region_get_fd(mr);
-        if (fd > 0) {
-            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
-            trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
-                                                  reg->memory_size,
-                                                  reg->guest_phys_addr,
-                                                  reg->userspace_addr, offset);
-            u->region_rb_offset[i] = offset;
-            u->region_rb[i] = mr->ram_block;
-            msg.payload.memory.regions[fd_num].userspace_addr =
-                reg->userspace_addr;
-            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
-            msg.payload.memory.regions[fd_num].guest_phys_addr =
-                reg->guest_phys_addr;
-            msg.payload.memory.regions[fd_num].mmap_offset = offset;
-            fds[fd_num++] = fd;
-        } else {
-            u->region_rb_offset[i] = 0;
-            u->region_rb[i] = NULL;
-        }
-    }
-
-    msg.payload.memory.nregions = fd_num;
-
-    if (!fd_num) {
-        error_report("Failed initializing vhost-user memory map, "
-                     "consider using -object memory-backend-file share=on");
+    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
+                                          true) < 0) {
         return -1;
     }
 
-    msg.hdr.size = sizeof(msg.payload.memory.nregions);
-    msg.hdr.size += sizeof(msg.payload.memory.padding);
-    msg.hdr.size += fd_num * sizeof(VhostUserMemoryRegion);
-
     if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
         return -1;
     }
@@ -545,7 +569,6 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
 {
     struct vhost_user *u = dev->opaque;
     int fds[VHOST_MEMORY_MAX_NREGIONS];
-    int i, fd;
     size_t fd_num = 0;
     bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
     bool reply_supported = virtio_has_feature(dev->protocol_features,
@@ -559,7 +582,6 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
     }
 
     VhostUserMsg msg = {
-        .hdr.request = VHOST_USER_SET_MEM_TABLE,
         .hdr.flags = VHOST_USER_VERSION,
     };
 
@@ -567,42 +589,11 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
     }
 
-    for (i = 0; i < dev->mem->nregions; ++i) {
-        struct vhost_memory_region *reg = dev->mem->regions + i;
-        ram_addr_t offset;
-        MemoryRegion *mr;
-
-        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
-        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
-                                     &offset);
-        fd = memory_region_get_fd(mr);
-        if (fd > 0) {
-            if (fd_num == VHOST_MEMORY_MAX_NREGIONS) {
-                error_report("Failed preparing vhost-user memory table msg");
-                return -1;
-            }
-            msg.payload.memory.regions[fd_num].userspace_addr =
-                reg->userspace_addr;
-            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
-            msg.payload.memory.regions[fd_num].guest_phys_addr =
-                reg->guest_phys_addr;
-            msg.payload.memory.regions[fd_num].mmap_offset = offset;
-            fds[fd_num++] = fd;
-        }
-    }
-
-    msg.payload.memory.nregions = fd_num;
-
-    if (!fd_num) {
-        error_report("Failed initializing vhost-user memory map, "
-                     "consider using -object memory-backend-file share=on");
+    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
+                                          false) < 0) {
         return -1;
     }
 
-    msg.hdr.size = sizeof(msg.payload.memory.nregions);
-    msg.hdr.size += sizeof(msg.payload.memory.padding);
-    msg.hdr.size += fd_num * sizeof(VhostUserMemoryRegion);
-
     if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
         return -1;
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
  2020-01-16  2:57 [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
  2020-01-16  2:57 ` [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy Raphael Norwitz
  2020-01-16  2:57 ` [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions Raphael Norwitz
@ 2020-01-16  2:57 ` Raphael Norwitz
  2020-02-06  8:32   ` Michael S. Tsirkin
  2020-01-31 21:21 ` [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
  2020-02-06  8:33 ` Michael S. Tsirkin
  4 siblings, 1 reply; 19+ messages in thread
From: Raphael Norwitz @ 2020-01-16  2:57 UTC (permalink / raw)
  To: qemu-devel, mst; +Cc: Peter Turschmid, raphael.s.norwitz, Raphael Norwitz

The current vhost-user implementation in Qemu imposes a limit on the
maximum number of memory slots exposed to a VM using a vhost-user
device. This change provides a new protocol feature
VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
allows a VM with a vhost-user device to expose a configurable number of
memory slots, up to the ACPI defined maximum. Existing backends which
do not support this protocol feature are unaffected.

This feature works by using three new messages,
VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
number of memory slots the backend is willing to accept when the
backend is initialized. Then, when the memory tables are set or updated,
a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
are sent to transmit the regions to map and/or unmap instead of trying
to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
message.

The vhost_user struct maintains a shadow state of the VM’s memory
regions. When the memory tables are modified, the
vhost_user_set_mem_table() function compares the new device memory state
to the shadow state and only sends regions which need to be unmapped or
mapped in. The regions which must be unmapped are sent first, followed
by the new regions to be mapped in. After all the messages have been
sent, the shadow state is set to the current virtual device state.

The current feature implementation does not work with postcopy migration
and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
also been negotiated.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
Suggested-by: Mike Cui <cui@nutanix.com>
---
 docs/interop/vhost-user.rst |  43 ++++++++
 hw/virtio/vhost-user.c      | 254 ++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 275 insertions(+), 22 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 5f8b3a4..ae9acf2 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -786,6 +786,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
   #define VHOST_USER_PROTOCOL_F_RESET_DEVICE   13
+  #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS   14
 
 Master message types
 --------------------
@@ -1205,6 +1206,48 @@ Master message types
   Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
   feature is set by the backend.
 
+``VHOST_USER_GET_MAX_MEM_SLOTS``
+  :id: 35
+  :equivalent ioctl: N/A
+  :slave payload: u64
+
+  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
+  successfully negotiated, this message is submitted by master to the
+  slave. The slave should return the message with a u64 payload
+  containing the maximum number of memory slots for QEMU to expose to
+  the guest. This message is not supported with postcopy migration or if
+  the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
+
+``VHOST_USER_ADD_MEM_REG``
+  :id: 36
+  :equivalent ioctl: N/A
+  :slave payload: memory region
+
+  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
+  successfully negotiated, this message is submitted by master to the slave.
+  The message payload contains a memory region descriptor struct, describing
+  a region of guest memory which the slave device must map in. When the
+  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
+  negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is
+  used to set and update the memory tables of the slave device. This message
+  is not supported with postcopy migration or if the
+  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
+
+``VHOST_USER_REM_MEM_REG``
+  :id: 37
+  :equivalent ioctl: N/A
+  :slave payload: memory region
+
+  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
+  successfully negotiated, this message is submitted by master to the slave.
+  The message payload contains a memory region descriptor struct, describing
+  a region of guest memory which the slave device must unmap. When the
+  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
+  negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is
+  used to set and update the memory tables of the slave device. This message
+  is not supported with postcopy migration or if the
+  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
+
 Slave message types
 -------------------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index af83fdd..fed6d02 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -35,11 +35,29 @@
 #include <linux/userfaultfd.h>
 #endif
 
-#define VHOST_MEMORY_MAX_NREGIONS    8
+#define VHOST_MEMORY_LEGACY_NREGIONS    8
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 #define VHOST_USER_SLAVE_MAX_FDS     8
 
 /*
+ * Set maximum number of RAM slots supported to
+ * the maximum number supported by the target
+ * hardware plaform.
+ */
+#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
+    defined(TARGET_ARM) || defined(TARGET_ARM_64)
+#include "hw/acpi/acpi.h"
+#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS
+
+#elif defined(TARGET_PPC) || defined(TARGET_PPC_64)
+#include "hw/ppc/spapr.h"
+#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS
+
+#else
+#define VHOST_USER_MAX_RAM_SLOTS 512
+#endif
+
+/*
  * Maximum size of virtio device config space
  */
 #define VHOST_USER_MAX_CONFIG_SIZE 256
@@ -59,6 +77,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
     VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
     VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
+    VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS = 14,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -100,6 +119,9 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_INFLIGHT_FD = 32,
     VHOST_USER_GPU_SET_SOCKET = 33,
     VHOST_USER_RESET_DEVICE = 34,
+    VHOST_USER_GET_MAX_MEM_SLOTS = 35,
+    VHOST_USER_ADD_MEM_REG = 36,
+    VHOST_USER_REM_MEM_REG = 37,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -121,9 +143,14 @@ typedef struct VhostUserMemoryRegion {
 typedef struct VhostUserMemory {
     uint32_t nregions;
     uint32_t padding;
-    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+    VhostUserMemoryRegion regions[VHOST_MEMORY_LEGACY_NREGIONS];
 } VhostUserMemory;
 
+typedef struct VhostUserMemRegMsg {
+    uint32_t padding;
+    VhostUserMemoryRegion region;
+} VhostUserMemRegMsg;
+
 typedef struct VhostUserLog {
     uint64_t mmap_size;
     uint64_t mmap_offset;
@@ -182,6 +209,7 @@ typedef union {
         struct vhost_vring_state state;
         struct vhost_vring_addr addr;
         VhostUserMemory memory;
+        VhostUserMemRegMsg mem_reg;
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
@@ -210,7 +238,7 @@ struct vhost_user {
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
-    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
+    uint64_t           postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
     /* Length of the region_rb and region_rb_offset arrays */
     size_t             region_rb_len;
     /* RAMBlock associated with a given region */
@@ -222,6 +250,13 @@ struct vhost_user {
 
     /* True once we've entered postcopy_listen */
     bool               postcopy_listen;
+
+    /* Maximum number of RAM slots supported by the backend */
+    uint64_t memory_slots;
+
+    /* Our current regions */
+    int num_shadow_regions;
+    VhostUserMemoryRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS];
 };
 
 static bool ioeventfd_enabled(void)
@@ -370,7 +405,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd)
 static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
                                    struct vhost_log *log)
 {
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_USER_MAX_RAM_SLOTS];
     size_t fd_num = 0;
     bool shmfd = virtio_has_feature(dev->protocol_features,
                                     VHOST_USER_PROTOCOL_F_LOG_SHMFD);
@@ -429,7 +464,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
         fd = memory_region_get_fd(mr);
         if (fd > 0) {
             if (postcopy) {
-                assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
+                assert(*fd_num < VHOST_MEMORY_LEGACY_NREGIONS);
                 trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
                                                       reg->memory_size,
                                                       reg->guest_phys_addr,
@@ -437,7 +472,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
                                                       offset);
                 u->region_rb_offset[i] = offset;
                 u->region_rb[i] = mr->ram_block;
-            } else if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
+            } else if (*fd_num == VHOST_MEMORY_LEGACY_NREGIONS) {
                 error_report("Failed preparing vhost-user memory table msg");
                 return -1;
             }
@@ -474,7 +509,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
                                              struct vhost_memory *mem)
 {
     struct vhost_user *u = dev->opaque;
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
     size_t fd_num = 0;
     VhostUserMsg msg_reply;
     int region_i, msg_i;
@@ -523,7 +558,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
     }
 
     memset(u->postcopy_client_bases, 0,
-           sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+           sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
 
     /* They're in the same order as the regions that were sent
      * but some of the regions were skipped (above) if they
@@ -564,18 +599,151 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
     return 0;
 }
 
+static inline bool reg_equal(VhostUserMemoryRegion *shadow_reg,
+                             struct vhost_memory_region *vdev_reg)
+{
+    if (shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr &&
+        shadow_reg->userspace_addr == vdev_reg->userspace_addr &&
+        shadow_reg->memory_size == vdev_reg->memory_size) {
+        return true;
+    } else {
+        return false;
+    }
+}
+
+static int vhost_user_send_add_remove_regions(struct vhost_dev *dev,
+                                              struct vhost_memory *mem,
+                                              VhostUserMsg *msg)
+{
+    struct vhost_user *u = dev->opaque;
+    int i, j, fd;
+    bool found[VHOST_USER_MAX_RAM_SLOTS] = {};
+    bool matching = false;
+    struct vhost_memory_region *reg;
+    ram_addr_t offset;
+    MemoryRegion *mr;
+
+    /*
+     * Ensure the VHOST_USER_PROTOCOL_F_REPLY_ACK has not been
+     * negotiated and no postcopy migration is in progress.
+     */
+    assert(!virtio_has_feature(dev->protocol_features,
+                               VHOST_USER_PROTOCOL_F_REPLY_ACK));
+    if (u->postcopy_listen && u->postcopy_fd.handler) {
+        error_report("Postcopy migration is not supported when the "
+                     "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
+                     "has been negotiated");
+        return -1;
+    }
+
+    msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
+    msg->hdr.size += sizeof(VhostUserMemoryRegion);
+
+    /*
+     * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
+     * which are not found not in the device's memory state.
+     */
+    for (i = 0; i < u->num_shadow_regions; ++i) {
+        reg = dev->mem->regions;
+
+        for (j = 0; j < dev->mem->nregions; j++) {
+            reg = dev->mem->regions + j;
+
+            assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+            mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                         &offset);
+            fd = memory_region_get_fd(mr);
+
+            if (reg_equal(&u->shadow_regions[i], reg)) {
+                matching = true;
+                found[j] = true;
+                break;
+            }
+        }
+
+        if (fd > 0 && !matching) {
+            msg->hdr.request = VHOST_USER_REM_MEM_REG;
+            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
+            msg->payload.mem_reg.region.memory_size = reg->memory_size;
+            msg->payload.mem_reg.region.guest_phys_addr =
+                reg->guest_phys_addr;
+            msg->payload.mem_reg.region.mmap_offset = offset;
+
+            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
+                return -1;
+            }
+        }
+    }
+
+    /*
+     * Send messages to add regions present in the device which are not
+     * in our shadow state.
+     */
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        reg = dev->mem->regions + i;
+
+        /*
+         * If the region was in both the shadow and vdev state we don't
+         * need to send a VHOST_USER_ADD_MEM_REG message for it.
+         */
+        if (found[i]) {
+            continue;
+        }
+
+        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                     &offset);
+        fd = memory_region_get_fd(mr);
+
+        if (fd > 0) {
+            msg->hdr.request = VHOST_USER_ADD_MEM_REG;
+            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
+            msg->payload.mem_reg.region.memory_size = reg->memory_size;
+            msg->payload.mem_reg.region.guest_phys_addr =
+                reg->guest_phys_addr;
+            msg->payload.mem_reg.region.mmap_offset = offset;
+
+            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
+                return -1;
+            }
+        }
+    }
+
+    /* Make our shadow state match the updated device state. */
+    u->num_shadow_regions = dev->mem->nregions;
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        reg = dev->mem->regions + i;
+        u->shadow_regions[i].guest_phys_addr = reg->guest_phys_addr;
+        u->shadow_regions[i].userspace_addr = reg->userspace_addr;
+        u->shadow_regions[i].memory_size = reg->memory_size;
+    }
+
+    return 0;
+}
+
 static int vhost_user_set_mem_table(struct vhost_dev *dev,
                                     struct vhost_memory *mem)
 {
     struct vhost_user *u = dev->opaque;
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
     size_t fd_num = 0;
     bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
     bool reply_supported = virtio_has_feature(dev->protocol_features,
                                               VHOST_USER_PROTOCOL_F_REPLY_ACK);
+    bool config_slots =
+        virtio_has_feature(dev->protocol_features,
+                           VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS);
 
     if (do_postcopy) {
-        /* Postcopy has enough differences that it's best done in it's own
+        if (config_slots) {
+            error_report("Postcopy migration not supported with "
+                         "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
+                         "enabled.");
+            return -1;
+        }
+
+        /*
+         * Postcopy has enough differences that it's best done in it's own
          * version
          */
         return vhost_user_set_mem_table_postcopy(dev, mem);
@@ -589,17 +757,22 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
     }
 
-    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
-                                          false) < 0) {
-        return -1;
-    }
-
-    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
-        return -1;
-    }
+    if (config_slots && !reply_supported) {
+        if (vhost_user_send_add_remove_regions(dev, mem, &msg) < 0) {
+            return -1;
+        }
+    } else {
+        if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
+                                              false) < 0) {
+            return -1;
+        }
+        if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+            return -1;
+        }
 
-    if (reply_supported) {
-        return process_message_reply(dev, &msg);
+        if (reply_supported) {
+            return process_message_reply(dev, &msg);
+        }
     }
 
     return 0;
@@ -764,7 +937,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
                                 VhostUserRequest request,
                                 struct vhost_vring_file *file)
 {
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int fds[VHOST_USER_MAX_RAM_SLOTS];
     size_t fd_num = 0;
     VhostUserMsg msg = {
         .hdr.request = request,
@@ -866,6 +1039,23 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
     return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
 }
 
+static int vhost_user_get_max_memslots(struct vhost_dev *dev,
+                                       uint64_t *max_memslots)
+{
+    uint64_t backend_max_memslots;
+    int err;
+
+    err = vhost_user_get_u64(dev, VHOST_USER_GET_MAX_MEM_SLOTS,
+                             &backend_max_memslots);
+    if (err < 0) {
+        return err;
+    }
+
+    *max_memslots = MIN(backend_max_memslots, VHOST_USER_MAX_RAM_SLOTS);
+
+    return *max_memslots;
+}
+
 static int vhost_user_set_owner(struct vhost_dev *dev)
 {
     VhostUserMsg msg = {
@@ -1439,6 +1629,24 @@ static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
                          "slave-req protocol features.");
             return -1;
         }
+
+        /* get max memory regions if backend supports configurable RAM slots */
+        if (!virtio_has_feature(dev->protocol_features,
+                                VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS)) {
+            u->memory_slots = VHOST_MEMORY_LEGACY_NREGIONS;
+        } else if (virtio_has_feature(dev->protocol_features,
+                           VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
+            error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol "
+                         "feature is not supported when the "
+                         "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also "
+                         "been negotiated");
+            return -1;
+        } else {
+            err = vhost_user_get_max_memslots(dev, &u->memory_slots);
+            if (err < 0) {
+                return err;
+            }
+        }
     }
 
     if (dev->migration_blocker == NULL &&
@@ -1502,7 +1710,9 @@ static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx)
 
 static int vhost_user_memslots_limit(struct vhost_dev *dev)
 {
-    return VHOST_MEMORY_MAX_NREGIONS;
+    struct vhost_user *u = dev->opaque;
+
+    return u->memory_slots;
 }
 
 static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation
  2020-01-16  2:57 [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (2 preceding siblings ...)
  2020-01-16  2:57 ` [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
@ 2020-01-31 21:21 ` Raphael Norwitz
  2020-02-06  8:33 ` Michael S. Tsirkin
  4 siblings, 0 replies; 19+ messages in thread
From: Raphael Norwitz @ 2020-01-31 21:21 UTC (permalink / raw)
  To: qemu-devel, mst; +Cc: raphael.s.norwitz

Ping

On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> 
> In QEMU today, a VM with a vhost-user device can hot add memory a
> maximum of 8 times. See these threads, among others:
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html  
>     https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html 
> 
> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html 
> 
> This series introduces a new protocol feature
> VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
> restriction on the maximum number RAM slots imposed by vhost-user.
> 
> The patch consists of 3 changes:
> 1. Fixed assert in vhost_user_set_mem_table_postcopy:
>    This is a bug fix in the postcopy migration path
> 2. Refactor vhost_user_set_mem_table functions:
>    This is a non-functional change refractoring the
>    vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy
>    functions such that the feature can be more cleanly added.
> 3. Lift max memory slots limit imposed by vhost-user:
>    This change introduces the new protocol feature
>    VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS.
> 
> The implementation details are explained in more detail in the commit
> messages, but at a high level the new protocol feature works as follows:
> - If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will
>   send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
>   messages to map and unmap individual memory regions instead of one large
>   VHOST_USER_SET_MEM_TABLE message containing all memory regions.
> - The vhost-user struct maintains a ’shadow state’ of memory regions
>   already sent to the guest. Each time vhost_user_set_mem_table is called,
>   the shadow state is compared with the new device state. A
>   VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
>   not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
>   for each region in the device state but not the shadow state. After
>   these messages have been sent, the shadow state will be updated to
>   reflect the new device state.
> 
> The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of
> regions grows, the message becomes very large. In practice, such large
> messages caused problems (truncated messages) and in the past it seems the
> community has opted for smaller fixed size messages where possible. VRINGs,
> for example, are sent to the backend individually instead of in one massive
> message.
> 
> Current Limitations:
> - postcopy migration is not supported when the
>   VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. 
> - VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when
>   VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated.
> 
> Both of these limitations are due to resource contraints. They are not
> imposed for technical reasons.
> 
> Changes since V1:
>     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
>       to prevent corruption
>     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
>       startup and cache the returned value so that QEMU does not need to
>       query the backend every time vhost_backend_memslots_limit is called.
> 
> Best,
> Raphael
> 
> Raphael Norwitz (3):
>   Fixed assert in vhost_user_set_mem_table_postcopy
>   Refactor vhost_user_set_mem_table functions
>   Lift max memory slots limit imposed by vhost-user
> 
>  docs/interop/vhost-user.rst |  43 +++++
>  hw/virtio/vhost-user.c      | 385 +++++++++++++++++++++++++++++++++-----------
>  2 files changed, 336 insertions(+), 92 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy
  2020-01-16  2:57 ` [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy Raphael Norwitz
@ 2020-02-06  8:17   ` Michael S. Tsirkin
  2020-02-06  8:20     ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  8:17 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: Peter Turschmid, raphael.s.norwitz, qemu-devel

On Wed, Jan 15, 2020 at 09:57:04PM -0500, Raphael Norwitz wrote:
> The current vhost_user_set_mem_table_postcopy() implementation
> populates each region of the VHOST_USER_SET_MEM_TABLE message without
> first checking if there are more than VHOST_MEMORY_MAX_NREGIONS already
> populated. This can cause memory corruption if too many regions are
> added to the message during the postcopy step.
> 
> This change moves an existing assert up such that attempting to
> construct a VHOST_USER_SET_MEM_TABLE message with too many memory
> regions will gracefully bring down qemu instead of corrupting memory.
> 
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>


Could you pls add Fixes: and stable tags?

> ---
>  hw/virtio/vhost-user.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 2e81f55..cce851a 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -443,6 +443,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>                                       &offset);
>          fd = memory_region_get_fd(mr);
>          if (fd > 0) {
> +            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
>              trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
>                                                    reg->memory_size,
>                                                    reg->guest_phys_addr,
> @@ -455,7 +456,6 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>              msg.payload.memory.regions[fd_num].guest_phys_addr =
>                  reg->guest_phys_addr;
>              msg.payload.memory.regions[fd_num].mmap_offset = offset;
> -            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
>              fds[fd_num++] = fd;
>          } else {
>              u->region_rb_offset[i] = 0;
> -- 
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy
  2020-02-06  8:17   ` Michael S. Tsirkin
@ 2020-02-06  8:20     ` Michael S. Tsirkin
  2020-02-09 17:17       ` Raphael Norwitz
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  8:20 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: Peter Turschmid, raphael.s.norwitz, qemu-devel

On Thu, Feb 06, 2020 at 03:17:04AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 15, 2020 at 09:57:04PM -0500, Raphael Norwitz wrote:
> > The current vhost_user_set_mem_table_postcopy() implementation
> > populates each region of the VHOST_USER_SET_MEM_TABLE message without
> > first checking if there are more than VHOST_MEMORY_MAX_NREGIONS already
> > populated. This can cause memory corruption if too many regions are
> > added to the message during the postcopy step.
> > 
> > This change moves an existing assert up such that attempting to
> > construct a VHOST_USER_SET_MEM_TABLE message with too many memory
> > regions will gracefully bring down qemu instead of corrupting memory.
> > 
> > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> 
> 
> Could you pls add Fixes: and stable tags?

oh wait no, this is just a theoretical thing, right?
it doesn't actually trigger, it's just a cleanup.

no fixes/stable needed then, sorry

> > ---
> >  hw/virtio/vhost-user.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 2e81f55..cce851a 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -443,6 +443,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
> >                                       &offset);
> >          fd = memory_region_get_fd(mr);
> >          if (fd > 0) {
> > +            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
> >              trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
> >                                                    reg->memory_size,
> >                                                    reg->guest_phys_addr,
> > @@ -455,7 +456,6 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
> >              msg.payload.memory.regions[fd_num].guest_phys_addr =
> >                  reg->guest_phys_addr;
> >              msg.payload.memory.regions[fd_num].mmap_offset = offset;
> > -            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
> >              fds[fd_num++] = fd;
> >          } else {
> >              u->region_rb_offset[i] = 0;
> > -- 
> > 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions
  2020-01-16  2:57 ` [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions Raphael Norwitz
@ 2020-02-06  8:21   ` Michael S. Tsirkin
  2020-02-09 17:21     ` Raphael Norwitz
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  8:21 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: Peter Turschmid, raphael.s.norwitz, qemu-devel

On Wed, Jan 15, 2020 at 09:57:05PM -0500, Raphael Norwitz wrote:
> vhost_user_set_mem_table() and vhost_user_set_mem_table_postcopy() have
> gotten convoluted, and have some identical code.
> 
> This change moves the logic populating the VhostUserMemory struct and
> fds array from vhost_user_set_mem_table() and
> vhost_user_set_mem_table_postcopy() to a new function,
> vhost_user_fill_set_mem_table_msg().
> 
> No functionality is impacted.
> 
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>


Looks ok, but just cosmetics: let's have the flag say what
it does, not who uses it.

So s/postcopy/track_ramblocks/ ?


> ---
>  hw/virtio/vhost-user.c | 143 +++++++++++++++++++++++--------------------------
>  1 file changed, 67 insertions(+), 76 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cce851a..af83fdd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -407,18 +407,79 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>      return 0;
>  }
>  
> +static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
> +                                             struct vhost_dev *dev,
> +                                             VhostUserMsg *msg,
> +                                             int *fds, size_t *fd_num,
> +                                             bool postcopy)
> +{
> +    int i, fd;
> +    ram_addr_t offset;
> +    MemoryRegion *mr;
> +    struct vhost_memory_region *reg;
> +
> +    msg->hdr.request = VHOST_USER_SET_MEM_TABLE;
> +
> +    for (i = 0; i < dev->mem->nregions; ++i) {
> +        reg = dev->mem->regions + i;
> +
> +        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> +        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> +                                     &offset);
> +        fd = memory_region_get_fd(mr);
> +        if (fd > 0) {
> +            if (postcopy) {
> +                assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
> +                trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
> +                                                      reg->memory_size,
> +                                                      reg->guest_phys_addr,
> +                                                      reg->userspace_addr,
> +                                                      offset);
> +                u->region_rb_offset[i] = offset;
> +                u->region_rb[i] = mr->ram_block;
> +            } else if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
> +                error_report("Failed preparing vhost-user memory table msg");
> +                return -1;
> +            }
> +            msg->payload.memory.regions[*fd_num].userspace_addr =
> +                reg->userspace_addr;
> +            msg->payload.memory.regions[*fd_num].memory_size =
> +                reg->memory_size;
> +            msg->payload.memory.regions[*fd_num].guest_phys_addr =
> +                reg->guest_phys_addr;
> +            msg->payload.memory.regions[*fd_num].mmap_offset = offset;
> +            fds[(*fd_num)++] = fd;
> +        } else if (postcopy) {
> +            u->region_rb_offset[i] = 0;
> +            u->region_rb[i] = NULL;
> +        }
> +    }
> +
> +    msg->payload.memory.nregions = *fd_num;
> +
> +    if (!*fd_num) {
> +        error_report("Failed initializing vhost-user memory map, "
> +                     "consider using -object memory-backend-file share=on");
> +        return -1;
> +    }
> +
> +    msg->hdr.size = sizeof(msg->payload.memory.nregions);
> +    msg->hdr.size += sizeof(msg->payload.memory.padding);
> +    msg->hdr.size += *fd_num * sizeof(VhostUserMemoryRegion);
> +
> +    return 1;
> +}
> +
>  static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>                                               struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
>      int fds[VHOST_MEMORY_MAX_NREGIONS];
> -    int i, fd;
>      size_t fd_num = 0;
>      VhostUserMsg msg_reply;
>      int region_i, msg_i;
>  
>      VhostUserMsg msg = {
> -        .hdr.request = VHOST_USER_SET_MEM_TABLE,
>          .hdr.flags = VHOST_USER_VERSION,
>      };
>  
> @@ -433,48 +494,11 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>          u->region_rb_len = dev->mem->nregions;
>      }
>  
> -    for (i = 0; i < dev->mem->nregions; ++i) {
> -        struct vhost_memory_region *reg = dev->mem->regions + i;
> -        ram_addr_t offset;
> -        MemoryRegion *mr;
> -
> -        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> -        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> -                                     &offset);
> -        fd = memory_region_get_fd(mr);
> -        if (fd > 0) {
> -            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
> -            trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
> -                                                  reg->memory_size,
> -                                                  reg->guest_phys_addr,
> -                                                  reg->userspace_addr, offset);
> -            u->region_rb_offset[i] = offset;
> -            u->region_rb[i] = mr->ram_block;
> -            msg.payload.memory.regions[fd_num].userspace_addr =
> -                reg->userspace_addr;
> -            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
> -            msg.payload.memory.regions[fd_num].guest_phys_addr =
> -                reg->guest_phys_addr;
> -            msg.payload.memory.regions[fd_num].mmap_offset = offset;
> -            fds[fd_num++] = fd;
> -        } else {
> -            u->region_rb_offset[i] = 0;
> -            u->region_rb[i] = NULL;
> -        }
> -    }
> -
> -    msg.payload.memory.nregions = fd_num;
> -
> -    if (!fd_num) {
> -        error_report("Failed initializing vhost-user memory map, "
> -                     "consider using -object memory-backend-file share=on");
> +    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> +                                          true) < 0) {
>          return -1;
>      }
>  
> -    msg.hdr.size = sizeof(msg.payload.memory.nregions);
> -    msg.hdr.size += sizeof(msg.payload.memory.padding);
> -    msg.hdr.size += fd_num * sizeof(VhostUserMemoryRegion);
> -
>      if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
>          return -1;
>      }
> @@ -545,7 +569,6 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>  {
>      struct vhost_user *u = dev->opaque;
>      int fds[VHOST_MEMORY_MAX_NREGIONS];
> -    int i, fd;
>      size_t fd_num = 0;
>      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
>      bool reply_supported = virtio_has_feature(dev->protocol_features,
> @@ -559,7 +582,6 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>      }
>  
>      VhostUserMsg msg = {
> -        .hdr.request = VHOST_USER_SET_MEM_TABLE,
>          .hdr.flags = VHOST_USER_VERSION,
>      };
>  
> @@ -567,42 +589,11 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
>  
> -    for (i = 0; i < dev->mem->nregions; ++i) {
> -        struct vhost_memory_region *reg = dev->mem->regions + i;
> -        ram_addr_t offset;
> -        MemoryRegion *mr;
> -
> -        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> -        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> -                                     &offset);
> -        fd = memory_region_get_fd(mr);
> -        if (fd > 0) {
> -            if (fd_num == VHOST_MEMORY_MAX_NREGIONS) {
> -                error_report("Failed preparing vhost-user memory table msg");
> -                return -1;
> -            }
> -            msg.payload.memory.regions[fd_num].userspace_addr =
> -                reg->userspace_addr;
> -            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
> -            msg.payload.memory.regions[fd_num].guest_phys_addr =
> -                reg->guest_phys_addr;
> -            msg.payload.memory.regions[fd_num].mmap_offset = offset;
> -            fds[fd_num++] = fd;
> -        }
> -    }
> -
> -    msg.payload.memory.nregions = fd_num;
> -
> -    if (!fd_num) {
> -        error_report("Failed initializing vhost-user memory map, "
> -                     "consider using -object memory-backend-file share=on");
> +    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> +                                          false) < 0) {
>          return -1;
>      }
>  
> -    msg.hdr.size = sizeof(msg.payload.memory.nregions);
> -    msg.hdr.size += sizeof(msg.payload.memory.padding);
> -    msg.hdr.size += fd_num * sizeof(VhostUserMemoryRegion);
> -
>      if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
>          return -1;
>      }
> -- 
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
  2020-01-16  2:57 ` [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
@ 2020-02-06  8:32   ` Michael S. Tsirkin
  2020-02-09 17:43     ` Raphael Norwitz
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  8:32 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: Peter Turschmid, raphael.s.norwitz, qemu-devel

On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> The current vhost-user implementation in Qemu imposes a limit on the
> maximum number of memory slots exposed to a VM using a vhost-user
> device. This change provides a new protocol feature
> VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> allows a VM with a vhost-user device to expose a configurable number of
> memory slots, up to the ACPI defined maximum. Existing backends which
> do not support this protocol feature are unaffected.

Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
single message?  So can't we just increase the number (after negotiating
with remote) and be done with it, instead of add/remove?  Or is there
another reason to prefer add/remove?

> 
> This feature works by using three new messages,
> VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> number of memory slots the backend is willing to accept when the
> backend is initialized. Then, when the memory tables are set or updated,
> a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> are sent to transmit the regions to map and/or unmap instead of trying
> to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> message.
> 
> The vhost_user struct maintains a shadow state of the VM’s memory
> regions. When the memory tables are modified, the
> vhost_user_set_mem_table() function compares the new device memory state
> to the shadow state and only sends regions which need to be unmapped or
> mapped in. The regions which must be unmapped are sent first, followed
> by the new regions to be mapped in. After all the messages have been
> sent, the shadow state is set to the current virtual device state.
> 
> The current feature implementation does not work with postcopy migration
> and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> also been negotiated.

Hmm what would it take to lift the restrictions?
conflicting features like this makes is very hard for users to make
an informed choice what to support.


> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> Suggested-by: Mike Cui <cui@nutanix.com>
> ---
>  docs/interop/vhost-user.rst |  43 ++++++++
>  hw/virtio/vhost-user.c      | 254 ++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 275 insertions(+), 22 deletions(-)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 5f8b3a4..ae9acf2 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -786,6 +786,7 @@ Protocol features
>    #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
>    #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
>    #define VHOST_USER_PROTOCOL_F_RESET_DEVICE   13
> +  #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS   14
>  
>  Master message types
>  --------------------
> @@ -1205,6 +1206,48 @@ Master message types
>    Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
>    feature is set by the backend.
>  
> +``VHOST_USER_GET_MAX_MEM_SLOTS``
> +  :id: 35
> +  :equivalent ioctl: N/A
> +  :slave payload: u64
> +
> +  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> +  successfully negotiated, this message is submitted by master to the
> +  slave. The slave should return the message with a u64 payload
> +  containing the maximum number of memory slots for QEMU to expose to
> +  the guest. This message is not supported with postcopy migration or if
> +  the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> +
> +``VHOST_USER_ADD_MEM_REG``
> +  :id: 36
> +  :equivalent ioctl: N/A
> +  :slave payload: memory region
> +
> +  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> +  successfully negotiated, this message is submitted by master to the slave.
> +  The message payload contains a memory region descriptor struct, describing
> +  a region of guest memory which the slave device must map in. When the
> +  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
> +  negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is
> +  used to set and update the memory tables of the slave device. This message
> +  is not supported with postcopy migration or if the
> +  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> +
> +``VHOST_USER_REM_MEM_REG``
> +  :id: 37
> +  :equivalent ioctl: N/A
> +  :slave payload: memory region
> +
> +  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> +  successfully negotiated, this message is submitted by master to the slave.
> +  The message payload contains a memory region descriptor struct, describing
> +  a region of guest memory which the slave device must unmap. When the
> +  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
> +  negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is
> +  used to set and update the memory tables of the slave device. This message
> +  is not supported with postcopy migration or if the
> +  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> +
>  Slave message types
>  -------------------
>  
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index af83fdd..fed6d02 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -35,11 +35,29 @@
>  #include <linux/userfaultfd.h>
>  #endif
>  
> -#define VHOST_MEMORY_MAX_NREGIONS    8
> +#define VHOST_MEMORY_LEGACY_NREGIONS    8

Hardly legacy when this is intended to always be used e.g. with
postcopy, right?

>  #define VHOST_USER_F_PROTOCOL_FEATURES 30
>  #define VHOST_USER_SLAVE_MAX_FDS     8
>  
>  /*
> + * Set maximum number of RAM slots supported to
> + * the maximum number supported by the target
> + * hardware plaform.
> + */
> +#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
> +    defined(TARGET_ARM) || defined(TARGET_ARM_64)
> +#include "hw/acpi/acpi.h"
> +#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS
> +
> +#elif defined(TARGET_PPC) || defined(TARGET_PPC_64)
> +#include "hw/ppc/spapr.h"
> +#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS
> +
> +#else
> +#define VHOST_USER_MAX_RAM_SLOTS 512
> +#endif
> +
> +/*
>   * Maximum size of virtio device config space
>   */
>  #define VHOST_USER_MAX_CONFIG_SIZE 256
> @@ -59,6 +77,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
>      VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
>      VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
> +    VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS = 14,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>  
> @@ -100,6 +119,9 @@ typedef enum VhostUserRequest {
>      VHOST_USER_SET_INFLIGHT_FD = 32,
>      VHOST_USER_GPU_SET_SOCKET = 33,
>      VHOST_USER_RESET_DEVICE = 34,
> +    VHOST_USER_GET_MAX_MEM_SLOTS = 35,
> +    VHOST_USER_ADD_MEM_REG = 36,
> +    VHOST_USER_REM_MEM_REG = 37,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
> @@ -121,9 +143,14 @@ typedef struct VhostUserMemoryRegion {
>  typedef struct VhostUserMemory {
>      uint32_t nregions;
>      uint32_t padding;
> -    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> +    VhostUserMemoryRegion regions[VHOST_MEMORY_LEGACY_NREGIONS];
>  } VhostUserMemory;
>  
> +typedef struct VhostUserMemRegMsg {
> +    uint32_t padding;
> +    VhostUserMemoryRegion region;
> +} VhostUserMemRegMsg;
> +
>  typedef struct VhostUserLog {
>      uint64_t mmap_size;
>      uint64_t mmap_offset;
> @@ -182,6 +209,7 @@ typedef union {
>          struct vhost_vring_state state;
>          struct vhost_vring_addr addr;
>          VhostUserMemory memory;
> +        VhostUserMemRegMsg mem_reg;
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
> @@ -210,7 +238,7 @@ struct vhost_user {
>      int slave_fd;
>      NotifierWithReturn postcopy_notifier;
>      struct PostCopyFD  postcopy_fd;
> -    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> +    uint64_t           postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
>      /* Length of the region_rb and region_rb_offset arrays */
>      size_t             region_rb_len;
>      /* RAMBlock associated with a given region */
> @@ -222,6 +250,13 @@ struct vhost_user {
>  
>      /* True once we've entered postcopy_listen */
>      bool               postcopy_listen;
> +
> +    /* Maximum number of RAM slots supported by the backend */
> +    uint64_t memory_slots;
> +
> +    /* Our current regions */
> +    int num_shadow_regions;
> +    VhostUserMemoryRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS];
>  };
>  
>  static bool ioeventfd_enabled(void)
> @@ -370,7 +405,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd)
>  static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>                                     struct vhost_log *log)
>  {
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_USER_MAX_RAM_SLOTS];
>      size_t fd_num = 0;
>      bool shmfd = virtio_has_feature(dev->protocol_features,
>                                      VHOST_USER_PROTOCOL_F_LOG_SHMFD);
> @@ -429,7 +464,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
>          fd = memory_region_get_fd(mr);
>          if (fd > 0) {
>              if (postcopy) {
> -                assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
> +                assert(*fd_num < VHOST_MEMORY_LEGACY_NREGIONS);
>                  trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
>                                                        reg->memory_size,
>                                                        reg->guest_phys_addr,
> @@ -437,7 +472,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
>                                                        offset);
>                  u->region_rb_offset[i] = offset;
>                  u->region_rb[i] = mr->ram_block;
> -            } else if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
> +            } else if (*fd_num == VHOST_MEMORY_LEGACY_NREGIONS) {
>                  error_report("Failed preparing vhost-user memory table msg");
>                  return -1;
>              }
> @@ -474,7 +509,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>                                               struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
>      size_t fd_num = 0;
>      VhostUserMsg msg_reply;
>      int region_i, msg_i;
> @@ -523,7 +558,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>      }
>  
>      memset(u->postcopy_client_bases, 0,
> -           sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> +           sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
>  
>      /* They're in the same order as the regions that were sent
>       * but some of the regions were skipped (above) if they
> @@ -564,18 +599,151 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>      return 0;
>  }
>  
> +static inline bool reg_equal(VhostUserMemoryRegion *shadow_reg,
> +                             struct vhost_memory_region *vdev_reg)
> +{
> +    if (shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr &&
> +        shadow_reg->userspace_addr == vdev_reg->userspace_addr &&
> +        shadow_reg->memory_size == vdev_reg->memory_size) {
> +        return true;
> +    } else {
> +        return false;
> +    }
> +}
> +
> +static int vhost_user_send_add_remove_regions(struct vhost_dev *dev,
> +                                              struct vhost_memory *mem,
> +                                              VhostUserMsg *msg)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    int i, j, fd;
> +    bool found[VHOST_USER_MAX_RAM_SLOTS] = {};
> +    bool matching = false;
> +    struct vhost_memory_region *reg;
> +    ram_addr_t offset;
> +    MemoryRegion *mr;
> +
> +    /*
> +     * Ensure the VHOST_USER_PROTOCOL_F_REPLY_ACK has not been
> +     * negotiated and no postcopy migration is in progress.
> +     */
> +    assert(!virtio_has_feature(dev->protocol_features,
> +                               VHOST_USER_PROTOCOL_F_REPLY_ACK));
> +    if (u->postcopy_listen && u->postcopy_fd.handler) {
> +        error_report("Postcopy migration is not supported when the "
> +                     "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
> +                     "has been negotiated");
> +        return -1;
> +    }
> +
> +    msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
> +    msg->hdr.size += sizeof(VhostUserMemoryRegion);
> +
> +    /*
> +     * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
> +     * which are not found not in the device's memory state.

double negation - could not parse this.

> +     */
> +    for (i = 0; i < u->num_shadow_regions; ++i) {
> +        reg = dev->mem->regions;
> +
> +        for (j = 0; j < dev->mem->nregions; j++) {
> +            reg = dev->mem->regions + j;
> +
> +            assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> +            mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> +                                         &offset);
> +            fd = memory_region_get_fd(mr);
> +
> +            if (reg_equal(&u->shadow_regions[i], reg)) {
> +                matching = true;
> +                found[j] = true;
> +                break;
> +            }
> +        }
> +
> +        if (fd > 0 && !matching) {
> +            msg->hdr.request = VHOST_USER_REM_MEM_REG;
> +            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> +            msg->payload.mem_reg.region.memory_size = reg->memory_size;
> +            msg->payload.mem_reg.region.guest_phys_addr =
> +                reg->guest_phys_addr;
> +            msg->payload.mem_reg.region.mmap_offset = offset;
> +
> +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> +                return -1;
> +            }
> +        }
> +    }
> +
> +    /*
> +     * Send messages to add regions present in the device which are not
> +     * in our shadow state.
> +     */
> +    for (i = 0; i < dev->mem->nregions; ++i) {
> +        reg = dev->mem->regions + i;
> +
> +        /*
> +         * If the region was in both the shadow and vdev state we don't
> +         * need to send a VHOST_USER_ADD_MEM_REG message for it.
> +         */
> +        if (found[i]) {
> +            continue;
> +        }
> +
> +        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> +        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> +                                     &offset);
> +        fd = memory_region_get_fd(mr);
> +
> +        if (fd > 0) {
> +            msg->hdr.request = VHOST_USER_ADD_MEM_REG;
> +            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> +            msg->payload.mem_reg.region.memory_size = reg->memory_size;
> +            msg->payload.mem_reg.region.guest_phys_addr =
> +                reg->guest_phys_addr;
> +            msg->payload.mem_reg.region.mmap_offset = offset;
> +
> +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> +                return -1;
> +            }
> +        }
> +    }
> +
> +    /* Make our shadow state match the updated device state. */
> +    u->num_shadow_regions = dev->mem->nregions;
> +    for (i = 0; i < dev->mem->nregions; ++i) {
> +        reg = dev->mem->regions + i;
> +        u->shadow_regions[i].guest_phys_addr = reg->guest_phys_addr;
> +        u->shadow_regions[i].userspace_addr = reg->userspace_addr;
> +        u->shadow_regions[i].memory_size = reg->memory_size;
> +    }
> +
> +    return 0;
> +}
> +
>  static int vhost_user_set_mem_table(struct vhost_dev *dev,
>                                      struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
>      size_t fd_num = 0;
>      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
>      bool reply_supported = virtio_has_feature(dev->protocol_features,
>                                                VHOST_USER_PROTOCOL_F_REPLY_ACK);
> +    bool config_slots =
> +        virtio_has_feature(dev->protocol_features,
> +                           VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS);
>  
>      if (do_postcopy) {
> -        /* Postcopy has enough differences that it's best done in it's own
> +        if (config_slots) {
> +            error_report("Postcopy migration not supported with "
> +                         "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
> +                         "enabled.");
> +            return -1;
> +        }
> +
> +        /*
> +         * Postcopy has enough differences that it's best done in it's own
>           * version
>           */
>          return vhost_user_set_mem_table_postcopy(dev, mem);
> @@ -589,17 +757,22 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
>  
> -    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> -                                          false) < 0) {
> -        return -1;
> -    }
> -
> -    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> -        return -1;
> -    }
> +    if (config_slots && !reply_supported) {
> +        if (vhost_user_send_add_remove_regions(dev, mem, &msg) < 0) {
> +            return -1;
> +        }
> +    } else {
> +        if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> +                                              false) < 0) {
> +            return -1;
> +        }
> +        if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> +            return -1;
> +        }
>  
> -    if (reply_supported) {
> -        return process_message_reply(dev, &msg);
> +        if (reply_supported) {
> +            return process_message_reply(dev, &msg);
> +        }
>      }
>  
>      return 0;
> @@ -764,7 +937,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
>                                  VhostUserRequest request,
>                                  struct vhost_vring_file *file)
>  {
> -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> +    int fds[VHOST_USER_MAX_RAM_SLOTS];
>      size_t fd_num = 0;
>      VhostUserMsg msg = {
>          .hdr.request = request,
> @@ -866,6 +1039,23 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
>      return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
>  }
>  
> +static int vhost_user_get_max_memslots(struct vhost_dev *dev,
> +                                       uint64_t *max_memslots)
> +{
> +    uint64_t backend_max_memslots;
> +    int err;
> +
> +    err = vhost_user_get_u64(dev, VHOST_USER_GET_MAX_MEM_SLOTS,
> +                             &backend_max_memslots);
> +    if (err < 0) {
> +        return err;
> +    }
> +
> +    *max_memslots = MIN(backend_max_memslots, VHOST_USER_MAX_RAM_SLOTS);
> +
> +    return *max_memslots;
> +}
> +
>  static int vhost_user_set_owner(struct vhost_dev *dev)
>  {
>      VhostUserMsg msg = {
> @@ -1439,6 +1629,24 @@ static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
>                           "slave-req protocol features.");
>              return -1;
>          }
> +
> +        /* get max memory regions if backend supports configurable RAM slots */
> +        if (!virtio_has_feature(dev->protocol_features,
> +                                VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS)) {
> +            u->memory_slots = VHOST_MEMORY_LEGACY_NREGIONS;
> +        } else if (virtio_has_feature(dev->protocol_features,
> +                           VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> +            error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol "
> +                         "feature is not supported when the "
> +                         "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also "
> +                         "been negotiated");
> +            return -1;
> +        } else {
> +            err = vhost_user_get_max_memslots(dev, &u->memory_slots);
> +            if (err < 0) {
> +                return err;
> +            }
> +        }
>      }
>  
>      if (dev->migration_blocker == NULL &&
> @@ -1502,7 +1710,9 @@ static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx)
>  
>  static int vhost_user_memslots_limit(struct vhost_dev *dev)
>  {
> -    return VHOST_MEMORY_MAX_NREGIONS;
> +    struct vhost_user *u = dev->opaque;
> +
> +    return u->memory_slots;
>  }
>  
>  static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
> -- 
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation
  2020-01-16  2:57 [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
                   ` (3 preceding siblings ...)
  2020-01-31 21:21 ` [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
@ 2020-02-06  8:33 ` Michael S. Tsirkin
  2020-02-09 17:14   ` Raphael Norwitz
  4 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2020-02-06  8:33 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: raphael.s.norwitz, qemu-devel

On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> In QEMU today, a VM with a vhost-user device can hot add memory a
> maximum of 8 times. See these threads, among others:
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
>     https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html
> 
> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html
> 
> This series introduces a new protocol feature
> VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
> restriction on the maximum number RAM slots imposed by vhost-user.
> 
> The patch consists of 3 changes:
> 1. Fixed assert in vhost_user_set_mem_table_postcopy:
>    This is a bug fix in the postcopy migration path
> 2. Refactor vhost_user_set_mem_table functions:
>    This is a non-functional change refractoring the
>    vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy
>    functions such that the feature can be more cleanly added.
> 3. Lift max memory slots limit imposed by vhost-user:
>    This change introduces the new protocol feature
>    VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS.
> 
> The implementation details are explained in more detail in the commit
> messages, but at a high level the new protocol feature works as follows:
> - If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will
>   send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
>   messages to map and unmap individual memory regions instead of one large
>   VHOST_USER_SET_MEM_TABLE message containing all memory regions.
> - The vhost-user struct maintains a ’shadow state’ of memory regions
>   already sent to the guest. Each time vhost_user_set_mem_table is called,
>   the shadow state is compared with the new device state. A
>   VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
>   not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
>   for each region in the device state but not the shadow state. After
>   these messages have been sent, the shadow state will be updated to
>   reflect the new device state.
> 
> The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of
> regions grows, the message becomes very large. In practice, such large
> messages caused problems (truncated messages) and in the past it seems the
> community has opted for smaller fixed size messages where possible. VRINGs,
> for example, are sent to the backend individually instead of in one massive
> message.
> 
> Current Limitations:
> - postcopy migration is not supported when the
>   VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. 
> - VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when
>   VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated.
> 
> Both of these limitations are due to resource contraints. They are not
> imposed for technical reasons.
> 
> Changes since V1:
>     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
>       to prevent corruption
>     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
>       startup and cache the returned value so that QEMU does not need to
>       query the backend every time vhost_backend_memslots_limit is called.

I'm a bit confused about what happens on reconnect.
Can you clarify pls?


> Best,
> Raphael
> 
> Raphael Norwitz (3):
>   Fixed assert in vhost_user_set_mem_table_postcopy
>   Refactor vhost_user_set_mem_table functions
>   Lift max memory slots limit imposed by vhost-user
> 
>  docs/interop/vhost-user.rst |  43 +++++
>  hw/virtio/vhost-user.c      | 385 +++++++++++++++++++++++++++++++++-----------
>  2 files changed, 336 insertions(+), 92 deletions(-)
> 
> -- 
> 1.8.3.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation
  2020-02-06  8:33 ` Michael S. Tsirkin
@ 2020-02-09 17:14   ` Raphael Norwitz
  2020-02-10 16:04     ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Raphael Norwitz @ 2020-02-09 17:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> 
> On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > 
> > Changes since V1:
> >     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> >       to prevent corruption
> >     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> >       startup and cache the returned value so that QEMU does not need to
> >       query the backend every time vhost_backend_memslots_limit is called.
> 
> I'm a bit confused about what happens on reconnect.
> Can you clarify pls?
> 
From what I can see, backends which support reconnect call vhost_dev_init,
which then calls vhost_user_backend_init(), as vhost-user-blk does here:
https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
ram slots limit is fetched in vhost_user_backend_init() so every time the
device reconnects the limit should be refetched. 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy
  2020-02-06  8:20     ` Michael S. Tsirkin
@ 2020-02-09 17:17       ` Raphael Norwitz
  0 siblings, 0 replies; 19+ messages in thread
From: Raphael Norwitz @ 2020-02-09 17:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Yes - it's just a cleanup.

On Thu, Feb 06, 2020 at 03:20:01AM -0500, Michael S. Tsirkin wrote:
> 
> On Thu, Feb 06, 2020 at 03:17:04AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 15, 2020 at 09:57:04PM -0500, Raphael Norwitz wrote:
> > > The current vhost_user_set_mem_table_postcopy() implementation
> > > populates each region of the VHOST_USER_SET_MEM_TABLE message without
> > > first checking if there are more than VHOST_MEMORY_MAX_NREGIONS already
> > > populated. This can cause memory corruption if too many regions are
> > > added to the message during the postcopy step.
> > > 
> > > This change moves an existing assert up such that attempting to
> > > construct a VHOST_USER_SET_MEM_TABLE message with too many memory
> > > regions will gracefully bring down qemu instead of corrupting memory.
> > > 
> > > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > > Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> > 
> > 
> > Could you pls add Fixes: and stable tags?
> 
> oh wait no, this is just a theoretical thing, right?
> it doesn't actually trigger, it's just a cleanup.
> 
> no fixes/stable needed then, sorry
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions
  2020-02-06  8:21   ` Michael S. Tsirkin
@ 2020-02-09 17:21     ` Raphael Norwitz
  0 siblings, 0 replies; 19+ messages in thread
From: Raphael Norwitz @ 2020-02-09 17:21 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Sounds good

On Thu, Feb 06, 2020 at 03:21:42AM -0500, Michael S. Tsirkin wrote:
> 
> On Wed, Jan 15, 2020 at 09:57:05PM -0500, Raphael Norwitz wrote:
> > vhost_user_set_mem_table() and vhost_user_set_mem_table_postcopy() have
> > gotten convoluted, and have some identical code.
> > 
> > This change moves the logic populating the VhostUserMemory struct and
> > fds array from vhost_user_set_mem_table() and
> > vhost_user_set_mem_table_postcopy() to a new function,
> > vhost_user_fill_set_mem_table_msg().
> > 
> > No functionality is impacted.
> > 
> > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> 
> 
> Looks ok, but just cosmetics: let's have the flag say what
> it does, not who uses it.
> 
> So s/postcopy/track_ramblocks/ ?
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
  2020-02-06  8:32   ` Michael S. Tsirkin
@ 2020-02-09 17:43     ` Raphael Norwitz
  2020-02-20  7:03       ` Raphael Norwitz
  2020-02-25 12:07       ` Michael S. Tsirkin
  0 siblings, 2 replies; 19+ messages in thread
From: Raphael Norwitz @ 2020-02-09 17:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Thu, Feb 06, 2020 at 03:32:38AM -0500, Michael S. Tsirkin wrote:
> 
> On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> > The current vhost-user implementation in Qemu imposes a limit on the
> > maximum number of memory slots exposed to a VM using a vhost-user
> > device. This change provides a new protocol feature
> > VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> > allows a VM with a vhost-user device to expose a configurable number of
> > memory slots, up to the ACPI defined maximum. Existing backends which
> > do not support this protocol feature are unaffected.
> 
> Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
> single message?  So can't we just increase the number (after negotiating
> with remote) and be done with it, instead of add/remove?  Or is there
> another reason to prefer add/remove?
>

As mentioned in my cover letter, we experimented with simply increasing the
message size and it didn’t work on our setup. We debugged down to the socket
layer and found that on the receiving end the messages were truncated at
around 512 bytes, or around 16 memory regions. To support 512 memory regions
we would need a message size of around  32 <bytes per region> * 512 <regions>
+ 8 <bytes for padding and region count> ~= 16k packet size. That would be 64
times larger than the next largest message size. We thought it would be cleaner
and more in line with the rest of the protocol to keep the message sizes
smaller. In particular, we thought memory regions should be treated like the
rings, which are sent over one message at a time instead of in one large message.
Whether or not such a large message size can be made to work in our case,
separate messages will always work on Linux, and most likely all other UNIX
platforms QEMU is used on.

> > 
> > This feature works by using three new messages,
> > VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> > VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> > number of memory slots the backend is willing to accept when the
> > backend is initialized. Then, when the memory tables are set or updated,
> > a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> > are sent to transmit the regions to map and/or unmap instead of trying
> > to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> > message.
> > 
> > The vhost_user struct maintains a shadow state of the VM’s memory
> > regions. When the memory tables are modified, the
> > vhost_user_set_mem_table() function compares the new device memory state
> > to the shadow state and only sends regions which need to be unmapped or
> > mapped in. The regions which must be unmapped are sent first, followed
> > by the new regions to be mapped in. After all the messages have been
> > sent, the shadow state is set to the current virtual device state.
> > 
> > The current feature implementation does not work with postcopy migration
> > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> > also been negotiated.
> 
> Hmm what would it take to lift the restrictions?
> conflicting features like this makes is very hard for users to make
> an informed choice what to support.
>

We would need a setup with a backend which supports these features (REPLY_ACK
and postcopy migration). At first glance it looks like DPDK could work but
I'm not sure how easy it will be to test postcopy migration with the resources
we have.
 
> > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> > Suggested-by: Mike Cui <cui@nutanix.com>
> > ---
> >  docs/interop/vhost-user.rst |  43 ++++++++
> >  hw/virtio/vhost-user.c      | 254 ++++++++++++++++++++++++++++++++++++++++----
> >  2 files changed, 275 insertions(+), 22 deletions(-)
> > 
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index 5f8b3a4..ae9acf2 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -786,6 +786,7 @@ Protocol features
> >    #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
> >    #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
> >    #define VHOST_USER_PROTOCOL_F_RESET_DEVICE   13
> > +  #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS   14
> >  
> >  Master message types
> >  --------------------
> > @@ -1205,6 +1206,48 @@ Master message types
> >    Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
> >    feature is set by the backend.
> >  
> > +``VHOST_USER_GET_MAX_MEM_SLOTS``
> > +  :id: 35
> > +  :equivalent ioctl: N/A
> > +  :slave payload: u64
> > +
> > +  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> > +  successfully negotiated, this message is submitted by master to the
> > +  slave. The slave should return the message with a u64 payload
> > +  containing the maximum number of memory slots for QEMU to expose to
> > +  the guest. This message is not supported with postcopy migration or if
> > +  the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> > +
> > +``VHOST_USER_ADD_MEM_REG``
> > +  :id: 36
> > +  :equivalent ioctl: N/A
> > +  :slave payload: memory region
> > +
> > +  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> > +  successfully negotiated, this message is submitted by master to the slave.
> > +  The message payload contains a memory region descriptor struct, describing
> > +  a region of guest memory which the slave device must map in. When the
> > +  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
> > +  negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is
> > +  used to set and update the memory tables of the slave device. This message
> > +  is not supported with postcopy migration or if the
> > +  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> > +
> > +``VHOST_USER_REM_MEM_REG``
> > +  :id: 37
> > +  :equivalent ioctl: N/A
> > +  :slave payload: memory region
> > +
> > +  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
> > +  successfully negotiated, this message is submitted by master to the slave.
> > +  The message payload contains a memory region descriptor struct, describing
> > +  a region of guest memory which the slave device must unmap. When the
> > +  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
> > +  negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is
> > +  used to set and update the memory tables of the slave device. This message
> > +  is not supported with postcopy migration or if the
> > +  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
> > +
> >  Slave message types
> >  -------------------
> >  
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index af83fdd..fed6d02 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -35,11 +35,29 @@
> >  #include <linux/userfaultfd.h>
> >  #endif
> >  
> > -#define VHOST_MEMORY_MAX_NREGIONS    8
> > +#define VHOST_MEMORY_LEGACY_NREGIONS    8
> 
> Hardly legacy when this is intended to always be used e.g. with
> postcopy, right?
>

How about 'BASELINE'?
 
> >  #define VHOST_USER_F_PROTOCOL_FEATURES 30
> >  #define VHOST_USER_SLAVE_MAX_FDS     8
> >  
> >  /*
> > + * Set maximum number of RAM slots supported to
> > + * the maximum number supported by the target
> > + * hardware plaform.
> > + */
> > +#if defined(TARGET_X86) || defined(TARGET_X86_64) || \
> > +    defined(TARGET_ARM) || defined(TARGET_ARM_64)
> > +#include "hw/acpi/acpi.h"
> > +#define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS
> > +
> > +#elif defined(TARGET_PPC) || defined(TARGET_PPC_64)
> > +#include "hw/ppc/spapr.h"
> > +#define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS
> > +
> > +#else
> > +#define VHOST_USER_MAX_RAM_SLOTS 512
> > +#endif
> > +
> > +/*
> >   * Maximum size of virtio device config space
> >   */
> >  #define VHOST_USER_MAX_CONFIG_SIZE 256
> > @@ -59,6 +77,7 @@ enum VhostUserProtocolFeature {
> >      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> >      VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
> >      VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
> > +    VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS = 14,
> >      VHOST_USER_PROTOCOL_F_MAX
> >  };
> >  
> > @@ -100,6 +119,9 @@ typedef enum VhostUserRequest {
> >      VHOST_USER_SET_INFLIGHT_FD = 32,
> >      VHOST_USER_GPU_SET_SOCKET = 33,
> >      VHOST_USER_RESET_DEVICE = 34,
> > +    VHOST_USER_GET_MAX_MEM_SLOTS = 35,
> > +    VHOST_USER_ADD_MEM_REG = 36,
> > +    VHOST_USER_REM_MEM_REG = 37,
> >      VHOST_USER_MAX
> >  } VhostUserRequest;
> >  
> > @@ -121,9 +143,14 @@ typedef struct VhostUserMemoryRegion {
> >  typedef struct VhostUserMemory {
> >      uint32_t nregions;
> >      uint32_t padding;
> > -    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> > +    VhostUserMemoryRegion regions[VHOST_MEMORY_LEGACY_NREGIONS];
> >  } VhostUserMemory;
> >  
> > +typedef struct VhostUserMemRegMsg {
> > +    uint32_t padding;
> > +    VhostUserMemoryRegion region;
> > +} VhostUserMemRegMsg;
> > +
> >  typedef struct VhostUserLog {
> >      uint64_t mmap_size;
> >      uint64_t mmap_offset;
> > @@ -182,6 +209,7 @@ typedef union {
> >          struct vhost_vring_state state;
> >          struct vhost_vring_addr addr;
> >          VhostUserMemory memory;
> > +        VhostUserMemRegMsg mem_reg;
> >          VhostUserLog log;
> >          struct vhost_iotlb_msg iotlb;
> >          VhostUserConfig config;
> > @@ -210,7 +238,7 @@ struct vhost_user {
> >      int slave_fd;
> >      NotifierWithReturn postcopy_notifier;
> >      struct PostCopyFD  postcopy_fd;
> > -    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> > +    uint64_t           postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
> >      /* Length of the region_rb and region_rb_offset arrays */
> >      size_t             region_rb_len;
> >      /* RAMBlock associated with a given region */
> > @@ -222,6 +250,13 @@ struct vhost_user {
> >  
> >      /* True once we've entered postcopy_listen */
> >      bool               postcopy_listen;
> > +
> > +    /* Maximum number of RAM slots supported by the backend */
> > +    uint64_t memory_slots;
> > +
> > +    /* Our current regions */
> > +    int num_shadow_regions;
> > +    VhostUserMemoryRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS];
> >  };
> >  
> >  static bool ioeventfd_enabled(void)
> > @@ -370,7 +405,7 @@ int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd)
> >  static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
> >                                     struct vhost_log *log)
> >  {
> > -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> > +    int fds[VHOST_USER_MAX_RAM_SLOTS];
> >      size_t fd_num = 0;
> >      bool shmfd = virtio_has_feature(dev->protocol_features,
> >                                      VHOST_USER_PROTOCOL_F_LOG_SHMFD);
> > @@ -429,7 +464,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
> >          fd = memory_region_get_fd(mr);
> >          if (fd > 0) {
> >              if (postcopy) {
> > -                assert(*fd_num < VHOST_MEMORY_MAX_NREGIONS);
> > +                assert(*fd_num < VHOST_MEMORY_LEGACY_NREGIONS);
> >                  trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
> >                                                        reg->memory_size,
> >                                                        reg->guest_phys_addr,
> > @@ -437,7 +472,7 @@ static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
> >                                                        offset);
> >                  u->region_rb_offset[i] = offset;
> >                  u->region_rb[i] = mr->ram_block;
> > -            } else if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
> > +            } else if (*fd_num == VHOST_MEMORY_LEGACY_NREGIONS) {
> >                  error_report("Failed preparing vhost-user memory table msg");
> >                  return -1;
> >              }
> > @@ -474,7 +509,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
> >                                               struct vhost_memory *mem)
> >  {
> >      struct vhost_user *u = dev->opaque;
> > -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> > +    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
> >      size_t fd_num = 0;
> >      VhostUserMsg msg_reply;
> >      int region_i, msg_i;
> > @@ -523,7 +558,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
> >      }
> >  
> >      memset(u->postcopy_client_bases, 0,
> > -           sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> > +           sizeof(uint64_t) * VHOST_USER_MAX_RAM_SLOTS);
> >  
> >      /* They're in the same order as the regions that were sent
> >       * but some of the regions were skipped (above) if they
> > @@ -564,18 +599,151 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
> >      return 0;
> >  }
> >  
> > +static inline bool reg_equal(VhostUserMemoryRegion *shadow_reg,
> > +                             struct vhost_memory_region *vdev_reg)
> > +{
> > +    if (shadow_reg->guest_phys_addr == vdev_reg->guest_phys_addr &&
> > +        shadow_reg->userspace_addr == vdev_reg->userspace_addr &&
> > +        shadow_reg->memory_size == vdev_reg->memory_size) {
> > +        return true;
> > +    } else {
> > +        return false;
> > +    }
> > +}
> > +
> > +static int vhost_user_send_add_remove_regions(struct vhost_dev *dev,
> > +                                              struct vhost_memory *mem,
> > +                                              VhostUserMsg *msg)
> > +{
> > +    struct vhost_user *u = dev->opaque;
> > +    int i, j, fd;
> > +    bool found[VHOST_USER_MAX_RAM_SLOTS] = {};
> > +    bool matching = false;
> > +    struct vhost_memory_region *reg;
> > +    ram_addr_t offset;
> > +    MemoryRegion *mr;
> > +
> > +    /*
> > +     * Ensure the VHOST_USER_PROTOCOL_F_REPLY_ACK has not been
> > +     * negotiated and no postcopy migration is in progress.
> > +     */
> > +    assert(!virtio_has_feature(dev->protocol_features,
> > +                               VHOST_USER_PROTOCOL_F_REPLY_ACK));
> > +    if (u->postcopy_listen && u->postcopy_fd.handler) {
> > +        error_report("Postcopy migration is not supported when the "
> > +                     "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
> > +                     "has been negotiated");
> > +        return -1;
> > +    }
> > +
> > +    msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
> > +    msg->hdr.size += sizeof(VhostUserMemoryRegion);
> > +
> > +    /*
> > +     * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
> > +     * which are not found not in the device's memory state.
> 
> double negation - could not parse this.
>

Apologies - typo here. It should say “Send VHOST_USER_REM_MEM_REG for memory
regions in our shadow state which are not found in the device's memory state.” 
i.e. send messages to remove regions in the shadow state but not in the updated
device state. 
 
> > +     */
> > +    for (i = 0; i < u->num_shadow_regions; ++i) {
> > +        reg = dev->mem->regions;
> > +
> > +        for (j = 0; j < dev->mem->nregions; j++) {
> > +            reg = dev->mem->regions + j;
> > +
> > +            assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> > +            mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> > +                                         &offset);
> > +            fd = memory_region_get_fd(mr);
> > +
> > +            if (reg_equal(&u->shadow_regions[i], reg)) {
> > +                matching = true;
> > +                found[j] = true;
> > +                break;
> > +            }
> > +        }
> > +
> > +        if (fd > 0 && !matching) {
> > +            msg->hdr.request = VHOST_USER_REM_MEM_REG;
> > +            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> > +            msg->payload.mem_reg.region.memory_size = reg->memory_size;
> > +            msg->payload.mem_reg.region.guest_phys_addr =
> > +                reg->guest_phys_addr;
> > +            msg->payload.mem_reg.region.mmap_offset = offset;
> > +
> > +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> > +                return -1;
> > +            }
> > +        }
> > +    }
> > +
> > +    /*
> > +     * Send messages to add regions present in the device which are not
> > +     * in our shadow state.
> > +     */
> > +    for (i = 0; i < dev->mem->nregions; ++i) {
> > +        reg = dev->mem->regions + i;
> > +
> > +        /*
> > +         * If the region was in both the shadow and vdev state we don't
> > +         * need to send a VHOST_USER_ADD_MEM_REG message for it.
> > +         */
> > +        if (found[i]) {
> > +            continue;
> > +        }
> > +
> > +        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> > +        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> > +                                     &offset);
> > +        fd = memory_region_get_fd(mr);
> > +
> > +        if (fd > 0) {
> > +            msg->hdr.request = VHOST_USER_ADD_MEM_REG;
> > +            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> > +            msg->payload.mem_reg.region.memory_size = reg->memory_size;
> > +            msg->payload.mem_reg.region.guest_phys_addr =
> > +                reg->guest_phys_addr;
> > +            msg->payload.mem_reg.region.mmap_offset = offset;
> > +
> > +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> > +                return -1;
> > +            }
> > +        }
> > +    }
> > +
> > +    /* Make our shadow state match the updated device state. */
> > +    u->num_shadow_regions = dev->mem->nregions;
> > +    for (i = 0; i < dev->mem->nregions; ++i) {
> > +        reg = dev->mem->regions + i;
> > +        u->shadow_regions[i].guest_phys_addr = reg->guest_phys_addr;
> > +        u->shadow_regions[i].userspace_addr = reg->userspace_addr;
> > +        u->shadow_regions[i].memory_size = reg->memory_size;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static int vhost_user_set_mem_table(struct vhost_dev *dev,
> >                                      struct vhost_memory *mem)
> >  {
> >      struct vhost_user *u = dev->opaque;
> > -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> > +    int fds[VHOST_MEMORY_LEGACY_NREGIONS];
> >      size_t fd_num = 0;
> >      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
> >      bool reply_supported = virtio_has_feature(dev->protocol_features,
> >                                                VHOST_USER_PROTOCOL_F_REPLY_ACK);
> > +    bool config_slots =
> > +        virtio_has_feature(dev->protocol_features,
> > +                           VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS);
> >  
> >      if (do_postcopy) {
> > -        /* Postcopy has enough differences that it's best done in it's own
> > +        if (config_slots) {
> > +            error_report("Postcopy migration not supported with "
> > +                         "VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS feature "
> > +                         "enabled.");
> > +            return -1;
> > +        }
> > +
> > +        /*
> > +         * Postcopy has enough differences that it's best done in it's own
> >           * version
> >           */
> >          return vhost_user_set_mem_table_postcopy(dev, mem);
> > @@ -589,17 +757,22 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
> >          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
> >      }
> >  
> > -    if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> > -                                          false) < 0) {
> > -        return -1;
> > -    }
> > -
> > -    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> > -        return -1;
> > -    }
> > +    if (config_slots && !reply_supported) {
> > +        if (vhost_user_send_add_remove_regions(dev, mem, &msg) < 0) {
> > +            return -1;
> > +        }
> > +    } else {
> > +        if (vhost_user_fill_set_mem_table_msg(u, dev, &msg, fds, &fd_num,
> > +                                              false) < 0) {
> > +            return -1;
> > +        }
> > +        if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
> > +            return -1;
> > +        }
> >  
> > -    if (reply_supported) {
> > -        return process_message_reply(dev, &msg);
> > +        if (reply_supported) {
> > +            return process_message_reply(dev, &msg);
> > +        }
> >      }
> >  
> >      return 0;
> > @@ -764,7 +937,7 @@ static int vhost_set_vring_file(struct vhost_dev *dev,
> >                                  VhostUserRequest request,
> >                                  struct vhost_vring_file *file)
> >  {
> > -    int fds[VHOST_MEMORY_MAX_NREGIONS];
> > +    int fds[VHOST_USER_MAX_RAM_SLOTS];
> >      size_t fd_num = 0;
> >      VhostUserMsg msg = {
> >          .hdr.request = request,
> > @@ -866,6 +1039,23 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
> >      return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
> >  }
> >  
> > +static int vhost_user_get_max_memslots(struct vhost_dev *dev,
> > +                                       uint64_t *max_memslots)
> > +{
> > +    uint64_t backend_max_memslots;
> > +    int err;
> > +
> > +    err = vhost_user_get_u64(dev, VHOST_USER_GET_MAX_MEM_SLOTS,
> > +                             &backend_max_memslots);
> > +    if (err < 0) {
> > +        return err;
> > +    }
> > +
> > +    *max_memslots = MIN(backend_max_memslots, VHOST_USER_MAX_RAM_SLOTS);
> > +
> > +    return *max_memslots;
> > +}
> > +
> >  static int vhost_user_set_owner(struct vhost_dev *dev)
> >  {
> >      VhostUserMsg msg = {
> > @@ -1439,6 +1629,24 @@ static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
> >                           "slave-req protocol features.");
> >              return -1;
> >          }
> > +
> > +        /* get max memory regions if backend supports configurable RAM slots */
> > +        if (!virtio_has_feature(dev->protocol_features,
> > +                                VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS)) {
> > +            u->memory_slots = VHOST_MEMORY_LEGACY_NREGIONS;
> > +        } else if (virtio_has_feature(dev->protocol_features,
> > +                           VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > +            error_report("The VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol "
> > +                         "feature is not supported when the "
> > +                         "VHOST_USER_PROTOCOL_F_REPLY_ACK features has also "
> > +                         "been negotiated");
> > +            return -1;
> > +        } else {
> > +            err = vhost_user_get_max_memslots(dev, &u->memory_slots);
> > +            if (err < 0) {
> > +                return err;
> > +            }
> > +        }
> >      }
> >  
> >      if (dev->migration_blocker == NULL &&
> > @@ -1502,7 +1710,9 @@ static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx)
> >  
> >  static int vhost_user_memslots_limit(struct vhost_dev *dev)
> >  {
> > -    return VHOST_MEMORY_MAX_NREGIONS;
> > +    struct vhost_user *u = dev->opaque;
> > +
> > +    return u->memory_slots;
> >  }
> >  
> >  static bool vhost_user_requires_shm_log(struct vhost_dev *dev)
> > -- 
> > 1.8.3.1
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation
  2020-02-09 17:14   ` Raphael Norwitz
@ 2020-02-10 16:04     ` Michael S. Tsirkin
  2020-02-19  5:33       ` Raphael Norwitz
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2020-02-10 16:04 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: qemu-devel

On Sun, Feb 09, 2020 at 12:14:42PM -0500, Raphael Norwitz wrote:
> On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> > 
> > On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > > 
> > > Changes since V1:
> > >     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> > >       to prevent corruption
> > >     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> > >       startup and cache the returned value so that QEMU does not need to
> > >       query the backend every time vhost_backend_memslots_limit is called.
> > 
> > I'm a bit confused about what happens on reconnect.
> > Can you clarify pls?
> > 
> >From what I can see, backends which support reconnect call vhost_dev_init,
> which then calls vhost_user_backend_init(), as vhost-user-blk does here:
> https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
> ram slots limit is fetched in vhost_user_backend_init() so every time the
> device reconnects the limit should be refetched. 

Right. Point is, we might have validated using an old limit.
Reconnect needs to verify limit did not change or at least
did not decrease.

-- 
MST



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation
  2020-02-10 16:04     ` Michael S. Tsirkin
@ 2020-02-19  5:33       ` Raphael Norwitz
  2020-02-19 10:08         ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Raphael Norwitz @ 2020-02-19  5:33 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, Feb 10, 2020 at 11:04:28AM -0500, Michael S. Tsirkin wrote:
> 
> On Sun, Feb 09, 2020 at 12:14:42PM -0500, Raphael Norwitz wrote:
> > On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> > > 
> > > On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > > > 
> > > > Changes since V1:
> > > >     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> > > >       to prevent corruption
> > > >     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> > > >       startup and cache the returned value so that QEMU does not need to
> > > >       query the backend every time vhost_backend_memslots_limit is called.
> > > 
> > > I'm a bit confused about what happens on reconnect.
> > > Can you clarify pls?
> > > 
> > >From what I can see, backends which support reconnect call vhost_dev_init,
> > which then calls vhost_user_backend_init(), as vhost-user-blk does here:
> > https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
> > ram slots limit is fetched in vhost_user_backend_init() so every time the
> > device reconnects the limit should be refetched. 
> 
> Right. Point is, we might have validated using an old limit.
> Reconnect needs to verify limit did not change or at least
> did not decrease.
> 
> -- 
> MST
Good point - I did not consider this case. Could we keep the slots limit in
the VhostUserState instead?

Say vhost_user_init() initializes the limit inside the VhostUserState to 0. Then,
vhost_user_backend_init() checks if this limit is 0. If so, this is the initial
connection and qemu fetches the limit from the backend, ensures the returned
value is nonzero, and sets the limit the VhostUserState. If not, qemu knows this
is a reconnect and queries the backend slots limit. If the returned value does
not equal the limit in the VhostUserState, vhost_user_backend_init() returns an
error.

Thoughts?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation
  2020-02-19  5:33       ` Raphael Norwitz
@ 2020-02-19 10:08         ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2020-02-19 10:08 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: qemu-devel

On Wed, Feb 19, 2020 at 12:33:24AM -0500, Raphael Norwitz wrote:
> On Mon, Feb 10, 2020 at 11:04:28AM -0500, Michael S. Tsirkin wrote:
> > 
> > On Sun, Feb 09, 2020 at 12:14:42PM -0500, Raphael Norwitz wrote:
> > > On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> > > > 
> > > > On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > > > > 
> > > > > Changes since V1:
> > > > >     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> > > > >       to prevent corruption
> > > > >     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> > > > >       startup and cache the returned value so that QEMU does not need to
> > > > >       query the backend every time vhost_backend_memslots_limit is called.
> > > > 
> > > > I'm a bit confused about what happens on reconnect.
> > > > Can you clarify pls?
> > > > 
> > > >From what I can see, backends which support reconnect call vhost_dev_init,
> > > which then calls vhost_user_backend_init(), as vhost-user-blk does here:
> > > https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
> > > ram slots limit is fetched in vhost_user_backend_init() so every time the
> > > device reconnects the limit should be refetched. 
> > 
> > Right. Point is, we might have validated using an old limit.
> > Reconnect needs to verify limit did not change or at least
> > did not decrease.
> > 
> > -- 
> > MST
> Good point - I did not consider this case. Could we keep the slots limit in
> the VhostUserState instead?
> 
> Say vhost_user_init() initializes the limit inside the VhostUserState to 0. Then,
> vhost_user_backend_init() checks if this limit is 0. If so, this is the initial
> connection and qemu fetches the limit from the backend, ensures the returned
> value is nonzero, and sets the limit the VhostUserState. If not, qemu knows this
> is a reconnect and queries the backend slots limit. If the returned value does
> not equal the limit in the VhostUserState, vhost_user_backend_init() returns an
> error.
> 
> Thoughts?

Right.
Or if we really want to, check backend value is >= the saved one.
Basically same thing we do with features.

-- 
MST



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
  2020-02-09 17:43     ` Raphael Norwitz
@ 2020-02-20  7:03       ` Raphael Norwitz
  2020-02-25 12:07       ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: Raphael Norwitz @ 2020-02-20  7:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Ping

On Sun, Feb 09, 2020 at 12:43:35PM -0500, Raphael Norwitz wrote:
> 
> On Thu, Feb 06, 2020 at 03:32:38AM -0500, Michael S. Tsirkin wrote:
> > 
> > On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> > > The current vhost-user implementation in Qemu imposes a limit on the
> > > maximum number of memory slots exposed to a VM using a vhost-user
> > > device. This change provides a new protocol feature
> > > VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> > > allows a VM with a vhost-user device to expose a configurable number of
> > > memory slots, up to the ACPI defined maximum. Existing backends which
> > > do not support this protocol feature are unaffected.
> > 
> > Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
> > single message?  So can't we just increase the number (after negotiating
> > with remote) and be done with it, instead of add/remove?  Or is there
> > another reason to prefer add/remove?
> >
> 
> As mentioned in my cover letter, we experimented with simply increasing the
> message size and it didn’t work on our setup. We debugged down to the socket
> layer and found that on the receiving end the messages were truncated at
> around 512 bytes, or around 16 memory regions. To support 512 memory regions
> we would need a message size of around  32 <bytes per region> * 512 <regions>
> + 8 <bytes for padding and region count> ~= 16k packet size. That would be 64
> times larger than the next largest message size. We thought it would be cleaner
> and more in line with the rest of the protocol to keep the message sizes
> smaller. In particular, we thought memory regions should be treated like the
> rings, which are sent over one message at a time instead of in one large message.
> Whether or not such a large message size can be made to work in our case,
> separate messages will always work on Linux, and most likely all other UNIX
> platforms QEMU is used on.
> 

> > > 
> > > This feature works by using three new messages,
> > > VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> > > VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> > > number of memory slots the backend is willing to accept when the
> > > backend is initialized. Then, when the memory tables are set or updated,
> > > a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> > > are sent to transmit the regions to map and/or unmap instead of trying
> > > to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> > > message.
> > > 
> > > The vhost_user struct maintains a shadow state of the VM’s memory
> > > regions. When the memory tables are modified, the
> > > vhost_user_set_mem_table() function compares the new device memory state
> > > to the shadow state and only sends regions which need to be unmapped or
> > > mapped in. The regions which must be unmapped are sent first, followed
> > > by the new regions to be mapped in. After all the messages have been
> > > sent, the shadow state is set to the current virtual device state.
> > > 
> > > The current feature implementation does not work with postcopy migration
> > > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> > > also been negotiated.
> > 
> > Hmm what would it take to lift the restrictions?
> > conflicting features like this makes is very hard for users to make
> > an informed choice what to support.
> >
> 
> We would need a setup with a backend which supports these features (REPLY_ACK
> and postcopy migration). At first glance it looks like DPDK could work but
> I'm not sure how easy it will be to test postcopy migration with the resources
> we have.
>  

> > > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > > Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> > > Suggested-by: Mike Cui <cui@nutanix.com>
> > > ---
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index af83fdd..fed6d02 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -35,11 +35,29 @@
> > >  #include <linux/userfaultfd.h>
> > >  #endif
> > >  
> > > -#define VHOST_MEMORY_MAX_NREGIONS    8
> > > +#define VHOST_MEMORY_LEGACY_NREGIONS    8
> > 
> > Hardly legacy when this is intended to always be used e.g. with
> > postcopy, right?
> >
> 
> How about 'BASELINE'?

> > > +    msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
> > > +    msg->hdr.size += sizeof(VhostUserMemoryRegion);
> > > +
> > > +    /*
> > > +     * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
> > > +     * which are not found not in the device's memory state.
> > 
> > double negation - could not parse this.
> >
> 
> Apologies - typo here. It should say “Send VHOST_USER_REM_MEM_REG for memory
> regions in our shadow state which are not found in the device's memory state.” 
> i.e. send messages to remove regions in the shadow state but not in the updated
> device state. 
>  
> > > +     */
> > > +    for (i = 0; i < u->num_shadow_regions; ++i) {
> > > +        reg = dev->mem->regions;
> > > +
> > > +        for (j = 0; j < dev->mem->nregions; j++) {
> > > +            reg = dev->mem->regions + j;
> > > +
> > > +            assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> > > +            mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> > > +                                         &offset);
> > > +            fd = memory_region_get_fd(mr);
> > > +
> > > +            if (reg_equal(&u->shadow_regions[i], reg)) {
> > > +                matching = true;
> > > +                found[j] = true;
> > > +                break;
> > > +            }
> > > +        }
> > > +
> > > +        if (fd > 0 && !matching) {
> > > +            msg->hdr.request = VHOST_USER_REM_MEM_REG;
> > > +            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> > > +            msg->payload.mem_reg.region.memory_size = reg->memory_size;
> > > +            msg->payload.mem_reg.region.guest_phys_addr =
> > > +                reg->guest_phys_addr;
> > > +            msg->payload.mem_reg.region.mmap_offset = offset;
> > > +
> > > +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> > > +                return -1;
> > > +            }
> > > +        }
> > > +    }
> > > +


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
  2020-02-09 17:43     ` Raphael Norwitz
  2020-02-20  7:03       ` Raphael Norwitz
@ 2020-02-25 12:07       ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2020-02-25 12:07 UTC (permalink / raw)
  To: Raphael Norwitz; +Cc: qemu-devel, Prerna Saxena

On Sun, Feb 09, 2020 at 12:43:35PM -0500, Raphael Norwitz wrote:
> > > The current feature implementation does not work with postcopy migration
> > > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> > > also been negotiated.
> > 
> > Hmm what would it take to lift the restrictions?
> > conflicting features like this makes is very hard for users to make
> > an informed choice what to support.
> >
> 
> We would need a setup with a backend which supports these features (REPLY_ACK
> and postcopy migration). At first glance it looks like DPDK could work but
> I'm not sure how easy it will be to test postcopy migration with the resources
> we have.

Yes, DPDK works with postcopy. I understand it's of no
immediate interest to you but I'm afraid it just becomes too messy
if everyone keeps breaking it.
VHOST_USER_PROTOCOL_F_REPLY_ACK was added by a contributor from
nutanix, surely you can ping them internally for a test-case :).
 

-- 
MST



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-02-25 12:28 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-16  2:57 [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
2020-01-16  2:57 ` [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy Raphael Norwitz
2020-02-06  8:17   ` Michael S. Tsirkin
2020-02-06  8:20     ` Michael S. Tsirkin
2020-02-09 17:17       ` Raphael Norwitz
2020-01-16  2:57 ` [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions Raphael Norwitz
2020-02-06  8:21   ` Michael S. Tsirkin
2020-02-09 17:21     ` Raphael Norwitz
2020-01-16  2:57 ` [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
2020-02-06  8:32   ` Michael S. Tsirkin
2020-02-09 17:43     ` Raphael Norwitz
2020-02-20  7:03       ` Raphael Norwitz
2020-02-25 12:07       ` Michael S. Tsirkin
2020-01-31 21:21 ` [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
2020-02-06  8:33 ` Michael S. Tsirkin
2020-02-09 17:14   ` Raphael Norwitz
2020-02-10 16:04     ` Michael S. Tsirkin
2020-02-19  5:33       ` Raphael Norwitz
2020-02-19 10:08         ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).