All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.7 v7 0/2]vhost-user: Extend protocol to receive replies on any command.
@ 2016-08-05 10:53 Prerna Saxena
  2016-08-05 10:53 ` [Qemu-devel] [PATCH for-2.7 v7 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK Prerna Saxena
  2016-08-05 10:53 ` [Qemu-devel] [PATCH for-2.7 v7 2/2] vhost-user: Attempt to fix a race with set_mem_table Prerna Saxena
  0 siblings, 2 replies; 3+ messages in thread
From: Prerna Saxena @ 2016-08-05 10:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcandre.lureau, felipe, anilkumar.boggarapu, eblake,
	Prerna Saxena

From: Prerna Saxena <prerna.saxena@nutanix.com>

[ This series incorporates all suggestions around documentation that were suggested.]
vhost-user: Extend protocol to receive replies on any command.

The current vhost-user protocol requires the client to send reply to only a
few commands. For the remaining commands, it is impossible for QEMU to know the
status of the requested operation -- ie, did it succeed? If so, by what time?

This is inconvenient, and can also lead to races. As an example:

(1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net application).Note that SET_MEM_TABLE does not require a reply according to the spec.
(2) Qemu commits the memory to the guest.
(3) Guest issues an I/O operation over a new memory region which was configured on (1).
(4) The application hasn't yet remapped the memory, but it sees the I/O request.
(5) The application cannot satisfy the request because it does not know about those GPAs.

Note that the kernel implementation does not suffer from this limitation since messages are sent via an ioctl(). The ioctl() blocks until the backend (eg. vhost-net) completes the command and returns (with an error code).

Changing the behaviour of current vhost-user commands would break existing applications.
Patch 1 introduces a protocol extension, VHOST_USER_PROTOCOL_F_REPLY_ACK. This
feature, if negotiated, allows QEMU to request a reply to any message by setting
the newly introduced "need_reply" flag. The application must then respond to qemu
by providing a status about the requested operation.

Patch 2 adds a workaround for the race described above for clients that do not support REPLY_ACK
feature. It introduces  a get_features command to be sent before returning from set_mem_table. While this is not a complete fix, it will help client applications that strictly process messagesin order.

Changelog:
----------
Changes v6 -> v7:
1) Patch 1: In docs/specs/vhost-user.txt
*   s/behaviourial/behavioural/
*   "This indicates to QEMU that the requested operation has deterministically been met or not" -> " The response payload gives QEMU a deterministic indication of the result of the command."
2) Patch 2 : Unchanged.

Changes v5.1 -> v6:
1) Patch 1 : fixed some minor indentation issues and a really tiny documentation chang
2) Patch 2 : unchanged.

Changes v5->v5.1 :
1) Patch 1 : no change
2) Patch 2 : fixes a tiny typo I'd accidentally introduced while creating v5 from v4. The code itself is unchanged from v4.

Changes v4->v5:
1) Patch 1 :
* Reword 'response' to 'reply' on public demand.
* Documentation is more concise.
Patch 2 : unchanged

Changes v3->v4:
1) Rearranged code in PATCH 1 to offset compiler warnings about missing declaration of vhost_user_read(). Fixed by moving process_message_reply() after definition of vhost_user_read()
2) Fixed minor suggestions in writeup for this protocol extension.

Changes v2->v3:
1) Swapped the patch numbers 1 & 2 from the previous series.
2) Patch 1 (previously patch 2 in v2): addresses MarcAndre's review comments and renames function 'process_message_response' to 'process_message_reply'
3) Patch 2 (ie patch 1 in v2) : Unchanged from v2.

Changes v1->v2:
1) Patch 1 : Ask for get_features before returning from set_mem_table(new).
2) Patch 2 : * Improve documentation.
          * Abstract out commonly used operations in the form of a function, process_message_response(). Also implement this only for SET_MEM_TABLE.

References:
v1 : https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07152.html
v2 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00048.html
v3 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg01598.html
v4 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06173.html
v5 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06338.html
v5.1:https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06359.html 
v6 : https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06772.html

Prerna Saxena (2):
  vhost-user: Introduce a new protocol feature REPLY_ACK.
  vhost-user: Attempt to fix a race with set_mem_table.

 docs/specs/vhost-user.txt |  26 +++++++++
 hw/virtio/vhost-user.c    | 137 +++++++++++++++++++++++++++++-----------------
 2 files changed, 114 insertions(+), 49 deletions(-)

-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Qemu-devel] [PATCH for-2.7 v7 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK.
  2016-08-05 10:53 [Qemu-devel] [PATCH for-2.7 v7 0/2]vhost-user: Extend protocol to receive replies on any command Prerna Saxena
@ 2016-08-05 10:53 ` Prerna Saxena
  2016-08-05 10:53 ` [Qemu-devel] [PATCH for-2.7 v7 2/2] vhost-user: Attempt to fix a race with set_mem_table Prerna Saxena
  1 sibling, 0 replies; 3+ messages in thread
From: Prerna Saxena @ 2016-08-05 10:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcandre.lureau, felipe, anilkumar.boggarapu, eblake,
	Prerna Saxena

From: Prerna Saxena <prerna.saxena@nutanix.com>

This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK.

If negotiated, client applications should send a u64 payload in
response to any message that contains the "need_reply" bit set
on the message flags. Setting the payload to "zero" indicates the
command finished successfully. Likewise, setting it to "non-zero"
indicates an error.

Currently implemented only for SET_MEM_TABLE.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
---
 docs/specs/vhost-user.txt | 26 ++++++++++++++++++++++++++
 hw/virtio/vhost-user.c    | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 777c49c..7890d71 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -37,6 +37,8 @@ consists of 3 header fields and a payload:
  * Flags: 32-bit bit field:
    - Lower 2 bits are the version (currently 0x01)
    - Bit 2 is the reply flag - needs to be sent on each reply from the slave
+   - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for
+     details.
  * Size - 32-bit size of the payload
 
 
@@ -126,6 +128,8 @@ the ones that do:
  * VHOST_GET_VRING_BASE
  * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
 
+[ Also see the section on REPLY_ACK protocol extension. ]
+
 There are several messages that the master sends with file descriptors passed
 in the ancillary data:
 
@@ -254,6 +258,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_MQ             0
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD      1
 #define VHOST_USER_PROTOCOL_F_RARP           2
+#define VHOST_USER_PROTOCOL_F_REPLY_ACK      3
 
 Message types
 -------------
@@ -464,3 +469,24 @@ Message types
       is present in VHOST_USER_GET_PROTOCOL_FEATURES.
       The first 6 bytes of the payload contain the mac address of the guest to
       allow the vhost user backend to construct and broadcast the fake RARP.
+
+VHOST_USER_PROTOCOL_F_REPLY_ACK:
+-------------------------------
+The original vhost-user specification only demands replies for certain
+commands. This differs from the vhost protocol implementation where commands
+are sent over an ioctl() call and block until the client has completed.
+
+With this protocol extension negotiated, the sender (QEMU) can set the
+"need_reply" [Bit 3] flag to any command. This indicates that
+the client MUST respond with a Payload VhostUserMsg indicating success or
+failure. The payload should be set to zero on success or non-zero on failure,
+unless the message already has an explicit reply body.
+
+The response payload gives QEMU a deterministic indication of the result
+of the command. Today, QEMU is expected to terminate the main vhost-user
+loop upon receiving such errors. In future, qemu could be taught to be more
+resilient for selective requests.
+
+For the message types that already solicit a reply from the client, the
+presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
+no behavioural change. (See the 'Communication' section for details.)
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1995fd2..b57454a 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -31,6 +31,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_MQ = 0,
     VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
     VHOST_USER_PROTOCOL_F_RARP = 2,
+    VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -84,6 +85,7 @@ typedef struct VhostUserMsg {
 
 #define VHOST_USER_VERSION_MASK     (0x3)
 #define VHOST_USER_REPLY_MASK       (0x1<<2)
+#define VHOST_USER_NEED_REPLY_MASK  (0x1 << 3)
     uint32_t flags;
     uint32_t size; /* the following payload size */
     union {
@@ -158,6 +160,25 @@ fail:
     return -1;
 }
 
+static int process_message_reply(struct vhost_dev *dev,
+                                 VhostUserRequest request)
+{
+    VhostUserMsg msg;
+
+    if (vhost_user_read(dev, &msg) < 0) {
+        return -1;
+    }
+
+    if (msg.request != request) {
+        error_report("Received unexpected msg type."
+                     "Expected %d received %d",
+                     request, msg.request);
+        return -1;
+    }
+
+    return msg.payload.u64 ? -1 : 0;
+}
+
 static bool vhost_user_one_time_request(VhostUserRequest request)
 {
     switch (request) {
@@ -248,11 +269,18 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
     int fds[VHOST_MEMORY_MAX_NREGIONS];
     int i, fd;
     size_t fd_num = 0;
+    bool reply_supported = virtio_has_feature(dev->protocol_features,
+                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
+
     VhostUserMsg msg = {
         .request = VHOST_USER_SET_MEM_TABLE,
         .flags = VHOST_USER_VERSION,
     };
 
+    if (reply_supported) {
+        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
+    }
+
     for (i = 0; i < dev->mem->nregions; ++i) {
         struct vhost_memory_region *reg = dev->mem->regions + i;
         ram_addr_t offset;
@@ -288,6 +316,10 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         return -1;
     }
 
+    if (reply_supported) {
+        return process_message_reply(dev, msg.request);
+    }
+
     return 0;
 }
 
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [Qemu-devel] [PATCH for-2.7 v7 2/2] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-05 10:53 [Qemu-devel] [PATCH for-2.7 v7 0/2]vhost-user: Extend protocol to receive replies on any command Prerna Saxena
  2016-08-05 10:53 ` [Qemu-devel] [PATCH for-2.7 v7 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK Prerna Saxena
@ 2016-08-05 10:53 ` Prerna Saxena
  1 sibling, 0 replies; 3+ messages in thread
From: Prerna Saxena @ 2016-08-05 10:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcandre.lureau, felipe, anilkumar.boggarapu, eblake,
	Prerna Saxena

From: Prerna Saxena <prerna.saxena@nutanix.com>

The set_mem_table command currently does not seek a reply. Hence, there is
no easy way for a remote application to notify to QEMU when it finished
setting up memory, or if there were errors doing so.

As an example:
(1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
application). SET_MEM_TABLE does not require a reply according to the spec.
(2) Qemu commits the memory to the guest.
(3) Guest issues an I/O operation over a new memory region which was configured on (1).
(4) The application has not yet remapped the memory, but it sees the I/O request.
(5) The application cannot satisfy the request because it does not know about those GPAs.

While a guaranteed fix would require a protocol extension (committed separately),
a best-effort workaround for existing applications is to send a GET_FEATURES
message before completing the vhost_user_set_mem_table() call.
Since GET_FEATURES requires a reply, an application that processes vhost-user
messages synchronously would probably have completed the SET_MEM_TABLE before replying.

Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
---
 hw/virtio/vhost-user.c | 127 ++++++++++++++++++++++++++-----------------------
 1 file changed, 67 insertions(+), 60 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b57454a..1a7d53c 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -263,66 +263,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
     return 0;
 }
 
-static int vhost_user_set_mem_table(struct vhost_dev *dev,
-                                    struct vhost_memory *mem)
-{
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
-    int i, fd;
-    size_t fd_num = 0;
-    bool reply_supported = virtio_has_feature(dev->protocol_features,
-                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
-
-    VhostUserMsg msg = {
-        .request = VHOST_USER_SET_MEM_TABLE,
-        .flags = VHOST_USER_VERSION,
-    };
-
-    if (reply_supported) {
-        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
-    }
-
-    for (i = 0; i < dev->mem->nregions; ++i) {
-        struct vhost_memory_region *reg = dev->mem->regions + i;
-        ram_addr_t offset;
-        MemoryRegion *mr;
-
-        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
-        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
-                                     &offset);
-        fd = memory_region_get_fd(mr);
-        if (fd > 0) {
-            msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
-            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
-            msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
-            msg.payload.memory.regions[fd_num].mmap_offset = offset;
-            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
-            fds[fd_num++] = fd;
-        }
-    }
-
-    msg.payload.memory.nregions = fd_num;
-
-    if (!fd_num) {
-        error_report("Failed initializing vhost-user memory map, "
-                     "consider using -object memory-backend-file share=on");
-        return -1;
-    }
-
-    msg.size = sizeof(msg.payload.memory.nregions);
-    msg.size += sizeof(msg.payload.memory.padding);
-    msg.size += fd_num * sizeof(VhostUserMemoryRegion);
-
-    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
-        return -1;
-    }
-
-    if (reply_supported) {
-        return process_message_reply(dev, msg.request);
-    }
-
-    return 0;
-}
-
 static int vhost_user_set_vring_addr(struct vhost_dev *dev,
                                      struct vhost_vring_addr *addr)
 {
@@ -537,6 +477,73 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
     return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
 }
 
+static int vhost_user_set_mem_table(struct vhost_dev *dev,
+                                    struct vhost_memory *mem)
+{
+    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int i, fd;
+    size_t fd_num = 0;
+    uint64_t features;
+    bool reply_supported = virtio_has_feature(dev->protocol_features,
+                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
+
+    VhostUserMsg msg = {
+        .request = VHOST_USER_SET_MEM_TABLE,
+        .flags = VHOST_USER_VERSION,
+    };
+
+    if (reply_supported) {
+        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
+    }
+
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        struct vhost_memory_region *reg = dev->mem->regions + i;
+        ram_addr_t offset;
+        MemoryRegion *mr;
+
+        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                     &offset);
+        fd = memory_region_get_fd(mr);
+        if (fd > 0) {
+            msg.payload.memory.regions[fd_num].userspace_addr
+                                             = reg->userspace_addr;
+            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
+            msg.payload.memory.regions[fd_num].guest_phys_addr
+                                             = reg->guest_phys_addr;
+            msg.payload.memory.regions[fd_num].mmap_offset = offset;
+            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
+            fds[fd_num++] = fd;
+        }
+    }
+
+    msg.payload.memory.nregions = fd_num;
+
+    if (!fd_num) {
+        error_report("Failed initializing vhost-user memory map, "
+                     "consider using -object memory-backend-file share=on");
+        return -1;
+    }
+
+    msg.size = sizeof(msg.payload.memory.nregions);
+    msg.size += sizeof(msg.payload.memory.padding);
+    msg.size += fd_num * sizeof(VhostUserMemoryRegion);
+
+    vhost_user_write(dev, &msg, fds, fd_num);
+
+    if (reply_supported) {
+        return process_message_reply(dev, msg.request);
+    } else {
+        /* Note: It is (yet) unknown when the client application has finished
+         * remapping the GPA.
+         * Attempt to prevent a race by sending a command that requires a reply.
+         */
+        vhost_user_get_features(dev, &features);
+    }
+
+    return 0;
+}
+
 static int vhost_user_set_owner(struct vhost_dev *dev)
 {
     VhostUserMsg msg = {
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-08-05 10:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-05 10:53 [Qemu-devel] [PATCH for-2.7 v7 0/2]vhost-user: Extend protocol to receive replies on any command Prerna Saxena
2016-08-05 10:53 ` [Qemu-devel] [PATCH for-2.7 v7 1/2] vhost-user: Introduce a new protocol feature REPLY_ACK Prerna Saxena
2016-08-05 10:53 ` [Qemu-devel] [PATCH for-2.7 v7 2/2] vhost-user: Attempt to fix a race with set_mem_table Prerna Saxena

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.