All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL 0/3] virtio/vhost: fixes
@ 2016-08-10 15:30 Michael S. Tsirkin
  2016-08-10 15:30 ` [Qemu-devel] [PULL 1/3] vhost: check for vhost_ops before using Michael S. Tsirkin
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-10 15:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

The following changes since commit 53279c76cf071fed07a336948d37c72e3613e0b7:

  Update version for v2.7.0-rc2 release (2016-08-08 17:26:11 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 28ed5ef16384f12500abd3647973ee21b03cbe23:

  vhost-user: Attempt to fix a race with set_mem_table. (2016-08-10 17:47:29 +0300)

----------------------------------------------------------------
virtio/vhost: fixes

some bugfixes for virtio/vhost

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Ilya Maximets (1):
      vhost: check for vhost_ops before using.

Prerna Saxena (2):
      vhost-user: Introduce a new protocol feature REPLY_ACK.
      vhost-user: Attempt to fix a race with set_mem_table.

 hw/net/vhost_net.c        |   2 +-
 hw/virtio/vhost-user.c    | 137 +++++++++++++++++++++++++++++-----------------
 docs/specs/vhost-user.txt |  26 +++++++++
 3 files changed, 115 insertions(+), 50 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PULL 1/3] vhost: check for vhost_ops before using.
  2016-08-10 15:30 [Qemu-devel] [PULL 0/3] virtio/vhost: fixes Michael S. Tsirkin
@ 2016-08-10 15:30 ` Michael S. Tsirkin
  2016-08-10 15:30 ` [Qemu-devel] [PULL 2/3] vhost-user: Introduce a new protocol feature REPLY_ACK Michael S. Tsirkin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-10 15:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Ilya Maximets, Jason Wang

From: Ilya Maximets <i.maximets@samsung.com>

'vhost_set_vring_enable()' tries to call function using pointer to
'vhost_ops' which can be already zeroized in 'vhost_dev_cleanup()'
while vhost disconnection.

Fix that by checking 'vhost_ops' before using. This fixes QEMU crash
on calling 'ethtool -L eth0 combined 2' if vhost disconnected.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/net/vhost_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index dc61dc1..f2d49ad 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -428,7 +428,7 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
 
     nc->vring_enable = enable;
 
-    if (vhost_ops->vhost_set_vring_enable) {
+    if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
         return vhost_ops->vhost_set_vring_enable(&net->dev, enable);
     }
 
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PULL 2/3] vhost-user: Introduce a new protocol feature REPLY_ACK.
  2016-08-10 15:30 [Qemu-devel] [PULL 0/3] virtio/vhost: fixes Michael S. Tsirkin
  2016-08-10 15:30 ` [Qemu-devel] [PULL 1/3] vhost: check for vhost_ops before using Michael S. Tsirkin
@ 2016-08-10 15:30 ` Michael S. Tsirkin
  2016-08-10 15:30 ` [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table Michael S. Tsirkin
  2016-08-10 17:32 ` [Qemu-devel] [PULL 0/3] virtio/vhost: fixes Peter Maydell
  3 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-10 15:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Prerna Saxena, Marc-André Lureau

From: Prerna Saxena <prerna.saxena@nutanix.com>

This introduces the VHOST_USER_PROTOCOL_F_REPLY_ACK.

If negotiated, client applications should send a u64 payload in
response to any message that contains the "need_reply" bit set
on the message flags. Setting the payload to "zero" indicates the
command finished successfully. Likewise, setting it to "non-zero"
indicates an error.

Currently implemented only for SET_MEM_TABLE.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-user.c    | 32 ++++++++++++++++++++++++++++++++
 docs/specs/vhost-user.txt | 26 ++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1995fd2..b57454a 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -31,6 +31,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_MQ = 0,
     VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
     VHOST_USER_PROTOCOL_F_RARP = 2,
+    VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -84,6 +85,7 @@ typedef struct VhostUserMsg {
 
 #define VHOST_USER_VERSION_MASK     (0x3)
 #define VHOST_USER_REPLY_MASK       (0x1<<2)
+#define VHOST_USER_NEED_REPLY_MASK  (0x1 << 3)
     uint32_t flags;
     uint32_t size; /* the following payload size */
     union {
@@ -158,6 +160,25 @@ fail:
     return -1;
 }
 
+static int process_message_reply(struct vhost_dev *dev,
+                                 VhostUserRequest request)
+{
+    VhostUserMsg msg;
+
+    if (vhost_user_read(dev, &msg) < 0) {
+        return -1;
+    }
+
+    if (msg.request != request) {
+        error_report("Received unexpected msg type."
+                     "Expected %d received %d",
+                     request, msg.request);
+        return -1;
+    }
+
+    return msg.payload.u64 ? -1 : 0;
+}
+
 static bool vhost_user_one_time_request(VhostUserRequest request)
 {
     switch (request) {
@@ -248,11 +269,18 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
     int fds[VHOST_MEMORY_MAX_NREGIONS];
     int i, fd;
     size_t fd_num = 0;
+    bool reply_supported = virtio_has_feature(dev->protocol_features,
+                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
+
     VhostUserMsg msg = {
         .request = VHOST_USER_SET_MEM_TABLE,
         .flags = VHOST_USER_VERSION,
     };
 
+    if (reply_supported) {
+        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
+    }
+
     for (i = 0; i < dev->mem->nregions; ++i) {
         struct vhost_memory_region *reg = dev->mem->regions + i;
         ram_addr_t offset;
@@ -288,6 +316,10 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         return -1;
     }
 
+    if (reply_supported) {
+        return process_message_reply(dev, msg.request);
+    }
+
     return 0;
 }
 
diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 777c49c..7890d71 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -37,6 +37,8 @@ consists of 3 header fields and a payload:
  * Flags: 32-bit bit field:
    - Lower 2 bits are the version (currently 0x01)
    - Bit 2 is the reply flag - needs to be sent on each reply from the slave
+   - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for
+     details.
  * Size - 32-bit size of the payload
 
 
@@ -126,6 +128,8 @@ the ones that do:
  * VHOST_GET_VRING_BASE
  * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
 
+[ Also see the section on REPLY_ACK protocol extension. ]
+
 There are several messages that the master sends with file descriptors passed
 in the ancillary data:
 
@@ -254,6 +258,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_MQ             0
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD      1
 #define VHOST_USER_PROTOCOL_F_RARP           2
+#define VHOST_USER_PROTOCOL_F_REPLY_ACK      3
 
 Message types
 -------------
@@ -464,3 +469,24 @@ Message types
       is present in VHOST_USER_GET_PROTOCOL_FEATURES.
       The first 6 bytes of the payload contain the mac address of the guest to
       allow the vhost user backend to construct and broadcast the fake RARP.
+
+VHOST_USER_PROTOCOL_F_REPLY_ACK:
+-------------------------------
+The original vhost-user specification only demands replies for certain
+commands. This differs from the vhost protocol implementation where commands
+are sent over an ioctl() call and block until the client has completed.
+
+With this protocol extension negotiated, the sender (QEMU) can set the
+"need_reply" [Bit 3] flag to any command. This indicates that
+the client MUST respond with a Payload VhostUserMsg indicating success or
+failure. The payload should be set to zero on success or non-zero on failure,
+unless the message already has an explicit reply body.
+
+The response payload gives QEMU a deterministic indication of the result
+of the command. Today, QEMU is expected to terminate the main vhost-user
+loop upon receiving such errors. In future, qemu could be taught to be more
+resilient for selective requests.
+
+For the message types that already solicit a reply from the client, the
+presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
+no behavioural change. (See the 'Communication' section for details.)
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-10 15:30 [Qemu-devel] [PULL 0/3] virtio/vhost: fixes Michael S. Tsirkin
  2016-08-10 15:30 ` [Qemu-devel] [PULL 1/3] vhost: check for vhost_ops before using Michael S. Tsirkin
  2016-08-10 15:30 ` [Qemu-devel] [PULL 2/3] vhost-user: Introduce a new protocol feature REPLY_ACK Michael S. Tsirkin
@ 2016-08-10 15:30 ` Michael S. Tsirkin
  2016-08-12  6:38   ` Fam Zheng
  2016-08-10 17:32 ` [Qemu-devel] [PULL 0/3] virtio/vhost: fixes Peter Maydell
  3 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-10 15:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Prerna Saxena

From: Prerna Saxena <prerna.saxena@nutanix.com>

The set_mem_table command currently does not seek a reply. Hence, there is
no easy way for a remote application to notify to QEMU when it finished
setting up memory, or if there were errors doing so.

As an example:
(1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
application). SET_MEM_TABLE does not require a reply according to the spec.
(2) Qemu commits the memory to the guest.
(3) Guest issues an I/O operation over a new memory region which was configured on (1).
(4) The application has not yet remapped the memory, but it sees the I/O request.
(5) The application cannot satisfy the request because it does not know about those GPAs.

While a guaranteed fix would require a protocol extension (committed separately),
a best-effort workaround for existing applications is to send a GET_FEATURES
message before completing the vhost_user_set_mem_table() call.
Since GET_FEATURES requires a reply, an application that processes vhost-user
messages synchronously would probably have completed the SET_MEM_TABLE before replying.

Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-user.c | 127 ++++++++++++++++++++++++++-----------------------
 1 file changed, 67 insertions(+), 60 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b57454a..1a7d53c 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -263,66 +263,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
     return 0;
 }
 
-static int vhost_user_set_mem_table(struct vhost_dev *dev,
-                                    struct vhost_memory *mem)
-{
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
-    int i, fd;
-    size_t fd_num = 0;
-    bool reply_supported = virtio_has_feature(dev->protocol_features,
-                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
-
-    VhostUserMsg msg = {
-        .request = VHOST_USER_SET_MEM_TABLE,
-        .flags = VHOST_USER_VERSION,
-    };
-
-    if (reply_supported) {
-        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
-    }
-
-    for (i = 0; i < dev->mem->nregions; ++i) {
-        struct vhost_memory_region *reg = dev->mem->regions + i;
-        ram_addr_t offset;
-        MemoryRegion *mr;
-
-        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
-        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
-                                     &offset);
-        fd = memory_region_get_fd(mr);
-        if (fd > 0) {
-            msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
-            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
-            msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
-            msg.payload.memory.regions[fd_num].mmap_offset = offset;
-            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
-            fds[fd_num++] = fd;
-        }
-    }
-
-    msg.payload.memory.nregions = fd_num;
-
-    if (!fd_num) {
-        error_report("Failed initializing vhost-user memory map, "
-                     "consider using -object memory-backend-file share=on");
-        return -1;
-    }
-
-    msg.size = sizeof(msg.payload.memory.nregions);
-    msg.size += sizeof(msg.payload.memory.padding);
-    msg.size += fd_num * sizeof(VhostUserMemoryRegion);
-
-    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
-        return -1;
-    }
-
-    if (reply_supported) {
-        return process_message_reply(dev, msg.request);
-    }
-
-    return 0;
-}
-
 static int vhost_user_set_vring_addr(struct vhost_dev *dev,
                                      struct vhost_vring_addr *addr)
 {
@@ -537,6 +477,73 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
     return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
 }
 
+static int vhost_user_set_mem_table(struct vhost_dev *dev,
+                                    struct vhost_memory *mem)
+{
+    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int i, fd;
+    size_t fd_num = 0;
+    uint64_t features;
+    bool reply_supported = virtio_has_feature(dev->protocol_features,
+                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
+
+    VhostUserMsg msg = {
+        .request = VHOST_USER_SET_MEM_TABLE,
+        .flags = VHOST_USER_VERSION,
+    };
+
+    if (reply_supported) {
+        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
+    }
+
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        struct vhost_memory_region *reg = dev->mem->regions + i;
+        ram_addr_t offset;
+        MemoryRegion *mr;
+
+        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                     &offset);
+        fd = memory_region_get_fd(mr);
+        if (fd > 0) {
+            msg.payload.memory.regions[fd_num].userspace_addr
+                                             = reg->userspace_addr;
+            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
+            msg.payload.memory.regions[fd_num].guest_phys_addr
+                                             = reg->guest_phys_addr;
+            msg.payload.memory.regions[fd_num].mmap_offset = offset;
+            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
+            fds[fd_num++] = fd;
+        }
+    }
+
+    msg.payload.memory.nregions = fd_num;
+
+    if (!fd_num) {
+        error_report("Failed initializing vhost-user memory map, "
+                     "consider using -object memory-backend-file share=on");
+        return -1;
+    }
+
+    msg.size = sizeof(msg.payload.memory.nregions);
+    msg.size += sizeof(msg.payload.memory.padding);
+    msg.size += fd_num * sizeof(VhostUserMemoryRegion);
+
+    vhost_user_write(dev, &msg, fds, fd_num);
+
+    if (reply_supported) {
+        return process_message_reply(dev, msg.request);
+    } else {
+        /* Note: It is (yet) unknown when the client application has finished
+         * remapping the GPA.
+         * Attempt to prevent a race by sending a command that requires a reply.
+         */
+        vhost_user_get_features(dev, &features);
+    }
+
+    return 0;
+}
+
 static int vhost_user_set_owner(struct vhost_dev *dev)
 {
     VhostUserMsg msg = {
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 0/3] virtio/vhost: fixes
  2016-08-10 15:30 [Qemu-devel] [PULL 0/3] virtio/vhost: fixes Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2016-08-10 15:30 ` [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table Michael S. Tsirkin
@ 2016-08-10 17:32 ` Peter Maydell
  3 siblings, 0 replies; 26+ messages in thread
From: Peter Maydell @ 2016-08-10 17:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: QEMU Developers

On 10 August 2016 at 16:30, Michael S. Tsirkin <mst@redhat.com> wrote:
> The following changes since commit 53279c76cf071fed07a336948d37c72e3613e0b7:
>
>   Update version for v2.7.0-rc2 release (2016-08-08 17:26:11 +0100)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to 28ed5ef16384f12500abd3647973ee21b03cbe23:
>
>   vhost-user: Attempt to fix a race with set_mem_table. (2016-08-10 17:47:29 +0300)
>
> ----------------------------------------------------------------
> virtio/vhost: fixes
>
> some bugfixes for virtio/vhost
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Applied, thanks.

-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-10 15:30 ` [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table Michael S. Tsirkin
@ 2016-08-12  6:38   ` Fam Zheng
  2016-08-12  7:16     ` Prerna Saxena
  0 siblings, 1 reply; 26+ messages in thread
From: Fam Zheng @ 2016-08-12  6:38 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, Peter Maydell, Prerna Saxena

On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> From: Prerna Saxena <prerna.saxena@nutanix.com>
> 
> The set_mem_table command currently does not seek a reply. Hence, there is
> no easy way for a remote application to notify to QEMU when it finished
> setting up memory, or if there were errors doing so.
> 
> As an example:
> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> application). SET_MEM_TABLE does not require a reply according to the spec.
> (2) Qemu commits the memory to the guest.
> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> (4) The application has not yet remapped the memory, but it sees the I/O request.
> (5) The application cannot satisfy the request because it does not know about those GPAs.
> 
> While a guaranteed fix would require a protocol extension (committed separately),
> a best-effort workaround for existing applications is to send a GET_FEATURES
> message before completing the vhost_user_set_mem_table() call.
> Since GET_FEATURES requires a reply, an application that processes vhost-user
> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> 
> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Sporadic hangs are seen with test-vhost-user after this patch:

https://travis-ci.org/qemu/qemu/builds

Reverting seems to fix it for me.

Is this a known problem?

Fam

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12  6:38   ` Fam Zheng
@ 2016-08-12  7:16     ` Prerna Saxena
  2016-08-12  7:20       ` Marc-André Lureau
  2016-08-14  2:51       ` Michael S. Tsirkin
  0 siblings, 2 replies; 26+ messages in thread
From: Prerna Saxena @ 2016-08-12  7:16 UTC (permalink / raw)
  To: Fam Zheng, Michael S. Tsirkin; +Cc: qemu-devel, Peter Maydell, marcandre.lureau


On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:





>On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
>> From: Prerna Saxena <prerna.saxena@nutanix.com>
>> 
>> The set_mem_table command currently does not seek a reply. Hence, there is
>> no easy way for a remote application to notify to QEMU when it finished
>> setting up memory, or if there were errors doing so.
>> 
>> As an example:
>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
>> application). SET_MEM_TABLE does not require a reply according to the spec.
>> (2) Qemu commits the memory to the guest.
>> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
>> (4) The application has not yet remapped the memory, but it sees the I/O request.
>> (5) The application cannot satisfy the request because it does not know about those GPAs.
>> 
>> While a guaranteed fix would require a protocol extension (committed separately),
>> a best-effort workaround for existing applications is to send a GET_FEATURES
>> message before completing the vhost_user_set_mem_table() call.
>> Since GET_FEATURES requires a reply, an application that processes vhost-user
>> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
>> 
>> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
>Sporadic hangs are seen with test-vhost-user after this patch:
>
>https://travis-ci.org/qemu/qemu/builds
>
>Reverting seems to fix it for me.
>
>Is this a known problem?
>
>Fam

Hi Fam,
Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
I am setting up the docker test env to repro this, but I think I can guess the problem :

In tests/vhost-user-test.c: 

static void chr_read(void *opaque, const uint8_t *buf, int size)
{
..[snip]..

    case VHOST_USER_SET_MEM_TABLE:
       /* received the mem table */
       memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
       s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));


       /* signal the test that it can continue */
       g_cond_signal(&s->data_cond);
       break;
..[snip]..
}


The test seems to be marked complete as soon as mem_table is copied. 
However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports. 

Thoughts : Michael, Fam, MarcAndre ?

Regards,
Prerna 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12  7:16     ` Prerna Saxena
@ 2016-08-12  7:20       ` Marc-André Lureau
  2016-08-12 12:01         ` Peter Maydell
  2016-08-12 15:47         ` Michael S. Tsirkin
  2016-08-14  2:51       ` Michael S. Tsirkin
  1 sibling, 2 replies; 26+ messages in thread
From: Marc-André Lureau @ 2016-08-12  7:20 UTC (permalink / raw)
  To: Prerna Saxena
  Cc: Fam Zheng, Michael S. Tsirkin, qemu-devel, Peter Maydell,
	marcandre lureau

Hi

----- Original Message -----
> sent a follow-up response to GET_FEATURES), I am now wondering if this patch
> may break existing vhost applications too ? If so, reverting it possibly
> better.
> What confuses me is why it doesn’t fail all the time, but only about 20% to
> 30% time as Fam reports.
> 
> Thoughts : Michael, Fam, MarcAndre ?

Indeed, I didn't ack that patch in the first place for that kind of reasons, so I would revert it.

thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12  7:20       ` Marc-André Lureau
@ 2016-08-12 12:01         ` Peter Maydell
  2016-08-12 15:49           ` Michael S. Tsirkin
  2016-08-12 15:47         ` Michael S. Tsirkin
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Maydell @ 2016-08-12 12:01 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Prerna Saxena, Fam Zheng, Michael S. Tsirkin, QEMU Developers,
	marcandre lureau

On 12 August 2016 at 08:20, Marc-André Lureau <mlureau@redhat.com> wrote:
> Hi
>
> ----- Original Message -----
>> sent a follow-up response to GET_FEATURES), I am now wondering if this patch
>> may break existing vhost applications too ? If so, reverting it possibly
>> better.
>> What confuses me is why it doesn’t fail all the time, but only about 20% to
>> 30% time as Fam reports.
>>
>> Thoughts : Michael, Fam, MarcAndre ?
>
> Indeed, I didn't ack that patch in the first place for that kind of reasons, so I would revert it.


If somebody would like to send a revert-patch to the list I'll apply
it to master (please cc me as I suspect the mailing list server is
down at the moment...) I've been seeing these failures in the
build tests I run for merges.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12  7:20       ` Marc-André Lureau
  2016-08-12 12:01         ` Peter Maydell
@ 2016-08-12 15:47         ` Michael S. Tsirkin
  2016-08-12 15:54           ` Marc-André Lureau
  1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-12 15:47 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Prerna Saxena, Fam Zheng, qemu-devel, Peter Maydell, marcandre lureau

On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > sent a follow-up response to GET_FEATURES), I am now wondering if this patch
> > may break existing vhost applications too ? If so, reverting it possibly
> > better.
> > What confuses me is why it doesn’t fail all the time, but only about 20% to
> > 30% time as Fam reports.
> > 
> > Thoughts : Michael, Fam, MarcAndre ?
> 
> Indeed, I didn't ack that patch in the first place for that kind of reasons, so I would revert it.
> 
> thanks

I guess that's the safest thing to do for 2.7.
At least that's not any worse than 2.6.
I still think it's a good idea long term and test should be fixed,
but let's revert for now.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12 12:01         ` Peter Maydell
@ 2016-08-12 15:49           ` Michael S. Tsirkin
  2016-08-15 10:20             ` Peter Maydell
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-12 15:49 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Marc-André Lureau, Prerna Saxena, Fam Zheng,
	QEMU Developers, marcandre lureau

On Fri, Aug 12, 2016 at 01:01:16PM +0100, Peter Maydell wrote:
> On 12 August 2016 at 08:20, Marc-André Lureau <mlureau@redhat.com> wrote:
> > Hi
> >
> > ----- Original Message -----
> >> sent a follow-up response to GET_FEATURES), I am now wondering if this patch
> >> may break existing vhost applications too ? If so, reverting it possibly
> >> better.
> >> What confuses me is why it doesn’t fail all the time, but only about 20% to
> >> 30% time as Fam reports.
> >>
> >> Thoughts : Michael, Fam, MarcAndre ?
> >
> > Indeed, I didn't ack that patch in the first place for that kind of reasons, so I would revert it.
> 
> 
> If somebody would like to send a revert-patch to the list I'll apply
> it to master (please cc me as I suspect the mailing list server is
> down at the moment...) I've been seeing these failures in the
> build tests I run for merges.
> 
> thanks
> -- PMM

Will do right now.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12 15:47         ` Michael S. Tsirkin
@ 2016-08-12 15:54           ` Marc-André Lureau
  2016-08-12 21:12             ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Marc-André Lureau @ 2016-08-12 15:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Prerna Saxena, Fam Zheng, qemu-devel, Peter Maydell, marcandre lureau

Hi

----- Original Message -----
> On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > Hi
> > 
> > ----- Original Message -----
> > > sent a follow-up response to GET_FEATURES), I am now wondering if this
> > > patch
> > > may break existing vhost applications too ? If so, reverting it possibly
> > > better.
> > > What confuses me is why it doesn’t fail all the time, but only about 20%
> > > to
> > > 30% time as Fam reports.
> > > 
> > > Thoughts : Michael, Fam, MarcAndre ?
> > 
> > Indeed, I didn't ack that patch in the first place for that kind of
> > reasons, so I would revert it.
> > 
> > thanks
> 
> I guess that's the safest thing to do for 2.7.
> At least that's not any worse than 2.6.
> I still think it's a good idea long term and test should be fixed,
> but let's revert for now.
> 

What about other backends that may have similar expectations from the protocol.

This patch is a hack, there is no reason to have it upstream. The solution is provided with the REPLY_ACK patch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12 15:54           ` Marc-André Lureau
@ 2016-08-12 21:12             ` Michael S. Tsirkin
  2016-08-13  6:13               ` Marc-André Lureau
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-12 21:12 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Prerna Saxena, Fam Zheng, qemu-devel, Peter Maydell, marcandre lureau

On Fri, Aug 12, 2016 at 11:54:54AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > ----- Original Message -----
> > > > sent a follow-up response to GET_FEATURES), I am now wondering if this
> > > > patch
> > > > may break existing vhost applications too ? If so, reverting it possibly
> > > > better.
> > > > What confuses me is why it doesn’t fail all the time, but only about 20%
> > > > to
> > > > 30% time as Fam reports.
> > > > 
> > > > Thoughts : Michael, Fam, MarcAndre ?
> > > 
> > > Indeed, I didn't ack that patch in the first place for that kind of
> > > reasons, so I would revert it.
> > > 
> > > thanks
> > 
> > I guess that's the safest thing to do for 2.7.
> > At least that's not any worse than 2.6.
> > I still think it's a good idea long term and test should be fixed,
> > but let's revert for now.
> > 
> 
> What about other backends that may have similar expectations from the protocol.
> 
> This patch is a hack, there is no reason to have it upstream.

The reason is to avoid crashes with existing backends.

> The solution is provided with the REPLY_ACK patch.

It needs a backend update though.

But the issue is old, it's not a regression. I think we lose nothing
by pushing the work-around out until after 2.7.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12 21:12             ` Michael S. Tsirkin
@ 2016-08-13  6:13               ` Marc-André Lureau
  2016-08-14  2:30                 ` Michael S. Tsirkin
  2016-08-14  2:44                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 26+ messages in thread
From: Marc-André Lureau @ 2016-08-13  6:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Prerna Saxena, Fam Zheng, qemu-devel, Peter Maydell, marcandre lureau

Hi

----- Original Message -----
> On Fri, Aug 12, 2016 at 11:54:54AM -0400, Marc-André Lureau wrote:
> > Hi
> > 
> > ----- Original Message -----
> > > On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > > > Hi
> > > > 
> > > > ----- Original Message -----
> > > > > sent a follow-up response to GET_FEATURES), I am now wondering if
> > > > > this
> > > > > patch
> > > > > may break existing vhost applications too ? If so, reverting it
> > > > > possibly
> > > > > better.
> > > > > What confuses me is why it doesn’t fail all the time, but only about
> > > > > 20%
> > > > > to
> > > > > 30% time as Fam reports.
> > > > > 
> > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > > 
> > > > Indeed, I didn't ack that patch in the first place for that kind of
> > > > reasons, so I would revert it.
> > > > 
> > > > thanks
> > > 
> > > I guess that's the safest thing to do for 2.7.
> > > At least that's not any worse than 2.6.
> > > I still think it's a good idea long term and test should be fixed,
> > > but let's revert for now.
> > > 
> > 
> > What about other backends that may have similar expectations from the
> > protocol.
> > 
> > This patch is a hack, there is no reason to have it upstream.
> 
> The reason is to avoid crashes with existing backends.

Which backend? I had a similar issue, it wasn't about crashes, and Prerna didn't mention crashes either, but anyway there is not guarantee that adding a GET_FEATURES message will solve it...


> > The solution is provided with the REPLY_ACK patch.
> 
> It needs a backend update though.
> 
> But the issue is old, it's not a regression. I think we lose nothing
> by pushing the work-around out until after 2.7.
> 
> --
> MST
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-13  6:13               ` Marc-André Lureau
@ 2016-08-14  2:30                 ` Michael S. Tsirkin
  2016-08-14  2:44                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-14  2:30 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Prerna Saxena, Fam Zheng, qemu-devel, Peter Maydell, marcandre lureau

On Sat, Aug 13, 2016 at 02:13:46AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Fri, Aug 12, 2016 at 11:54:54AM -0400, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > ----- Original Message -----
> > > > On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > > > > Hi
> > > > > 
> > > > > ----- Original Message -----
> > > > > > sent a follow-up response to GET_FEATURES), I am now wondering if
> > > > > > this
> > > > > > patch
> > > > > > may break existing vhost applications too ? If so, reverting it
> > > > > > possibly
> > > > > > better.
> > > > > > What confuses me is why it doesn’t fail all the time, but only about
> > > > > > 20%
> > > > > > to
> > > > > > 30% time as Fam reports.
> > > > > > 
> > > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > > > 
> > > > > Indeed, I didn't ack that patch in the first place for that kind of
> > > > > reasons, so I would revert it.
> > > > > 
> > > > > thanks
> > > > 
> > > > I guess that's the safest thing to do for 2.7.
> > > > At least that's not any worse than 2.6.
> > > > I still think it's a good idea long term and test should be fixed,
> > > > but let's revert for now.
> > > > 
> > > 
> > > What about other backends that may have similar expectations from the
> > > protocol.
> > > 
> > > This patch is a hack, there is no reason to have it upstream.
> > 
> > The reason is to avoid crashes with existing backends.
> 
> Which backend? I had a similar issue, it wasn't about crashes, and Prerna didn't mention crashes either, but anyway there is not guarantee that adding a GET_FEATURES message will solve it...
> 

Not guaranteed, but it helps with all those I'm aware of.

> > > The solution is provided with the REPLY_ACK patch.
> > 
> > It needs a backend update though.
> > 
> > But the issue is old, it's not a regression. I think we lose nothing
> > by pushing the work-around out until after 2.7.
> > 
> > --
> > MST
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-13  6:13               ` Marc-André Lureau
  2016-08-14  2:30                 ` Michael S. Tsirkin
@ 2016-08-14  2:44                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-14  2:44 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Prerna Saxena, Fam Zheng, qemu-devel, Peter Maydell, marcandre lureau

On Sat, Aug 13, 2016 at 02:13:46AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Fri, Aug 12, 2016 at 11:54:54AM -0400, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > ----- Original Message -----
> > > > On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > > > > Hi
> > > > > 
> > > > > ----- Original Message -----
> > > > > > sent a follow-up response to GET_FEATURES), I am now wondering if
> > > > > > this
> > > > > > patch
> > > > > > may break existing vhost applications too ? If so, reverting it
> > > > > > possibly
> > > > > > better.
> > > > > > What confuses me is why it doesn’t fail all the time, but only about
> > > > > > 20%
> > > > > > to
> > > > > > 30% time as Fam reports.
> > > > > > 
> > > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > > > 
> > > > > Indeed, I didn't ack that patch in the first place for that kind of
> > > > > reasons, so I would revert it.
> > > > > 
> > > > > thanks
> > > > 
> > > > I guess that's the safest thing to do for 2.7.
> > > > At least that's not any worse than 2.6.
> > > > I still think it's a good idea long term and test should be fixed,
> > > > but let's revert for now.
> > > > 
> > > 
> > > What about other backends that may have similar expectations from the
> > > protocol.
> > > 
> > > This patch is a hack, there is no reason to have it upstream.
> > 
> > The reason is to avoid crashes with existing backends.
> 
> Which backend? I had a similar issue, it wasn't about crashes, and Prerna didn't mention crashes either, but anyway there is not guarantee that adding a GET_FEATURES message will solve it...
> 


So what bothers me the most about dropping this is that
there are no backends with the new feature implemented
at the moment.

The GET_FEATURES hack at least makes it easy to test
with existing backends ...




> > > The solution is provided with the REPLY_ACK patch.
> > 
> > It needs a backend update though.
> > 
> > But the issue is old, it's not a regression. I think we lose nothing
> > by pushing the work-around out until after 2.7.
> > 
> > --
> > MST
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12  7:16     ` Prerna Saxena
  2016-08-12  7:20       ` Marc-André Lureau
@ 2016-08-14  2:51       ` Michael S. Tsirkin
  2016-08-14  9:42         ` Prerna Saxena
  1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-14  2:51 UTC (permalink / raw)
  To: Prerna Saxena; +Cc: Fam Zheng, qemu-devel, Peter Maydell, marcandre.lureau

On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> 
> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> 
> 
> 
> 
> 
> >On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> >> From: Prerna Saxena <prerna.saxena@nutanix.com>
> >> 
> >> The set_mem_table command currently does not seek a reply. Hence, there is
> >> no easy way for a remote application to notify to QEMU when it finished
> >> setting up memory, or if there were errors doing so.
> >> 
> >> As an example:
> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> >> application). SET_MEM_TABLE does not require a reply according to the spec.
> >> (2) Qemu commits the memory to the guest.
> >> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> >> (4) The application has not yet remapped the memory, but it sees the I/O request.
> >> (5) The application cannot satisfy the request because it does not know about those GPAs.
> >> 
> >> While a guaranteed fix would require a protocol extension (committed separately),
> >> a best-effort workaround for existing applications is to send a GET_FEATURES
> >> message before completing the vhost_user_set_mem_table() call.
> >> Since GET_FEATURES requires a reply, an application that processes vhost-user
> >> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> >> 
> >> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> >Sporadic hangs are seen with test-vhost-user after this patch:
> >
> >https://travis-ci.org/qemu/qemu/builds
> >
> >Reverting seems to fix it for me.
> >
> >Is this a known problem?
> >
> >Fam
> 
> Hi Fam,
> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> I am setting up the docker test env to repro this, but I think I can guess the problem :
> 
> In tests/vhost-user-test.c: 
> 
> static void chr_read(void *opaque, const uint8_t *buf, int size)
> {
> ..[snip]..
> 
>     case VHOST_USER_SET_MEM_TABLE:
>        /* received the mem table */
>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> 
> 
>        /* signal the test that it can continue */
>        g_cond_signal(&s->data_cond);
>        break;
> ..[snip]..
> }
> 
> 
> The test seems to be marked complete as soon as mem_table is copied. 
> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)

Hmm but why does it matter that data_cond is woken up?


> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.

What bothers me is that the new feature might cause the same
issue once we enable it in the test.

How about a patch to tests/vhost-user-test.c adding the new
protocol feature? I would be quite interested to see what
is going on with it.


> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports. 

And succeeds every time on my systems :(

> 
> Thoughts : Michael, Fam, MarcAndre ?
> 
> Regards,
> Prerna 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-14  2:51       ` Michael S. Tsirkin
@ 2016-08-14  9:42         ` Prerna Saxena
  2016-08-14 21:39           ` Michael S. Tsirkin
  2016-08-31 11:19           ` Maxime Coquelin
  0 siblings, 2 replies; 26+ messages in thread
From: Prerna Saxena @ 2016-08-14  9:42 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Fam Zheng, qemu-devel, Peter Maydell, marcandre.lureau

On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:


>On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
>> 
>> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
>> 
>> 
>> 
>> 
>> 
>> >On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
>> >> From: Prerna Saxena <prerna.saxena@nutanix.com>
>> >> 
>> >> The set_mem_table command currently does not seek a reply. Hence, there is
>> >> no easy way for a remote application to notify to QEMU when it finished
>> >> setting up memory, or if there were errors doing so.
>> >> 
>> >> As an example:
>> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
>> >> application). SET_MEM_TABLE does not require a reply according to the spec.
>> >> (2) Qemu commits the memory to the guest.
>> >> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
>> >> (4) The application has not yet remapped the memory, but it sees the I/O request.
>> >> (5) The application cannot satisfy the request because it does not know about those GPAs.
>> >> 
>> >> While a guaranteed fix would require a protocol extension (committed separately),
>> >> a best-effort workaround for existing applications is to send a GET_FEATURES
>> >> message before completing the vhost_user_set_mem_table() call.
>> >> Since GET_FEATURES requires a reply, an application that processes vhost-user
>> >> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
>> >> 
>> >> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
>> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> >
>> >Sporadic hangs are seen with test-vhost-user after this patch:
>> >
>> >https://travis-ci.org/qemu/qemu/builds
>> >
>> >Reverting seems to fix it for me.
>> >
>> >Is this a known problem?
>> >
>> >Fam
>> 
>> Hi Fam,
>> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
>> I am setting up the docker test env to repro this, but I think I can guess the problem :
>> 
>> In tests/vhost-user-test.c: 
>> 
>> static void chr_read(void *opaque, const uint8_t *buf, int size)
>> {
>> ..[snip]..
>> 
>>     case VHOST_USER_SET_MEM_TABLE:
>>        /* received the mem table */
>>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
>> 
>> 
>>        /* signal the test that it can continue */
>>        g_cond_signal(&s->data_cond);
>>        break;
>> ..[snip]..
>> }
>> 
>> 
>> The test seems to be marked complete as soon as mem_table is copied. 
>> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
>
>Hmm but why does it matter that data_cond is woken up?

Michael, sorry, I didn’t quite understand that. Could you pls explain ?

>
>
>> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
>
>What bothers me is that the new feature might cause the same
>issue once we enable it in the test.

No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.

>
>How about a patch to tests/vhost-user-test.c adding the new
>protocol feature? I would be quite interested to see what
>is going on with it.

Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.

>
>
>> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports. 
>
>And succeeds every time on my systems :(

+1 to that :( I have had no luck repro’ing it.

>
>> 
>> Thoughts : Michael, Fam, MarcAndre ?
>> 
>> Regards,
>>

Prerna

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-14  9:42         ` Prerna Saxena
@ 2016-08-14 21:39           ` Michael S. Tsirkin
  2016-08-31 11:19           ` Maxime Coquelin
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-08-14 21:39 UTC (permalink / raw)
  To: Prerna Saxena; +Cc: Fam Zheng, qemu-devel, Peter Maydell, marcandre.lureau

On Sun, Aug 14, 2016 at 09:42:11AM +0000, Prerna Saxena wrote:
> On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> 
> >On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> >> 
> >> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> >> 
> >> 
> >> 
> >> 
> >> 
> >> >On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> >> >> From: Prerna Saxena <prerna.saxena@nutanix.com>
> >> >> 
> >> >> The set_mem_table command currently does not seek a reply. Hence, there is
> >> >> no easy way for a remote application to notify to QEMU when it finished
> >> >> setting up memory, or if there were errors doing so.
> >> >> 
> >> >> As an example:
> >> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> >> >> application). SET_MEM_TABLE does not require a reply according to the spec.
> >> >> (2) Qemu commits the memory to the guest.
> >> >> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> >> >> (4) The application has not yet remapped the memory, but it sees the I/O request.
> >> >> (5) The application cannot satisfy the request because it does not know about those GPAs.
> >> >> 
> >> >> While a guaranteed fix would require a protocol extension (committed separately),
> >> >> a best-effort workaround for existing applications is to send a GET_FEATURES
> >> >> message before completing the vhost_user_set_mem_table() call.
> >> >> Since GET_FEATURES requires a reply, an application that processes vhost-user
> >> >> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> >> >> 
> >> >> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> >> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> >
> >> >Sporadic hangs are seen with test-vhost-user after this patch:
> >> >
> >> >https://travis-ci.org/qemu/qemu/builds
> >> >
> >> >Reverting seems to fix it for me.
> >> >
> >> >Is this a known problem?
> >> >
> >> >Fam
> >> 
> >> Hi Fam,
> >> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> >> I am setting up the docker test env to repro this, but I think I can guess the problem :
> >> 
> >> In tests/vhost-user-test.c: 
> >> 
> >> static void chr_read(void *opaque, const uint8_t *buf, int size)
> >> {
> >> ..[snip]..
> >> 
> >>     case VHOST_USER_SET_MEM_TABLE:
> >>        /* received the mem table */
> >>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
> >>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> >> 
> >> 
> >>        /* signal the test that it can continue */
> >>        g_cond_signal(&s->data_cond);
> >>        break;
> >> ..[snip]..
> >> }
> >> 
> >> 
> >> The test seems to be marked complete as soon as mem_table is copied. 
> >> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
> >
> >Hmm but why does it matter that data_cond is woken up?
> 
> Michael, sorry, I didn’t quite understand that. Could you pls explain ?


We do g_cond_signal but I don't see where does it prevent
test from responding to GET_FEATURES. Except if test exited
signaling success - but then why do we care?


> >
> >
> >> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
> >
> >What bothers me is that the new feature might cause the same
> >issue once we enable it in the test.
> 
> No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.

Absolutely - this reduces the risk - but how do we know that whatever
problem causes the test failures is not there with the new feature?


> >
> >How about a patch to tests/vhost-user-test.c adding the new
> >protocol feature? I would be quite interested to see what
> >is going on with it.
> 
> Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.

If that passes on travis, then we'll be more confident -
after all it is the travis failures that cause us to
reconsider the work-around.

> >
> >
> >> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports. 
> >
> >And succeeds every time on my systems :(
> 
> +1 to that :( I have had no luck repro’ing it.
> 
> >
> >> 
> >> Thoughts : Michael, Fam, MarcAndre ?
> >> 
> >> Regards,
> >>
> 
> Prerna

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-12 15:49           ` Michael S. Tsirkin
@ 2016-08-15 10:20             ` Peter Maydell
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Maydell @ 2016-08-15 10:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, Prerna Saxena, Fam Zheng,
	QEMU Developers, marcandre lureau

On 12 August 2016 at 16:49, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Aug 12, 2016 at 01:01:16PM +0100, Peter Maydell wrote:
>> If somebody would like to send a revert-patch to the list I'll apply
>> it to master (please cc me as I suspect the mailing list server is
>> down at the moment...) I've been seeing these failures in the
>> build tests I run for merges.

> Will do right now.

Ping. This is blocking everything else for rc3 right now :-(
If I don't see a patch in the next few hours I'll just revert it
directly...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-14  9:42         ` Prerna Saxena
  2016-08-14 21:39           ` Michael S. Tsirkin
@ 2016-08-31 11:19           ` Maxime Coquelin
  2016-09-01 13:46             ` Michael S. Tsirkin
  1 sibling, 1 reply; 26+ messages in thread
From: Maxime Coquelin @ 2016-08-31 11:19 UTC (permalink / raw)
  To: Prerna Saxena, Michael S. Tsirkin, marcandre.lureau
  Cc: Peter Maydell, Fam Zheng, qemu-devel



On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
>
>> On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
>>>
>>> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
>>>
>>>
>>>
>>>
>>>
>>>> On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
>>>>> From: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>
>>>>> The set_mem_table command currently does not seek a reply. Hence, there is
>>>>> no easy way for a remote application to notify to QEMU when it finished
>>>>> setting up memory, or if there were errors doing so.
>>>>>
>>>>> As an example:
>>>>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
>>>>> application). SET_MEM_TABLE does not require a reply according to the spec.
>>>>> (2) Qemu commits the memory to the guest.
>>>>> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
>>>>> (4) The application has not yet remapped the memory, but it sees the I/O request.
>>>>> (5) The application cannot satisfy the request because it does not know about those GPAs.
>>>>>
>>>>> While a guaranteed fix would require a protocol extension (committed separately),
>>>>> a best-effort workaround for existing applications is to send a GET_FEATURES
>>>>> message before completing the vhost_user_set_mem_table() call.
>>>>> Since GET_FEATURES requires a reply, an application that processes vhost-user
>>>>> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
>>>>>
>>>>> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>
>>>> Sporadic hangs are seen with test-vhost-user after this patch:
>>>>
>>>> https://travis-ci.org/qemu/qemu/builds
>>>>
>>>> Reverting seems to fix it for me.
>>>>
>>>> Is this a known problem?
>>>>
>>>> Fam
>>>
>>> Hi Fam,
>>> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
>>> I am setting up the docker test env to repro this, but I think I can guess the problem :
>>>
>>> In tests/vhost-user-test.c:
>>>
>>> static void chr_read(void *opaque, const uint8_t *buf, int size)
>>> {
>>> ..[snip]..
>>>
>>>     case VHOST_USER_SET_MEM_TABLE:
>>>        /* received the mem table */
>>>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>>>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
>>>
>>>
>>>        /* signal the test that it can continue */
>>>        g_cond_signal(&s->data_cond);
>>>        break;
>>> ..[snip]..
>>> }
>>>
>>>
>>> The test seems to be marked complete as soon as mem_table is copied.
>>> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
>>
>> Hmm but why does it matter that data_cond is woken up?
>
> Michael, sorry, I didn’t quite understand that. Could you pls explain ?
>
>>
>>
>>> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
>>
>> What bothers me is that the new feature might cause the same
>> issue once we enable it in the test.
>
> No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
>
>>
>> How about a patch to tests/vhost-user-test.c adding the new
>> protocol feature? I would be quite interested to see what
>> is going on with it.
>
> Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
>
>>
>>
>>> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
>>
>> And succeeds every time on my systems :(
>
> +1 to that :( I have had no luck repro’ing it.
>
>>
>>>
>>> Thoughts : Michael, Fam, MarcAndre ?

I have managed to reproduce the hang by adding some debug prints into
vhost_user_get_features().

Doing this the issue is reproducible quite easily.
Another way to reproduce it in one shot is to strace (with following
forks option) vhost-user-test execution.

So, by adding debug prints at vhost_user_get_features() entry and exit,
we can see we never return from this function when hang happens.
Strace of Qemu instance shows that its thread keeps retrying to receive
GET_FEATURE reply:

write(1, "vhost_user_get_features IN: \n", 29) = 29
sendmsg(11, {msg_name=NULL, msg_namelen=0,
         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
...
recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0

The reason is that vhost-user-test never replies to Qemu,
because its thread handling the GET_FEATURES command is waiting for
the s->data_mutex lock.
This lock is held by the other vhost-user-test thread, executing
read_guest_mem().

The lock is never released because the thread is blocked in read
syscall, when read_guest_mem() is doing the readl().

This is because on Qemu side, the thread polling the qtest socket is
waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
the mutex is held by the thread trying to get the GET_FEATURE reply
(the TCG one).

So here is the deadlock.

That said, I don't see a clean way to solve this.
Any thoughts?

Regards,
Maxime

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-08-31 11:19           ` Maxime Coquelin
@ 2016-09-01 13:46             ` Michael S. Tsirkin
  2016-09-02  8:57               ` Maxime Coquelin
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-09-01 13:46 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Prerna Saxena, marcandre.lureau, Peter Maydell, Fam Zheng, qemu-devel

On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
> 
> 
> On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> > On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > 
> > > On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> > > > 
> > > > On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > > On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> > > > > > From: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > 
> > > > > > The set_mem_table command currently does not seek a reply. Hence, there is
> > > > > > no easy way for a remote application to notify to QEMU when it finished
> > > > > > setting up memory, or if there were errors doing so.
> > > > > > 
> > > > > > As an example:
> > > > > > (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> > > > > > application). SET_MEM_TABLE does not require a reply according to the spec.
> > > > > > (2) Qemu commits the memory to the guest.
> > > > > > (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> > > > > > (4) The application has not yet remapped the memory, but it sees the I/O request.
> > > > > > (5) The application cannot satisfy the request because it does not know about those GPAs.
> > > > > > 
> > > > > > While a guaranteed fix would require a protocol extension (committed separately),
> > > > > > a best-effort workaround for existing applications is to send a GET_FEATURES
> > > > > > message before completing the vhost_user_set_mem_table() call.
> > > > > > Since GET_FEATURES requires a reply, an application that processes vhost-user
> > > > > > messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> > > > > > 
> > > > > > Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > 
> > > > > Sporadic hangs are seen with test-vhost-user after this patch:
> > > > > 
> > > > > https://travis-ci.org/qemu/qemu/builds
> > > > > 
> > > > > Reverting seems to fix it for me.
> > > > > 
> > > > > Is this a known problem?
> > > > > 
> > > > > Fam
> > > > 
> > > > Hi Fam,
> > > > Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> > > > I am setting up the docker test env to repro this, but I think I can guess the problem :
> > > > 
> > > > In tests/vhost-user-test.c:
> > > > 
> > > > static void chr_read(void *opaque, const uint8_t *buf, int size)
> > > > {
> > > > ..[snip]..
> > > > 
> > > >     case VHOST_USER_SET_MEM_TABLE:
> > > >        /* received the mem table */
> > > >        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
> > > >        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> > > > 
> > > > 
> > > >        /* signal the test that it can continue */
> > > >        g_cond_signal(&s->data_cond);
> > > >        break;
> > > > ..[snip]..
> > > > }
> > > > 
> > > > 
> > > > The test seems to be marked complete as soon as mem_table is copied.
> > > > However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
> > > 
> > > Hmm but why does it matter that data_cond is woken up?
> > 
> > Michael, sorry, I didn’t quite understand that. Could you pls explain ?
> > 
> > > 
> > > 
> > > > While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
> > > 
> > > What bothers me is that the new feature might cause the same
> > > issue once we enable it in the test.
> > 
> > No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
> > 
> > > 
> > > How about a patch to tests/vhost-user-test.c adding the new
> > > protocol feature? I would be quite interested to see what
> > > is going on with it.
> > 
> > Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
> > 
> > > 
> > > 
> > > > What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
> > > 
> > > And succeeds every time on my systems :(
> > 
> > +1 to that :( I have had no luck repro’ing it.
> > 
> > > 
> > > > 
> > > > Thoughts : Michael, Fam, MarcAndre ?
> 
> I have managed to reproduce the hang by adding some debug prints into
> vhost_user_get_features().
> 
> Doing this the issue is reproducible quite easily.
> Another way to reproduce it in one shot is to strace (with following
> forks option) vhost-user-test execution.
> 
> So, by adding debug prints at vhost_user_get_features() entry and exit,
> we can see we never return from this function when hang happens.
> Strace of Qemu instance shows that its thread keeps retrying to receive
> GET_FEATURE reply:
> 
> write(1, "vhost_user_get_features IN: \n", 29) = 29
> sendmsg(11, {msg_name=NULL, msg_namelen=0,
>         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
>         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> ...
> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> 
> The reason is that vhost-user-test never replies to Qemu,
> because its thread handling the GET_FEATURES command is waiting for
> the s->data_mutex lock.
> This lock is held by the other vhost-user-test thread, executing
> read_guest_mem().
> 
> The lock is never released because the thread is blocked in read
> syscall, when read_guest_mem() is doing the readl().
> 
> This is because on Qemu side, the thread polling the qtest socket is
> waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
> the mutex is held by the thread trying to get the GET_FEATURE reply
> (the TCG one).
> 
> So here is the deadlock.
> 
> That said, I don't see a clean way to solve this.
> Any thoughts?
> 
> Regards,
> Maxime

My thought is that we really need to do what I said:
avoid doing GET_FEATURES (and setting reply_ack)
on the first set_mem, and I quote:

	OK this all looks very reasonable (and I do like patch 1 too)
	but there's one source of waste here: we do not need to
	synchronize when we set up device the first time
	when hdev->memory_changed is false.

	I think we should test that and skip synch in both patches
	unless  hdev->memory_changed is set.

with that change test will start passing.


-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-09-01 13:46             ` Michael S. Tsirkin
@ 2016-09-02  8:57               ` Maxime Coquelin
  2016-09-02 17:29                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Maxime Coquelin @ 2016-09-02  8:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Prerna Saxena, marcandre.lureau, Peter Maydell, Fam Zheng, qemu-devel



On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote:
> On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
>>
>>
>> On 08/14/2016 11:42 AM, Prerna Saxena wrote:
>>> On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>
>>>
>>>> On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
>>>>>
>>>>> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
>>>>>>> From: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>>>
>>>>>>> The set_mem_table command currently does not seek a reply. Hence, there is
>>>>>>> no easy way for a remote application to notify to QEMU when it finished
>>>>>>> setting up memory, or if there were errors doing so.
>>>>>>>
>>>>>>> As an example:
>>>>>>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
>>>>>>> application). SET_MEM_TABLE does not require a reply according to the spec.
>>>>>>> (2) Qemu commits the memory to the guest.
>>>>>>> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
>>>>>>> (4) The application has not yet remapped the memory, but it sees the I/O request.
>>>>>>> (5) The application cannot satisfy the request because it does not know about those GPAs.
>>>>>>>
>>>>>>> While a guaranteed fix would require a protocol extension (committed separately),
>>>>>>> a best-effort workaround for existing applications is to send a GET_FEATURES
>>>>>>> message before completing the vhost_user_set_mem_table() call.
>>>>>>> Since GET_FEATURES requires a reply, an application that processes vhost-user
>>>>>>> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
>>>>>>>
>>>>>>> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>
>>>>>> Sporadic hangs are seen with test-vhost-user after this patch:
>>>>>>
>>>>>> https://travis-ci.org/qemu/qemu/builds
>>>>>>
>>>>>> Reverting seems to fix it for me.
>>>>>>
>>>>>> Is this a known problem?
>>>>>>
>>>>>> Fam
>>>>>
>>>>> Hi Fam,
>>>>> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
>>>>> I am setting up the docker test env to repro this, but I think I can guess the problem :
>>>>>
>>>>> In tests/vhost-user-test.c:
>>>>>
>>>>> static void chr_read(void *opaque, const uint8_t *buf, int size)
>>>>> {
>>>>> ..[snip]..
>>>>>
>>>>>     case VHOST_USER_SET_MEM_TABLE:
>>>>>        /* received the mem table */
>>>>>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>>>>>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
>>>>>
>>>>>
>>>>>        /* signal the test that it can continue */
>>>>>        g_cond_signal(&s->data_cond);
>>>>>        break;
>>>>> ..[snip]..
>>>>> }
>>>>>
>>>>>
>>>>> The test seems to be marked complete as soon as mem_table is copied.
>>>>> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
>>>>
>>>> Hmm but why does it matter that data_cond is woken up?
>>>
>>> Michael, sorry, I didn’t quite understand that. Could you pls explain ?
>>>
>>>>
>>>>
>>>>> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
>>>>
>>>> What bothers me is that the new feature might cause the same
>>>> issue once we enable it in the test.
>>>
>>> No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
>>>
>>>>
>>>> How about a patch to tests/vhost-user-test.c adding the new
>>>> protocol feature? I would be quite interested to see what
>>>> is going on with it.
>>>
>>> Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
>>>
>>>>
>>>>
>>>>> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
>>>>
>>>> And succeeds every time on my systems :(
>>>
>>> +1 to that :( I have had no luck repro’ing it.
>>>
>>>>
>>>>>
>>>>> Thoughts : Michael, Fam, MarcAndre ?
>>
>> I have managed to reproduce the hang by adding some debug prints into
>> vhost_user_get_features().
>>
>> Doing this the issue is reproducible quite easily.
>> Another way to reproduce it in one shot is to strace (with following
>> forks option) vhost-user-test execution.
>>
>> So, by adding debug prints at vhost_user_get_features() entry and exit,
>> we can see we never return from this function when hang happens.
>> Strace of Qemu instance shows that its thread keeps retrying to receive
>> GET_FEATURE reply:
>>
>> write(1, "vhost_user_get_features IN: \n", 29) = 29
>> sendmsg(11, {msg_name=NULL, msg_namelen=0,
>>         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
>>         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
>> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
>> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
>> ...
>> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
>> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
>>
>> The reason is that vhost-user-test never replies to Qemu,
>> because its thread handling the GET_FEATURES command is waiting for
>> the s->data_mutex lock.
>> This lock is held by the other vhost-user-test thread, executing
>> read_guest_mem().
>>
>> The lock is never released because the thread is blocked in read
>> syscall, when read_guest_mem() is doing the readl().
>>
>> This is because on Qemu side, the thread polling the qtest socket is
>> waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
>> the mutex is held by the thread trying to get the GET_FEATURE reply
>> (the TCG one).
>>
>> So here is the deadlock.
>>
>> That said, I don't see a clean way to solve this.
>> Any thoughts?
>>
>> Regards,
>> Maxime
>
> My thought is that we really need to do what I said:
> avoid doing GET_FEATURES (and setting reply_ack)
> on the first set_mem, and I quote:
>
> 	OK this all looks very reasonable (and I do like patch 1 too)
> 	but there's one source of waste here: we do not need to
> 	synchronize when we set up device the first time
> 	when hdev->memory_changed is false.
>
> 	I think we should test that and skip synch in both patches
> 	unless  hdev->memory_changed is set.
>
> with that change test will start passing.

Actually, it looks like memory_changed is true even at first
SET_MEM_TABLE request.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-09-02  8:57               ` Maxime Coquelin
@ 2016-09-02 17:29                 ` Michael S. Tsirkin
  2016-09-05 13:06                   ` Maxime Coquelin
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-09-02 17:29 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Prerna Saxena, marcandre.lureau, Peter Maydell, Fam Zheng, qemu-devel

On Fri, Sep 02, 2016 at 10:57:17AM +0200, Maxime Coquelin wrote:
> 
> 
> On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote:
> > On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
> > > 
> > > 
> > > On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> > > > On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > 
> > > > 
> > > > > On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> > > > > > 
> > > > > > On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> > > > > > > > From: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > > > 
> > > > > > > > The set_mem_table command currently does not seek a reply. Hence, there is
> > > > > > > > no easy way for a remote application to notify to QEMU when it finished
> > > > > > > > setting up memory, or if there were errors doing so.
> > > > > > > > 
> > > > > > > > As an example:
> > > > > > > > (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> > > > > > > > application). SET_MEM_TABLE does not require a reply according to the spec.
> > > > > > > > (2) Qemu commits the memory to the guest.
> > > > > > > > (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> > > > > > > > (4) The application has not yet remapped the memory, but it sees the I/O request.
> > > > > > > > (5) The application cannot satisfy the request because it does not know about those GPAs.
> > > > > > > > 
> > > > > > > > While a guaranteed fix would require a protocol extension (committed separately),
> > > > > > > > a best-effort workaround for existing applications is to send a GET_FEATURES
> > > > > > > > message before completing the vhost_user_set_mem_table() call.
> > > > > > > > Since GET_FEATURES requires a reply, an application that processes vhost-user
> > > > > > > > messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> > > > > > > > 
> > > > > > > > Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > 
> > > > > > > Sporadic hangs are seen with test-vhost-user after this patch:
> > > > > > > 
> > > > > > > https://travis-ci.org/qemu/qemu/builds
> > > > > > > 
> > > > > > > Reverting seems to fix it for me.
> > > > > > > 
> > > > > > > Is this a known problem?
> > > > > > > 
> > > > > > > Fam
> > > > > > 
> > > > > > Hi Fam,
> > > > > > Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> > > > > > I am setting up the docker test env to repro this, but I think I can guess the problem :
> > > > > > 
> > > > > > In tests/vhost-user-test.c:
> > > > > > 
> > > > > > static void chr_read(void *opaque, const uint8_t *buf, int size)
> > > > > > {
> > > > > > ..[snip]..
> > > > > > 
> > > > > >     case VHOST_USER_SET_MEM_TABLE:
> > > > > >        /* received the mem table */
> > > > > >        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
> > > > > >        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> > > > > > 
> > > > > > 
> > > > > >        /* signal the test that it can continue */
> > > > > >        g_cond_signal(&s->data_cond);
> > > > > >        break;
> > > > > > ..[snip]..
> > > > > > }
> > > > > > 
> > > > > > 
> > > > > > The test seems to be marked complete as soon as mem_table is copied.
> > > > > > However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
> > > > > 
> > > > > Hmm but why does it matter that data_cond is woken up?
> > > > 
> > > > Michael, sorry, I didn’t quite understand that. Could you pls explain ?
> > > > 
> > > > > 
> > > > > 
> > > > > > While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
> > > > > 
> > > > > What bothers me is that the new feature might cause the same
> > > > > issue once we enable it in the test.
> > > > 
> > > > No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
> > > > 
> > > > > 
> > > > > How about a patch to tests/vhost-user-test.c adding the new
> > > > > protocol feature? I would be quite interested to see what
> > > > > is going on with it.
> > > > 
> > > > Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
> > > > 
> > > > > 
> > > > > 
> > > > > > What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
> > > > > 
> > > > > And succeeds every time on my systems :(
> > > > 
> > > > +1 to that :( I have had no luck repro’ing it.
> > > > 
> > > > > 
> > > > > > 
> > > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > 
> > > I have managed to reproduce the hang by adding some debug prints into
> > > vhost_user_get_features().
> > > 
> > > Doing this the issue is reproducible quite easily.
> > > Another way to reproduce it in one shot is to strace (with following
> > > forks option) vhost-user-test execution.
> > > 
> > > So, by adding debug prints at vhost_user_get_features() entry and exit,
> > > we can see we never return from this function when hang happens.
> > > Strace of Qemu instance shows that its thread keeps retrying to receive
> > > GET_FEATURE reply:
> > > 
> > > write(1, "vhost_user_get_features IN: \n", 29) = 29
> > > sendmsg(11, {msg_name=NULL, msg_namelen=0,
> > >         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
> > >         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
> > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > ...
> > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > 
> > > The reason is that vhost-user-test never replies to Qemu,
> > > because its thread handling the GET_FEATURES command is waiting for
> > > the s->data_mutex lock.
> > > This lock is held by the other vhost-user-test thread, executing
> > > read_guest_mem().
> > > 
> > > The lock is never released because the thread is blocked in read
> > > syscall, when read_guest_mem() is doing the readl().
> > > 
> > > This is because on Qemu side, the thread polling the qtest socket is
> > > waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
> > > the mutex is held by the thread trying to get the GET_FEATURE reply
> > > (the TCG one).
> > > 
> > > So here is the deadlock.
> > > 
> > > That said, I don't see a clean way to solve this.
> > > Any thoughts?
> > > 
> > > Regards,
> > > Maxime
> > 
> > My thought is that we really need to do what I said:
> > avoid doing GET_FEATURES (and setting reply_ack)
> > on the first set_mem, and I quote:
> > 
> > 	OK this all looks very reasonable (and I do like patch 1 too)
> > 	but there's one source of waste here: we do not need to
> > 	synchronize when we set up device the first time
> > 	when hdev->memory_changed is false.
> > 
> > 	I think we should test that and skip synch in both patches
> > 	unless  hdev->memory_changed is set.
> > 
> > with that change test will start passing.
> 
> Actually, it looks like memory_changed is true even at first
> SET_MEM_TABLE request.
> 
> Thanks,
> Maxime

Let's add another flag then? What we care about is that it's not
the first time set specify translations for a given address.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-09-02 17:29                 ` Michael S. Tsirkin
@ 2016-09-05 13:06                   ` Maxime Coquelin
  2016-09-06  2:22                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Maxime Coquelin @ 2016-09-05 13:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Prerna Saxena, marcandre.lureau, Peter Maydell, Fam Zheng, qemu-devel



On 09/02/2016 07:29 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 02, 2016 at 10:57:17AM +0200, Maxime Coquelin wrote:
>>
>>
>> On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote:
>>> On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
>>>>
>>>>
>>>> On 08/14/2016 11:42 AM, Prerna Saxena wrote:
>>>>> On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>>
>>>>>
>>>>>> On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
>>>>>>>
>>>>>>> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
>>>>>>>>> From: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>>>>>
>>>>>>>>> The set_mem_table command currently does not seek a reply. Hence, there is
>>>>>>>>> no easy way for a remote application to notify to QEMU when it finished
>>>>>>>>> setting up memory, or if there were errors doing so.
>>>>>>>>>
>>>>>>>>> As an example:
>>>>>>>>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
>>>>>>>>> application). SET_MEM_TABLE does not require a reply according to the spec.
>>>>>>>>> (2) Qemu commits the memory to the guest.
>>>>>>>>> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
>>>>>>>>> (4) The application has not yet remapped the memory, but it sees the I/O request.
>>>>>>>>> (5) The application cannot satisfy the request because it does not know about those GPAs.
>>>>>>>>>
>>>>>>>>> While a guaranteed fix would require a protocol extension (committed separately),
>>>>>>>>> a best-effort workaround for existing applications is to send a GET_FEATURES
>>>>>>>>> message before completing the vhost_user_set_mem_table() call.
>>>>>>>>> Since GET_FEATURES requires a reply, an application that processes vhost-user
>>>>>>>>> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>>
>>>>>>>> Sporadic hangs are seen with test-vhost-user after this patch:
>>>>>>>>
>>>>>>>> https://travis-ci.org/qemu/qemu/builds
>>>>>>>>
>>>>>>>> Reverting seems to fix it for me.
>>>>>>>>
>>>>>>>> Is this a known problem?
>>>>>>>>
>>>>>>>> Fam
>>>>>>>
>>>>>>> Hi Fam,
>>>>>>> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
>>>>>>> I am setting up the docker test env to repro this, but I think I can guess the problem :
>>>>>>>
>>>>>>> In tests/vhost-user-test.c:
>>>>>>>
>>>>>>> static void chr_read(void *opaque, const uint8_t *buf, int size)
>>>>>>> {
>>>>>>> ..[snip]..
>>>>>>>
>>>>>>>     case VHOST_USER_SET_MEM_TABLE:
>>>>>>>        /* received the mem table */
>>>>>>>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>>>>>>>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
>>>>>>>
>>>>>>>
>>>>>>>        /* signal the test that it can continue */
>>>>>>>        g_cond_signal(&s->data_cond);
>>>>>>>        break;
>>>>>>> ..[snip]..
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> The test seems to be marked complete as soon as mem_table is copied.
>>>>>>> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
>>>>>>
>>>>>> Hmm but why does it matter that data_cond is woken up?
>>>>>
>>>>> Michael, sorry, I didn’t quite understand that. Could you pls explain ?
>>>>>
>>>>>>
>>>>>>
>>>>>>> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
>>>>>>
>>>>>> What bothers me is that the new feature might cause the same
>>>>>> issue once we enable it in the test.
>>>>>
>>>>> No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
>>>>>
>>>>>>
>>>>>> How about a patch to tests/vhost-user-test.c adding the new
>>>>>> protocol feature? I would be quite interested to see what
>>>>>> is going on with it.
>>>>>
>>>>> Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
>>>>>
>>>>>>
>>>>>>
>>>>>>> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
>>>>>>
>>>>>> And succeeds every time on my systems :(
>>>>>
>>>>> +1 to that :( I have had no luck repro’ing it.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thoughts : Michael, Fam, MarcAndre ?
>>>>
>>>> I have managed to reproduce the hang by adding some debug prints into
>>>> vhost_user_get_features().
>>>>
>>>> Doing this the issue is reproducible quite easily.
>>>> Another way to reproduce it in one shot is to strace (with following
>>>> forks option) vhost-user-test execution.
>>>>
>>>> So, by adding debug prints at vhost_user_get_features() entry and exit,
>>>> we can see we never return from this function when hang happens.
>>>> Strace of Qemu instance shows that its thread keeps retrying to receive
>>>> GET_FEATURE reply:
>>>>
>>>> write(1, "vhost_user_get_features IN: \n", 29) = 29
>>>> sendmsg(11, {msg_name=NULL, msg_namelen=0,
>>>>         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
>>>>         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
>>>> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
>>>> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
>>>> ...
>>>> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
>>>> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
>>>>
>>>> The reason is that vhost-user-test never replies to Qemu,
>>>> because its thread handling the GET_FEATURES command is waiting for
>>>> the s->data_mutex lock.
>>>> This lock is held by the other vhost-user-test thread, executing
>>>> read_guest_mem().
>>>>
>>>> The lock is never released because the thread is blocked in read
>>>> syscall, when read_guest_mem() is doing the readl().
>>>>
>>>> This is because on Qemu side, the thread polling the qtest socket is
>>>> waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
>>>> the mutex is held by the thread trying to get the GET_FEATURE reply
>>>> (the TCG one).
>>>>
>>>> So here is the deadlock.
>>>>
>>>> That said, I don't see a clean way to solve this.
>>>> Any thoughts?
>>>>
>>>> Regards,
>>>> Maxime
>>>
>>> My thought is that we really need to do what I said:
>>> avoid doing GET_FEATURES (and setting reply_ack)
>>> on the first set_mem, and I quote:
>>>
>>> 	OK this all looks very reasonable (and I do like patch 1 too)
>>> 	but there's one source of waste here: we do not need to
>>> 	synchronize when we set up device the first time
>>> 	when hdev->memory_changed is false.
>>>
>>> 	I think we should test that and skip synch in both patches
>>> 	unless  hdev->memory_changed is set.
>>>
>>> with that change test will start passing.
>>
>> Actually, it looks like memory_changed is true even at first
>> SET_MEM_TABLE request.
>>
>> Thanks,
>> Maxime
>
> Let's add another flag then? What we care about is that it's not
> the first time set specify translations for a given address.

I added a dedicated flag, that skips sync on two conditions:
  1. First set_mem_table call
  2. If only a new regions are added

It solves the hang seen with vhost-user-test app, and I think the patch
makes sense.

But IMHO the problem is deeper than that, and could under some
conditions still hang when running in TCG mode.
Imagine Qemu sends a random "GET_FEATURE" request after the
set_mem_table, and vhost-user-test read_guest_mem() is executed just 
before this second call (Let's say it was not scheduled for some time).

In this case, read_guest_mem() thread owns the data_mutex, and start
doing readl() calls. On Qemu side, as we are sending an update of the
mem table, we own the qemu_global_mutex, and the deadlock happen again:
  - Vhost-user-test
    * read_guest_mem() thread: Blocked in readl(), waiting for Qemu to
handle it (TCG mode only), owning the data_mutex lock.
    * Command handler thread: Received GET_FEATURE event, but wait for
data_mutex ownership to handle it.

  - Qemu
    * FDs polling thread: Wait for qemu_global_mutex ownership, to be
able to handle the readl() request from vhost-user-test.
    * TCG thread: Own the qemu_global_mutex, and poll to receive the
GET_FEATURE reply.

Maybe the GET_FEATURE case is not realistic, but what about
GET_VRING_BASE, that get called by vhost_net_stop()?

Thanks in advance,
Maxime

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.
  2016-09-05 13:06                   ` Maxime Coquelin
@ 2016-09-06  2:22                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-09-06  2:22 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Prerna Saxena, marcandre.lureau, Peter Maydell, Fam Zheng, qemu-devel

On Mon, Sep 05, 2016 at 03:06:09PM +0200, Maxime Coquelin wrote:
> 
> 
> On 09/02/2016 07:29 PM, Michael S. Tsirkin wrote:
> > On Fri, Sep 02, 2016 at 10:57:17AM +0200, Maxime Coquelin wrote:
> > > 
> > > 
> > > On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
> > > > > 
> > > > > 
> > > > > On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> > > > > > On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > > On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> > > > > > > > 
> > > > > > > > On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> > > > > > > > > > From: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > > > > > 
> > > > > > > > > > The set_mem_table command currently does not seek a reply. Hence, there is
> > > > > > > > > > no easy way for a remote application to notify to QEMU when it finished
> > > > > > > > > > setting up memory, or if there were errors doing so.
> > > > > > > > > > 
> > > > > > > > > > As an example:
> > > > > > > > > > (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> > > > > > > > > > application). SET_MEM_TABLE does not require a reply according to the spec.
> > > > > > > > > > (2) Qemu commits the memory to the guest.
> > > > > > > > > > (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> > > > > > > > > > (4) The application has not yet remapped the memory, but it sees the I/O request.
> > > > > > > > > > (5) The application cannot satisfy the request because it does not know about those GPAs.
> > > > > > > > > > 
> > > > > > > > > > While a guaranteed fix would require a protocol extension (committed separately),
> > > > > > > > > > a best-effort workaround for existing applications is to send a GET_FEATURES
> > > > > > > > > > message before completing the vhost_user_set_mem_table() call.
> > > > > > > > > > Since GET_FEATURES requires a reply, an application that processes vhost-user
> > > > > > > > > > messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > 
> > > > > > > > > Sporadic hangs are seen with test-vhost-user after this patch:
> > > > > > > > > 
> > > > > > > > > https://travis-ci.org/qemu/qemu/builds
> > > > > > > > > 
> > > > > > > > > Reverting seems to fix it for me.
> > > > > > > > > 
> > > > > > > > > Is this a known problem?
> > > > > > > > > 
> > > > > > > > > Fam
> > > > > > > > 
> > > > > > > > Hi Fam,
> > > > > > > > Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> > > > > > > > I am setting up the docker test env to repro this, but I think I can guess the problem :
> > > > > > > > 
> > > > > > > > In tests/vhost-user-test.c:
> > > > > > > > 
> > > > > > > > static void chr_read(void *opaque, const uint8_t *buf, int size)
> > > > > > > > {
> > > > > > > > ..[snip]..
> > > > > > > > 
> > > > > > > >     case VHOST_USER_SET_MEM_TABLE:
> > > > > > > >        /* received the mem table */
> > > > > > > >        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
> > > > > > > >        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> > > > > > > > 
> > > > > > > > 
> > > > > > > >        /* signal the test that it can continue */
> > > > > > > >        g_cond_signal(&s->data_cond);
> > > > > > > >        break;
> > > > > > > > ..[snip]..
> > > > > > > > }
> > > > > > > > 
> > > > > > > > 
> > > > > > > > The test seems to be marked complete as soon as mem_table is copied.
> > > > > > > > However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
> > > > > > > 
> > > > > > > Hmm but why does it matter that data_cond is woken up?
> > > > > > 
> > > > > > Michael, sorry, I didn’t quite understand that. Could you pls explain ?
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
> > > > > > > 
> > > > > > > What bothers me is that the new feature might cause the same
> > > > > > > issue once we enable it in the test.
> > > > > > 
> > > > > > No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
> > > > > > 
> > > > > > > 
> > > > > > > How about a patch to tests/vhost-user-test.c adding the new
> > > > > > > protocol feature? I would be quite interested to see what
> > > > > > > is going on with it.
> > > > > > 
> > > > > > Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
> > > > > > > 
> > > > > > > And succeeds every time on my systems :(
> > > > > > 
> > > > > > +1 to that :( I have had no luck repro’ing it.
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > > > 
> > > > > I have managed to reproduce the hang by adding some debug prints into
> > > > > vhost_user_get_features().
> > > > > 
> > > > > Doing this the issue is reproducible quite easily.
> > > > > Another way to reproduce it in one shot is to strace (with following
> > > > > forks option) vhost-user-test execution.
> > > > > 
> > > > > So, by adding debug prints at vhost_user_get_features() entry and exit,
> > > > > we can see we never return from this function when hang happens.
> > > > > Strace of Qemu instance shows that its thread keeps retrying to receive
> > > > > GET_FEATURE reply:
> > > > > 
> > > > > write(1, "vhost_user_get_features IN: \n", 29) = 29
> > > > > sendmsg(11, {msg_name=NULL, msg_namelen=0,
> > > > >         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
> > > > >         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
> > > > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > > > ...
> > > > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > > > 
> > > > > The reason is that vhost-user-test never replies to Qemu,
> > > > > because its thread handling the GET_FEATURES command is waiting for
> > > > > the s->data_mutex lock.
> > > > > This lock is held by the other vhost-user-test thread, executing
> > > > > read_guest_mem().
> > > > > 
> > > > > The lock is never released because the thread is blocked in read
> > > > > syscall, when read_guest_mem() is doing the readl().
> > > > > 
> > > > > This is because on Qemu side, the thread polling the qtest socket is
> > > > > waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
> > > > > the mutex is held by the thread trying to get the GET_FEATURE reply
> > > > > (the TCG one).
> > > > > 
> > > > > So here is the deadlock.
> > > > > 
> > > > > That said, I don't see a clean way to solve this.
> > > > > Any thoughts?
> > > > > 
> > > > > Regards,
> > > > > Maxime
> > > > 
> > > > My thought is that we really need to do what I said:
> > > > avoid doing GET_FEATURES (and setting reply_ack)
> > > > on the first set_mem, and I quote:
> > > > 
> > > > 	OK this all looks very reasonable (and I do like patch 1 too)
> > > > 	but there's one source of waste here: we do not need to
> > > > 	synchronize when we set up device the first time
> > > > 	when hdev->memory_changed is false.
> > > > 
> > > > 	I think we should test that and skip synch in both patches
> > > > 	unless  hdev->memory_changed is set.
> > > > 
> > > > with that change test will start passing.
> > > 
> > > Actually, it looks like memory_changed is true even at first
> > > SET_MEM_TABLE request.
> > > 
> > > Thanks,
> > > Maxime
> > 
> > Let's add another flag then? What we care about is that it's not
> > the first time set specify translations for a given address.
> 
> I added a dedicated flag, that skips sync on two conditions:
>  1. First set_mem_table call
>  2. If only a new regions are added
> 
> It solves the hang seen with vhost-user-test app, and I think the patch
> makes sense.
> 
> But IMHO the problem is deeper than that, and could under some
> conditions still hang when running in TCG mode.
> Imagine Qemu sends a random "GET_FEATURE" request after the
> set_mem_table, and vhost-user-test read_guest_mem() is executed just before
> this second call (Let's say it was not scheduled for some time).
>
> In this case, read_guest_mem() thread owns the data_mutex, and start
> doing readl() calls. On Qemu side, as we are sending an update of the
> mem table, we own the qemu_global_mutex, and the deadlock happen again:
>  - Vhost-user-test
>    * read_guest_mem() thread: Blocked in readl(), waiting for Qemu to
> handle it (TCG mode only), owning the data_mutex lock.
>    * Command handler thread: Received GET_FEATURE event, but wait for
> data_mutex ownership to handle it.
> 
>  - Qemu
>    * FDs polling thread: Wait for qemu_global_mutex ownership, to be
> able to handle the readl() request from vhost-user-test.
>    * TCG thread: Own the qemu_global_mutex, and poll to receive the
> GET_FEATURE reply.
> 
> Maybe the GET_FEATURE case is not realistic, but what about
> GET_VRING_BASE, that get called by vhost_net_stop()?
> 
> Thanks in advance,
> Maxime

I think most applications expect to be able to handle GET_VRING_BASE
at any time.
If application isn't ready to handle GET_VRING_BASE at any time,
we won't be able to stop the device.

This is not the case for this test but that's because it's
just a unit test, so incomplete.


-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-09-06  2:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-10 15:30 [Qemu-devel] [PULL 0/3] virtio/vhost: fixes Michael S. Tsirkin
2016-08-10 15:30 ` [Qemu-devel] [PULL 1/3] vhost: check for vhost_ops before using Michael S. Tsirkin
2016-08-10 15:30 ` [Qemu-devel] [PULL 2/3] vhost-user: Introduce a new protocol feature REPLY_ACK Michael S. Tsirkin
2016-08-10 15:30 ` [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table Michael S. Tsirkin
2016-08-12  6:38   ` Fam Zheng
2016-08-12  7:16     ` Prerna Saxena
2016-08-12  7:20       ` Marc-André Lureau
2016-08-12 12:01         ` Peter Maydell
2016-08-12 15:49           ` Michael S. Tsirkin
2016-08-15 10:20             ` Peter Maydell
2016-08-12 15:47         ` Michael S. Tsirkin
2016-08-12 15:54           ` Marc-André Lureau
2016-08-12 21:12             ` Michael S. Tsirkin
2016-08-13  6:13               ` Marc-André Lureau
2016-08-14  2:30                 ` Michael S. Tsirkin
2016-08-14  2:44                 ` Michael S. Tsirkin
2016-08-14  2:51       ` Michael S. Tsirkin
2016-08-14  9:42         ` Prerna Saxena
2016-08-14 21:39           ` Michael S. Tsirkin
2016-08-31 11:19           ` Maxime Coquelin
2016-09-01 13:46             ` Michael S. Tsirkin
2016-09-02  8:57               ` Maxime Coquelin
2016-09-02 17:29                 ` Michael S. Tsirkin
2016-09-05 13:06                   ` Maxime Coquelin
2016-09-06  2:22                     ` Michael S. Tsirkin
2016-08-10 17:32 ` [Qemu-devel] [PULL 0/3] virtio/vhost: fixes Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.