qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Linux vhost-user interrupt management fixes
@ 2015-12-03  9:53 Didier Pallard
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full Didier Pallard
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Didier Pallard @ 2015-12-03  9:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: thibaut.collet, jmg, Didier Pallard

Hi,

I recently did some stress tests of a vhost-user interface using an UDP
traffic generator. Traffic generator was connected to 2 physical ports
that are in turn connected to 2 virtio ports through a linux bridge, VM
(running linux) doing routing to forward packets between the 2 virtio ports.
When traffic reaches high pps rates of small packets, I faced the 2 following
problems:

- at some time, my qemu socket becomes full, causing qemu to send incomplete
SET_VRING_CALL messages to vhost-user backend (without proper fd set in
ancillary data).
- after some time, some interrupts are lost, causing the VM to stop
transmitting packets.

Both problems come from the fact that interrupt masking/unmasking of the VM
is deferred to vhost-user backend through the linux socket.
First problem comes from the fact that socket buffer gets full; it is corrected
in the first patch of the serie.
Second problem is a bit more complex. From what i understand of the code,
when VM wants to mask/unmask interrupts, qemu traps the command and sends a
SET_VRING_CALL to the vhost-user to swap interrupt notifications either to
a dummy descriptor or to an fd that was given to kvm module to route
interruption to the VM. After sending SET_VRING_CALL message through
the socket, VM code continues to run, assuming that the interrupts are now
masked; but due to linux socket, this message may be buffered and not currently
treated by the vhost-user backend, ie interrupts are not really masked/unmasked
as they ought to be.
I think it can be solved in two different ways:
- by waiting for an acknowledgement of vhost-user backend before exiting
interrupt masking function, this ensures that interrupt masking is correctly
taken into account before giving back hand to the VM, but it has a very high
cost in term of cycles. Moreover, unless specifying a new option, it will
break current API, since existing vhost-user implementations do not
expect a return message on a SET_VRING_CALL call.
- second way could be, in case vhost-user is involved, to restore the
initial behaviour of interrupt masking (ie masking/unmasking interrupts
by taking/giving vring_call fd from/to kernel kvm module).
Patches 2 and 3 of the serie propose an implementation of this second option.

Didier

Didier Pallard (3):
  char: fix vhost-user socket full
  virtio-pci: add an option to bypass guest_notifier_mask
  vhost-net: force guest_notifier_mask bypass in vhost-user case

 hw/net/vhost_net.c     | 19 ++++++++++++++++++-
 hw/virtio/vhost.c      | 13 +++++++++++++
 hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
 hw/virtio/virtio-pci.h |  6 ++++++
 qemu-char.c            | 10 ++++++++++
 5 files changed, 70 insertions(+), 7 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2015-12-03  9:53 [Qemu-devel] Linux vhost-user interrupt management fixes Didier Pallard
@ 2015-12-03  9:53 ` Didier Pallard
  2015-12-07 13:31   ` Marc-André Lureau
                     ` (2 more replies)
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask Didier Pallard
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 33+ messages in thread
From: Didier Pallard @ 2015-12-03  9:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: thibaut.collet, jmg, Didier Pallard

unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
is used to send a message and retries as long as EAGAIN errno is set,
but write_msgfds buffer is freed after first EAGAIN failure, causing
message to be sent without proper fds attachment.

In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
user responsability to resend message as is or to free write_msgfds
using set_msgfds(0)

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
---
 qemu-char.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/qemu-char.c b/qemu-char.c
index 5448b0f..26d5f2e 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
         r = sendmsg(s->fd, &msgh, 0);
     } while (r < 0 && errno == EINTR);
 
+    /* Ancillary data are not sent if no byte is written
+     * so don't free msgfds buffer if return value is EAGAIN
+     * If called from qemu_chr_fe_write_all retry will come soon
+     * If called from qemu_chr_fe_write, it is the user responsibility
+     * to resend message or free fds using set_msgfds(0)
+     */
+    if (r < 0 && errno == EAGAIN) {
+        return r;
+    }
+
     /* free the written msgfds, no matter what */
     if (s->write_msgfds_num) {
         g_free(s->write_msgfds);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask
  2015-12-03  9:53 [Qemu-devel] Linux vhost-user interrupt management fixes Didier Pallard
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full Didier Pallard
@ 2015-12-03  9:53 ` Didier Pallard
  2015-12-07 13:37   ` Marc-André Lureau
                     ` (2 more replies)
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 3/3] vhost-net: force guest_notifier_mask bypass in vhost-user case Didier Pallard
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 33+ messages in thread
From: Didier Pallard @ 2015-12-03  9:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: thibaut.collet, jmg, Didier Pallard

Using guest_notifier_mask function in vhost-user case may
break interrupt mask paradigm, because mask/unmask is not
really done when returning from guest_notifier_mask call, instead
message is posted in a unix socket, and processed later.

Add an option bit to disable the use of guest_notifier_mask
in virtio pci.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
---
 hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
 hw/virtio/virtio-pci.h |  6 ++++++
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index dd48562..26bb617 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -806,7 +806,8 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
         /* If guest supports masking, set up irqfd now.
          * Otherwise, delay until unmasked in the frontend.
          */
-        if (k->guest_notifier_mask) {
+        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
+            k->guest_notifier_mask) {
             ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
             if (ret < 0) {
                 kvm_virtio_pci_vq_vector_release(proxy, vector);
@@ -822,7 +823,8 @@ undo:
         if (vector >= msix_nr_vectors_allocated(dev)) {
             continue;
         }
-        if (k->guest_notifier_mask) {
+        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
+            k->guest_notifier_mask) {
             kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
         }
         kvm_virtio_pci_vq_vector_release(proxy, vector);
@@ -849,7 +851,8 @@ static void kvm_virtio_pci_vector_release(VirtIOPCIProxy *proxy, int nvqs)
         /* If guest supports masking, clean up irqfd now.
          * Otherwise, it was cleaned when masked in the frontend.
          */
-        if (k->guest_notifier_mask) {
+        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
+            k->guest_notifier_mask) {
             kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
         }
         kvm_virtio_pci_vq_vector_release(proxy, vector);
@@ -882,7 +885,8 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
     /* If guest supports masking, irqfd is already setup, unmask it.
      * Otherwise, set it up now.
      */
-    if (k->guest_notifier_mask) {
+    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
+        k->guest_notifier_mask) {
         k->guest_notifier_mask(vdev, queue_no, false);
         /* Test after unmasking to avoid losing events. */
         if (k->guest_notifier_pending &&
@@ -905,7 +909,8 @@ static void virtio_pci_vq_vector_mask(VirtIOPCIProxy *proxy,
     /* If guest supports masking, keep irqfd but mask it.
      * Otherwise, clean it up now.
      */ 
-    if (k->guest_notifier_mask) {
+    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
+        k->guest_notifier_mask) {
         k->guest_notifier_mask(vdev, queue_no, true);
     } else {
         kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
@@ -1022,7 +1027,9 @@ static int virtio_pci_set_guest_notifier(DeviceState *d, int n, bool assign,
         event_notifier_cleanup(notifier);
     }
 
-    if (!msix_enabled(&proxy->pci_dev) && vdc->guest_notifier_mask) {
+    if (!msix_enabled(&proxy->pci_dev) &&
+        (proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
+        vdc->guest_notifier_mask) {
         vdc->guest_notifier_mask(vdev, n, !assign);
     }
 
@@ -1164,6 +1171,8 @@ static void virtio_9p_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 static Property virtio_9p_pci_properties[] = {
     DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
                     VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
+    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
+                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
     DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -1908,6 +1917,8 @@ static Property virtio_blk_pci_properties[] = {
     DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
     DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
                     VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
+    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
+                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
     DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -1961,6 +1972,8 @@ static const TypeInfo virtio_blk_pci_info = {
 static Property virtio_scsi_pci_properties[] = {
     DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
                     VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
+    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
+                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
     DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
                        DEV_NVECTORS_UNSPECIFIED),
     DEFINE_PROP_END_OF_LIST(),
@@ -2175,6 +2188,8 @@ static void virtio_serial_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 static Property virtio_serial_pci_properties[] = {
     DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
                     VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
+    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
+                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
     DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
     DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
     DEFINE_PROP_END_OF_LIST(),
@@ -2215,6 +2230,8 @@ static const TypeInfo virtio_serial_pci_info = {
 static Property virtio_net_properties[] = {
     DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
                     VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, false),
+    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
+                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
     DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
     DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index ffb74bb..aecd4eb 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -86,6 +86,12 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
 #define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY \
     (1 << VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT)
 
+/* Where vhost-user implementation exists, using the guest notifier mask
+ * feature can lead to improper interrupt management. Add a flag to
+ * allow to disable this guest notifier mask if desired */
+#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT 6
+#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK (1 << VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT)
+
 typedef struct {
     MSIMessage msg;
     int virq;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 3/3] vhost-net: force guest_notifier_mask bypass in vhost-user case
  2015-12-03  9:53 [Qemu-devel] Linux vhost-user interrupt management fixes Didier Pallard
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full Didier Pallard
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask Didier Pallard
@ 2015-12-03  9:53 ` Didier Pallard
  2016-02-04 13:06   ` Michael S. Tsirkin
  2015-12-04 10:04 ` [Qemu-devel] Linux vhost-user interrupt management fixes Didier Pallard
  2016-01-25  9:22 ` Victor Kaplansky
  4 siblings, 1 reply; 33+ messages in thread
From: Didier Pallard @ 2015-12-03  9:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: thibaut.collet, jmg, Didier Pallard

Since guest_mask_notifier can not be used in vhost-user
mode due to buffering implied by unix control socket,
force VIRTIO_PCI_FLAG_USE_NOTIFIERMASK on virtio pci
of vhost-user interfaces, and send correct callfd
to the guest at vhost start.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
---
 hw/net/vhost_net.c | 19 ++++++++++++++++++-
 hw/virtio/vhost.c  | 13 +++++++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 318c3e6..74318dc 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -36,9 +36,11 @@
 #include <stdio.h>
 
 #include "standard-headers/linux/virtio_ring.h"
+#include "hw/pci/pci.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-pci.h"
 
 struct vhost_net {
     struct vhost_dev dev;
@@ -314,7 +316,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     }
 
     for (i = 0; i < total_queues; i++) {
-        vhost_net_set_vq_index(get_vhost_net(ncs[i].peer), i * 2);
+        struct vhost_net *net= get_vhost_net(ncs[i].peer);
+        vhost_net_set_vq_index(net, i * 2);
+
+        /* Force VIRTIO_PCI_FLAG_USE_NOTIFIERMASK reset in vhost user case
+         * Must be done before set_guest_notifier call
+         */
+        if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
+            BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
+            DeviceState *d = DEVICE(qbus->parent);
+            if (!strcmp(object_get_typename(OBJECT(d)), TYPE_VIRTIO_NET_PCI)) {
+                VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+
+                /* Force proxy to not use mask notifier */
+                proxy->flags &= ~VIRTIO_PCI_FLAG_USE_NOTIFIERMASK;
+            }
+        }
     }
 
     r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index de29968..7a4c1d3 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -854,8 +854,21 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
     /* Clear and discard previous events if any. */
     event_notifier_test_and_clear(&vq->masked_notifier);
 
+    /* If vhost user, register now the call eventfd, guest_notifier_mask
+     * function is not used anymore
+     */
+    if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER) {
+        file.fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vvq));
+        r = dev->vhost_ops->vhost_set_vring_call(dev, &file);
+        if (r) {
+            r = -errno;
+            goto fail_call;
+        }
+    }
+
     return 0;
 
+fail_call:
 fail_kick:
 fail_alloc:
     cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] Linux vhost-user interrupt management fixes
  2015-12-03  9:53 [Qemu-devel] Linux vhost-user interrupt management fixes Didier Pallard
                   ` (2 preceding siblings ...)
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 3/3] vhost-net: force guest_notifier_mask bypass in vhost-user case Didier Pallard
@ 2015-12-04 10:04 ` Didier Pallard
  2016-01-25  9:22 ` Victor Kaplansky
  4 siblings, 0 replies; 33+ messages in thread
From: Didier Pallard @ 2015-12-04 10:04 UTC (permalink / raw)
  To: qemu-devel, mst, pbonzini; +Cc: thibaut.collet, jmg

On 12/03/2015 10:53 AM, Didier Pallard wrote:
> Hi,
>
> I recently did some stress tests of a vhost-user interface using an UDP
> traffic generator. Traffic generator was connected to 2 physical ports
> that are in turn connected to 2 virtio ports through a linux bridge, VM
> (running linux) doing routing to forward packets between the 2 virtio ports.
> When traffic reaches high pps rates of small packets, I faced the 2 following
> problems:
>
> - at some time, my qemu socket becomes full, causing qemu to send incomplete
> SET_VRING_CALL messages to vhost-user backend (without proper fd set in
> ancillary data).
> - after some time, some interrupts are lost, causing the VM to stop
> transmitting packets.
>
> Both problems come from the fact that interrupt masking/unmasking of the VM
> is deferred to vhost-user backend through the linux socket.
> First problem comes from the fact that socket buffer gets full; it is corrected
> in the first patch of the serie.
> Second problem is a bit more complex. From what i understand of the code,
> when VM wants to mask/unmask interrupts, qemu traps the command and sends a
> SET_VRING_CALL to the vhost-user to swap interrupt notifications either to
> a dummy descriptor or to an fd that was given to kvm module to route
> interruption to the VM. After sending SET_VRING_CALL message through
> the socket, VM code continues to run, assuming that the interrupts are now
> masked; but due to linux socket, this message may be buffered and not currently
> treated by the vhost-user backend, ie interrupts are not really masked/unmasked
> as they ought to be.
> I think it can be solved in two different ways:
> - by waiting for an acknowledgement of vhost-user backend before exiting
> interrupt masking function, this ensures that interrupt masking is correctly
> taken into account before giving back hand to the VM, but it has a very high
> cost in term of cycles. Moreover, unless specifying a new option, it will
> break current API, since existing vhost-user implementations do not
> expect a return message on a SET_VRING_CALL call.
> - second way could be, in case vhost-user is involved, to restore the
> initial behaviour of interrupt masking (ie masking/unmasking interrupts
> by taking/giving vring_call fd from/to kernel kvm module).
> Patches 2 and 3 of the serie propose an implementation of this second option.
>
> Didier
>
> Didier Pallard (3):
>    char: fix vhost-user socket full
>    virtio-pci: add an option to bypass guest_notifier_mask
>    vhost-net: force guest_notifier_mask bypass in vhost-user case
>
>   hw/net/vhost_net.c     | 19 ++++++++++++++++++-
>   hw/virtio/vhost.c      | 13 +++++++++++++
>   hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
>   hw/virtio/virtio-pci.h |  6 ++++++
>   qemu-char.c            | 10 ++++++++++
>   5 files changed, 70 insertions(+), 7 deletions(-)
>

+maintainers

sorry, i forgot them in initial mail

didier

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full Didier Pallard
@ 2015-12-07 13:31   ` Marc-André Lureau
  2015-12-09 15:59     ` Victor Kaplansky
  2016-02-04 13:13   ` Michael S. Tsirkin
  2016-02-04 14:10   ` Michael S. Tsirkin
  2 siblings, 1 reply; 33+ messages in thread
From: Marc-André Lureau @ 2015-12-07 13:31 UTC (permalink / raw)
  To: Didier Pallard; +Cc: Thibaut Collet, Jean-Mickael Guerin, QEMU

Hi

On Thu, Dec 3, 2015 at 10:53 AM, Didier Pallard
<didier.pallard@6wind.com> wrote:
> unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> is used to send a message and retries as long as EAGAIN errno is set,
> but write_msgfds buffer is freed after first EAGAIN failure, causing
> message to be sent without proper fds attachment.
>
> In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> user responsability to resend message as is or to free write_msgfds
> using set_msgfds(0)
>
> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> ---
>  qemu-char.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/qemu-char.c b/qemu-char.c
> index 5448b0f..26d5f2e 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
>          r = sendmsg(s->fd, &msgh, 0);
>      } while (r < 0 && errno == EINTR);
>
> +    /* Ancillary data are not sent if no byte is written
> +     * so don't free msgfds buffer if return value is EAGAIN
> +     * If called from qemu_chr_fe_write_all retry will come soon
> +     * If called from qemu_chr_fe_write, it is the user responsibility
> +     * to resend message or free fds using set_msgfds(0)
> +     */
> +    if (r < 0 && errno == EAGAIN) {
> +        return r;
> +    }
> +

This looks reasonable to me. However, I don't know what happens with
partial write of ancillary data. Hopefully it's all or nothing.
Apparently, reading unix_stream_sendmsg() in kernel shows that as long
as a few bytes have been sent, the ancillary data is sent. So it looks
like it still does the right thing in case of a partial write.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

>      /* free the written msgfds, no matter what */
>      if (s->write_msgfds_num) {
>          g_free(s->write_msgfds);
> --
> 2.1.4
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask Didier Pallard
@ 2015-12-07 13:37   ` Marc-André Lureau
  2015-12-07 13:59     ` Marc-André Lureau
  2016-02-04 13:08   ` Michael S. Tsirkin
  2016-02-15 15:38   ` Victor Kaplansky
  2 siblings, 1 reply; 33+ messages in thread
From: Marc-André Lureau @ 2015-12-07 13:37 UTC (permalink / raw)
  To: Didier Pallard; +Cc: Thibaut Collet, Jean-Mickael Guerin, QEMU

Hi

On Thu, Dec 3, 2015 at 10:53 AM, Didier Pallard
<didier.pallard@6wind.com> wrote:
> Using guest_notifier_mask function in vhost-user case may
> break interrupt mask paradigm, because mask/unmask is not
> really done when returning from guest_notifier_mask call, instead
> message is posted in a unix socket, and processed later.
>
> Add an option bit to disable the use of guest_notifier_mask
> in virtio pci.

Why not disabling it whenever vhost-user is used if it's broken and
inefficient to be fixed?

>
> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> ---
>  hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
>  hw/virtio/virtio-pci.h |  6 ++++++
>  2 files changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index dd48562..26bb617 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -806,7 +806,8 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
>          /* If guest supports masking, set up irqfd now.
>           * Otherwise, delay until unmasked in the frontend.
>           */
> -        if (k->guest_notifier_mask) {
> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +            k->guest_notifier_mask) {
>              ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
>              if (ret < 0) {
>                  kvm_virtio_pci_vq_vector_release(proxy, vector);
> @@ -822,7 +823,8 @@ undo:
>          if (vector >= msix_nr_vectors_allocated(dev)) {
>              continue;
>          }
> -        if (k->guest_notifier_mask) {
> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +            k->guest_notifier_mask) {
>              kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>          }
>          kvm_virtio_pci_vq_vector_release(proxy, vector);
> @@ -849,7 +851,8 @@ static void kvm_virtio_pci_vector_release(VirtIOPCIProxy *proxy, int nvqs)
>          /* If guest supports masking, clean up irqfd now.
>           * Otherwise, it was cleaned when masked in the frontend.
>           */
> -        if (k->guest_notifier_mask) {
> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +            k->guest_notifier_mask) {
>              kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>          }
>          kvm_virtio_pci_vq_vector_release(proxy, vector);
> @@ -882,7 +885,8 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
>      /* If guest supports masking, irqfd is already setup, unmask it.
>       * Otherwise, set it up now.
>       */
> -    if (k->guest_notifier_mask) {
> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +        k->guest_notifier_mask) {
>          k->guest_notifier_mask(vdev, queue_no, false);
>          /* Test after unmasking to avoid losing events. */
>          if (k->guest_notifier_pending &&
> @@ -905,7 +909,8 @@ static void virtio_pci_vq_vector_mask(VirtIOPCIProxy *proxy,
>      /* If guest supports masking, keep irqfd but mask it.
>       * Otherwise, clean it up now.
>       */
> -    if (k->guest_notifier_mask) {
> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +        k->guest_notifier_mask) {
>          k->guest_notifier_mask(vdev, queue_no, true);
>      } else {
>          kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
> @@ -1022,7 +1027,9 @@ static int virtio_pci_set_guest_notifier(DeviceState *d, int n, bool assign,
>          event_notifier_cleanup(notifier);
>      }
>
> -    if (!msix_enabled(&proxy->pci_dev) && vdc->guest_notifier_mask) {
> +    if (!msix_enabled(&proxy->pci_dev) &&
> +        (proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +        vdc->guest_notifier_mask) {
>          vdc->guest_notifier_mask(vdev, n, !assign);
>      }
>
> @@ -1164,6 +1171,8 @@ static void virtio_9p_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>  static Property virtio_9p_pci_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> @@ -1908,6 +1917,8 @@ static Property virtio_blk_pci_properties[] = {
>      DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> @@ -1961,6 +1972,8 @@ static const TypeInfo virtio_blk_pci_info = {
>  static Property virtio_scsi_pci_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
>                         DEV_NVECTORS_UNSPECIFIED),
>      DEFINE_PROP_END_OF_LIST(),
> @@ -2175,6 +2188,8 @@ static void virtio_serial_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>  static Property virtio_serial_pci_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>      DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>      DEFINE_PROP_END_OF_LIST(),
> @@ -2215,6 +2230,8 @@ static const TypeInfo virtio_serial_pci_info = {
>  static Property virtio_net_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, false),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index ffb74bb..aecd4eb 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -86,6 +86,12 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
>  #define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY \
>      (1 << VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT)
>
> +/* Where vhost-user implementation exists, using the guest notifier mask
> + * feature can lead to improper interrupt management. Add a flag to
> + * allow to disable this guest notifier mask if desired */
> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT 6
> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK (1 << VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT)
> +
>  typedef struct {
>      MSIMessage msg;
>      int virq;
> --
> 2.1.4
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask
  2015-12-07 13:37   ` Marc-André Lureau
@ 2015-12-07 13:59     ` Marc-André Lureau
  2015-12-09 15:06       ` Didier Pallard
  0 siblings, 1 reply; 33+ messages in thread
From: Marc-André Lureau @ 2015-12-07 13:59 UTC (permalink / raw)
  To: Didier Pallard; +Cc: Thibaut Collet, Jean-Mickael Guerin, QEMU

Hi

On Mon, Dec 7, 2015 at 2:37 PM, Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
> Hi
>
> On Thu, Dec 3, 2015 at 10:53 AM, Didier Pallard
> <didier.pallard@6wind.com> wrote:
>> Using guest_notifier_mask function in vhost-user case may
>> break interrupt mask paradigm, because mask/unmask is not
>> really done when returning from guest_notifier_mask call, instead
>> message is posted in a unix socket, and processed later.
>>
>> Add an option bit to disable the use of guest_notifier_mask
>> in virtio pci.
>
> Why not disabling it whenever vhost-user is used if it's broken and
> inefficient to be fixed?

Sorry, I missed the following patch.

I am not sure it's necessary to have a user-visible property,
especially if it's overwritten later. But apparently, other properties
are already forced.

looks good overall to me, but maintainers might have different opinions.

>>
>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
>> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
>> ---
>>  hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
>>  hw/virtio/virtio-pci.h |  6 ++++++
>>  2 files changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>> index dd48562..26bb617 100644
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -806,7 +806,8 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
>>          /* If guest supports masking, set up irqfd now.
>>           * Otherwise, delay until unmasked in the frontend.
>>           */
>> -        if (k->guest_notifier_mask) {
>> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +            k->guest_notifier_mask) {
>>              ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
>>              if (ret < 0) {
>>                  kvm_virtio_pci_vq_vector_release(proxy, vector);
>> @@ -822,7 +823,8 @@ undo:
>>          if (vector >= msix_nr_vectors_allocated(dev)) {
>>              continue;
>>          }
>> -        if (k->guest_notifier_mask) {
>> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +            k->guest_notifier_mask) {
>>              kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>>          }
>>          kvm_virtio_pci_vq_vector_release(proxy, vector);
>> @@ -849,7 +851,8 @@ static void kvm_virtio_pci_vector_release(VirtIOPCIProxy *proxy, int nvqs)
>>          /* If guest supports masking, clean up irqfd now.
>>           * Otherwise, it was cleaned when masked in the frontend.
>>           */
>> -        if (k->guest_notifier_mask) {
>> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +            k->guest_notifier_mask) {
>>              kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>>          }
>>          kvm_virtio_pci_vq_vector_release(proxy, vector);
>> @@ -882,7 +885,8 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
>>      /* If guest supports masking, irqfd is already setup, unmask it.
>>       * Otherwise, set it up now.
>>       */
>> -    if (k->guest_notifier_mask) {
>> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +        k->guest_notifier_mask) {
>>          k->guest_notifier_mask(vdev, queue_no, false);
>>          /* Test after unmasking to avoid losing events. */
>>          if (k->guest_notifier_pending &&
>> @@ -905,7 +909,8 @@ static void virtio_pci_vq_vector_mask(VirtIOPCIProxy *proxy,
>>      /* If guest supports masking, keep irqfd but mask it.
>>       * Otherwise, clean it up now.
>>       */
>> -    if (k->guest_notifier_mask) {
>> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +        k->guest_notifier_mask) {
>>          k->guest_notifier_mask(vdev, queue_no, true);
>>      } else {
>>          kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>> @@ -1022,7 +1027,9 @@ static int virtio_pci_set_guest_notifier(DeviceState *d, int n, bool assign,
>>          event_notifier_cleanup(notifier);
>>      }
>>
>> -    if (!msix_enabled(&proxy->pci_dev) && vdc->guest_notifier_mask) {
>> +    if (!msix_enabled(&proxy->pci_dev) &&
>> +        (proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +        vdc->guest_notifier_mask) {
>>          vdc->guest_notifier_mask(vdev, n, !assign);
>>      }
>>
>> @@ -1164,6 +1171,8 @@ static void virtio_9p_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>>  static Property virtio_9p_pci_properties[] = {
>>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>>      DEFINE_PROP_END_OF_LIST(),
>>  };
>> @@ -1908,6 +1917,8 @@ static Property virtio_blk_pci_properties[] = {
>>      DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>>      DEFINE_PROP_END_OF_LIST(),
>>  };
>> @@ -1961,6 +1972,8 @@ static const TypeInfo virtio_blk_pci_info = {
>>  static Property virtio_scsi_pci_properties[] = {
>>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
>>                         DEV_NVECTORS_UNSPECIFIED),
>>      DEFINE_PROP_END_OF_LIST(),
>> @@ -2175,6 +2188,8 @@ static void virtio_serial_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>>  static Property virtio_serial_pci_properties[] = {
>>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>>      DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>>      DEFINE_PROP_END_OF_LIST(),
>> @@ -2215,6 +2230,8 @@ static const TypeInfo virtio_serial_pci_info = {
>>  static Property virtio_net_properties[] = {
>>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, false),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
>>      DEFINE_PROP_END_OF_LIST(),
>>  };
>> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
>> index ffb74bb..aecd4eb 100644
>> --- a/hw/virtio/virtio-pci.h
>> +++ b/hw/virtio/virtio-pci.h
>> @@ -86,6 +86,12 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
>>  #define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY \
>>      (1 << VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT)
>>
>> +/* Where vhost-user implementation exists, using the guest notifier mask
>> + * feature can lead to improper interrupt management. Add a flag to
>> + * allow to disable this guest notifier mask if desired */
>> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT 6
>> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK (1 << VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT)
>> +
>>  typedef struct {
>>      MSIMessage msg;
>>      int virq;
>> --
>> 2.1.4
>>
>>
>
>
>
> --
> Marc-André Lureau



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask
  2015-12-07 13:59     ` Marc-André Lureau
@ 2015-12-09 15:06       ` Didier Pallard
  0 siblings, 0 replies; 33+ messages in thread
From: Didier Pallard @ 2015-12-09 15:06 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: Thibaut Collet, Jean-Mickael Guerin, QEMU

On 12/07/2015 02:59 PM, Marc-André Lureau wrote:
> Hi
>
> On Mon, Dec 7, 2015 at 2:37 PM, Marc-André Lureau
> <marcandre.lureau@gmail.com> wrote:
>> Hi
>>
>> On Thu, Dec 3, 2015 at 10:53 AM, Didier Pallard
>> <didier.pallard@6wind.com> wrote:
>>> Using guest_notifier_mask function in vhost-user case may
>>> break interrupt mask paradigm, because mask/unmask is not
>>> really done when returning from guest_notifier_mask call, instead
>>> message is posted in a unix socket, and processed later.
>>>
>>> Add an option bit to disable the use of guest_notifier_mask
>>> in virtio pci.
>> Why not disabling it whenever vhost-user is used if it's broken and
>> inefficient to be fixed?
> Sorry, I missed the following patch.
>
> I am not sure it's necessary to have a user-visible property,
> especially if it's overwritten later. But apparently, other properties
> are already forced.
>
> looks good overall to me, but maintainers might have different opinions.
>
>>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
>>> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
>>> ---
>>>   hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
>>>   hw/virtio/virtio-pci.h |  6 ++++++
>>>   2 files changed, 29 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>>> index dd48562..26bb617 100644
>>> --- a/hw/virtio/virtio-pci.c
>>> +++ b/hw/virtio/virtio-pci.c
>>> @@ -806,7 +806,8 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
>>>           /* If guest supports masking, set up irqfd now.
>>>            * Otherwise, delay until unmasked in the frontend.
>>>            */
>>> -        if (k->guest_notifier_mask) {
>>> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>>> +            k->guest_notifier_mask) {
>>>               ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
>>>               if (ret < 0) {
>>>                   kvm_virtio_pci_vq_vector_release(proxy, vector);
>>> @@ -822,7 +823,8 @@ undo:
>>>           if (vector >= msix_nr_vectors_allocated(dev)) {
>>>               continue;
>>>           }
>>> -        if (k->guest_notifier_mask) {
>>> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>>> +            k->guest_notifier_mask) {
>>>               kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>>>           }
>>>           kvm_virtio_pci_vq_vector_release(proxy, vector);
>>> @@ -849,7 +851,8 @@ static void kvm_virtio_pci_vector_release(VirtIOPCIProxy *proxy, int nvqs)
>>>           /* If guest supports masking, clean up irqfd now.
>>>            * Otherwise, it was cleaned when masked in the frontend.
>>>            */
>>> -        if (k->guest_notifier_mask) {
>>> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>>> +            k->guest_notifier_mask) {
>>>               kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>>>           }
>>>           kvm_virtio_pci_vq_vector_release(proxy, vector);
>>> @@ -882,7 +885,8 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
>>>       /* If guest supports masking, irqfd is already setup, unmask it.
>>>        * Otherwise, set it up now.
>>>        */
>>> -    if (k->guest_notifier_mask) {
>>> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>>> +        k->guest_notifier_mask) {
>>>           k->guest_notifier_mask(vdev, queue_no, false);
>>>           /* Test after unmasking to avoid losing events. */
>>>           if (k->guest_notifier_pending &&
>>> @@ -905,7 +909,8 @@ static void virtio_pci_vq_vector_mask(VirtIOPCIProxy *proxy,
>>>       /* If guest supports masking, keep irqfd but mask it.
>>>        * Otherwise, clean it up now.
>>>        */
>>> -    if (k->guest_notifier_mask) {
>>> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>>> +        k->guest_notifier_mask) {
>>>           k->guest_notifier_mask(vdev, queue_no, true);
>>>       } else {
>>>           kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>>> @@ -1022,7 +1027,9 @@ static int virtio_pci_set_guest_notifier(DeviceState *d, int n, bool assign,
>>>           event_notifier_cleanup(notifier);
>>>       }
>>>
>>> -    if (!msix_enabled(&proxy->pci_dev) && vdc->guest_notifier_mask) {
>>> +    if (!msix_enabled(&proxy->pci_dev) &&
>>> +        (proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>>> +        vdc->guest_notifier_mask) {
>>>           vdc->guest_notifier_mask(vdev, n, !assign);
>>>       }
>>>
>>> @@ -1164,6 +1171,8 @@ static void virtio_9p_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>>>   static Property virtio_9p_pci_properties[] = {
>>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>>>       DEFINE_PROP_END_OF_LIST(),
>>>   };
>>> @@ -1908,6 +1917,8 @@ static Property virtio_blk_pci_properties[] = {
>>>       DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>>>       DEFINE_PROP_END_OF_LIST(),
>>>   };
>>> @@ -1961,6 +1972,8 @@ static const TypeInfo virtio_blk_pci_info = {
>>>   static Property virtio_scsi_pci_properties[] = {
>>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
>>>                          DEV_NVECTORS_UNSPECIFIED),
>>>       DEFINE_PROP_END_OF_LIST(),
>>> @@ -2175,6 +2188,8 @@ static void virtio_serial_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>>>   static Property virtio_serial_pci_properties[] = {
>>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>>>       DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>>>       DEFINE_PROP_END_OF_LIST(),
>>> @@ -2215,6 +2230,8 @@ static const TypeInfo virtio_serial_pci_info = {
>>>   static Property virtio_net_properties[] = {
>>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, false),
>>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
>>>       DEFINE_PROP_END_OF_LIST(),
>>>   };
>>> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
>>> index ffb74bb..aecd4eb 100644
>>> --- a/hw/virtio/virtio-pci.h
>>> +++ b/hw/virtio/virtio-pci.h
>>> @@ -86,6 +86,12 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
>>>   #define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY \
>>>       (1 << VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT)
>>>
>>> +/* Where vhost-user implementation exists, using the guest notifier mask
>>> + * feature can lead to improper interrupt management. Add a flag to
>>> + * allow to disable this guest notifier mask if desired */
>>> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT 6
>>> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK (1 << VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT)
>>> +
>>>   typedef struct {
>>>       MSIMessage msg;
>>>       int virq;
>>> --
>>> 2.1.4
>>>
>>>
>>
>>
>> --
>> Marc-André Lureau

Hi marc-André,

thanks for reviewing.

> I am not sure it's necessary to have a user-visible property,

I make this a user visible property, because I thought it should perhaps be
   useful to have the ability to force this behaviour on some non-pci 
bus as well,
   (since the patch that forces the option only applies to PCI buses) 
and I have
   no knowledge of non PCI buses in qemu or if vhost-user can be used on 
such
   buses.

Didier

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2015-12-07 13:31   ` Marc-André Lureau
@ 2015-12-09 15:59     ` Victor Kaplansky
  2015-12-09 17:06       ` Didier Pallard
  0 siblings, 1 reply; 33+ messages in thread
From: Victor Kaplansky @ 2015-12-09 15:59 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Thibaut Collet, Jean-Mickael Guerin, QEMU, Didier Pallard,
	Michael S. Tsirkin

On Mon, Dec 07, 2015 at 02:31:36PM +0100, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Dec 3, 2015 at 10:53 AM, Didier Pallard
> <didier.pallard@6wind.com> wrote:
> > unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> > is used to send a message and retries as long as EAGAIN errno is set,
> > but write_msgfds buffer is freed after first EAGAIN failure, causing
> > message to be sent without proper fds attachment.
> >
> > In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> > user responsability to resend message as is or to free write_msgfds
> > using set_msgfds(0)
> >
> > Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> > Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> > ---
> >  qemu-char.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/qemu-char.c b/qemu-char.c
> > index 5448b0f..26d5f2e 100644
> > --- a/qemu-char.c
> > +++ b/qemu-char.c
> > @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> >          r = sendmsg(s->fd, &msgh, 0);
> >      } while (r < 0 && errno == EINTR);
> >
> > +    /* Ancillary data are not sent if no byte is written
> > +     * so don't free msgfds buffer if return value is EAGAIN
> > +     * If called from qemu_chr_fe_write_all retry will come soon
> > +     * If called from qemu_chr_fe_write, it is the user responsibility
> > +     * to resend message or free fds using set_msgfds(0)
> > +     */
> > +    if (r < 0 && errno == EAGAIN) {
> > +        return r;
> > +    }
> > +
> 
> This looks reasonable to me. However, I don't know what happens with
> partial write of ancillary data. Hopefully it's all or nothing.
> Apparently, reading unix_stream_sendmsg() in kernel shows that as long
> as a few bytes have been sent, the ancillary data is sent. So it looks
> like it still does the right thing in case of a partial write.

If I may put my two cents in, it looks to me very similar to an
fd leakage on back-end side. When a new set_call_fd request
arrives, it is very easy to forget closing the previous file
descriptor. As result, if interrupts are actively maksed/unmasked
by the guest, the back-end can easily reach maximum fds, which
will cause receiving side silently drop new fds in aux data.
--Victor

> 
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> >      /* free the written msgfds, no matter what */
> >      if (s->write_msgfds_num) {
> >          g_free(s->write_msgfds);
> > --
> > 2.1.4
> >
> >
> 
> 
> 
> -- 
> Marc-André Lureau
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2015-12-09 15:59     ` Victor Kaplansky
@ 2015-12-09 17:06       ` Didier Pallard
  2015-12-10 12:56         ` Victor Kaplansky
  0 siblings, 1 reply; 33+ messages in thread
From: Didier Pallard @ 2015-12-09 17:06 UTC (permalink / raw)
  To: Victor Kaplansky, Marc-André Lureau
  Cc: Thibaut Collet, Jean-Mickael Guerin, pbonzini, QEMU, Michael S. Tsirkin

On 12/09/2015 04:59 PM, Victor Kaplansky wrote:
> On Mon, Dec 07, 2015 at 02:31:36PM +0100, Marc-André Lureau wrote:
>> Hi
>>
>> On Thu, Dec 3, 2015 at 10:53 AM, Didier Pallard
>> <didier.pallard@6wind.com> wrote:
>>> unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
>>> is used to send a message and retries as long as EAGAIN errno is set,
>>> but write_msgfds buffer is freed after first EAGAIN failure, causing
>>> message to be sent without proper fds attachment.
>>>
>>> In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
>>> user responsability to resend message as is or to free write_msgfds
>>> using set_msgfds(0)
>>>
>>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
>>> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
>>> ---
>>>   qemu-char.c | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/qemu-char.c b/qemu-char.c
>>> index 5448b0f..26d5f2e 100644
>>> --- a/qemu-char.c
>>> +++ b/qemu-char.c
>>> @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
>>>           r = sendmsg(s->fd, &msgh, 0);
>>>       } while (r < 0 && errno == EINTR);
>>>
>>> +    /* Ancillary data are not sent if no byte is written
>>> +     * so don't free msgfds buffer if return value is EAGAIN
>>> +     * If called from qemu_chr_fe_write_all retry will come soon
>>> +     * If called from qemu_chr_fe_write, it is the user responsibility
>>> +     * to resend message or free fds using set_msgfds(0)
>>> +     */
>>> +    if (r < 0 && errno == EAGAIN) {
>>> +        return r;
>>> +    }
>>> +
>>
>> This looks reasonable to me. However, I don't know what happens with
>> partial write of ancillary data. Hopefully it's all or nothing.
>> Apparently, reading unix_stream_sendmsg() in kernel shows that as long
>> as a few bytes have been sent, the ancillary data is sent. So it looks
>> like it still does the right thing in case of a partial write.
>
> If I may put my two cents in, it looks to me very similar to an
> fd leakage on back-end side. When a new set_call_fd request
> arrives, it is very easy to forget closing the previous file
> descriptor. As result, if interrupts are actively maksed/unmasked
> by the guest, the back-end can easily reach maximum fds, which
> will cause receiving side silently drop new fds in aux data.
> --Victor
>

Hi victor,
This is not a problem of fd exausted. This was my first axe of 
investigation, but fd management is correct in our vhost-user backend, 
there is no fd leakage.
And i guess you are refering to the problem fixed by patches 2 and 3, 
since the problem corrected by this patch is a message arriving from 
qemu without ancillary data, whatever the state of the fds in the 
vhost-user backend.

thanks
didier

>>
>> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>>
>>>       /* free the written msgfds, no matter what */
>>>       if (s->write_msgfds_num) {
>>>           g_free(s->write_msgfds);
>>> --
>>> 2.1.4
>>>
>>>
>>
>>
>>
>> --
>> Marc-André Lureau
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2015-12-09 17:06       ` Didier Pallard
@ 2015-12-10 12:56         ` Victor Kaplansky
  2015-12-10 15:09           ` Didier Pallard
  0 siblings, 1 reply; 33+ messages in thread
From: Victor Kaplansky @ 2015-12-10 12:56 UTC (permalink / raw)
  To: Didier Pallard
  Cc: Michael S. Tsirkin, Thibaut Collet, Jean-Mickael Guerin, QEMU,
	Marc-André Lureau, pbonzini

On Wed, Dec 09, 2015 at 06:06:06PM +0100, Didier Pallard wrote:
> On 12/09/2015 04:59 PM, Victor Kaplansky wrote:
> >On Mon, Dec 07, 2015 at 02:31:36PM +0100, Marc-André Lureau wrote:
> >>Hi
> >>
> >>On Thu, Dec 3, 2015 at 10:53 AM, Didier Pallard
> >><didier.pallard@6wind.com> wrote:
> >>>unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> >>>is used to send a message and retries as long as EAGAIN errno is set,
> >>>but write_msgfds buffer is freed after first EAGAIN failure, causing
> >>>message to be sent without proper fds attachment.
> >>>
> >>>In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> >>>user responsability to resend message as is or to free write_msgfds
> >>>using set_msgfds(0)
> >>>
> >>>Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> >>>Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> >>>---
> >>>  qemu-char.c | 10 ++++++++++
> >>>  1 file changed, 10 insertions(+)
> >>>
> >>>diff --git a/qemu-char.c b/qemu-char.c
> >>>index 5448b0f..26d5f2e 100644
> >>>--- a/qemu-char.c
> >>>+++ b/qemu-char.c
> >>>@@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> >>>          r = sendmsg(s->fd, &msgh, 0);
> >>>      } while (r < 0 && errno == EINTR);
> >>>
> >>>+    /* Ancillary data are not sent if no byte is written
> >>>+     * so don't free msgfds buffer if return value is EAGAIN
> >>>+     * If called from qemu_chr_fe_write_all retry will come soon
> >>>+     * If called from qemu_chr_fe_write, it is the user responsibility
> >>>+     * to resend message or free fds using set_msgfds(0)
> >>>+     */
> >>>+    if (r < 0 && errno == EAGAIN) {
> >>>+        return r;
> >>>+    }
> >>>+
> >>
> >>This looks reasonable to me. However, I don't know what happens with
> >>partial write of ancillary data. Hopefully it's all or nothing.
> >>Apparently, reading unix_stream_sendmsg() in kernel shows that as long
> >>as a few bytes have been sent, the ancillary data is sent. So it looks
> >>like it still does the right thing in case of a partial write.
> >
> >If I may put my two cents in, it looks to me very similar to an
> >fd leakage on back-end side. When a new set_call_fd request
> >arrives, it is very easy to forget closing the previous file
> >descriptor. As result, if interrupts are actively maksed/unmasked
> >by the guest, the back-end can easily reach maximum fds, which
> >will cause receiving side silently drop new fds in aux data.
> >--Victor
> >
> 
> Hi victor,
> This is not a problem of fd exausted. This was my first axe of
> investigation, but fd management is correct in our vhost-user backend, there
> is no fd leakage.

That's good.

> And i guess you are refering to the problem fixed by patches 2 and 3, since
> the problem corrected by this patch is a message arriving from qemu without
> ancillary data, whatever the state of the fds in the vhost-user backend.

I'm talking about the problem that supposed to be fixed by the
first patch. It is not clear to me how the patch fixes the
partial send. sendmsg() is called in qemu-char.c:unix_send_msgfds
with zero flags, which means a blocking operation, so I'm
surprised that sendmsg can return with errno == EAGAIN. 

> 
> thanks
> didier
> 
> >>
> >>Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> >>
> >>>      /* free the written msgfds, no matter what */
> >>>      if (s->write_msgfds_num) {
> >>>          g_free(s->write_msgfds);
> >>>--
> >>>2.1.4
> >>>
> >>>
> >>
> >>
> >>
> >>--
> >>Marc-André Lureau
> >>
> 
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2015-12-10 12:56         ` Victor Kaplansky
@ 2015-12-10 15:09           ` Didier Pallard
  2015-12-17 14:41             ` Victor Kaplansky
  0 siblings, 1 reply; 33+ messages in thread
From: Didier Pallard @ 2015-12-10 15:09 UTC (permalink / raw)
  To: Victor Kaplansky
  Cc: Michael S. Tsirkin, Thibaut Collet, Jean-Mickael Guerin, QEMU,
	Marc-André Lureau, pbonzini

On 12/10/2015 01:56 PM, Victor Kaplansky wrote:
> On Wed, Dec 09, 2015 at 06:06:06PM +0100, Didier Pallard wrote:
>> On 12/09/2015 04:59 PM, Victor Kaplansky wrote:
>>> On Mon, Dec 07, 2015 at 02:31:36PM +0100, Marc-André Lureau wrote:
>>>> Hi
>>>>
>>>> On Thu, Dec 3, 2015 at 10:53 AM, Didier Pallard
>>>> <didier.pallard@6wind.com> wrote:
>>>>> unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
>>>>> is used to send a message and retries as long as EAGAIN errno is set,
>>>>> but write_msgfds buffer is freed after first EAGAIN failure, causing
>>>>> message to be sent without proper fds attachment.
>>>>>
>>>>> In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
>>>>> user responsability to resend message as is or to free write_msgfds
>>>>> using set_msgfds(0)
>>>>>
>>>>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
>>>>> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
>>>>> ---
>>>>>   qemu-char.c | 10 ++++++++++
>>>>>   1 file changed, 10 insertions(+)
>>>>>
>>>>> diff --git a/qemu-char.c b/qemu-char.c
>>>>> index 5448b0f..26d5f2e 100644
>>>>> --- a/qemu-char.c
>>>>> +++ b/qemu-char.c
>>>>> @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
>>>>>           r = sendmsg(s->fd, &msgh, 0);
>>>>>       } while (r < 0 && errno == EINTR);
>>>>>
>>>>> +    /* Ancillary data are not sent if no byte is written
>>>>> +     * so don't free msgfds buffer if return value is EAGAIN
>>>>> +     * If called from qemu_chr_fe_write_all retry will come soon
>>>>> +     * If called from qemu_chr_fe_write, it is the user responsibility
>>>>> +     * to resend message or free fds using set_msgfds(0)
>>>>> +     */
>>>>> +    if (r < 0 && errno == EAGAIN) {
>>>>> +        return r;
>>>>> +    }
>>>>> +
>>>>
>>>> This looks reasonable to me. However, I don't know what happens with
>>>> partial write of ancillary data. Hopefully it's all or nothing.
>>>> Apparently, reading unix_stream_sendmsg() in kernel shows that as long
>>>> as a few bytes have been sent, the ancillary data is sent. So it looks
>>>> like it still does the right thing in case of a partial write.
>>>
>>> If I may put my two cents in, it looks to me very similar to an
>>> fd leakage on back-end side. When a new set_call_fd request
>>> arrives, it is very easy to forget closing the previous file
>>> descriptor. As result, if interrupts are actively maksed/unmasked
>>> by the guest, the back-end can easily reach maximum fds, which
>>> will cause receiving side silently drop new fds in aux data.
>>> --Victor
>>>
>>
>> Hi victor,
>> This is not a problem of fd exausted. This was my first axe of
>> investigation, but fd management is correct in our vhost-user backend, there
>> is no fd leakage.
>
> That's good.
>
>> And i guess you are refering to the problem fixed by patches 2 and 3, since
>> the problem corrected by this patch is a message arriving from qemu without
>> ancillary data, whatever the state of the fds in the vhost-user backend.
>
> I'm talking about the problem that supposed to be fixed by the
> first patch. It is not clear to me how the patch fixes the
> partial send. sendmsg() is called in qemu-char.c:unix_send_msgfds
> with zero flags, which means a blocking operation, so I'm
> surprised that sendmsg can return with errno == EAGAIN.
>

Well, vhost-user socket is started with following chardev:
-chardev socket,id=vhostuserchr0,path=/tmp/vhost_sock0,server
and according to code in tcp_chr_add_client:
static int tcp_chr_add_client(CharDriverState *chr, int fd)
{
...
     qemu_set_nonblock(fd);

So fd is set in non blocking mode. This is enough to have an
EAGAIN returned value on socket buffer full, whatever flags used in 
sendmsg, i think.
Perhaps changing the blocking mode here may also correct the first 
problem, but I am not able to measure the impact that may have such a 
modification...


>>
>> thanks
>> didier
>>
>>>>
>>>> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>>>>
>>>>>       /* free the written msgfds, no matter what */
>>>>>       if (s->write_msgfds_num) {
>>>>>           g_free(s->write_msgfds);
>>>>> --
>>>>> 2.1.4
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Marc-André Lureau
>>>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2015-12-10 15:09           ` Didier Pallard
@ 2015-12-17 14:41             ` Victor Kaplansky
  0 siblings, 0 replies; 33+ messages in thread
From: Victor Kaplansky @ 2015-12-17 14:41 UTC (permalink / raw)
  To: Didier Pallard
  Cc: Michael S. Tsirkin, Thibaut Collet, Jean-Mickael Guerin, QEMU,
	Marc-André Lureau, pbonzini

On Thu, Dec 10, 2015 at 04:09:23PM +0100, Didier Pallard wrote:
> On 12/10/2015 01:56 PM, Victor Kaplansky wrote:
> >On Wed, Dec 09, 2015 at 06:06:06PM +0100, Didier Pallard wrote:
> >>On 12/09/2015 04:59 PM, Victor Kaplansky wrote:
> >>>On Mon, Dec 07, 2015 at 02:31:36PM +0100, Marc-André Lureau wrote:
> >>>>Hi
> >>>>
> >>>>On Thu, Dec 3, 2015 at 10:53 AM, Didier Pallard
> >>>><didier.pallard@6wind.com> wrote:
> >>>>>unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> >>>>>is used to send a message and retries as long as EAGAIN errno is set,
> >>>>>but write_msgfds buffer is freed after first EAGAIN failure, causing
> >>>>>message to be sent without proper fds attachment.
> >>>>>
> >>>>>In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> >>>>>user responsability to resend message as is or to free write_msgfds
> >>>>>using set_msgfds(0)
> >>>>>
> >>>>>Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> >>>>>Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> >>>>>---
> >>>>>  qemu-char.c | 10 ++++++++++
> >>>>>  1 file changed, 10 insertions(+)
> >>>>>
> >>>>>diff --git a/qemu-char.c b/qemu-char.c
> >>>>>index 5448b0f..26d5f2e 100644
> >>>>>--- a/qemu-char.c
> >>>>>+++ b/qemu-char.c
> >>>>>@@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> >>>>>          r = sendmsg(s->fd, &msgh, 0);
> >>>>>      } while (r < 0 && errno == EINTR);
> >>>>>
> >>>>>+    /* Ancillary data are not sent if no byte is written
> >>>>>+     * so don't free msgfds buffer if return value is EAGAIN
> >>>>>+     * If called from qemu_chr_fe_write_all retry will come soon
> >>>>>+     * If called from qemu_chr_fe_write, it is the user responsibility
> >>>>>+     * to resend message or free fds using set_msgfds(0)
> >>>>>+     */
> >>>>>+    if (r < 0 && errno == EAGAIN) {
> >>>>>+        return r;
> >>>>>+    }
> >>>>>+
> >>>>
> >>>>This looks reasonable to me. However, I don't know what happens with
> >>>>partial write of ancillary data. Hopefully it's all or nothing.
> >>>>Apparently, reading unix_stream_sendmsg() in kernel shows that as long
> >>>>as a few bytes have been sent, the ancillary data is sent. So it looks
> >>>>like it still does the right thing in case of a partial write.
> >>>
> >>>If I may put my two cents in, it looks to me very similar to an
> >>>fd leakage on back-end side. When a new set_call_fd request
> >>>arrives, it is very easy to forget closing the previous fil
> >>>descriptor. As result, if interrupts are actively maksed/unmasked
> >>>by the guest, the back-end can easily reach maximum fds, which
> >>>will cause receiving side silently drop new fds in aux data.
> >>>--Victor
> >>>
> >>
> >>Hi victor,
> >>This is not a problem of fd exausted. This was my first axe of
> >>investigation, but fd management is correct in our vhost-user backend, there
> >>is no fd leakage.
> >
> >That's good.
> >
> >>And i guess you are refering to the problem fixed by patches 2 and 3, since
> >>the problem corrected by this patch is a message arriving from qemu without
> >>ancillary data, whatever the state of the fds in the vhost-user backend.
> >
> >I'm talking about the problem that supposed to be fixed by the
> >first patch. It is not clear to me how the patch fixes the
> >partial send. sendmsg() is called in qemu-char.c:unix_send_msgfds
> >with zero flags, which means a blocking operation, so I'm
> >surprised that sendmsg can return with errno == EAGAIN.
> >
> 
> Well, vhost-user socket is started with following chardev:
> -chardev socket,id=vhostuserchr0,path=/tmp/vhost_sock0,server
> and according to code in tcp_chr_add_client:
> static int tcp_chr_add_client(CharDriverState *chr, int fd)
> {
> ...
>     qemu_set_nonblock(fd);
> 
> So fd is set in non blocking mode. This is enough to have an
> EAGAIN returned value on socket buffer full, whatever flags used in sendmsg,
> i think.
> Perhaps changing the blocking mode here may also correct the first problem,
> but I am not able to measure the impact that may have such a modification...

Thanks for the clarification. I was able to reproduce both issues
- with send_msgfds() partial send and lost interrupts using code
based on vhost-user-bridge test application. 
I'm working on a fix for the lost interrupts. Will send an RFC
patch by the Sunday.

-- Victor
> 
> 
> >>
> >>thanks
> >>didier
> >>
> >>>>
> >>>>Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> >>>>
> >>>>>      /* free the written msgfds, no matter what */
> >>>>>      if (s->write_msgfds_num) {
> >>>>>          g_free(s->write_msgfds);
> >>>>>--
> >>>>>2.1.4
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>--
> >>>>Marc-André Lureau
> >>>>
> >>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] Linux vhost-user interrupt management fixes
  2015-12-03  9:53 [Qemu-devel] Linux vhost-user interrupt management fixes Didier Pallard
                   ` (3 preceding siblings ...)
  2015-12-04 10:04 ` [Qemu-devel] Linux vhost-user interrupt management fixes Didier Pallard
@ 2016-01-25  9:22 ` Victor Kaplansky
  2016-01-26  9:25   ` Didier Pallard
  4 siblings, 1 reply; 33+ messages in thread
From: Victor Kaplansky @ 2016-01-25  9:22 UTC (permalink / raw)
  To: Didier Pallard; +Cc: thibaut.collet, jmg, qemu-devel

On Thu, Dec 03, 2015 at 10:53:16AM +0100, Didier Pallard wrote:
> Hi,
> 
> I recently did some stress tests of a vhost-user interface using an UDP
> traffic generator. Traffic generator was connected to 2 physical ports
> that are in turn connected to 2 virtio ports through a linux bridge, VM
> (running linux) doing routing to forward packets between the 2 virtio ports.
> When traffic reaches high pps rates of small packets, I faced the 2 following
> problems:
> 
> - at some time, my qemu socket becomes full, causing qemu to send incomplete
> SET_VRING_CALL messages to vhost-user backend (without proper fd set in
> ancillary data).
> - after some time, some interrupts are lost, causing the VM to stop
> transmitting packets.
> 
> Both problems come from the fact that interrupt masking/unmasking of the VM
> is deferred to vhost-user backend through the linux socket.
> First problem comes from the fact that socket buffer gets full; it is corrected
> in the first patch of the serie.
> Second problem is a bit more complex. From what i understand of the code,
> when VM wants to mask/unmask interrupts, qemu traps the command and sends a
> SET_VRING_CALL to the vhost-user to swap interrupt notifications either to
> a dummy descriptor or to an fd that was given to kvm module to route
> interruption to the VM. After sending SET_VRING_CALL message through
> the socket, VM code continues to run, assuming that the interrupts are now
> masked; but due to linux socket, this message may be buffered and not currently
> treated by the vhost-user backend, ie interrupts are not really masked/unmasked
> as they ought to be.
> I think it can be solved in two different ways:
> - by waiting for an acknowledgement of vhost-user backend before exiting
> interrupt masking function, this ensures that interrupt masking is correctly
> taken into account before giving back hand to the VM, but it has a very high
> cost in term of cycles. Moreover, unless specifying a new option, it will
> break current API, since existing vhost-user implementations do not
> expect a return message on a SET_VRING_CALL call.
> - second way could be, in case vhost-user is involved, to restore the
> initial behaviour of interrupt masking (ie masking/unmasking interrupts
> by taking/giving vring_call fd from/to kernel kvm module).
> Patches 2 and 3 of the serie propose an implementation of this second option.
> 
> Didier
> 

Tested-by: Victor Kaplansky <victork@redhat.com>    

> Didier Pallard (3):
>   char: fix vhost-user socket full
>   virtio-pci: add an option to bypass guest_notifier_mask
>   vhost-net: force guest_notifier_mask bypass in vhost-user case
> 
>  hw/net/vhost_net.c     | 19 ++++++++++++++++++-
>  hw/virtio/vhost.c      | 13 +++++++++++++
>  hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
>  hw/virtio/virtio-pci.h |  6 ++++++
>  qemu-char.c            | 10 ++++++++++
>  5 files changed, 70 insertions(+), 7 deletions(-)
> 
> -- 
> 2.1.4
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] Linux vhost-user interrupt management fixes
  2016-01-25  9:22 ` Victor Kaplansky
@ 2016-01-26  9:25   ` Didier Pallard
  0 siblings, 0 replies; 33+ messages in thread
From: Didier Pallard @ 2016-01-26  9:25 UTC (permalink / raw)
  To: Victor Kaplansky; +Cc: thibaut.collet, jmg, qemu-devel

On 01/25/2016 10:22 AM, Victor Kaplansky wrote:
> On Thu, Dec 03, 2015 at 10:53:16AM +0100, Didier Pallard wrote:
>> Hi,
>>
>> I recently did some stress tests of a vhost-user interface using an UDP
>> traffic generator. Traffic generator was connected to 2 physical ports
>> that are in turn connected to 2 virtio ports through a linux bridge, VM
>> (running linux) doing routing to forward packets between the 2 virtio ports.
>> When traffic reaches high pps rates of small packets, I faced the 2 following
>> problems:
>>
>> - at some time, my qemu socket becomes full, causing qemu to send incomplete
>> SET_VRING_CALL messages to vhost-user backend (without proper fd set in
>> ancillary data).
>> - after some time, some interrupts are lost, causing the VM to stop
>> transmitting packets.
>>
>> Both problems come from the fact that interrupt masking/unmasking of the VM
>> is deferred to vhost-user backend through the linux socket.
>> First problem comes from the fact that socket buffer gets full; it is corrected
>> in the first patch of the serie.
>> Second problem is a bit more complex. From what i understand of the code,
>> when VM wants to mask/unmask interrupts, qemu traps the command and sends a
>> SET_VRING_CALL to the vhost-user to swap interrupt notifications either to
>> a dummy descriptor or to an fd that was given to kvm module to route
>> interruption to the VM. After sending SET_VRING_CALL message through
>> the socket, VM code continues to run, assuming that the interrupts are now
>> masked; but due to linux socket, this message may be buffered and not currently
>> treated by the vhost-user backend, ie interrupts are not really masked/unmasked
>> as they ought to be.
>> I think it can be solved in two different ways:
>> - by waiting for an acknowledgement of vhost-user backend before exiting
>> interrupt masking function, this ensures that interrupt masking is correctly
>> taken into account before giving back hand to the VM, but it has a very high
>> cost in term of cycles. Moreover, unless specifying a new option, it will
>> break current API, since existing vhost-user implementations do not
>> expect a return message on a SET_VRING_CALL call.
>> - second way could be, in case vhost-user is involved, to restore the
>> initial behaviour of interrupt masking (ie masking/unmasking interrupts
>> by taking/giving vring_call fd from/to kernel kvm module).
>> Patches 2 and 3 of the serie propose an implementation of this second option.
>>
>> Didier
>>
> Tested-by: Victor Kaplansky <victork@redhat.com>
>
>> Didier Pallard (3):
>>    char: fix vhost-user socket full
>>    virtio-pci: add an option to bypass guest_notifier_mask
>>    vhost-net: force guest_notifier_mask bypass in vhost-user case
>>
>>   hw/net/vhost_net.c     | 19 ++++++++++++++++++-
>>   hw/virtio/vhost.c      | 13 +++++++++++++
>>   hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
>>   hw/virtio/virtio-pci.h |  6 ++++++
>>   qemu-char.c            | 10 ++++++++++
>>   5 files changed, 70 insertions(+), 7 deletions(-)
>>
>> -- 
>> 2.1.4
>>
>>

thanks for the tests, victor

didier

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] vhost-net: force guest_notifier_mask bypass in vhost-user case
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 3/3] vhost-net: force guest_notifier_mask bypass in vhost-user case Didier Pallard
@ 2016-02-04 13:06   ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2016-02-04 13:06 UTC (permalink / raw)
  To: Didier Pallard; +Cc: thibaut.collet, jmg, qemu-devel

On Thu, Dec 03, 2015 at 10:53:19AM +0100, Didier Pallard wrote:
> Since guest_mask_notifier can not be used in vhost-user
> mode due to buffering implied by unix control socket,
> force VIRTIO_PCI_FLAG_USE_NOTIFIERMASK on virtio pci
> of vhost-user interfaces, and send correct callfd
> to the guest at vhost start.
> 
> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>

I queued this now so we have a bugfix, but I think we
should clean this up using a property and avoid
depending on virtio pci.

> ---
>  hw/net/vhost_net.c | 19 ++++++++++++++++++-
>  hw/virtio/vhost.c  | 13 +++++++++++++
>  2 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 318c3e6..74318dc 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -36,9 +36,11 @@
>  #include <stdio.h>
>  
>  #include "standard-headers/linux/virtio_ring.h"
> +#include "hw/pci/pci.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/virtio-pci.h"
>  
>  struct vhost_net {
>      struct vhost_dev dev;
> @@ -314,7 +316,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>      }
>  
>      for (i = 0; i < total_queues; i++) {
> -        vhost_net_set_vq_index(get_vhost_net(ncs[i].peer), i * 2);
> +        struct vhost_net *net= get_vhost_net(ncs[i].peer);
> +        vhost_net_set_vq_index(net, i * 2);
> +
> +        /* Force VIRTIO_PCI_FLAG_USE_NOTIFIERMASK reset in vhost user case
> +         * Must be done before set_guest_notifier call
> +         */
> +        if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
> +            BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> +            DeviceState *d = DEVICE(qbus->parent);
> +            if (!strcmp(object_get_typename(OBJECT(d)), TYPE_VIRTIO_NET_PCI)) {
> +                VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
> +
> +                /* Force proxy to not use mask notifier */
> +                proxy->flags &= ~VIRTIO_PCI_FLAG_USE_NOTIFIERMASK;
> +            }
> +        }
>      }
>  
>      r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index de29968..7a4c1d3 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -854,8 +854,21 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
>      /* Clear and discard previous events if any. */
>      event_notifier_test_and_clear(&vq->masked_notifier);
>  
> +    /* If vhost user, register now the call eventfd, guest_notifier_mask
> +     * function is not used anymore
> +     */
> +    if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER) {
> +        file.fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vvq));
> +        r = dev->vhost_ops->vhost_set_vring_call(dev, &file);
> +        if (r) {
> +            r = -errno;
> +            goto fail_call;
> +        }
> +    }
> +
>      return 0;
>  
> +fail_call:
>  fail_kick:
>  fail_alloc:
>      cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask Didier Pallard
  2015-12-07 13:37   ` Marc-André Lureau
@ 2016-02-04 13:08   ` Michael S. Tsirkin
  2016-02-08 13:24     ` Didier Pallard
  2016-02-15 15:38   ` Victor Kaplansky
  2 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2016-02-04 13:08 UTC (permalink / raw)
  To: Didier Pallard; +Cc: thibaut.collet, jmg, qemu-devel

On Thu, Dec 03, 2015 at 10:53:18AM +0100, Didier Pallard wrote:
> Using guest_notifier_mask function in vhost-user case may
> break interrupt mask paradigm, because mask/unmask is not
> really done when returning from guest_notifier_mask call, instead
> message is posted in a unix socket, and processed later.
> 
> Add an option bit to disable the use of guest_notifier_mask
> in virtio pci.
> 
> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> ---
>  hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
>  hw/virtio/virtio-pci.h |  6 ++++++
>  2 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index dd48562..26bb617 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -806,7 +806,8 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
>          /* If guest supports masking, set up irqfd now.
>           * Otherwise, delay until unmasked in the frontend.
>           */
> -        if (k->guest_notifier_mask) {
> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +            k->guest_notifier_mask) {
>              ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
>              if (ret < 0) {
>                  kvm_virtio_pci_vq_vector_release(proxy, vector);
> @@ -822,7 +823,8 @@ undo:
>          if (vector >= msix_nr_vectors_allocated(dev)) {
>              continue;
>          }
> -        if (k->guest_notifier_mask) {
> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +            k->guest_notifier_mask) {
>              kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>          }
>          kvm_virtio_pci_vq_vector_release(proxy, vector);
> @@ -849,7 +851,8 @@ static void kvm_virtio_pci_vector_release(VirtIOPCIProxy *proxy, int nvqs)
>          /* If guest supports masking, clean up irqfd now.
>           * Otherwise, it was cleaned when masked in the frontend.
>           */
> -        if (k->guest_notifier_mask) {
> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +            k->guest_notifier_mask) {
>              kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>          }
>          kvm_virtio_pci_vq_vector_release(proxy, vector);
> @@ -882,7 +885,8 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
>      /* If guest supports masking, irqfd is already setup, unmask it.
>       * Otherwise, set it up now.
>       */
> -    if (k->guest_notifier_mask) {
> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +        k->guest_notifier_mask) {
>          k->guest_notifier_mask(vdev, queue_no, false);
>          /* Test after unmasking to avoid losing events. */
>          if (k->guest_notifier_pending &&
> @@ -905,7 +909,8 @@ static void virtio_pci_vq_vector_mask(VirtIOPCIProxy *proxy,
>      /* If guest supports masking, keep irqfd but mask it.
>       * Otherwise, clean it up now.
>       */ 
> -    if (k->guest_notifier_mask) {
> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +        k->guest_notifier_mask) {
>          k->guest_notifier_mask(vdev, queue_no, true);
>      } else {
>          kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
> @@ -1022,7 +1027,9 @@ static int virtio_pci_set_guest_notifier(DeviceState *d, int n, bool assign,
>          event_notifier_cleanup(notifier);
>      }
>  
> -    if (!msix_enabled(&proxy->pci_dev) && vdc->guest_notifier_mask) {
> +    if (!msix_enabled(&proxy->pci_dev) &&
> +        (proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +        vdc->guest_notifier_mask) {
>          vdc->guest_notifier_mask(vdev, n, !assign);
>      }
>  
> @@ -1164,6 +1171,8 @@ static void virtio_9p_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>  static Property virtio_9p_pci_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> @@ -1908,6 +1917,8 @@ static Property virtio_blk_pci_properties[] = {
>      DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> @@ -1961,6 +1972,8 @@ static const TypeInfo virtio_blk_pci_info = {
>  static Property virtio_scsi_pci_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
>                         DEV_NVECTORS_UNSPECIFIED),
>      DEFINE_PROP_END_OF_LIST(),
> @@ -2175,6 +2188,8 @@ static void virtio_serial_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>  static Property virtio_serial_pci_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>      DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>      DEFINE_PROP_END_OF_LIST(),
> @@ -2215,6 +2230,8 @@ static const TypeInfo virtio_serial_pci_info = {
>  static Property virtio_net_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, false),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
>      DEFINE_PROP_END_OF_LIST(),
>  };

I think we should rename this x-use-notifier mask.
And I'd rather drop it from all devices where it's not relevant.


> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index ffb74bb..aecd4eb 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -86,6 +86,12 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
>  #define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY \
>      (1 << VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT)
>  
> +/* Where vhost-user implementation exists, using the guest notifier mask
> + * feature can lead to improper interrupt management. Add a flag to
> + * allow to disable this guest notifier mask if desired */
> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT 6
> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK (1 << VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT)
> +
>  typedef struct {
>      MSIMessage msg;
>      int virq;
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full Didier Pallard
  2015-12-07 13:31   ` Marc-André Lureau
@ 2016-02-04 13:13   ` Michael S. Tsirkin
  2016-02-04 14:10   ` Michael S. Tsirkin
  2 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2016-02-04 13:13 UTC (permalink / raw)
  To: Didier Pallard; +Cc: thibaut.collet, jmg, qemu-devel

On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
> unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> is used to send a message and retries as long as EAGAIN errno is set,
> but write_msgfds buffer is freed after first EAGAIN failure, causing
> message to be sent without proper fds attachment.
> 
> In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> user responsability to resend message as is or to free write_msgfds
> using set_msgfds(0)
> 
> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>

I don't see any callers actually retrying though.
Where is it?

> ---
>  qemu-char.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/qemu-char.c b/qemu-char.c
> index 5448b0f..26d5f2e 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
>          r = sendmsg(s->fd, &msgh, 0);
>      } while (r < 0 && errno == EINTR);
>  
> +    /* Ancillary data are not sent if no byte is written
> +     * so don't free msgfds buffer if return value is EAGAIN
> +     * If called from qemu_chr_fe_write_all retry will come soon
> +     * If called from qemu_chr_fe_write, it is the user responsibility
> +     * to resend message or free fds using set_msgfds(0)
> +     */
> +    if (r < 0 && errno == EAGAIN) {
> +        return r;
> +    }
> +
>      /* free the written msgfds, no matter what */
>      if (s->write_msgfds_num) {
>          g_free(s->write_msgfds);
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full Didier Pallard
  2015-12-07 13:31   ` Marc-André Lureau
  2016-02-04 13:13   ` Michael S. Tsirkin
@ 2016-02-04 14:10   ` Michael S. Tsirkin
  2016-02-08 13:12     ` Didier Pallard
  2016-02-09 11:48     ` Daniel P. Berrange
  2 siblings, 2 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2016-02-04 14:10 UTC (permalink / raw)
  To: Didier Pallard; +Cc: thibaut.collet, jmg, qemu-devel

On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
> unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> is used to send a message and retries as long as EAGAIN errno is set,
> but write_msgfds buffer is freed after first EAGAIN failure, causing
> message to be sent without proper fds attachment.
> 
> In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> user responsability to resend message as is or to free write_msgfds
> using set_msgfds(0)
> 
> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> ---
>  qemu-char.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/qemu-char.c b/qemu-char.c
> index 5448b0f..26d5f2e 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
>          r = sendmsg(s->fd, &msgh, 0);
>      } while (r < 0 && errno == EINTR);
>  
> +    /* Ancillary data are not sent if no byte is written
> +     * so don't free msgfds buffer if return value is EAGAIN
> +     * If called from qemu_chr_fe_write_all retry will come soon
> +     * If called from qemu_chr_fe_write, it is the user responsibility
> +     * to resend message or free fds using set_msgfds(0)

Problem is, if ever anyone tries to send messages
with qemu_chr_fe_write and does not retry, there will
be a memory leak.

Rather than this, how about adding an assert in qemu_chr_fe_write
to make sure its not used with msgfds?


> +     */
> +    if (r < 0 && errno == EAGAIN) {
> +        return r;
> +    }
> +
>      /* free the written msgfds, no matter what */
>      if (s->write_msgfds_num) {
>          g_free(s->write_msgfds);
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-04 14:10   ` Michael S. Tsirkin
@ 2016-02-08 13:12     ` Didier Pallard
  2016-02-09 11:37       ` Michael S. Tsirkin
  2016-02-09 11:48     ` Daniel P. Berrange
  1 sibling, 1 reply; 33+ messages in thread
From: Didier Pallard @ 2016-02-08 13:12 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: thibaut.collet, jmg, qemu-devel

On 02/04/2016 03:10 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
>> unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
>> is used to send a message and retries as long as EAGAIN errno is set,
>> but write_msgfds buffer is freed after first EAGAIN failure, causing
>> message to be sent without proper fds attachment.
>>
>> In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
>> user responsability to resend message as is or to free write_msgfds
>> using set_msgfds(0)
>>
>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
>> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
>> ---
>>   qemu-char.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/qemu-char.c b/qemu-char.c
>> index 5448b0f..26d5f2e 100644
>> --- a/qemu-char.c
>> +++ b/qemu-char.c
>> @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
>>           r = sendmsg(s->fd, &msgh, 0);
>>       } while (r < 0 && errno == EINTR);
>>   
>> +    /* Ancillary data are not sent if no byte is written
>> +     * so don't free msgfds buffer if return value is EAGAIN
>> +     * If called from qemu_chr_fe_write_all retry will come soon
>> +     * If called from qemu_chr_fe_write, it is the user responsibility
>> +     * to resend message or free fds using set_msgfds(0)
> Problem is, if ever anyone tries to send messages
> with qemu_chr_fe_write and does not retry, there will
> be a memory leak.
>
> Rather than this, how about adding an assert in qemu_chr_fe_write
> to make sure its not used with msgfds?

Well, in qemu_chr_fe_write, all i have is a CharDriverState.
Information on number of fds to pass are located in TCPCharDriver structure.

should i assert as soon as a set_msgfds method is set:
assert(s->set_msgfds == NULL);
but it may trigger on unix sockets were message fds are never used.
or try to detect the implementation and really check the number of fds 
passed:
assert((s->chr_write != tcp_chr_write) && ((TCPCharDriver 
*)s->opaque)->write_msgfds_num == 0))
but i didn't find better than checking a method to detect type of 
underlying driver and I don't like this kind of code.

Another solution should be to add a new method to get pending fds 
number, assert would become:
assert((s->get_write_msgfds_num == NULL) || (s->get_write_msgfds_num(s) 
== 0))

s->get_write_msgfds_num being set only for tcp_ socket, with a simple 
function:

tcp_chr_get_write_msgfds_num(CharDriverState *chr)
{
     TCPCharDriver *s = chr->opaque;
     return s->write_msgfds_num;
}

...
s->get_write_msgfds_num = tcp_chr_get_write_msgfds_num;
...

?

>
>> +     */
>> +    if (r < 0 && errno == EAGAIN) {
>> +        return r;
>> +    }
>> +
>>       /* free the written msgfds, no matter what */
>>       if (s->write_msgfds_num) {
>>           g_free(s->write_msgfds);
>> -- 
>> 2.1.4
>>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask
  2016-02-04 13:08   ` Michael S. Tsirkin
@ 2016-02-08 13:24     ` Didier Pallard
  0 siblings, 0 replies; 33+ messages in thread
From: Didier Pallard @ 2016-02-08 13:24 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: thibaut.collet, jmg, qemu-devel

On 02/04/2016 02:08 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2015 at 10:53:18AM +0100, Didier Pallard wrote:
>> Using guest_notifier_mask function in vhost-user case may
>> break interrupt mask paradigm, because mask/unmask is not
>> really done when returning from guest_notifier_mask call, instead
>> message is posted in a unix socket, and processed later.
>>
>> Add an option bit to disable the use of guest_notifier_mask
>> in virtio pci.
>>
>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
>> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
>> ---
>>   hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
>>   hw/virtio/virtio-pci.h |  6 ++++++
>>   2 files changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>> index dd48562..26bb617 100644
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -806,7 +806,8 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
>>           /* If guest supports masking, set up irqfd now.
>>            * Otherwise, delay until unmasked in the frontend.
>>            */
>> -        if (k->guest_notifier_mask) {
>> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +            k->guest_notifier_mask) {
>>               ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
>>               if (ret < 0) {
>>                   kvm_virtio_pci_vq_vector_release(proxy, vector);
>> @@ -822,7 +823,8 @@ undo:
>>           if (vector >= msix_nr_vectors_allocated(dev)) {
>>               continue;
>>           }
>> -        if (k->guest_notifier_mask) {
>> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +            k->guest_notifier_mask) {
>>               kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>>           }
>>           kvm_virtio_pci_vq_vector_release(proxy, vector);
>> @@ -849,7 +851,8 @@ static void kvm_virtio_pci_vector_release(VirtIOPCIProxy *proxy, int nvqs)
>>           /* If guest supports masking, clean up irqfd now.
>>            * Otherwise, it was cleaned when masked in the frontend.
>>            */
>> -        if (k->guest_notifier_mask) {
>> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +            k->guest_notifier_mask) {
>>               kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>>           }
>>           kvm_virtio_pci_vq_vector_release(proxy, vector);
>> @@ -882,7 +885,8 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
>>       /* If guest supports masking, irqfd is already setup, unmask it.
>>        * Otherwise, set it up now.
>>        */
>> -    if (k->guest_notifier_mask) {
>> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +        k->guest_notifier_mask) {
>>           k->guest_notifier_mask(vdev, queue_no, false);
>>           /* Test after unmasking to avoid losing events. */
>>           if (k->guest_notifier_pending &&
>> @@ -905,7 +909,8 @@ static void virtio_pci_vq_vector_mask(VirtIOPCIProxy *proxy,
>>       /* If guest supports masking, keep irqfd but mask it.
>>        * Otherwise, clean it up now.
>>        */
>> -    if (k->guest_notifier_mask) {
>> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +        k->guest_notifier_mask) {
>>           k->guest_notifier_mask(vdev, queue_no, true);
>>       } else {
>>           kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>> @@ -1022,7 +1027,9 @@ static int virtio_pci_set_guest_notifier(DeviceState *d, int n, bool assign,
>>           event_notifier_cleanup(notifier);
>>       }
>>   
>> -    if (!msix_enabled(&proxy->pci_dev) && vdc->guest_notifier_mask) {
>> +    if (!msix_enabled(&proxy->pci_dev) &&
>> +        (proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
>> +        vdc->guest_notifier_mask) {
>>           vdc->guest_notifier_mask(vdev, n, !assign);
>>       }
>>   
>> @@ -1164,6 +1171,8 @@ static void virtio_9p_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>>   static Property virtio_9p_pci_properties[] = {
>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>>       DEFINE_PROP_END_OF_LIST(),
>>   };
>> @@ -1908,6 +1917,8 @@ static Property virtio_blk_pci_properties[] = {
>>       DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>>       DEFINE_PROP_END_OF_LIST(),
>>   };
>> @@ -1961,6 +1972,8 @@ static const TypeInfo virtio_blk_pci_info = {
>>   static Property virtio_scsi_pci_properties[] = {
>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
>>                          DEV_NVECTORS_UNSPECIFIED),
>>       DEFINE_PROP_END_OF_LIST(),
>> @@ -2175,6 +2188,8 @@ static void virtio_serial_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>>   static Property virtio_serial_pci_properties[] = {
>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>>       DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>>       DEFINE_PROP_END_OF_LIST(),
>> @@ -2215,6 +2230,8 @@ static const TypeInfo virtio_serial_pci_info = {
>>   static Property virtio_net_properties[] = {
>>       DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>>                       VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, false),
>> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
>> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>>       DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
>>       DEFINE_PROP_END_OF_LIST(),
>>   };
> I think we should rename this x-use-notifier mask.
> And I'd rather drop it from all devices where it's not relevant.
I will rename the option x-use-notifier-mask.
Currently, only vhost-net has a vhost-user implementation, isn't it?
so it is the only place were i need to set the option, that's it?

>
>> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
>> index ffb74bb..aecd4eb 100644
>> --- a/hw/virtio/virtio-pci.h
>> +++ b/hw/virtio/virtio-pci.h
>> @@ -86,6 +86,12 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
>>   #define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY \
>>       (1 << VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT)
>>   
>> +/* Where vhost-user implementation exists, using the guest notifier mask
>> + * feature can lead to improper interrupt management. Add a flag to
>> + * allow to disable this guest notifier mask if desired */
>> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT 6
>> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK (1 << VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT)
>> +
>>   typedef struct {
>>       MSIMessage msg;
>>       int virq;
>> -- 
>> 2.1.4
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-08 13:12     ` Didier Pallard
@ 2016-02-09 11:37       ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2016-02-09 11:37 UTC (permalink / raw)
  To: Didier Pallard; +Cc: thibaut.collet, jmg, qemu-devel

On Mon, Feb 08, 2016 at 02:12:14PM +0100, Didier Pallard wrote:
> On 02/04/2016 03:10 PM, Michael S. Tsirkin wrote:
> >On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
> >>unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> >>is used to send a message and retries as long as EAGAIN errno is set,
> >>but write_msgfds buffer is freed after first EAGAIN failure, causing
> >>message to be sent without proper fds attachment.
> >>
> >>In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> >>user responsability to resend message as is or to free write_msgfds
> >>using set_msgfds(0)
> >>
> >>Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> >>Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> >>---
> >>  qemu-char.c | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >>diff --git a/qemu-char.c b/qemu-char.c
> >>index 5448b0f..26d5f2e 100644
> >>--- a/qemu-char.c
> >>+++ b/qemu-char.c
> >>@@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> >>          r = sendmsg(s->fd, &msgh, 0);
> >>      } while (r < 0 && errno == EINTR);
> >>+    /* Ancillary data are not sent if no byte is written
> >>+     * so don't free msgfds buffer if return value is EAGAIN
> >>+     * If called from qemu_chr_fe_write_all retry will come soon
> >>+     * If called from qemu_chr_fe_write, it is the user responsibility
> >>+     * to resend message or free fds using set_msgfds(0)
> >Problem is, if ever anyone tries to send messages
> >with qemu_chr_fe_write and does not retry, there will
> >be a memory leak.
> >
> >Rather than this, how about adding an assert in qemu_chr_fe_write
> >to make sure its not used with msgfds?
> 
> Well, in qemu_chr_fe_write, all i have is a CharDriverState.
> Information on number of fds to pass are located in TCPCharDriver structure.
> 
> should i assert as soon as a set_msgfds method is set:
> assert(s->set_msgfds == NULL);
> but it may trigger on unix sockets were message fds are never used.

Why? Because this field is never initialized?
Let's initialize it unconditionally then?

> or try to detect the implementation and really check the number of fds
> passed:
> assert((s->chr_write != tcp_chr_write) && ((TCPCharDriver
> *)s->opaque)->write_msgfds_num == 0))
> but i didn't find better than checking a method to detect type of underlying
> driver and I don't like this kind of code.
> 
> Another solution should be to add a new method to get pending fds number,
> assert would become:
> assert((s->get_write_msgfds_num == NULL) || (s->get_write_msgfds_num(s) ==
> 0))
> 
> s->get_write_msgfds_num being set only for tcp_ socket, with a simple
> function:
> 
> tcp_chr_get_write_msgfds_num(CharDriverState *chr)
> {
>     TCPCharDriver *s = chr->opaque;
>     return s->write_msgfds_num;
> }
> 
> ...
> s->get_write_msgfds_num = tcp_chr_get_write_msgfds_num;
> ...
> 
> ?
> 
> >
> >>+     */
> >>+    if (r < 0 && errno == EAGAIN) {
> >>+        return r;
> >>+    }
> >>+
> >>      /* free the written msgfds, no matter what */
> >>      if (s->write_msgfds_num) {
> >>          g_free(s->write_msgfds);
> >>-- 
> >>2.1.4
> >>
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-04 14:10   ` Michael S. Tsirkin
  2016-02-08 13:12     ` Didier Pallard
@ 2016-02-09 11:48     ` Daniel P. Berrange
  2016-02-09 12:21       ` Michael S. Tsirkin
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel P. Berrange @ 2016-02-09 11:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: thibaut.collet, jmg, qemu-devel, Didier Pallard

On Thu, Feb 04, 2016 at 04:10:38PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
> > unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> > is used to send a message and retries as long as EAGAIN errno is set,
> > but write_msgfds buffer is freed after first EAGAIN failure, causing
> > message to be sent without proper fds attachment.
> > 
> > In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> > user responsability to resend message as is or to free write_msgfds
> > using set_msgfds(0)
> > 
> > Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> > Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> > ---
> >  qemu-char.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/qemu-char.c b/qemu-char.c
> > index 5448b0f..26d5f2e 100644
> > --- a/qemu-char.c
> > +++ b/qemu-char.c
> > @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> >          r = sendmsg(s->fd, &msgh, 0);
> >      } while (r < 0 && errno == EINTR);
> >  
> > +    /* Ancillary data are not sent if no byte is written
> > +     * so don't free msgfds buffer if return value is EAGAIN
> > +     * If called from qemu_chr_fe_write_all retry will come soon
> > +     * If called from qemu_chr_fe_write, it is the user responsibility
> > +     * to resend message or free fds using set_msgfds(0)
> 
> Problem is, if ever anyone tries to send messages
> with qemu_chr_fe_write and does not retry, there will
> be a memory leak.
> 
> Rather than this, how about adding an assert in qemu_chr_fe_write
> to make sure its not used with msgfds?

NB, this TCP chardev code has completely changed since this patch
was submitted. It now uses QIOChannel instead of raw sockets APIs.
The same problem still exists though - when we get EAGAIN from
the sendmsg, we're releasing the file descriptors even though
they've not been sent.

Adding an assert in qemu_chr_fe_write() won't solve it - the
same problem still exists in qemu_chr_fe_write_all() - although
that loops to re-try on EAGAIN, the FDs are free'd before it
does the retry.

We need to update tcp_chr_write() to not free the FDs when
io_channel_send_full() returns with errno==EAGAIN, in the
same way this patch was doing.


> > +     */
> > +    if (r < 0 && errno == EAGAIN) {
> > +        return r;
> > +    }
> > +
> >      /* free the written msgfds, no matter what */
> >      if (s->write_msgfds_num) {
> >          g_free(s->write_msgfds);

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-09 11:48     ` Daniel P. Berrange
@ 2016-02-09 12:21       ` Michael S. Tsirkin
  2016-02-09 16:17         ` Didier Pallard
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2016-02-09 12:21 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: thibaut.collet, jmg, qemu-devel, Didier Pallard

On Tue, Feb 09, 2016 at 11:48:13AM +0000, Daniel P. Berrange wrote:
> On Thu, Feb 04, 2016 at 04:10:38PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
> > > unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> > > is used to send a message and retries as long as EAGAIN errno is set,
> > > but write_msgfds buffer is freed after first EAGAIN failure, causing
> > > message to be sent without proper fds attachment.
> > > 
> > > In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> > > user responsability to resend message as is or to free write_msgfds
> > > using set_msgfds(0)
> > > 
> > > Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> > > Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> > > ---
> > >  qemu-char.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/qemu-char.c b/qemu-char.c
> > > index 5448b0f..26d5f2e 100644
> > > --- a/qemu-char.c
> > > +++ b/qemu-char.c
> > > @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> > >          r = sendmsg(s->fd, &msgh, 0);
> > >      } while (r < 0 && errno == EINTR);
> > >  
> > > +    /* Ancillary data are not sent if no byte is written
> > > +     * so don't free msgfds buffer if return value is EAGAIN
> > > +     * If called from qemu_chr_fe_write_all retry will come soon
> > > +     * If called from qemu_chr_fe_write, it is the user responsibility
> > > +     * to resend message or free fds using set_msgfds(0)
> > 
> > Problem is, if ever anyone tries to send messages
> > with qemu_chr_fe_write and does not retry, there will
> > be a memory leak.
> > 
> > Rather than this, how about adding an assert in qemu_chr_fe_write
> > to make sure its not used with msgfds?
> 
> NB, this TCP chardev code has completely changed since this patch
> was submitted. It now uses QIOChannel instead of raw sockets APIs.
> The same problem still exists though - when we get EAGAIN from
> the sendmsg, we're releasing the file descriptors even though
> they've not been sent.
> 
> Adding an assert in qemu_chr_fe_write() won't solve it - the
> same problem still exists in qemu_chr_fe_write_all() - although
> that loops to re-try on EAGAIN, the FDs are free'd before it
> does the retry.
> 
> We need to update tcp_chr_write() to not free the FDs when
> io_channel_send_full() returns with errno==EAGAIN, in the
> same way this patch was doing.

Absolutely. We need to fix qemu_chr_fe_write_all.
I am just worried that someone tries to use
qemu_chr_fe_write with fds and then we get
a memory leak if qemu_chr_fe_write is not retried.

So I propose we teach qemu_chr_fe_write
that it can't send msg fds.

> 
> > > +     */
> > > +    if (r < 0 && errno == EAGAIN) {
> > > +        return r;
> > > +    }
> > > +
> > >      /* free the written msgfds, no matter what */
> > >      if (s->write_msgfds_num) {
> > >          g_free(s->write_msgfds);
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-09 12:21       ` Michael S. Tsirkin
@ 2016-02-09 16:17         ` Didier Pallard
  2016-02-09 16:50           ` Michael S. Tsirkin
  2016-02-09 17:04           ` Daniel P. Berrange
  0 siblings, 2 replies; 33+ messages in thread
From: Didier Pallard @ 2016-02-09 16:17 UTC (permalink / raw)
  To: Michael S. Tsirkin, Daniel P. Berrange; +Cc: thibaut.collet, jmg, qemu-devel

On 02/09/2016 01:21 PM, Michael S. Tsirkin wrote:
> On Tue, Feb 09, 2016 at 11:48:13AM +0000, Daniel P. Berrange wrote:
>> On Thu, Feb 04, 2016 at 04:10:38PM +0200, Michael S. Tsirkin wrote:
>>> On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
>>>> unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
>>>> is used to send a message and retries as long as EAGAIN errno is set,
>>>> but write_msgfds buffer is freed after first EAGAIN failure, causing
>>>> message to be sent without proper fds attachment.
>>>>
>>>> In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
>>>> user responsability to resend message as is or to free write_msgfds
>>>> using set_msgfds(0)
>>>>
>>>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
>>>> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
>>>> ---
>>>>   qemu-char.c | 10 ++++++++++
>>>>   1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/qemu-char.c b/qemu-char.c
>>>> index 5448b0f..26d5f2e 100644
>>>> --- a/qemu-char.c
>>>> +++ b/qemu-char.c
>>>> @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
>>>>           r = sendmsg(s->fd, &msgh, 0);
>>>>       } while (r < 0 && errno == EINTR);
>>>>   
>>>> +    /* Ancillary data are not sent if no byte is written
>>>> +     * so don't free msgfds buffer if return value is EAGAIN
>>>> +     * If called from qemu_chr_fe_write_all retry will come soon
>>>> +     * If called from qemu_chr_fe_write, it is the user responsibility
>>>> +     * to resend message or free fds using set_msgfds(0)
>>> Problem is, if ever anyone tries to send messages
>>> with qemu_chr_fe_write and does not retry, there will
>>> be a memory leak.
>>>
>>> Rather than this, how about adding an assert in qemu_chr_fe_write
>>> to make sure its not used with msgfds?
>> NB, this TCP chardev code has completely changed since this patch
>> was submitted. It now uses QIOChannel instead of raw sockets APIs.
>> The same problem still exists though - when we get EAGAIN from
>> the sendmsg, we're releasing the file descriptors even though
>> they've not been sent.
>>
>> Adding an assert in qemu_chr_fe_write() won't solve it - the
>> same problem still exists in qemu_chr_fe_write_all() - although
>> that loops to re-try on EAGAIN, the FDs are free'd before it
>> does the retry.
>>
>> We need to update tcp_chr_write() to not free the FDs when
>> io_channel_send_full() returns with errno==EAGAIN, in the
>> same way this patch was doing.
> Absolutely. We need to fix qemu_chr_fe_write_all.
> I am just worried that someone tries to use
> qemu_chr_fe_write with fds and then we get
> a memory leak if qemu_chr_fe_write is not retried.
>
> So I propose we teach qemu_chr_fe_write
> that it can't send msg fds.
Patch is easy to port on head version.

My concern with following assert in qemu_chr_fe_write:

assert(s->set_msgfds == NULL);

Is that it may trigger on a tcp/linux socket that never wants
to send message fds but uses qemu_chr_fe_write instead
of qemu_chr_fe_write_all. Are we supposed to always use
qemu_chr_fe_write_all on a tcp/linux socket?

I think that only vhost-user socket is using the ability to send
message fds through socket, but i don't know all places were a linux/tcp
socket can be used through qemu_chr_fe_write, without wanting
to send any fd...

anyway, i can put it in the code, but i don't know how to test that it does
not break in some unexpected cases.

thanks

>>>> +     */
>>>> +    if (r < 0 && errno == EAGAIN) {
>>>> +        return r;
>>>> +    }
>>>> +
>>>>       /* free the written msgfds, no matter what */
>>>>       if (s->write_msgfds_num) {
>>>>           g_free(s->write_msgfds);
>> Regards,
>> Daniel
>> -- 
>> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
>> |: http://libvirt.org              -o-             http://virt-manager.org :|
>> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
>> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-09 16:17         ` Didier Pallard
@ 2016-02-09 16:50           ` Michael S. Tsirkin
  2016-02-09 17:04           ` Daniel P. Berrange
  1 sibling, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2016-02-09 16:50 UTC (permalink / raw)
  To: Didier Pallard; +Cc: thibaut.collet, jmg, qemu-devel

On Tue, Feb 09, 2016 at 05:17:16PM +0100, Didier Pallard wrote:
> On 02/09/2016 01:21 PM, Michael S. Tsirkin wrote:
> >On Tue, Feb 09, 2016 at 11:48:13AM +0000, Daniel P. Berrange wrote:
> >>On Thu, Feb 04, 2016 at 04:10:38PM +0200, Michael S. Tsirkin wrote:
> >>>On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
> >>>>unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> >>>>is used to send a message and retries as long as EAGAIN errno is set,
> >>>>but write_msgfds buffer is freed after first EAGAIN failure, causing
> >>>>message to be sent without proper fds attachment.
> >>>>
> >>>>In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> >>>>user responsability to resend message as is or to free write_msgfds
> >>>>using set_msgfds(0)
> >>>>
> >>>>Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> >>>>Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> >>>>---
> >>>>  qemu-char.c | 10 ++++++++++
> >>>>  1 file changed, 10 insertions(+)
> >>>>
> >>>>diff --git a/qemu-char.c b/qemu-char.c
> >>>>index 5448b0f..26d5f2e 100644
> >>>>--- a/qemu-char.c
> >>>>+++ b/qemu-char.c
> >>>>@@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> >>>>          r = sendmsg(s->fd, &msgh, 0);
> >>>>      } while (r < 0 && errno == EINTR);
> >>>>+    /* Ancillary data are not sent if no byte is written
> >>>>+     * so don't free msgfds buffer if return value is EAGAIN
> >>>>+     * If called from qemu_chr_fe_write_all retry will come soon
> >>>>+     * If called from qemu_chr_fe_write, it is the user responsibility
> >>>>+     * to resend message or free fds using set_msgfds(0)
> >>>Problem is, if ever anyone tries to send messages
> >>>with qemu_chr_fe_write and does not retry, there will
> >>>be a memory leak.
> >>>
> >>>Rather than this, how about adding an assert in qemu_chr_fe_write
> >>>to make sure its not used with msgfds?
> >>NB, this TCP chardev code has completely changed since this patch
> >>was submitted. It now uses QIOChannel instead of raw sockets APIs.
> >>The same problem still exists though - when we get EAGAIN from
> >>the sendmsg, we're releasing the file descriptors even though
> >>they've not been sent.
> >>
> >>Adding an assert in qemu_chr_fe_write() won't solve it - the
> >>same problem still exists in qemu_chr_fe_write_all() - although
> >>that loops to re-try on EAGAIN, the FDs are free'd before it
> >>does the retry.
> >>
> >>We need to update tcp_chr_write() to not free the FDs when
> >>io_channel_send_full() returns with errno==EAGAIN, in the
> >>same way this patch was doing.
> >Absolutely. We need to fix qemu_chr_fe_write_all.
> >I am just worried that someone tries to use
> >qemu_chr_fe_write with fds and then we get
> >a memory leak if qemu_chr_fe_write is not retried.
> >
> >So I propose we teach qemu_chr_fe_write
> >that it can't send msg fds.
> Patch is easy to port on head version.
> 
> My concern with following assert in qemu_chr_fe_write:
> 
> assert(s->set_msgfds == NULL);
> 
> Is that it may trigger on a tcp/linux socket that never wants
> to send message fds but uses qemu_chr_fe_write instead
> of qemu_chr_fe_write_all. Are we supposed to always use
> qemu_chr_fe_write_all on a tcp/linux socket?

No but why will set_msgfds be != NULL if you don't send fds? There's
probably something obvious I'm missing.

> I think that only vhost-user socket is using the ability to send
> message fds through socket, but i don't know all places were a linux/tcp
> socket can be used through qemu_chr_fe_write, without wanting
> to send any fd...
> 
> anyway, i can put it in the code, but i don't know how to test that it does
> not break in some unexpected cases.
> 
> thanks
> 
> >>>>+     */
> >>>>+    if (r < 0 && errno == EAGAIN) {
> >>>>+        return r;
> >>>>+    }
> >>>>+
> >>>>      /* free the written msgfds, no matter what */
> >>>>      if (s->write_msgfds_num) {
> >>>>          g_free(s->write_msgfds);
> >>Regards,
> >>Daniel
> >>-- 
> >>|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> >>|: http://libvirt.org              -o-             http://virt-manager.org :|
> >>|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> >>|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-09 16:17         ` Didier Pallard
  2016-02-09 16:50           ` Michael S. Tsirkin
@ 2016-02-09 17:04           ` Daniel P. Berrange
  2016-02-10  9:35             ` Didier Pallard
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel P. Berrange @ 2016-02-09 17:04 UTC (permalink / raw)
  To: Didier Pallard; +Cc: thibaut.collet, jmg, qemu-devel, Michael S. Tsirkin

On Tue, Feb 09, 2016 at 05:17:16PM +0100, Didier Pallard wrote:
> On 02/09/2016 01:21 PM, Michael S. Tsirkin wrote:
> >On Tue, Feb 09, 2016 at 11:48:13AM +0000, Daniel P. Berrange wrote:
> >>On Thu, Feb 04, 2016 at 04:10:38PM +0200, Michael S. Tsirkin wrote:
> >>>On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
> >>>>unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> >>>>is used to send a message and retries as long as EAGAIN errno is set,
> >>>>but write_msgfds buffer is freed after first EAGAIN failure, causing
> >>>>message to be sent without proper fds attachment.
> >>>>
> >>>>In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> >>>>user responsability to resend message as is or to free write_msgfds
> >>>>using set_msgfds(0)
> >>>>
> >>>>Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> >>>>Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> >>>>---
> >>>>  qemu-char.c | 10 ++++++++++
> >>>>  1 file changed, 10 insertions(+)
> >>>>
> >>>>diff --git a/qemu-char.c b/qemu-char.c
> >>>>index 5448b0f..26d5f2e 100644
> >>>>--- a/qemu-char.c
> >>>>+++ b/qemu-char.c
> >>>>@@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> >>>>          r = sendmsg(s->fd, &msgh, 0);
> >>>>      } while (r < 0 && errno == EINTR);
> >>>>+    /* Ancillary data are not sent if no byte is written
> >>>>+     * so don't free msgfds buffer if return value is EAGAIN
> >>>>+     * If called from qemu_chr_fe_write_all retry will come soon
> >>>>+     * If called from qemu_chr_fe_write, it is the user responsibility
> >>>>+     * to resend message or free fds using set_msgfds(0)
> >>>Problem is, if ever anyone tries to send messages
> >>>with qemu_chr_fe_write and does not retry, there will
> >>>be a memory leak.
> >>>
> >>>Rather than this, how about adding an assert in qemu_chr_fe_write
> >>>to make sure its not used with msgfds?
> >>NB, this TCP chardev code has completely changed since this patch
> >>was submitted. It now uses QIOChannel instead of raw sockets APIs.
> >>The same problem still exists though - when we get EAGAIN from
> >>the sendmsg, we're releasing the file descriptors even though
> >>they've not been sent.
> >>
> >>Adding an assert in qemu_chr_fe_write() won't solve it - the
> >>same problem still exists in qemu_chr_fe_write_all() - although
> >>that loops to re-try on EAGAIN, the FDs are free'd before it
> >>does the retry.
> >>
> >>We need to update tcp_chr_write() to not free the FDs when
> >>io_channel_send_full() returns with errno==EAGAIN, in the
> >>same way this patch was doing.
> >Absolutely. We need to fix qemu_chr_fe_write_all.
> >I am just worried that someone tries to use
> >qemu_chr_fe_write with fds and then we get
> >a memory leak if qemu_chr_fe_write is not retried.
> >
> >So I propose we teach qemu_chr_fe_write
> >that it can't send msg fds.
> Patch is easy to port on head version.
> 
> My concern with following assert in qemu_chr_fe_write:
> 
> assert(s->set_msgfds == NULL);
> 
> Is that it may trigger on a tcp/linux socket that never wants
> to send message fds but uses qemu_chr_fe_write instead
> of qemu_chr_fe_write_all. Are we supposed to always use
> qemu_chr_fe_write_all on a tcp/linux socket?
> 
> I think that only vhost-user socket is using the ability to send
> message fds through socket, but i don't know all places were a linux/tcp
> socket can be used through qemu_chr_fe_write, without wanting
> to send any fd...

The qemu_chr_fe_set_msgfds() function is only called from one
place in QEMU:

$ git grep qemu_chr_fe_set_msgfds
hw/virtio/vhost-user.c:        qemu_chr_fe_set_msgfds(chr, fds, fd_num);
include/sysemu/char.h: * @qemu_chr_fe_set_msgfds:
include/sysemu/char.h:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num);
qemu-char.c:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num)

so if vhost-user.c does the write thing calling write_all(),
then you're fine.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-09 17:04           ` Daniel P. Berrange
@ 2016-02-10  9:35             ` Didier Pallard
  2016-02-10 11:53               ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Didier Pallard @ 2016-02-10  9:35 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: thibaut.collet, jmg, qemu-devel, Michael S. Tsirkin

On 02/09/2016 06:04 PM, Daniel P. Berrange wrote:
> On Tue, Feb 09, 2016 at 05:17:16PM +0100, Didier Pallard wrote:
>> On 02/09/2016 01:21 PM, Michael S. Tsirkin wrote:
>>> On Tue, Feb 09, 2016 at 11:48:13AM +0000, Daniel P. Berrange wrote:
>>>> On Thu, Feb 04, 2016 at 04:10:38PM +0200, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
>>>>>> unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
>>>>>> is used to send a message and retries as long as EAGAIN errno is set,
>>>>>> but write_msgfds buffer is freed after first EAGAIN failure, causing
>>>>>> message to be sent without proper fds attachment.
>>>>>>
>>>>>> In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
>>>>>> user responsability to resend message as is or to free write_msgfds
>>>>>> using set_msgfds(0)
>>>>>>
>>>>>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
>>>>>> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
>>>>>> ---
>>>>>>   qemu-char.c | 10 ++++++++++
>>>>>>   1 file changed, 10 insertions(+)
>>>>>>
>>>>>> diff --git a/qemu-char.c b/qemu-char.c
>>>>>> index 5448b0f..26d5f2e 100644
>>>>>> --- a/qemu-char.c
>>>>>> +++ b/qemu-char.c
>>>>>> @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
>>>>>>           r = sendmsg(s->fd, &msgh, 0);
>>>>>>       } while (r < 0 && errno == EINTR);
>>>>>> +    /* Ancillary data are not sent if no byte is written
>>>>>> +     * so don't free msgfds buffer if return value is EAGAIN
>>>>>> +     * If called from qemu_chr_fe_write_all retry will come soon
>>>>>> +     * If called from qemu_chr_fe_write, it is the user responsibility
>>>>>> +     * to resend message or free fds using set_msgfds(0)
>>>>> Problem is, if ever anyone tries to send messages
>>>>> with qemu_chr_fe_write and does not retry, there will
>>>>> be a memory leak.
>>>>>
>>>>> Rather than this, how about adding an assert in qemu_chr_fe_write
>>>>> to make sure its not used with msgfds?
>>>> NB, this TCP chardev code has completely changed since this patch
>>>> was submitted. It now uses QIOChannel instead of raw sockets APIs.
>>>> The same problem still exists though - when we get EAGAIN from
>>>> the sendmsg, we're releasing the file descriptors even though
>>>> they've not been sent.
>>>>
>>>> Adding an assert in qemu_chr_fe_write() won't solve it - the
>>>> same problem still exists in qemu_chr_fe_write_all() - although
>>>> that loops to re-try on EAGAIN, the FDs are free'd before it
>>>> does the retry.
>>>>
>>>> We need to update tcp_chr_write() to not free the FDs when
>>>> io_channel_send_full() returns with errno==EAGAIN, in the
>>>> same way this patch was doing.
>>> Absolutely. We need to fix qemu_chr_fe_write_all.
>>> I am just worried that someone tries to use
>>> qemu_chr_fe_write with fds and then we get
>>> a memory leak if qemu_chr_fe_write is not retried.
>>>
>>> So I propose we teach qemu_chr_fe_write
>>> that it can't send msg fds.
>> Patch is easy to port on head version.
>>
>> My concern with following assert in qemu_chr_fe_write:
>>
>> assert(s->set_msgfds == NULL);
>>
>> Is that it may trigger on a tcp/linux socket that never wants
>> to send message fds but uses qemu_chr_fe_write instead
>> of qemu_chr_fe_write_all. Are we supposed to always use
>> qemu_chr_fe_write_all on a tcp/linux socket?
>>
>> I think that only vhost-user socket is using the ability to send
>> message fds through socket, but i don't know all places were a linux/tcp
>> socket can be used through qemu_chr_fe_write, without wanting
>> to send any fd...
> The qemu_chr_fe_set_msgfds() function is only called from one
> place in QEMU:
>
> $ git grep qemu_chr_fe_set_msgfds
> hw/virtio/vhost-user.c:        qemu_chr_fe_set_msgfds(chr, fds, fd_num);
> include/sysemu/char.h: * @qemu_chr_fe_set_msgfds:
> include/sysemu/char.h:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num);
> qemu-char.c:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num)
>
> so if vhost-user.c does the write thing calling write_all(),
> then you're fine.
>
> Regards,
> Daniel

I agree that it will work as expected for vhost-user, but set_msgfds 
will be set
to tcp_set_msgfds for all "socket" type chardev. If such chardev is 
configured
to be used by some part of code using qemu_chr_fe_write instead of 
qemu_chr_fe_write_all,
it will trigger the assert.
And i just don't know if this case can happen.

I will write a new patchset with the assert in a separate commit.
thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-10  9:35             ` Didier Pallard
@ 2016-02-10 11:53               ` Michael S. Tsirkin
  2016-02-10 12:15                 ` Daniel P. Berrange
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2016-02-10 11:53 UTC (permalink / raw)
  To: Didier Pallard; +Cc: thibaut.collet, jmg, qemu-devel

On Wed, Feb 10, 2016 at 10:35:13AM +0100, Didier Pallard wrote:
> On 02/09/2016 06:04 PM, Daniel P. Berrange wrote:
> >On Tue, Feb 09, 2016 at 05:17:16PM +0100, Didier Pallard wrote:
> >>On 02/09/2016 01:21 PM, Michael S. Tsirkin wrote:
> >>>On Tue, Feb 09, 2016 at 11:48:13AM +0000, Daniel P. Berrange wrote:
> >>>>On Thu, Feb 04, 2016 at 04:10:38PM +0200, Michael S. Tsirkin wrote:
> >>>>>On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
> >>>>>>unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> >>>>>>is used to send a message and retries as long as EAGAIN errno is set,
> >>>>>>but write_msgfds buffer is freed after first EAGAIN failure, causing
> >>>>>>message to be sent without proper fds attachment.
> >>>>>>
> >>>>>>In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> >>>>>>user responsability to resend message as is or to free write_msgfds
> >>>>>>using set_msgfds(0)
> >>>>>>
> >>>>>>Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> >>>>>>Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> >>>>>>---
> >>>>>>  qemu-char.c | 10 ++++++++++
> >>>>>>  1 file changed, 10 insertions(+)
> >>>>>>
> >>>>>>diff --git a/qemu-char.c b/qemu-char.c
> >>>>>>index 5448b0f..26d5f2e 100644
> >>>>>>--- a/qemu-char.c
> >>>>>>+++ b/qemu-char.c
> >>>>>>@@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> >>>>>>          r = sendmsg(s->fd, &msgh, 0);
> >>>>>>      } while (r < 0 && errno == EINTR);
> >>>>>>+    /* Ancillary data are not sent if no byte is written
> >>>>>>+     * so don't free msgfds buffer if return value is EAGAIN
> >>>>>>+     * If called from qemu_chr_fe_write_all retry will come soon
> >>>>>>+     * If called from qemu_chr_fe_write, it is the user responsibility
> >>>>>>+     * to resend message or free fds using set_msgfds(0)
> >>>>>Problem is, if ever anyone tries to send messages
> >>>>>with qemu_chr_fe_write and does not retry, there will
> >>>>>be a memory leak.
> >>>>>
> >>>>>Rather than this, how about adding an assert in qemu_chr_fe_write
> >>>>>to make sure its not used with msgfds?
> >>>>NB, this TCP chardev code has completely changed since this patch
> >>>>was submitted. It now uses QIOChannel instead of raw sockets APIs.
> >>>>The same problem still exists though - when we get EAGAIN from
> >>>>the sendmsg, we're releasing the file descriptors even though
> >>>>they've not been sent.
> >>>>
> >>>>Adding an assert in qemu_chr_fe_write() won't solve it - the
> >>>>same problem still exists in qemu_chr_fe_write_all() - although
> >>>>that loops to re-try on EAGAIN, the FDs are free'd before it
> >>>>does the retry.
> >>>>
> >>>>We need to update tcp_chr_write() to not free the FDs when
> >>>>io_channel_send_full() returns with errno==EAGAIN, in the
> >>>>same way this patch was doing.
> >>>Absolutely. We need to fix qemu_chr_fe_write_all.
> >>>I am just worried that someone tries to use
> >>>qemu_chr_fe_write with fds and then we get
> >>>a memory leak if qemu_chr_fe_write is not retried.
> >>>
> >>>So I propose we teach qemu_chr_fe_write
> >>>that it can't send msg fds.
> >>Patch is easy to port on head version.
> >>
> >>My concern with following assert in qemu_chr_fe_write:
> >>
> >>assert(s->set_msgfds == NULL);
> >>
> >>Is that it may trigger on a tcp/linux socket that never wants
> >>to send message fds but uses qemu_chr_fe_write instead
> >>of qemu_chr_fe_write_all. Are we supposed to always use
> >>qemu_chr_fe_write_all on a tcp/linux socket?
> >>
> >>I think that only vhost-user socket is using the ability to send
> >>message fds through socket, but i don't know all places were a linux/tcp
> >>socket can be used through qemu_chr_fe_write, without wanting
> >>to send any fd...
> >The qemu_chr_fe_set_msgfds() function is only called from one
> >place in QEMU:
> >
> >$ git grep qemu_chr_fe_set_msgfds
> >hw/virtio/vhost-user.c:        qemu_chr_fe_set_msgfds(chr, fds, fd_num);
> >include/sysemu/char.h: * @qemu_chr_fe_set_msgfds:
> >include/sysemu/char.h:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num);
> >qemu-char.c:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num)
> >
> >so if vhost-user.c does the write thing calling write_all(),
> >then you're fine.
> >
> >Regards,
> >Daniel
> 
> I agree that it will work as expected for vhost-user, but set_msgfds will be
> set
> to tcp_set_msgfds for all "socket" type chardev. If such chardev is
> configured
> to be used by some part of code using qemu_chr_fe_write instead of
> qemu_chr_fe_write_all,
> it will trigger the assert.
> And i just don't know if this case can happen.
> 
> I will write a new patchset with the assert in a separate commit.
> thanks

Oh, I see. I really meant

assert(!s->write_msgfds && !s->write_msgfds_num);

this assert can go into tcp_chr_write.

-- 
MST

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-10 11:53               ` Michael S. Tsirkin
@ 2016-02-10 12:15                 ` Daniel P. Berrange
  2016-02-19  9:09                   ` Didier Pallard
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel P. Berrange @ 2016-02-10 12:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: thibaut.collet, jmg, qemu-devel, Didier Pallard

On Wed, Feb 10, 2016 at 01:53:49PM +0200, Michael S. Tsirkin wrote:
> On Wed, Feb 10, 2016 at 10:35:13AM +0100, Didier Pallard wrote:
> > On 02/09/2016 06:04 PM, Daniel P. Berrange wrote:
> > >On Tue, Feb 09, 2016 at 05:17:16PM +0100, Didier Pallard wrote:
> > >>On 02/09/2016 01:21 PM, Michael S. Tsirkin wrote:
> > >>>On Tue, Feb 09, 2016 at 11:48:13AM +0000, Daniel P. Berrange wrote:
> > >>>>On Thu, Feb 04, 2016 at 04:10:38PM +0200, Michael S. Tsirkin wrote:
> > >>>>>On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
> > >>>>>>unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
> > >>>>>>is used to send a message and retries as long as EAGAIN errno is set,
> > >>>>>>but write_msgfds buffer is freed after first EAGAIN failure, causing
> > >>>>>>message to be sent without proper fds attachment.
> > >>>>>>
> > >>>>>>In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
> > >>>>>>user responsability to resend message as is or to free write_msgfds
> > >>>>>>using set_msgfds(0)
> > >>>>>>
> > >>>>>>Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> > >>>>>>Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> > >>>>>>---
> > >>>>>>  qemu-char.c | 10 ++++++++++
> > >>>>>>  1 file changed, 10 insertions(+)
> > >>>>>>
> > >>>>>>diff --git a/qemu-char.c b/qemu-char.c
> > >>>>>>index 5448b0f..26d5f2e 100644
> > >>>>>>--- a/qemu-char.c
> > >>>>>>+++ b/qemu-char.c
> > >>>>>>@@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
> > >>>>>>          r = sendmsg(s->fd, &msgh, 0);
> > >>>>>>      } while (r < 0 && errno == EINTR);
> > >>>>>>+    /* Ancillary data are not sent if no byte is written
> > >>>>>>+     * so don't free msgfds buffer if return value is EAGAIN
> > >>>>>>+     * If called from qemu_chr_fe_write_all retry will come soon
> > >>>>>>+     * If called from qemu_chr_fe_write, it is the user responsibility
> > >>>>>>+     * to resend message or free fds using set_msgfds(0)
> > >>>>>Problem is, if ever anyone tries to send messages
> > >>>>>with qemu_chr_fe_write and does not retry, there will
> > >>>>>be a memory leak.
> > >>>>>
> > >>>>>Rather than this, how about adding an assert in qemu_chr_fe_write
> > >>>>>to make sure its not used with msgfds?
> > >>>>NB, this TCP chardev code has completely changed since this patch
> > >>>>was submitted. It now uses QIOChannel instead of raw sockets APIs.
> > >>>>The same problem still exists though - when we get EAGAIN from
> > >>>>the sendmsg, we're releasing the file descriptors even though
> > >>>>they've not been sent.
> > >>>>
> > >>>>Adding an assert in qemu_chr_fe_write() won't solve it - the
> > >>>>same problem still exists in qemu_chr_fe_write_all() - although
> > >>>>that loops to re-try on EAGAIN, the FDs are free'd before it
> > >>>>does the retry.
> > >>>>
> > >>>>We need to update tcp_chr_write() to not free the FDs when
> > >>>>io_channel_send_full() returns with errno==EAGAIN, in the
> > >>>>same way this patch was doing.
> > >>>Absolutely. We need to fix qemu_chr_fe_write_all.
> > >>>I am just worried that someone tries to use
> > >>>qemu_chr_fe_write with fds and then we get
> > >>>a memory leak if qemu_chr_fe_write is not retried.
> > >>>
> > >>>So I propose we teach qemu_chr_fe_write
> > >>>that it can't send msg fds.
> > >>Patch is easy to port on head version.
> > >>
> > >>My concern with following assert in qemu_chr_fe_write:
> > >>
> > >>assert(s->set_msgfds == NULL);
> > >>
> > >>Is that it may trigger on a tcp/linux socket that never wants
> > >>to send message fds but uses qemu_chr_fe_write instead
> > >>of qemu_chr_fe_write_all. Are we supposed to always use
> > >>qemu_chr_fe_write_all on a tcp/linux socket?
> > >>
> > >>I think that only vhost-user socket is using the ability to send
> > >>message fds through socket, but i don't know all places were a linux/tcp
> > >>socket can be used through qemu_chr_fe_write, without wanting
> > >>to send any fd...
> > >The qemu_chr_fe_set_msgfds() function is only called from one
> > >place in QEMU:
> > >
> > >$ git grep qemu_chr_fe_set_msgfds
> > >hw/virtio/vhost-user.c:        qemu_chr_fe_set_msgfds(chr, fds, fd_num);
> > >include/sysemu/char.h: * @qemu_chr_fe_set_msgfds:
> > >include/sysemu/char.h:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num);
> > >qemu-char.c:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num)
> > >
> > >so if vhost-user.c does the write thing calling write_all(),
> > >then you're fine.
> > >
> > >Regards,
> > >Daniel
> > 
> > I agree that it will work as expected for vhost-user, but set_msgfds will be
> > set
> > to tcp_set_msgfds for all "socket" type chardev. If such chardev is
> > configured
> > to be used by some part of code using qemu_chr_fe_write instead of
> > qemu_chr_fe_write_all,
> > it will trigger the assert.
> > And i just don't know if this case can happen.
> > 
> > I will write a new patchset with the assert in a separate commit.
> > thanks
> 
> Oh, I see. I really meant
> 
> assert(!s->write_msgfds && !s->write_msgfds_num);
> 
> this assert can go into tcp_chr_write.

Err, no it can't. tcp_chr_write() is the thing responsible for sending
FDs, so we can't assert that no FDs are set, as that'll make it impossible
for anything to send FDs.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask
  2015-12-03  9:53 ` [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask Didier Pallard
  2015-12-07 13:37   ` Marc-André Lureau
  2016-02-04 13:08   ` Michael S. Tsirkin
@ 2016-02-15 15:38   ` Victor Kaplansky
  2 siblings, 0 replies; 33+ messages in thread
From: Victor Kaplansky @ 2016-02-15 15:38 UTC (permalink / raw)
  To: Didier Pallard
  Cc: Marcel Apfelbaum, thibaut.collet, jmg, qemu-devel, Michael S. Tsirkin

On Thu, Dec 03, 2015 at 10:53:18AM +0100, Didier Pallard wrote:
> Using guest_notifier_mask function in vhost-user case may
> break interrupt mask paradigm, because mask/unmask is not
> really done when returning from guest_notifier_mask call, instead
> message is posted in a unix socket, and processed later.
> 
> Add an option bit to disable the use of guest_notifier_mask
> in virtio pci.
> 
> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
> ---
>  hw/virtio/virtio-pci.c | 29 +++++++++++++++++++++++------
>  hw/virtio/virtio-pci.h |  6 ++++++
>  2 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index dd48562..26bb617 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -806,7 +806,8 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
>          /* If guest supports masking, set up irqfd now.
>           * Otherwise, delay until unmasked in the frontend.
>           */
> -        if (k->guest_notifier_mask) {
> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +            k->guest_notifier_mask) {
>              ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
>              if (ret < 0) {
>                  kvm_virtio_pci_vq_vector_release(proxy, vector);
> @@ -822,7 +823,8 @@ undo:
>          if (vector >= msix_nr_vectors_allocated(dev)) {
>              continue;
>          }
> -        if (k->guest_notifier_mask) {
> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +            k->guest_notifier_mask) {
>              kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>          }
>          kvm_virtio_pci_vq_vector_release(proxy, vector);
> @@ -849,7 +851,8 @@ static void kvm_virtio_pci_vector_release(VirtIOPCIProxy *proxy, int nvqs)
>          /* If guest supports masking, clean up irqfd now.
>           * Otherwise, it was cleaned when masked in the frontend.
>           */
> -        if (k->guest_notifier_mask) {
> +        if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +            k->guest_notifier_mask) {
>              kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
>          }
>          kvm_virtio_pci_vq_vector_release(proxy, vector);
> @@ -882,7 +885,8 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
>      /* If guest supports masking, irqfd is already setup, unmask it.
>       * Otherwise, set it up now.
>       */
> -    if (k->guest_notifier_mask) {
> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +        k->guest_notifier_mask) {
>          k->guest_notifier_mask(vdev, queue_no, false);
>          /* Test after unmasking to avoid losing events. */
>          if (k->guest_notifier_pending &&
> @@ -905,7 +909,8 @@ static void virtio_pci_vq_vector_mask(VirtIOPCIProxy *proxy,
>      /* If guest supports masking, keep irqfd but mask it.
>       * Otherwise, clean it up now.
>       */ 
> -    if (k->guest_notifier_mask) {
> +    if ((proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +        k->guest_notifier_mask) {
>          k->guest_notifier_mask(vdev, queue_no, true);
>      } else {
>          kvm_virtio_pci_irqfd_release(proxy, queue_no, vector);
> @@ -1022,7 +1027,9 @@ static int virtio_pci_set_guest_notifier(DeviceState *d, int n, bool assign,
>          event_notifier_cleanup(notifier);
>      }
>  
> -    if (!msix_enabled(&proxy->pci_dev) && vdc->guest_notifier_mask) {
> +    if (!msix_enabled(&proxy->pci_dev) &&
> +        (proxy->flags & VIRTIO_PCI_FLAG_USE_NOTIFIERMASK) &&
> +        vdc->guest_notifier_mask) {
>          vdc->guest_notifier_mask(vdev, n, !assign);
>      }
>  
> @@ -1164,6 +1171,8 @@ static void virtio_9p_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>  static Property virtio_9p_pci_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),

Is there a reason why we add the new property to specific classes
like virtio_9p_pci_realize, virtio_blk_pci_properties and so on,
instead of adding it to the more general class
virtio_pci_properties, and thus the new property having
inherited, obtained by all PCI devices?

>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> @@ -1908,6 +1917,8 @@ static Property virtio_blk_pci_properties[] = {
>      DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> @@ -1961,6 +1972,8 @@ static const TypeInfo virtio_blk_pci_info = {
>  static Property virtio_scsi_pci_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors,
>                         DEV_NVECTORS_UNSPECIFIED),
>      DEFINE_PROP_END_OF_LIST(),
> @@ -2175,6 +2188,8 @@ static void virtio_serial_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>  static Property virtio_serial_pci_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>      DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
>      DEFINE_PROP_END_OF_LIST(),
> @@ -2215,6 +2230,8 @@ static const TypeInfo virtio_serial_pci_info = {
>  static Property virtio_net_properties[] = {
>      DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
>                      VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, false),
> +    DEFINE_PROP_BIT("usenotifiermask", VirtIOPCIProxy, flags,
> +                    VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT, true),
>      DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index ffb74bb..aecd4eb 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -86,6 +86,12 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
>  #define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY \
>      (1 << VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT)
>  
> +/* Where vhost-user implementation exists, using the guest notifier mask
> + * feature can lead to improper interrupt management. Add a flag to
> + * allow to disable this guest notifier mask if desired */
> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT 6
> +#define VIRTIO_PCI_FLAG_USE_NOTIFIERMASK (1 << VIRTIO_PCI_FLAG_USE_NOTIFIERMASK_BIT)
> +
>  typedef struct {
>      MSIMessage msg;
>      int virq;
> -- 
> 2.1.4
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full
  2016-02-10 12:15                 ` Daniel P. Berrange
@ 2016-02-19  9:09                   ` Didier Pallard
  0 siblings, 0 replies; 33+ messages in thread
From: Didier Pallard @ 2016-02-19  9:09 UTC (permalink / raw)
  To: Daniel P. Berrange, Michael S. Tsirkin; +Cc: thibaut.collet, jmg, qemu-devel

On 02/10/2016 01:15 PM, Daniel P. Berrange wrote:
> On Wed, Feb 10, 2016 at 01:53:49PM +0200, Michael S. Tsirkin wrote:
>> On Wed, Feb 10, 2016 at 10:35:13AM +0100, Didier Pallard wrote:
>>> On 02/09/2016 06:04 PM, Daniel P. Berrange wrote:
>>>> On Tue, Feb 09, 2016 at 05:17:16PM +0100, Didier Pallard wrote:
>>>>> On 02/09/2016 01:21 PM, Michael S. Tsirkin wrote:
>>>>>> On Tue, Feb 09, 2016 at 11:48:13AM +0000, Daniel P. Berrange wrote:
>>>>>>> On Thu, Feb 04, 2016 at 04:10:38PM +0200, Michael S. Tsirkin wrote:
>>>>>>>> On Thu, Dec 03, 2015 at 10:53:17AM +0100, Didier Pallard wrote:
>>>>>>>>> unix_send_msgfds is used by vhost-user control socket. qemu_chr_fe_write_all
>>>>>>>>> is used to send a message and retries as long as EAGAIN errno is set,
>>>>>>>>> but write_msgfds buffer is freed after first EAGAIN failure, causing
>>>>>>>>> message to be sent without proper fds attachment.
>>>>>>>>>
>>>>>>>>> In case unix_send_msgfds is called through qemu_chr_fe_write, it will be
>>>>>>>>> user responsability to resend message as is or to free write_msgfds
>>>>>>>>> using set_msgfds(0)
>>>>>>>>>
>>>>>>>>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
>>>>>>>>> Reviewed-by: Thibaut Collet <thibaut.collet@6wind.com>
>>>>>>>>> ---
>>>>>>>>>   qemu-char.c | 10 ++++++++++
>>>>>>>>>   1 file changed, 10 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/qemu-char.c b/qemu-char.c
>>>>>>>>> index 5448b0f..26d5f2e 100644
>>>>>>>>> --- a/qemu-char.c
>>>>>>>>> +++ b/qemu-char.c
>>>>>>>>> @@ -2614,6 +2614,16 @@ static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
>>>>>>>>>           r = sendmsg(s->fd, &msgh, 0);
>>>>>>>>>       } while (r < 0 && errno == EINTR);
>>>>>>>>> +    /* Ancillary data are not sent if no byte is written
>>>>>>>>> +     * so don't free msgfds buffer if return value is EAGAIN
>>>>>>>>> +     * If called from qemu_chr_fe_write_all retry will come soon
>>>>>>>>> +     * If called from qemu_chr_fe_write, it is the user responsibility
>>>>>>>>> +     * to resend message or free fds using set_msgfds(0)
>>>>>>>> Problem is, if ever anyone tries to send messages
>>>>>>>> with qemu_chr_fe_write and does not retry, there will
>>>>>>>> be a memory leak.
>>>>>>>>
>>>>>>>> Rather than this, how about adding an assert in qemu_chr_fe_write
>>>>>>>> to make sure its not used with msgfds?
>>>>>>> NB, this TCP chardev code has completely changed since this patch
>>>>>>> was submitted. It now uses QIOChannel instead of raw sockets APIs.
>>>>>>> The same problem still exists though - when we get EAGAIN from
>>>>>>> the sendmsg, we're releasing the file descriptors even though
>>>>>>> they've not been sent.
>>>>>>>
>>>>>>> Adding an assert in qemu_chr_fe_write() won't solve it - the
>>>>>>> same problem still exists in qemu_chr_fe_write_all() - although
>>>>>>> that loops to re-try on EAGAIN, the FDs are free'd before it
>>>>>>> does the retry.
>>>>>>>
>>>>>>> We need to update tcp_chr_write() to not free the FDs when
>>>>>>> io_channel_send_full() returns with errno==EAGAIN, in the
>>>>>>> same way this patch was doing.
>>>>>> Absolutely. We need to fix qemu_chr_fe_write_all.
>>>>>> I am just worried that someone tries to use
>>>>>> qemu_chr_fe_write with fds and then we get
>>>>>> a memory leak if qemu_chr_fe_write is not retried.
>>>>>>
>>>>>> So I propose we teach qemu_chr_fe_write
>>>>>> that it can't send msg fds.
>>>>> Patch is easy to port on head version.
>>>>>
>>>>> My concern with following assert in qemu_chr_fe_write:
>>>>>
>>>>> assert(s->set_msgfds == NULL);
>>>>>
>>>>> Is that it may trigger on a tcp/linux socket that never wants
>>>>> to send message fds but uses qemu_chr_fe_write instead
>>>>> of qemu_chr_fe_write_all. Are we supposed to always use
>>>>> qemu_chr_fe_write_all on a tcp/linux socket?
>>>>>
>>>>> I think that only vhost-user socket is using the ability to send
>>>>> message fds through socket, but i don't know all places were a linux/tcp
>>>>> socket can be used through qemu_chr_fe_write, without wanting
>>>>> to send any fd...
>>>> The qemu_chr_fe_set_msgfds() function is only called from one
>>>> place in QEMU:
>>>>
>>>> $ git grep qemu_chr_fe_set_msgfds
>>>> hw/virtio/vhost-user.c:        qemu_chr_fe_set_msgfds(chr, fds, fd_num);
>>>> include/sysemu/char.h: * @qemu_chr_fe_set_msgfds:
>>>> include/sysemu/char.h:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num);
>>>> qemu-char.c:int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num)
>>>>
>>>> so if vhost-user.c does the write thing calling write_all(),
>>>> then you're fine.
>>>>
>>>> Regards,
>>>> Daniel
>>> I agree that it will work as expected for vhost-user, but set_msgfds will be
>>> set
>>> to tcp_set_msgfds for all "socket" type chardev. If such chardev is
>>> configured
>>> to be used by some part of code using qemu_chr_fe_write instead of
>>> qemu_chr_fe_write_all,
>>> it will trigger the assert.
>>> And i just don't know if this case can happen.
>>>
>>> I will write a new patchset with the assert in a separate commit.
>>> thanks
>> Oh, I see. I really meant
>>
>> assert(!s->write_msgfds && !s->write_msgfds_num);
>>
>> this assert can go into tcp_chr_write.
> Err, no it can't. tcp_chr_write() is the thing responsible for sending
> FDs, so we can't assert that no FDs are set, as that'll make it impossible
> for anything to send FDs.
>
> Regards,
> Daniel
I fully agree with daniel, i think it can't be done in tcp_chr_write as is.
So finally, what should i do?
Add a method that may return the number of pending msgfds to be sent?
It may allow to check that no message fds are pending when using
tcp_chr_write...

thanks
didier

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-02-19  9:09 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03  9:53 [Qemu-devel] Linux vhost-user interrupt management fixes Didier Pallard
2015-12-03  9:53 ` [Qemu-devel] [PATCH 1/3] char: fix vhost-user socket full Didier Pallard
2015-12-07 13:31   ` Marc-André Lureau
2015-12-09 15:59     ` Victor Kaplansky
2015-12-09 17:06       ` Didier Pallard
2015-12-10 12:56         ` Victor Kaplansky
2015-12-10 15:09           ` Didier Pallard
2015-12-17 14:41             ` Victor Kaplansky
2016-02-04 13:13   ` Michael S. Tsirkin
2016-02-04 14:10   ` Michael S. Tsirkin
2016-02-08 13:12     ` Didier Pallard
2016-02-09 11:37       ` Michael S. Tsirkin
2016-02-09 11:48     ` Daniel P. Berrange
2016-02-09 12:21       ` Michael S. Tsirkin
2016-02-09 16:17         ` Didier Pallard
2016-02-09 16:50           ` Michael S. Tsirkin
2016-02-09 17:04           ` Daniel P. Berrange
2016-02-10  9:35             ` Didier Pallard
2016-02-10 11:53               ` Michael S. Tsirkin
2016-02-10 12:15                 ` Daniel P. Berrange
2016-02-19  9:09                   ` Didier Pallard
2015-12-03  9:53 ` [Qemu-devel] [PATCH 2/3] virtio-pci: add an option to bypass guest_notifier_mask Didier Pallard
2015-12-07 13:37   ` Marc-André Lureau
2015-12-07 13:59     ` Marc-André Lureau
2015-12-09 15:06       ` Didier Pallard
2016-02-04 13:08   ` Michael S. Tsirkin
2016-02-08 13:24     ` Didier Pallard
2016-02-15 15:38   ` Victor Kaplansky
2015-12-03  9:53 ` [Qemu-devel] [PATCH 3/3] vhost-net: force guest_notifier_mask bypass in vhost-user case Didier Pallard
2016-02-04 13:06   ` Michael S. Tsirkin
2015-12-04 10:04 ` [Qemu-devel] Linux vhost-user interrupt management fixes Didier Pallard
2016-01-25  9:22 ` Victor Kaplansky
2016-01-26  9:25   ` Didier Pallard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).