QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Qemu-devel] [RFC] vhost-user simulation extension
@ 2019-09-02 12:12 Johannes Berg
  2019-09-02 12:12 ` [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages Johannes Berg
  2019-09-06 12:13 ` [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS Johannes Berg
  0 siblings, 2 replies; 38+ messages in thread
From: Johannes Berg @ 2019-09-02 12:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: Marc-André Lureau, Stefan Hajnoczi

Hi all,

First of all, I'm not sure if QEMU actually maintains the master copy of
this document and it isn't like virtio? It *looks* like it does, given
that the updates are "do something" rather than "sync to some version",
but please let me know if I need to send this elsewhere.

Secondly, I'm not sure how amenable you are to changes that are likely
to not be of "production" use. There are many people using Linux for
simulation, but you don't necessarily have to make it easy for them :-)

Third, if this is accepted I'm not sure we'd currently implement it in
QEMU itself. We're currently working on user-mode-Linux because I haven't
found a really good way to control time in QEMU simulation (though other
people have, e.g. VMSimInt[1] has implemented this, AFAICT though only
in full simulation, not with KVM.) So again, this may or may not be a
show-stopper, let me know. The UML patch[2] to implement vhost-user has
been posted to the list but we expect to be making revisions to it before
it can be integrated.

With all that said, let me explain a bit more what this is about. In a
simulation system like VMSimInt or what we're working on, you have what
the VMSimInt paper[3] calls the "simulation calendar", where each of the
simulated components adds an entry and then gets told when (and for how
much simulated time) to run. This is necessary to ensure the simulation
will run the different components in the proper timeline, and if none of
the components has anything to do, time can just skip forward, similar to
the time-travel mode I implemented in UML[4].

Now, of course it's not very useful if you have multiple VMs that can be
on the same simulation time, if they cannot exchange data. VMSimInt has
its own network hooks in the code changes[5], but I'm also interested in
simulating devices (using the same simulation calendar), not just networks.

The obvious way of integrating devices seemed to be virtio/vhost-user, and
that's why we already implemented vhost-user in UML. Now, a bit later, I
realized that due to the asynchronous nature of kick/call on vrings, the
simulation calendar update cannot be guaranteed properly without some extra
synchronisation.

I haven't implemented it yet, but I'm convinced that the proposed addition
to the vhost-user spec would let us implement this properly, as we could
have the vring update message (VHOST_USER_VQ_KICK) put an entry on the
simulation calendar for actually processing the message before sending an
ACK reply back, and the same for VHOST_USER_VQ_CALL in the other direction.
The actual processing of the message on the vring would then happen when
the simulation calendar told that component to run, i.e. when the event
on the calendar actually gets scheduled.

There are obviously other ways of implementing this, for instance I could
use vhost-user polling mode, and then not really implement polling but
have an out-of-band message (e.g. through the communcation channel that's
also used for the simulation time/calendar [6]) to indiciate kick/call.
However, this is uglier in the code, extending the protocol in this way
would just require me to configure the drivers/devices to negotiate these
extensions and hook the interrupt handling on both sides to the simulation
calendar, the latter of which is necessary anyway.

Hopefully I've managed to explain what I'm trying to do here, questions or
suggestions are always welcome :-)

johannes

[1] http://www.ikr.uni-stuttgart.de/INDSimLib/
[2] https://patchwork.ozlabs.org/patch/1140033/
[3] http://www.ikr.uni-stuttgart.de/Content/Publications/Archive/We_SIMUTools_2014_40209.pdf
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=065038706f77a56754e8f0c2556dab7e22dfe577
[5] I've extracted them from their sources as a diff against QEMU v1.6.0:
    https://p.sipsolutions.net/af9a68ded948c07e.txt
[6] which I'm currently planning to implement using a separate virtio device
    with an appropriate driver that hooks into low-level UML things



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-02 12:12 [Qemu-devel] [RFC] vhost-user simulation extension Johannes Berg
@ 2019-09-02 12:12 ` Johannes Berg
  2019-09-05 20:28   ` Johannes Berg
  2019-09-11 15:31   ` Stefan Hajnoczi
  2019-09-06 12:13 ` [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS Johannes Berg
  1 sibling, 2 replies; 38+ messages in thread
From: Johannes Berg @ 2019-09-02 12:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: Marc-André Lureau, Johannes Berg, Stefan Hajnoczi

From: Johannes Berg <johannes.berg@intel.com>

For good reason, vhost-user is currently built asynchronously, that
way better performance can be obtained. However, for certain use
cases such as simulation, this is problematic.

Consider an event-based simulation in which both the device and CPU
have are scheduled according to a simulation "calendar". Now, for
example, consider the CPU sending a command to the device, over a
vring in the vhost-user protocol. In this case, the CPU must wait
for the vring update to have been processed, so that the device has
time to put an entry onto the simulation calendar to obtain time to
handle the interrupt, before the CPU goes back to scheduling and
thus updates the simulation calendar or even has the simulation
move time forward.

This cannot easily achieved with the eventfd based vring kick/call
design.

Extend the protocol slightly, so that a message can be used for kick
and call instead, if the new VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS is
negotiated. This in itself doesn't guarantee synchronisation, but both
sides can also negotiate VHOST_USER_PROTOCOL_F_REPLY_ACK and thus get
a reply to this message by setting the need_reply flag, and ensure
synchronisation this way.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 docs/interop/vhost-user.rst | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 7827b710aa0a..1270885fecf2 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -2,6 +2,7 @@
 Vhost-user Protocol
 ===================
 :Copyright: 2014 Virtual Open Systems Sarl.
+:Copyright: 2019 Intel Corporation
 :Licence: This work is licensed under the terms of the GNU GPL,
           version 2 or later. See the COPYING file in the top-level
           directory.
@@ -785,6 +786,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+  #define VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS 13
 
 Master message types
 --------------------
@@ -946,7 +948,9 @@ Master message types
   Bits (0-7) of the payload contain the vring index. Bit 8 is the
   invalid FD flag. This flag is set when there is no file descriptor
   in the ancillary data. This signals that polling should be used
-  instead of waiting for a kick.
+  instead of waiting for the call. however, if the protocol feature
+  ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` has been negotiated it instead
+  means the updates should be done using the messages.
 
 ``VHOST_USER_SET_VRING_CALL``
   :id: 13
@@ -959,7 +963,9 @@ Master message types
   Bits (0-7) of the payload contain the vring index. Bit 8 is the
   invalid FD flag. This flag is set when there is no file descriptor
   in the ancillary data. This signals that polling will be used
-  instead of waiting for the call.
+  instead of waiting for the call; however, if the protocol feature
+  ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` has been negotiated it instead
+  means the updates should be done using the messages.
 
 ``VHOST_USER_SET_VRING_ERR``
   :id: 14
@@ -1190,6 +1196,19 @@ Master message types
   ancillary data. The GPU protocol is used to inform the master of
   rendering state and updates. See vhost-user-gpu.rst for details.
 
+``VHOST_USER_VQ_CALL``
+  :id: 34
+  :equivalent ioctl: N/A
+  :slave payload: vring state description
+  :master payload: N/A
+
+  When the ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` protocol feature has
+  been successfully negotiated, this message may be submitted by the master
+  to indicate that a buffer was added to the vring instead of signalling it
+  using the vring's event FD or having the slave rely on polling.
+
+  The state.num field is currently reserved and must be set to 0.
+
 Slave message types
 -------------------
 
@@ -1246,6 +1265,19 @@ Slave message types
   ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
   successfully negotiated.
 
+``VHOST_USER_VQ_KICK``
+  :id: 4
+  :equivalent ioctl: N/A
+  :slave payload: vring state description
+  :master payload: N/A
+
+  When the ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` protocol feature has
+  been successfully negotiated, this message may be submitted by the slave
+  to indicate that a buffer was used from the vring instead of signalling it
+  using the vring's kick FD or having the master relying on polling.
+
+  The state.num field is currently reserved and must be set to 0.
+
 .. _reply_ack:
 
 VHOST_USER_PROTOCOL_F_REPLY_ACK
-- 
2.20.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-02 12:12 ` [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages Johannes Berg
@ 2019-09-05 20:28   ` Johannes Berg
  2019-09-09 16:00     ` Stefan Hajnoczi
  2019-09-11 15:31   ` Stefan Hajnoczi
  1 sibling, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-05 20:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Marc-André Lureau, Stefan Hajnoczi, mst

 
> +``VHOST_USER_VQ_CALL``
> +  :id: 34
> +  :equivalent ioctl: N/A
> +  :slave payload: vring state description
> +  :master payload: N/A

Oops. This message should be called VHOST_USER_VRING_KICK.

This file doesn't take about virtqueues, just vrings, and I inverted the
call/kick.

[...]
 
> +``VHOST_USER_VQ_KICK``
> +  :id: 4
> +  :equivalent ioctl: N/A
> +  :slave payload: vring state description
> +  :master payload: N/A

Similarly, that should be called VHOST_USER_SLAVE_VRING_CALL.

Anyway, some comments would be appreciated. I'm working on an
implementation now for my simulation environment, and I guess I can keep
that private etc. but if there's interest I can submit an (optional)
implementation of this for libvhost-user too, I think.

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-02 12:12 [Qemu-devel] [RFC] vhost-user simulation extension Johannes Berg
  2019-09-02 12:12 ` [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages Johannes Berg
@ 2019-09-06 12:13 ` Johannes Berg
  2019-09-06 14:22   ` Michael S. Tsirkin
  1 sibling, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-06 12:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Johannes Berg, Michael S . Tsirkin

From: Johannes Berg <johannes.berg@intel.com>

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 contrib/libvhost-user/libvhost-user.c | 61 +++++++++++++++++++++++----
 contrib/libvhost-user/libvhost-user.h |  3 ++
 2 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index fba291c13db4..550f6416a211 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -136,6 +136,7 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_GET_INFLIGHT_FD),
         REQ(VHOST_USER_SET_INFLIGHT_FD),
         REQ(VHOST_USER_GPU_SET_SOCKET),
+        REQ(VHOST_USER_VRING_KICK),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -920,6 +921,7 @@ static bool
 vu_check_queue_msg_file(VuDev *dev, VhostUserMsg *vmsg)
 {
     int index = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
+    bool nofd = vmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK;
 
     if (index >= dev->max_queues) {
         vmsg_close_fds(vmsg);
@@ -927,8 +929,12 @@ vu_check_queue_msg_file(VuDev *dev, VhostUserMsg *vmsg)
         return false;
     }
 
-    if (vmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK ||
-        vmsg->fd_num != 1) {
+    if (nofd) {
+        vmsg_close_fds(vmsg);
+        return true;
+    }
+
+    if (vmsg->fd_num != 1) {
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Invalid fds in request: %d", vmsg->request);
         return false;
@@ -1025,6 +1031,7 @@ static bool
 vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
     int index = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
+    bool nofd = vmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK;
 
     DPRINT("u64: 0x%016"PRIx64"\n", vmsg->payload.u64);
 
@@ -1038,8 +1045,8 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
         dev->vq[index].kick_fd = -1;
     }
 
-    dev->vq[index].kick_fd = vmsg->fds[0];
-    DPRINT("Got kick_fd: %d for vq: %d\n", vmsg->fds[0], index);
+    dev->vq[index].kick_fd = nofd ? -1 : vmsg->fds[0];
+    DPRINT("Got kick_fd: %d for vq: %d\n", dev->vq[index].kick_fd, index);
 
     dev->vq[index].started = true;
     if (dev->iface->queue_set_started) {
@@ -1116,6 +1123,7 @@ static bool
 vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
     int index = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
+    bool nofd = VHOST_USER_VRING_NOFD_MASK;
 
     DPRINT("u64: 0x%016"PRIx64"\n", vmsg->payload.u64);
 
@@ -1128,14 +1136,14 @@ vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
         dev->vq[index].call_fd = -1;
     }
 
-    dev->vq[index].call_fd = vmsg->fds[0];
+    dev->vq[index].call_fd = nofd ? -1 : vmsg->fds[0];
 
     /* in case of I/O hang after reconnecting */
-    if (eventfd_write(vmsg->fds[0], 1)) {
+    if (dev->vq[index].call_fd != -1 && eventfd_write(vmsg->fds[0], 1)) {
         return -1;
     }
 
-    DPRINT("Got call_fd: %d for vq: %d\n", vmsg->fds[0], index);
+    DPRINT("Got call_fd: %d for vq: %d\n", dev->vq[index].call_fd, index);
 
     return false;
 }
@@ -1169,7 +1177,8 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
                         1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ |
                         1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER |
                         1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD |
-                        1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK;
+                        1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK |
+                        1ULL << VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS;
 
     if (have_userfault()) {
         features |= 1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT;
@@ -1456,6 +1465,25 @@ vu_set_inflight_fd(VuDev *dev, VhostUserMsg *vmsg)
     return false;
 }
 
+static bool
+vu_handle_vring_kick(VuDev *dev, VhostUserMsg *vmsg)
+{
+    unsigned int index = vmsg->payload.state.index;
+
+    if (index >= dev->max_queues) {
+        vu_panic(dev, "Invalid queue index: %u", index);
+        return false;
+    }
+
+    DPRINT("Got kick message: handler:%p idx:%d\n",
+	   dev->vq[index].handler, index);
+    if (dev->vq[index].handler) {
+        dev->vq[index].handler(dev, index);
+    }
+
+    return false;
+}
+
 static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -1538,6 +1566,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_get_inflight_fd(dev, vmsg);
     case VHOST_USER_SET_INFLIGHT_FD:
         return vu_set_inflight_fd(dev, vmsg);
+    case VHOST_USER_VRING_KICK:
+        return vu_handle_vring_kick(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
@@ -2010,6 +2040,21 @@ vu_queue_notify(VuDev *dev, VuVirtq *vq)
         return;
     }
 
+    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS) &&
+        vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_SLAVE_REQ)) {
+        VhostUserMsg vmsg = {
+            .request = VHOST_USER_SLAVE_VRING_CALL,
+            .flags = VHOST_USER_VERSION,
+            .size = sizeof(vmsg.payload.state),
+            .payload.state = {
+                .index = dev->vq - vq,
+            },
+        };
+
+        vu_message_write(dev, dev->slave_fd, &vmsg);
+        return;
+    }
+
     if (eventfd_write(vq->call_fd, 1) < 0) {
         vu_panic(dev, "Error writing eventfd: %s", strerror(errno));
     }
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 46b600799b2e..392dea306bb9 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -53,6 +53,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
     VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
     VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
+    VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS = 13,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -94,6 +95,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_GET_INFLIGHT_FD = 31,
     VHOST_USER_SET_INFLIGHT_FD = 32,
     VHOST_USER_GPU_SET_SOCKET = 33,
+    VHOST_USER_VRING_KICK = 34,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -102,6 +104,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
     VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
+    VHOST_USER_SLAVE_VRING_CALL = 4,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
-- 
2.23.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-06 12:13 ` [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS Johannes Berg
@ 2019-09-06 14:22   ` Michael S. Tsirkin
  2019-09-06 14:48     ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-06 14:22 UTC (permalink / raw)
  To: Johannes Berg; +Cc: qemu-devel, Johannes Berg

On Fri, Sep 06, 2019 at 03:13:50PM +0300, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

a bit more content here about the motivation for this?

> ---
>  contrib/libvhost-user/libvhost-user.c | 61 +++++++++++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h |  3 ++
>  2 files changed, 56 insertions(+), 8 deletions(-)
> 
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index fba291c13db4..550f6416a211 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -136,6 +136,7 @@ vu_request_to_string(unsigned int req)
>          REQ(VHOST_USER_GET_INFLIGHT_FD),
>          REQ(VHOST_USER_SET_INFLIGHT_FD),
>          REQ(VHOST_USER_GPU_SET_SOCKET),
> +        REQ(VHOST_USER_VRING_KICK),
>          REQ(VHOST_USER_MAX),
>      };
>  #undef REQ
> @@ -920,6 +921,7 @@ static bool
>  vu_check_queue_msg_file(VuDev *dev, VhostUserMsg *vmsg)
>  {
>      int index = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
> +    bool nofd = vmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK;
>  
>      if (index >= dev->max_queues) {
>          vmsg_close_fds(vmsg);
> @@ -927,8 +929,12 @@ vu_check_queue_msg_file(VuDev *dev, VhostUserMsg *vmsg)
>          return false;
>      }
>  
> -    if (vmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK ||
> -        vmsg->fd_num != 1) {
> +    if (nofd) {
> +        vmsg_close_fds(vmsg);
> +        return true;
> +    }
> +
> +    if (vmsg->fd_num != 1) {
>          vmsg_close_fds(vmsg);
>          vu_panic(dev, "Invalid fds in request: %d", vmsg->request);
>          return false;
> @@ -1025,6 +1031,7 @@ static bool
>  vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
>  {
>      int index = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
> +    bool nofd = vmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK;
>  
>      DPRINT("u64: 0x%016"PRIx64"\n", vmsg->payload.u64);
>  
> @@ -1038,8 +1045,8 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
>          dev->vq[index].kick_fd = -1;
>      }
>  
> -    dev->vq[index].kick_fd = vmsg->fds[0];
> -    DPRINT("Got kick_fd: %d for vq: %d\n", vmsg->fds[0], index);
> +    dev->vq[index].kick_fd = nofd ? -1 : vmsg->fds[0];
> +    DPRINT("Got kick_fd: %d for vq: %d\n", dev->vq[index].kick_fd, index);
>  
>      dev->vq[index].started = true;
>      if (dev->iface->queue_set_started) {
> @@ -1116,6 +1123,7 @@ static bool
>  vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
>  {
>      int index = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
> +    bool nofd = VHOST_USER_VRING_NOFD_MASK;
>  
>      DPRINT("u64: 0x%016"PRIx64"\n", vmsg->payload.u64);
>  
> @@ -1128,14 +1136,14 @@ vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
>          dev->vq[index].call_fd = -1;
>      }
>  
> -    dev->vq[index].call_fd = vmsg->fds[0];
> +    dev->vq[index].call_fd = nofd ? -1 : vmsg->fds[0];
>  
>      /* in case of I/O hang after reconnecting */
> -    if (eventfd_write(vmsg->fds[0], 1)) {
> +    if (dev->vq[index].call_fd != -1 && eventfd_write(vmsg->fds[0], 1)) {
>          return -1;
>      }
>  
> -    DPRINT("Got call_fd: %d for vq: %d\n", vmsg->fds[0], index);
> +    DPRINT("Got call_fd: %d for vq: %d\n", dev->vq[index].call_fd, index);
>  
>      return false;
>  }
> @@ -1169,7 +1177,8 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
>                          1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ |
>                          1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER |
>                          1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD |
> -                        1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK;
> +                        1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK |
> +                        1ULL << VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS;
>  
>      if (have_userfault()) {
>          features |= 1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT;
> @@ -1456,6 +1465,25 @@ vu_set_inflight_fd(VuDev *dev, VhostUserMsg *vmsg)
>      return false;
>  }
>  
> +static bool
> +vu_handle_vring_kick(VuDev *dev, VhostUserMsg *vmsg)
> +{
> +    unsigned int index = vmsg->payload.state.index;
> +
> +    if (index >= dev->max_queues) {
> +        vu_panic(dev, "Invalid queue index: %u", index);
> +        return false;
> +    }
> +
> +    DPRINT("Got kick message: handler:%p idx:%d\n",
> +	   dev->vq[index].handler, index);
> +    if (dev->vq[index].handler) {
> +        dev->vq[index].handler(dev, index);
> +    }
> +
> +    return false;
> +}
> +
>  static bool
>  vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
>  {
> @@ -1538,6 +1566,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
>          return vu_get_inflight_fd(dev, vmsg);
>      case VHOST_USER_SET_INFLIGHT_FD:
>          return vu_set_inflight_fd(dev, vmsg);
> +    case VHOST_USER_VRING_KICK:
> +        return vu_handle_vring_kick(dev, vmsg);
>      default:
>          vmsg_close_fds(vmsg);
>          vu_panic(dev, "Unhandled request: %d", vmsg->request);
> @@ -2010,6 +2040,21 @@ vu_queue_notify(VuDev *dev, VuVirtq *vq)
>          return;
>      }
>  
> +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS) &&
> +        vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_SLAVE_REQ)) {
> +        VhostUserMsg vmsg = {
> +            .request = VHOST_USER_SLAVE_VRING_CALL,
> +            .flags = VHOST_USER_VERSION,
> +            .size = sizeof(vmsg.payload.state),
> +            .payload.state = {
> +                .index = dev->vq - vq,
> +            },
> +        };
> +
> +        vu_message_write(dev, dev->slave_fd, &vmsg);
> +        return;
> +    }
> +
>      if (eventfd_write(vq->call_fd, 1) < 0) {
>          vu_panic(dev, "Error writing eventfd: %s", strerror(errno));
>      }
> diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
> index 46b600799b2e..392dea306bb9 100644
> --- a/contrib/libvhost-user/libvhost-user.h
> +++ b/contrib/libvhost-user/libvhost-user.h
> @@ -53,6 +53,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
>      VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
>      VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
> +    VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS = 13,
>  
>      VHOST_USER_PROTOCOL_F_MAX
>  };
> @@ -94,6 +95,7 @@ typedef enum VhostUserRequest {
>      VHOST_USER_GET_INFLIGHT_FD = 31,
>      VHOST_USER_SET_INFLIGHT_FD = 32,
>      VHOST_USER_GPU_SET_SOCKET = 33,
> +    VHOST_USER_VRING_KICK = 34,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
> @@ -102,6 +104,7 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
>      VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> +    VHOST_USER_SLAVE_VRING_CALL = 4,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> -- 
> 2.23.0
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-06 14:22   ` Michael S. Tsirkin
@ 2019-09-06 14:48     ` Johannes Berg
  2019-09-06 15:12       ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-06 14:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Hi,

On Fri, 2019-09-06 at 10:22 -0400, Michael S. Tsirkin wrote:
> On Fri, Sep 06, 2019 at 03:13:50PM +0300, Johannes Berg wrote:
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> 
> a bit more content here about the motivation for this?

Heh, right, definitely needed.

I was just sending it out as the corresponding patch to the spec change
RFC, where I explained more, so didn't really bother here yet. However,
I evidently forgot to CC you on that:

https://lore.kernel.org/qemu-devel/20190902121233.13382-1-johannes@sipsolutions.net/

I'm still trying to implement it in User-Mode Linux (UML, ARCH=um),
we've submitted patches for virtio/vhost-user support to that, but the
simulation-bound IRQ handling is a bit complicated. I need to see how it
turns out once I actually get it to work - I've gotten this extension,
SLAVE_REQ and REPLY_ACK to work now, so need to "just" integrate with
the time-travel mode I already have.

In any case, if you think that this is a stupid extension and say you
will never accept it, I'll probably just implement a slightly more
hackish way, setting vhost-user to polling mode and using out-of-band
signalling or so. This seems a bit cleaner though, and if it's properly
spec'ed and with sample code and all then it'll possibly be far more
useful to others. (**)



I think I also forgot to CC you on these two:
https://lore.kernel.org/qemu-devel/20190828083401.2342-1-johannes@sipsolutions.net/
https://lore.kernel.org/qemu-devel/20190903192505.10686-1-johannes@sipsolutions.net/

Again, sorry about that.

Btw, at least one of these files doesn't even have an entry in the
maintainers file. Don't remember if it was the spec though or the
libvhost-user stuff.


(**) For example, there's the VMSimInt paper (***) that shows a very
similar thing with QEMU, but works only with CPU emulation. With UML's
time-travel mode made to work over virtio we can do similar things
without CPU emulation. I suspect it's also possible to emulate the HPET
or so in a KVM-based system, but seems far more tricky (to me at least).

(***) http://www.ikr.uni-stuttgart.de/Content/Publications/Archive/We_SIMUTools_2014_40209.pdf

Thanks,
johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-06 14:48     ` Johannes Berg
@ 2019-09-06 15:12       ` Michael S. Tsirkin
  2019-09-06 15:32         ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-06 15:12 UTC (permalink / raw)
  To: Johannes Berg; +Cc: qemu-devel

On Fri, Sep 06, 2019 at 04:48:39PM +0200, Johannes Berg wrote:
> Hi,
> 
> On Fri, 2019-09-06 at 10:22 -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 06, 2019 at 03:13:50PM +0300, Johannes Berg wrote:
> > > From: Johannes Berg <johannes.berg@intel.com>
> > > 
> > > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > 
> > a bit more content here about the motivation for this?
> 
> Heh, right, definitely needed.
> 
> I was just sending it out as the corresponding patch to the spec change
> RFC, where I explained more, so didn't really bother here yet. However,
> I evidently forgot to CC you on that:
> 
> https://lore.kernel.org/qemu-devel/20190902121233.13382-1-johannes@sipsolutions.net/


Oh. Apparently qemu mailman chose this time to kick me out
of list subscription (too many bounces or something?)
so I didn't see it.

> I'm still trying to implement it in User-Mode Linux (UML, ARCH=um),
> we've submitted patches for virtio/vhost-user support to that, but the
> simulation-bound IRQ handling is a bit complicated. I need to see how it
> turns out once I actually get it to work - I've gotten this extension,
> SLAVE_REQ and REPLY_ACK to work now, so need to "just" integrate with
> the time-travel mode I already have.
> 
> In any case, if you think that this is a stupid extension and say you
> will never accept it, I'll probably just implement a slightly more
> hackish way, setting vhost-user to polling mode and using out-of-band
> signalling or so. This seems a bit cleaner though, and if it's properly
> spec'ed and with sample code and all then it'll possibly be far more
> useful to others. (**)


What worries me is the load this places on the socket.
ATM if socket buffer is full qemu locks up, so we
need to be careful not to send too many messages.

> 
> 
> I think I also forgot to CC you on these two:
> https://lore.kernel.org/qemu-devel/20190828083401.2342-1-johannes@sipsolutions.net/
> https://lore.kernel.org/qemu-devel/20190903192505.10686-1-johannes@sipsolutions.net/
> 
> Again, sorry about that.
> 
> Btw, at least one of these files doesn't even have an entry in the
> maintainers file. Don't remember if it was the spec though or the
> libvhost-user stuff.
> 
> 
> (**) For example, there's the VMSimInt paper (***) that shows a very
> similar thing with QEMU, but works only with CPU emulation. With UML's
> time-travel mode made to work over virtio we can do similar things
> without CPU emulation. I suspect it's also possible to emulate the HPET
> or so in a KVM-based system, but seems far more tricky (to me at least).
> 
> (***) http://www.ikr.uni-stuttgart.de/Content/Publications/Archive/We_SIMUTools_2014_40209.pdf
> 
> Thanks,
> johannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-06 15:12       ` Michael S. Tsirkin
@ 2019-09-06 15:32         ` Johannes Berg
  2019-09-08 13:13           ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-06 15:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Hi,

> Oh. Apparently qemu mailman chose this time to kick me out
> of list subscription (too many bounces or something?)
> so I didn't see it.

D'oh. Well, it's really my mistake, I should've CC'ed you.

> What worries me is the load this places on the socket.
> ATM if socket buffer is full qemu locks up, so we
> need to be careful not to send too many messages.

Right, sure. I really don't think you ever want to use this extension in
a "normal VM" use case. :-)

I think the only use for this extension would be for simulation
purposes, and even then only combined with the REPLY_ACK and SLAVE_REQ
extensions, i.e. you explicitly *want* your virtual machine to lock up /
wait for a response to the KICK command (and respectively, the device to
wait for a response to the CALL command).

Note that this is basically its sole purpose: ensuring exactly this
synchronisation! Yes, it's bad for speed, but it's needed in simulation
when time isn't "real".

Let me try to explain again, most likely my previous explanation was too
long winded. WLOG, I'll focus on the "kick" use case, the "call" is the
same, just the other way around. I'm sure you know that the call is
asynchronous, i.e. the VM will increment the eventfd counter, and
"eventually" it becomes readable to the device. Now the device does
something (as fast as it can, presumably) and returns the buffer to the
VM.

Now, imagine you're running in simulation time, i.e. "time travel" mode.
Briefly, this hacks the idle loop of the (UML) VM to just skip forward
when there's nothing to do, i.e. if you have a timer firing in 100ms and
get to idle, time is immediately incremented by 100ms and the timer
fires. For a single VM/device this is already implemented in UML, and
while it's already very useful that's only half the story to me.

Once you have multiple devices and/or VMs, you basically have to keep a
"simulation calendar" where each participant (VM/device) can put an
entry, and then whenever they become idle they don't immediately move
time forward, but instead ask the calendar what's next, and the calendar
determines who runs.

Now, for these simulation cases, consider vhost-user again. It's
absolutely necessary that the calendar is updated all the time, and the
asynchronous nature of the call breaks that - the device cannot update
the calendar to put an event there to process the call message.

With this extension, the device would work in the following way. Assume
that the device is idle, and waiting for the simulation calendar to tell
it to run. Now,

 1) it has an incoming call (message) from VM (which waits for reply)
 2) the device will now put a new event on the simulation scheduler for
    a time slot to process the message
 3) return reply to VM
 4) device goes back to sleep - this stuff was asynchronously handled
    outside of the simulation basically.

In a sense, the code that just ran isn't considered part of the
simulated device, it's just the transport protocol and part of the
simulation environment.

At this point, the device is still waiting for its calendar event to be
triggered, but now it has a new one to process the message. Now, once
the VM goes to sleep, the scheduler will check the calendar and
presumably tell the device to run, which runs and processes the message.
This repeats for as long as the simulation runs, going both ways (or
multiple ways if there are more than 2 participants).


Now, what if you didn't have this synchronisation, ie. we don't have
this extension or we don't have REPLY_ACK or whatnot?

In that case, after the step 1 above, the VM will immediately continue
running. Let's say it'll wait for a response from the device for a few
hundred milliseconds (of now simulated time). However, depending on the
scheduling, the device has quite likely not yet put the new event on the
simulation calendar (that happens in step 2 above). This means that the
VM's calendar event to wake it up after a few hundred milliseconds will
immediately trigger, and the simulation ends with the driver getting a
timeout from the device.


So - yes, while I understand your concern, I basically think this is not
something anyone will want to use outside of such simulations. OTOH,
there are various use cases (I'm doing device simulation, others are
doing network simulation) that use such a behaviour, and it might be
nice to support it in a more standard way, rather than everyone having
their own local hacks for everything, like e.g. the VMSimInt paper(**).


But again, like I said, no hard feelings if you think such simulation
has no place in upstream vhost-user.


(**) I put a copy of their qemu changes on top of 1.6.0 here:
     https://p.sipsolutions.net/af9a68ded948c07e.txt

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-06 15:32         ` Johannes Berg
@ 2019-09-08 13:13           ` Michael S. Tsirkin
  2019-09-09 11:35             ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-08 13:13 UTC (permalink / raw)
  To: Johannes Berg; +Cc: qemu-devel

On Fri, Sep 06, 2019 at 05:32:02PM +0200, Johannes Berg wrote:
> Hi,
> 
> > Oh. Apparently qemu mailman chose this time to kick me out
> > of list subscription (too many bounces or something?)
> > so I didn't see it.
> 
> D'oh. Well, it's really my mistake, I should've CC'ed you.
> 
> > What worries me is the load this places on the socket.
> > ATM if socket buffer is full qemu locks up, so we
> > need to be careful not to send too many messages.
> 
> Right, sure. I really don't think you ever want to use this extension in
> a "normal VM" use case. :-)
> 
> I think the only use for this extension would be for simulation
> purposes, and even then only combined with the REPLY_ACK and SLAVE_REQ
> extensions, i.e. you explicitly *want* your virtual machine to lock up /
> wait for a response to the KICK command (and respectively, the device to
> wait for a response to the CALL command).

OK so when combined with these, it's OK I think.
Do we want to force this restriction in code maybe then?


> Note that this is basically its sole purpose: ensuring exactly this
> synchronisation! Yes, it's bad for speed, but it's needed in simulation
> when time isn't "real".
> 
> Let me try to explain again, most likely my previous explanation was too
> long winded. WLOG, I'll focus on the "kick" use case, the "call" is the
> same, just the other way around. I'm sure you know that the call is
> asynchronous, i.e. the VM will increment the eventfd counter, and
> "eventually" it becomes readable to the device. Now the device does
> something (as fast as it can, presumably) and returns the buffer to the
> VM.
> 
> Now, imagine you're running in simulation time, i.e. "time travel" mode.
> Briefly, this hacks the idle loop of the (UML) VM to just skip forward
> when there's nothing to do, i.e. if you have a timer firing in 100ms and
> get to idle, time is immediately incremented by 100ms and the timer
> fires. For a single VM/device this is already implemented in UML, and
> while it's already very useful that's only half the story to me.
> 
> Once you have multiple devices and/or VMs, you basically have to keep a
> "simulation calendar" where each participant (VM/device) can put an
> entry, and then whenever they become idle they don't immediately move
> time forward, but instead ask the calendar what's next, and the calendar
> determines who runs.
> 
> Now, for these simulation cases, consider vhost-user again. It's
> absolutely necessary that the calendar is updated all the time, and the
> asynchronous nature of the call breaks that - the device cannot update
> the calendar to put an event there to process the call message.
> 
> With this extension, the device would work in the following way. Assume
> that the device is idle, and waiting for the simulation calendar to tell
> it to run. Now,
> 
>  1) it has an incoming call (message) from VM (which waits for reply)
>  2) the device will now put a new event on the simulation scheduler for
>     a time slot to process the message
>  3) return reply to VM
>  4) device goes back to sleep - this stuff was asynchronously handled
>     outside of the simulation basically.
> 
> In a sense, the code that just ran isn't considered part of the
> simulated device, it's just the transport protocol and part of the
> simulation environment.
> 
> At this point, the device is still waiting for its calendar event to be
> triggered, but now it has a new one to process the message. Now, once
> the VM goes to sleep, the scheduler will check the calendar and
> presumably tell the device to run, which runs and processes the message.
> This repeats for as long as the simulation runs, going both ways (or
> multiple ways if there are more than 2 participants).
> 
> 
> Now, what if you didn't have this synchronisation, ie. we don't have
> this extension or we don't have REPLY_ACK or whatnot?
> 
> In that case, after the step 1 above, the VM will immediately continue
> running. Let's say it'll wait for a response from the device for a few
> hundred milliseconds (of now simulated time). However, depending on the
> scheduling, the device has quite likely not yet put the new event on the
> simulation calendar (that happens in step 2 above). This means that the
> VM's calendar event to wake it up after a few hundred milliseconds will
> immediately trigger, and the simulation ends with the driver getting a
> timeout from the device.
> 
> 
> So - yes, while I understand your concern, I basically think this is not
> something anyone will want to use outside of such simulations. OTOH,
> there are various use cases (I'm doing device simulation, others are
> doing network simulation) that use such a behaviour, and it might be
> nice to support it in a more standard way, rather than everyone having
> their own local hacks for everything, like e.g. the VMSimInt paper(**).
> 
> 
> But again, like I said, no hard feelings if you think such simulation
> has no place in upstream vhost-user.
> 
> 
> (**) I put a copy of their qemu changes on top of 1.6.0 here:
>      https://p.sipsolutions.net/af9a68ded948c07e.txt
> 
> johannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-08 13:13           ` Michael S. Tsirkin
@ 2019-09-09 11:35             ` Johannes Berg
  2019-09-09 12:41               ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-09 11:35 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Sun, 2019-09-08 at 09:13 -0400, Michael S. Tsirkin wrote:

> > I think the only use for this extension would be for simulation
> > purposes, and even then only combined with the REPLY_ACK and SLAVE_REQ
> > extensions, i.e. you explicitly *want* your virtual machine to lock up /
> > wait for a response to the KICK command (and respectively, the device to
> > wait for a response to the CALL command).
> 
> OK so when combined with these, it's OK I think.

OK.

> Do we want to force this restriction in code maybe then?

Unlike in this patch, I was planning to not actually advertise
VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS, and do that only if the user of
the library wants to advertise it, since devices used for simulation
also have to be careful about how they use this.

However, if I understand correctly we cannot enforce that all of them
are used at the same time - we're the device side, so we only advertise
our features and the master actually sets the ones it wants to use, no?

The only thing we could do is crash if it wants to use this feature
without the others, but would that really be helpful?

I'm starting to actually get this working, so I'll post new patches in a
few days or so.

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 11:35             ` Johannes Berg
@ 2019-09-09 12:41               ` Michael S. Tsirkin
  2019-09-09 13:05                 ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-09 12:41 UTC (permalink / raw)
  To: Johannes Berg; +Cc: qemu-devel

On Mon, Sep 09, 2019 at 01:35:19PM +0200, Johannes Berg wrote:
> On Sun, 2019-09-08 at 09:13 -0400, Michael S. Tsirkin wrote:
> 
> > > I think the only use for this extension would be for simulation
> > > purposes, and even then only combined with the REPLY_ACK and SLAVE_REQ
> > > extensions, i.e. you explicitly *want* your virtual machine to lock up /
> > > wait for a response to the KICK command (and respectively, the device to
> > > wait for a response to the CALL command).
> > 
> > OK so when combined with these, it's OK I think.
> 
> OK.
> 
> > Do we want to force this restriction in code maybe then?
> 
> Unlike in this patch, I was planning to not actually advertise
> VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS, and do that only if the user of
> the library wants to advertise it, since devices used for simulation
> also have to be careful about how they use this.
> 
> However, if I understand correctly we cannot enforce that all of them
> are used at the same time - we're the device side, so we only advertise
> our features and the master actually sets the ones it wants to use, no?
> 
> The only thing we could do is crash if it wants to use this feature
> without the others, but would that really be helpful?

We can return failure from SET_PROTOCOL_FEATURES.
We can also fail all following messages.


> I'm starting to actually get this working, so I'll post new patches in a
> few days or so.
> 
> johannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 12:41               ` Michael S. Tsirkin
@ 2019-09-09 13:05                 ` Johannes Berg
  2019-09-09 13:48                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-09 13:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, 2019-09-09 at 08:41 -0400, Michael S. Tsirkin wrote:

> > The only thing we could do is crash if it wants to use this feature
> > without the others, but would that really be helpful?
> 
> We can return failure from SET_PROTOCOL_FEATURES.
> We can also fail all following messages.

Only if we have F_REPLY_ACK, I think?

I suppose we could fail a later command that already needs a reply
without REPLY_ACK, but that seems difficult to debug?

Anyway, if you feel that we should have this as some sort of safeguard I
can try to do that; to me it feels rather pointless as libvhost-user is
more of a sample implementation than anything else.

Unless you also wanted to write into the spec that F_KICK_CALL_MSGS
absolutely *requires* F_REPLY_ACK, and add some text that tells a device
how to behave when F_KICK_CALL_MSGS is negotiated without F_REPLY_ACK?

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 13:05                 ` Johannes Berg
@ 2019-09-09 13:48                   ` Michael S. Tsirkin
  2019-09-09 13:50                     ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-09 13:48 UTC (permalink / raw)
  To: Johannes Berg; +Cc: qemu-devel

On Mon, Sep 09, 2019 at 03:05:08PM +0200, Johannes Berg wrote:
> On Mon, 2019-09-09 at 08:41 -0400, Michael S. Tsirkin wrote:
> 
> > > The only thing we could do is crash if it wants to use this feature
> > > without the others, but would that really be helpful?
> > 
> > We can return failure from SET_PROTOCOL_FEATURES.
> > We can also fail all following messages.
> 
> Only if we have F_REPLY_ACK, I think?
> 
> I suppose we could fail a later command that already needs a reply
> without REPLY_ACK, but that seems difficult to debug?


Next command is GET_FEATURES. Return an error response from that
and device init will fail.

> Anyway, if you feel that we should have this as some sort of safeguard I
> can try to do that; to me it feels rather pointless as libvhost-user is
> more of a sample implementation than anything else.


Exactly for this reason :)

> Unless you also wanted to write into the spec that F_KICK_CALL_MSGS
> absolutely *requires* F_REPLY_ACK,

yep

> and add some text that tells a device
> how to behave when F_KICK_CALL_MSGS is negotiated without F_REPLY_ACK?
> 
> johannes

We can document how to behave in case of inconsistent protocol features,
yes.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 13:48                   ` Michael S. Tsirkin
@ 2019-09-09 13:50                     ` Johannes Berg
  2019-09-09 14:59                       ` Michael S. Tsirkin
  2019-09-10 15:52                       ` Johannes Berg
  0 siblings, 2 replies; 38+ messages in thread
From: Johannes Berg @ 2019-09-09 13:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, 2019-09-09 at 09:48 -0400, Michael S. Tsirkin wrote:

> > I suppose we could fail a later command that already needs a reply
> > without REPLY_ACK, but that seems difficult to debug?
> 
> Next command is GET_FEATURES. Return an error response from that
> and device init will fail.

Hmm, what's an error here though? You can only return a value, no?

> > Anyway, if you feel that we should have this as some sort of safeguard I
> > can try to do that; to me it feels rather pointless as libvhost-user is
> > more of a sample implementation than anything else.
> 
> Exactly for this reason :)

:)

> > Unless you also wanted to write into the spec that F_KICK_CALL_MSGS
> > absolutely *requires* F_REPLY_ACK,
> 
> yep

Sure, I'm fine with that.

> We can document how to behave in case of inconsistent protocol features,
> yes.

OK.

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 13:50                     ` Johannes Berg
@ 2019-09-09 14:59                       ` Michael S. Tsirkin
  2019-09-09 15:26                         ` Johannes Berg
  2019-09-10 15:52                       ` Johannes Berg
  1 sibling, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-09 14:59 UTC (permalink / raw)
  To: Johannes Berg; +Cc: qemu-devel

On Mon, Sep 09, 2019 at 03:50:48PM +0200, Johannes Berg wrote:
> On Mon, 2019-09-09 at 09:48 -0400, Michael S. Tsirkin wrote:
> 
> > > I suppose we could fail a later command that already needs a reply
> > > without REPLY_ACK, but that seems difficult to debug?
> > 
> > Next command is GET_FEATURES. Return an error response from that
> > and device init will fail.
> 
> Hmm, what's an error here though? You can only return a value, no?

Either returning id that does not match GET_FEATURES or
returning size != 8 bytes will work.

Using 0 size payload has precedent in VHOST_USER_GET_CONFIG:
	vhost-user slave uses zero length of
	  payload to indicate an error to vhost-user master.

It's annoying that we don't get an error indicator in that case though.
Worth returning e.g. a 4 byte code?

And let's generalize for all other messages with response?

> > > Anyway, if you feel that we should have this as some sort of safeguard I
> > > can try to do that; to me it feels rather pointless as libvhost-user is
> > > more of a sample implementation than anything else.
> > 
> > Exactly for this reason :)
> 
> :)
> 
> > > Unless you also wanted to write into the spec that F_KICK_CALL_MSGS
> > > absolutely *requires* F_REPLY_ACK,
> > 
> > yep
> 
> Sure, I'm fine with that.
> 
> > We can document how to behave in case of inconsistent protocol features,
> > yes.
> 
> OK.
> 
> johannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 14:59                       ` Michael S. Tsirkin
@ 2019-09-09 15:26                         ` Johannes Berg
  2019-09-09 15:34                           ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-09 15:26 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, 2019-09-09 at 10:59 -0400, Michael S. Tsirkin wrote:

> > > Next command is GET_FEATURES. Return an error response from that
> > > and device init will fail.
> > 
> > Hmm, what's an error here though? You can only return a value, no?
> 
> Either returning id that does not match GET_FEATURES or
> returning size != 8 bytes will work.

Hmm, right.

> Using 0 size payload has precedent in VHOST_USER_GET_CONFIG:
> 	vhost-user slave uses zero length of
> 	  payload to indicate an error to vhost-user master.

OK, I think zero-len makes sense, that's certainly something that must
be checked by the receiver already.

> It's annoying that we don't get an error indicator in that case though.
> Worth returning e.g. a 4 byte code?

That also would need to be checked by the receiver, but

1) Then we have to start defining an error code enum, as there's no
   natural space. Is that worth it for what's basically a corner case?
2) It'd also be annoying that we now have a 4-byte error code, but with
   F_REPLY_ACK / need_reply you get an 8-byte status code, which would
   be incompatible in a way.

> And let's generalize for all other messages with response?

We could theoretically have a message in the future that wants a 4-byte
response, though by convention basically everything uses u64 anyway ...

OTOH, 0-byte as error doesn't generalize, VHOST_USER_POSTCOPY_ADVISE has
only a file descriptor as slave payload AFAICT... But then possibly in
those cases the master also wouldn't check the response size at all,
since it never uses it. Qemu does though, and is probably the currently
relevant implementation? Our UML implementation doesn't use this.


Maybe instead we should just add a "VHOST_USER_REPLY_ERROR" bit (e.g.
bit 4 after NEED_REPLY). Qemu in vhost_user_read_header() validates that
it received REPLY_MASK | VERSION, so it would reject the message at that
point.

Another possibility would be to define the highest bit of the 'request'
field to indicate an error, so for GET_FEATURES we'd return the value
0x80000000 | GET_FEATURES.

For even more safety we could make a 4-byte response which is never
valid for anything right now, but it feels overly cautious given that we
currently do check both the request and flag fields. If you think a
response status is needed and want to define an enum with possible
values, then I'd probably prefer an 8-byte status since that's the way
everything works now, but I don't really care much either way.

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 15:26                         ` Johannes Berg
@ 2019-09-09 15:34                           ` Johannes Berg
  2019-09-09 15:45                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-09 15:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, 2019-09-09 at 17:26 +0200, Johannes Berg wrote:
> 
> Maybe instead we should just add a "VHOST_USER_REPLY_ERROR" bit (e.g.
> bit 4 after NEED_REPLY). Qemu in vhost_user_read_header() validates that
> it received REPLY_MASK | VERSION, so it would reject the message at that
> point.
> 
> Another possibility would be to define the highest bit of the 'request'
> field to indicate an error, so for GET_FEATURES we'd return the value
> 0x80000000 | GET_FEATURES.

However, one way or another, that basically leaves us with three
different ways of indicating an error:

 1) already defined errors in existing messages - we can't change them
    since those are handled at runtime now, e.g. VHOST_USER_POSTCOPY_END
    returns a u64 value with an error status, and current code cannot
    deal with an error flag in the 'request' or 'flags' field
 2) F_REPLY_ACK errors to messages that do not specify a response at all
 3) this new way of indicating an error back from messages that specify
    a response, but the response has no inherent way of returning an
    error

To me that really feels a bit too complex from the spec POV. But I don't
see a way to generalize this without another extension, and again the
device cannot choose which extensions it supports since the master
chooses them and just sets them.

Perhaps I really should just stick a "g_assert()" into the code at that
point, and have it crash, since it's likely that F_KICK_CALL_MSGS isn't
even going to be implemented in qemu (unless it grows simulation support
and then it'd all be conditional on some simulation command-line option)



And actually ... you got the order wrong:

> > Next command is GET_FEATURES. Return an error response from that
> > and device init will fail.

That's not the case. We *start* with GET_FEATURES, if that includes
protocol features then we do GET_PROTOCOL_FEATURES next, and then we get
the # of queues next ...

Though the whole discussion pretty much applies equivalently to
GET_QUEUES_NUM instead of GET_FEATURES.

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 15:34                           ` Johannes Berg
@ 2019-09-09 15:45                             ` Michael S. Tsirkin
  2019-09-09 15:47                               ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-09 15:45 UTC (permalink / raw)
  To: Johannes Berg; +Cc: qemu-devel

On Mon, Sep 09, 2019 at 05:34:13PM +0200, Johannes Berg wrote:
> On Mon, 2019-09-09 at 17:26 +0200, Johannes Berg wrote:
> > 
> > Maybe instead we should just add a "VHOST_USER_REPLY_ERROR" bit (e.g.
> > bit 4 after NEED_REPLY). Qemu in vhost_user_read_header() validates that
> > it received REPLY_MASK | VERSION, so it would reject the message at that
> > point.
> > 
> > Another possibility would be to define the highest bit of the 'request'
> > field to indicate an error, so for GET_FEATURES we'd return the value
> > 0x80000000 | GET_FEATURES.
> 
> However, one way or another, that basically leaves us with three
> different ways of indicating an error:
> 
>  1) already defined errors in existing messages - we can't change them
>     since those are handled at runtime now, e.g. VHOST_USER_POSTCOPY_END
>     returns a u64 value with an error status, and current code cannot
>     deal with an error flag in the 'request' or 'flags' field
>  2) F_REPLY_ACK errors to messages that do not specify a response at all
>  3) this new way of indicating an error back from messages that specify
>     a response, but the response has no inherent way of returning an
>     error
> 
> To me that really feels a bit too complex from the spec POV. But I don't
> see a way to generalize this without another extension, and again the
> device cannot choose which extensions it supports since the master
> chooses them and just sets them.
> 
> Perhaps I really should just stick a "g_assert()" into the code at that
> point,

There's the old way: close the socket.
This will make reads fail gracefully.
If we don't want complexity right now, I'd go with that.


> and have it crash, since it's likely that F_KICK_CALL_MSGS isn't
> even going to be implemented in qemu (unless it grows simulation support
> and then it'd all be conditional on some simulation command-line option)
> 
> 
> 
> And actually ... you got the order wrong:
> 
> > > Next command is GET_FEATURES. Return an error response from that
> > > and device init will fail.
> 
> That's not the case. We *start* with GET_FEATURES, if that includes
> protocol features then we do GET_PROTOCOL_FEATURES next, and then we get
> the # of queues next ...
> 
> Though the whole discussion pretty much applies equivalently to
> GET_QUEUES_NUM instead of GET_FEATURES.
> 
> johannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 15:45                             ` Michael S. Tsirkin
@ 2019-09-09 15:47                               ` Johannes Berg
  0 siblings, 0 replies; 38+ messages in thread
From: Johannes Berg @ 2019-09-09 15:47 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, 2019-09-09 at 11:45 -0400, Michael S. Tsirkin wrote:
> On Mon, Sep 09, 2019 at 05:34:13PM +0200, Johannes Berg wrote:
> > On Mon, 2019-09-09 at 17:26 +0200, Johannes Berg wrote:
> > > Maybe instead we should just add a "VHOST_USER_REPLY_ERROR" bit (e.g.
> > > bit 4 after NEED_REPLY). Qemu in vhost_user_read_header() validates that
> > > it received REPLY_MASK | VERSION, so it would reject the message at that
> > > point.
> > > 
> > > Another possibility would be to define the highest bit of the 'request'
> > > field to indicate an error, so for GET_FEATURES we'd return the value
> > > 0x80000000 | GET_FEATURES.
> > 
> > However, one way or another, that basically leaves us with three
> > different ways of indicating an error:
> > 
> >  1) already defined errors in existing messages - we can't change them
> >     since those are handled at runtime now, e.g. VHOST_USER_POSTCOPY_END
> >     returns a u64 value with an error status, and current code cannot
> >     deal with an error flag in the 'request' or 'flags' field
> >  2) F_REPLY_ACK errors to messages that do not specify a response at all
> >  3) this new way of indicating an error back from messages that specify
> >     a response, but the response has no inherent way of returning an
> >     error
> > 
> > To me that really feels a bit too complex from the spec POV. But I don't
> > see a way to generalize this without another extension, and again the
> > device cannot choose which extensions it supports since the master
> > chooses them and just sets them.
> > 
> > Perhaps I really should just stick a "g_assert()" into the code at that
> > point,
> 
> There's the old way: close the socket.
> This will make reads fail gracefully.
> If we don't want complexity right now, I'd go with that.

D'oh, good point. OK, I'll do that. Though it's almost equivalent in
libvhost-user to just asserting, since it's mostly set up to just handle
a single connection and then quit.

Alright, thanks. Like I said, I'll send some more patches all around
once I get it working, right now I'm crashing in some weird ways that I
need to debug :)

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-05 20:28   ` Johannes Berg
@ 2019-09-09 16:00     ` Stefan Hajnoczi
  2019-09-09 17:34       ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2019-09-09 16:00 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Marc-André Lureau, qemu-devel, mst

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

On Thu, Sep 05, 2019 at 10:28:33PM +0200, Johannes Berg wrote:
>  
> > +``VHOST_USER_VQ_CALL``
> > +  :id: 34
> > +  :equivalent ioctl: N/A
> > +  :slave payload: vring state description
> > +  :master payload: N/A
> 
> Oops. This message should be called VHOST_USER_VRING_KICK.
> 
> This file doesn't take about virtqueues, just vrings, and I inverted the
> call/kick.
> 
> [...]
>  
> > +``VHOST_USER_VQ_KICK``
> > +  :id: 4
> > +  :equivalent ioctl: N/A
> > +  :slave payload: vring state description
> > +  :master payload: N/A
> 
> Similarly, that should be called VHOST_USER_SLAVE_VRING_CALL.
> 
> Anyway, some comments would be appreciated. I'm working on an
> implementation now for my simulation environment, and I guess I can keep
> that private etc. but if there's interest I can submit an (optional)
> implementation of this for libvhost-user too, I think.

Is this really necessary?  Can the simulation interpose between the
call/kick eventfds in order to control when events happen?

  CPU --cpu_kickfd--> Simulation --vhost_kickfd--> vhost-user device

and:

  vhost-user device --vhost_callfd--> Simulation -->cpu_callfd-> CPU

The simluation controls when the CPU's kick is seen by the device and
also when the call is seen by the CPU.

I don't understand why new vhost-user protocol messages are required.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-09 16:00     ` Stefan Hajnoczi
@ 2019-09-09 17:34       ` Johannes Berg
  2019-09-10 15:03         ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-09 17:34 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Marc-André Lureau, qemu-devel, mst

On Mon, 2019-09-09 at 18:00 +0200, Stefan Hajnoczi wrote:

> Is this really necessary?  

Yes* :)

> Can the simulation interpose between the
> call/kick eventfds in order to control when events happen?
> 
>   CPU --cpu_kickfd--> Simulation --vhost_kickfd--> vhost-user device
> 
> and:
> 
>   vhost-user device --vhost_callfd--> Simulation -->cpu_callfd-> CPU
> 
> The simluation controls when the CPU's kick is seen by the device and
> also when the call is seen by the CPU.

The point isn't to let the simulation know about anything that happens.
The CPU and the device are *part* of the simulation.

> I don't understand why new vhost-user protocol messages are required.

I guess I haven't really explained it well then :-)

So let's say, WLOG, I have a simulated network and a bunch of Linux
machines that are running on simulation time. Today I can do that only
with user-mode Linux, but we'll see.

Now in order to run everything on simulation time, *everything* that
happens in the simulation needs to request a simulation calendar entry,
and gets told when that entry is scheduled.

So think, for example, you have

CPU ---[kick]---> device

Now, this is essentially triggering an interrupt in the device. However,
the simulation code has to ensure that the simulated device's interrupt
handling only happens at a scheduler entry. Fundamentally, the
simulation serializes all processing, contrary to what you want in a
real system.

Now, this means that the CPU (that's part of the simulation) has to
*wait* for the device to add an entry to the simulation calendar in
response to the kick... That means that it really has to look like

CPU               device                   calendar
     ---[kick]-->
                         ---[add entry]-->
                         <---[return]-----
     <-[return]--

so that the CPU won't get to idle or some other processing where it asks
the simulation calendar for its own next entry...

Yes, like I said before, I realize that all of this is completely
opposed to what you want in a real system, but then in a real system you
also have real timeouts, and don't just skip time forward when the
simulation calendar says so ...


* Now, of course I lied, this is software after all. The *concept* is
necessary, but it's not strictly necessary to do this in-band in the
vhost-user protocol.
We could do an out-of-band simulation socket for the kick signal and
just pretend we're using polling mode as far as the vhost-user protocol
is concerned, but it'd probably be harder to implement, and we couldn't
do it in a way that we could actually contribute anything upstream.
There are quite a few papers proposing such simulation systems, I only
found the VMSimInt one publishing their code, but even that is some
hacks on top of qemu 1.6.0...

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-09 17:34       ` Johannes Berg
@ 2019-09-10 15:03         ` Stefan Hajnoczi
  2019-09-10 15:14           ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2019-09-10 15:03 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Marc-André Lureau, qemu-devel, mst

[-- Attachment #1: Type: text/plain, Size: 3070 bytes --]

On Mon, Sep 09, 2019 at 07:34:24PM +0200, Johannes Berg wrote:
> On Mon, 2019-09-09 at 18:00 +0200, Stefan Hajnoczi wrote:
> 
> > Is this really necessary?  
> 
> Yes* :)
> 
> > Can the simulation interpose between the
> > call/kick eventfds in order to control when events happen?
> > 
> >   CPU --cpu_kickfd--> Simulation --vhost_kickfd--> vhost-user device
> > 
> > and:
> > 
> >   vhost-user device --vhost_callfd--> Simulation -->cpu_callfd-> CPU
> > 
> > The simluation controls when the CPU's kick is seen by the device and
> > also when the call is seen by the CPU.
> 
> The point isn't to let the simulation know about anything that happens.
> The CPU and the device are *part* of the simulation.

I was thinking more of letting the simulation decide when to signal the
second eventfd in each pair, but now I understand you are trying to make
everything synchronous and ioeventfd is async by definition.

> 
> > I don't understand why new vhost-user protocol messages are required.
> 
> I guess I haven't really explained it well then :-)
> 
> So let's say, WLOG, I have a simulated network and a bunch of Linux
> machines that are running on simulation time. Today I can do that only
> with user-mode Linux, but we'll see.
> 
> Now in order to run everything on simulation time, *everything* that
> happens in the simulation needs to request a simulation calendar entry,
> and gets told when that entry is scheduled.
> 
> So think, for example, you have
> 
> CPU ---[kick]---> device
> 
> Now, this is essentially triggering an interrupt in the device. However,
> the simulation code has to ensure that the simulated device's interrupt
> handling only happens at a scheduler entry. Fundamentally, the
> simulation serializes all processing, contrary to what you want in a
> real system.
> 
> Now, this means that the CPU (that's part of the simulation) has to
> *wait* for the device to add an entry to the simulation calendar in
> response to the kick... That means that it really has to look like
> 
> CPU               device                   calendar
>      ---[kick]-->
>                          ---[add entry]-->
>                          <---[return]-----

What are the semantics of returning from the calendar?  Does it mean
"it's now your turn to run?", "your entry has been added and you'll be
notified later when it's time to run?", or something else?

>      <-[return]--

What are the semantics of returning to the CPU?  Does it mean the
submitted I/O request is now complete?

> 
> so that the CPU won't get to idle or some other processing where it asks
> the simulation calendar for its own next entry...
>
> Yes, like I said before, I realize that all of this is completely
> opposed to what you want in a real system, but then in a real system you
> also have real timeouts, and don't just skip time forward when the
> simulation calendar says so ...

Sorry for these questions.  If this becomes tedious for you I will look
into the paper you linked.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-10 15:03         ` Stefan Hajnoczi
@ 2019-09-10 15:14           ` Johannes Berg
  2019-09-10 15:33             ` Michael S. Tsirkin
  2019-09-11  7:35             ` Stefan Hajnoczi
  0 siblings, 2 replies; 38+ messages in thread
From: Johannes Berg @ 2019-09-10 15:14 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Marc-André Lureau, qemu-devel, mst

On Tue, 2019-09-10 at 17:03 +0200, Stefan Hajnoczi wrote:
> 
> > Now, this means that the CPU (that's part of the simulation) has to
> > *wait* for the device to add an entry to the simulation calendar in
> > response to the kick... That means that it really has to look like
> > 
> > CPU               device                   calendar
> >      ---[kick]-->
> >                          ---[add entry]-->
> >                          <---[return]-----
> 
> What are the semantics of returning from the calendar?  Does it mean
> "it's now your turn to run?", "your entry has been added and you'll be
> notified later when it's time to run?", or something else?

The latter - the entry was added, and you'll be notified when it's time
to run; but we need to have that state on the calendar so the CPU won't
converse with the calendar before that state is committed.

> >      <-[return]--
> 
> What are the semantics of returning to the CPU?  Does it mean the
> submitted I/O request is now complete?

No, it just means that the interrupt was triggered; it will be handled
once the calendar decide that it's the device's turn to run its
interrupt processing event.

The I/O request completion will be done later in a very similar fashion
(just inverting the participants, and replacing "kick" by "call").

> Sorry for these questions.  If this becomes tedious for you I will look
> into the paper you linked.

No no, don't worry, it's fine. Also, the paper doesn't really concern
itself with issues such as this, they just assume everything is
synchronous and make their code that way - they have a simulated device
in qemu that they wrote themselves, and it just happens to be
synchronous by way of their implementation...

What I'm trying to do here now is not really replicate that paper or
anything, but - because I'm working on similar things and simulation -
get some pieces into the upstreams (qemu for vhost-user, Linux for user-
mode-Linux running in a simulation) so (a) I don't have to maintain it
out-of-tree, and (b) it's available for others for use without a monster
patch to an ancient version of the software stack :-)

Happy to answer any questions.

Aside from the interrupt processing semantics in UML which I'll work on
next I do have the whole simulation calendar working etc., i.e. I have
full UML "VM" controlled entirely by the simulation calendar, also over
virtio.



Btw, that reminds me ... there's yet another process to add a virtio
device ID. I'm currently defining a virtio simulation calendar protocol
like this:

+/*
+ * A message passed on either of the VQs (input and output).
+ */
+struct virtio_simtime_msg {
+       __u64 op;               /* see below */
+       __u64 time;             /* time in nanoseconds */
+};
+
+/**
+ * enum virtio_simtime_ops - Operation codes
+ */
+enum virtio_simtime_ops {
+       /**
+        * @VIRTIO_SIMTIME_REQUEST: request from device to run at the given time
+        */
+       VIRTIO_SIMTIME_REQUEST          = 1,
+
+       /**
+        * @VIRTIO_SIMTIME_WAIT: wait for the requested runtime, new requests
+        *      may be made while waiting (due to interrupts); the time field
+        *      is ignored
+        */
+       VIRTIO_SIMTIME_WAIT             = 2,
+
+       /**
+        * @VIRTIO_SIMTIME_GET: ask device to fill the buffer with the current
+        *      simulation time
+        */
+       VIRTIO_SIMTIME_GET              = 3,
+
+       /**
+        * @VIRTIO_SIMTIME_UPDATE: time update to/from the control device
+        */
+       VIRTIO_SIMTIME_UPDATE           = 4,
+
+       /**
+        * @VIRTIO_SIMTIME_RUN: run time request granted, current time is in
+        *      the time field
+        */
+       VIRTIO_SIMTIME_RUN              = 5,
+
+       /**
+        * @VIRTIO_SIMTIME_FREE_UNTIL: enable free-running until the given time,
+        *      this is a messag from the device telling the user that it can do
+        *      its own scheduling for anything before the given time
+        */
+       VIRTIO_SIMTIME_FREE_UNTIL       = 6,
+};


Above, I've basically only described _REQUEST, _WAIT and _RUN, I'm
pretty sure I need _GET for interrupt handling (or maybe more
efficiently a _REQUEST_NOW, rather than using _GET && _REQUEST); I think
I need _UPDATE before a "kick" to the device if I have _FREE_UNTIL which
is an optimization to not have to talk to the calendar all the time.


Is any of you familiar with the process of getting a virtio device ID
assigned, and if so, do you think it'd be feasible? Without that, it'd
probably be difficult to upstream the patch to support this protocol to
user-mode Linux.

Thanks,
johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-10 15:14           ` Johannes Berg
@ 2019-09-10 15:33             ` Michael S. Tsirkin
  2019-09-10 15:34               ` Johannes Berg
  2019-09-11  7:35             ` Stefan Hajnoczi
  1 sibling, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-10 15:33 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Marc-André Lureau, qemu-devel, Stefan Hajnoczi

On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> Is any of you familiar with the process of getting a virtio device ID
> assigned, and if so, do you think it'd be feasible? Without that, it'd
> probably be difficult to upstream the patch to support this protocol to
> user-mode Linux.

Sure, subscribe then send a patch to virtio-comment@lists.oasis-open.org

We do expect people to eventually get around to documenting the device
and upstreaming it though. If there's no plan to do it at all, you might
still be able to reuse the virtio code, in that case let's talk.

-- 
MST


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-10 15:33             ` Michael S. Tsirkin
@ 2019-09-10 15:34               ` Johannes Berg
  2019-09-11  6:56                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-10 15:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Marc-André Lureau, qemu-devel, Stefan Hajnoczi

On Tue, 2019-09-10 at 11:33 -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> > Is any of you familiar with the process of getting a virtio device ID
> > assigned, and if so, do you think it'd be feasible? Without that, it'd
> > probably be difficult to upstream the patch to support this protocol to
> > user-mode Linux.
> 
> Sure, subscribe then send a patch to virtio-comment@lists.oasis-open.org

Ok, great.

> We do expect people to eventually get around to documenting the device
> and upstreaming it though. If there's no plan to do it at all, you might
> still be able to reuse the virtio code, in that case let's talk.

Right, no, I do want to and am working on the code now, but it's a bit
of a chicken & egg - without an ID I can't really send any code upstream
:-)

I can accompany the request for a new ID with working patches.

What kind of documentation beyond the header file should be added, and
where?

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-09 13:50                     ` Johannes Berg
  2019-09-09 14:59                       ` Michael S. Tsirkin
@ 2019-09-10 15:52                       ` Johannes Berg
  2019-09-11  9:16                         ` Michael S. Tsirkin
  1 sibling, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-10 15:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, 2019-09-09 at 15:50 +0200, Johannes Berg wrote:

> > We can document how to behave in case of inconsistent protocol features,
> > yes.
> 
> OK.

Coming back to this, I was just looking at it.

How/where would you like to see this done?

There isn't really any section that lists and explains the various
protocol features, there's only a list. I could add a new section for
"Simulation" or something like that that explains it, but then most
people would probably skip that and not ever read the text about how you
shouldn't implement F_KICK_CALL_MSGS :-)

Any thoughts?

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-10 15:34               ` Johannes Berg
@ 2019-09-11  6:56                 ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2019-09-11  6:56 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Marc-André Lureau, qemu-devel, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]

On Tue, Sep 10, 2019 at 05:34:57PM +0200, Johannes Berg wrote:
> On Tue, 2019-09-10 at 11:33 -0400, Michael S. Tsirkin wrote:
> > On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> > > Is any of you familiar with the process of getting a virtio device ID
> > > assigned, and if so, do you think it'd be feasible? Without that, it'd
> > > probably be difficult to upstream the patch to support this protocol to
> > > user-mode Linux.
> > 
> > Sure, subscribe then send a patch to virtio-comment@lists.oasis-open.org
> 
> Ok, great.
> 
> > We do expect people to eventually get around to documenting the device
> > and upstreaming it though. If there's no plan to do it at all, you might
> > still be able to reuse the virtio code, in that case let's talk.
> 
> Right, no, I do want to and am working on the code now, but it's a bit
> of a chicken & egg - without an ID I can't really send any code upstream
> :-)
> 
> I can accompany the request for a new ID with working patches.
> 
> What kind of documentation beyond the header file should be added, and
> where?

You can reserve the device ID without any header files or documentation.
Just a patch like this one will suffice:

  https://github.com/oasis-tcs/virtio-spec/commit/9454b568c29baab7f3e4b1a384627d0061f71eba

I have checked that device ID 29 appears to be free so you could use it.

For the actual VIRTIO device specification, please follow the same
format as the other devices.  Here is the virtio-net section in the
VIRTIO spec:

  https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-1940001

It documents the virtqueues, configuration space layout, theory of
operation, and also includes normative statements that compliant drivers
and devices are expected to follow.

The goal of the spec is to provide all the information needed by driver
and device emulation authors to create an implementation from scratch
(without studying existing code in Linux, QEMU, etc).

The VIRTIO spec contains pseudo-C struct and constant definitions, but
not a real header file.  The header file for a Linux driver would live
in include/uapi/linux/virtio_foo.h (see existing devices for examples).
This would be part of your Linux patches and separate from the virtio
spec.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-10 15:14           ` Johannes Berg
  2019-09-10 15:33             ` Michael S. Tsirkin
@ 2019-09-11  7:35             ` Stefan Hajnoczi
  2019-09-11  8:26               ` Johannes Berg
  1 sibling, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2019-09-11  7:35 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Marc-André Lureau, qemu-devel, mst

[-- Attachment #1: Type: text/plain, Size: 3070 bytes --]

On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> On Tue, 2019-09-10 at 17:03 +0200, Stefan Hajnoczi wrote:
> > 
> > > Now, this means that the CPU (that's part of the simulation) has to
> > > *wait* for the device to add an entry to the simulation calendar in
> > > response to the kick... That means that it really has to look like
> > > 
> > > CPU               device                   calendar
> > >      ---[kick]-->
> > >                          ---[add entry]-->
> > >                          <---[return]-----
> > 
> > What are the semantics of returning from the calendar?  Does it mean
> > "it's now your turn to run?", "your entry has been added and you'll be
> > notified later when it's time to run?", or something else?
> 
> The latter - the entry was added, and you'll be notified when it's time
> to run; but we need to have that state on the calendar so the CPU won't
> converse with the calendar before that state is committed.

Is the device only adding a calendar entry and not doing any actual
device emulation at this stage?

If yes, then this suggests the system could be structured more cleanly.
The vhost-user device process should focus on device emulation.  It
should not be aware of the calendar.  The vhost-user protocol also
shouldn't require modifications.

Instead, Linux arch/um code would add the entry to the calendar when the
CPU wants to kick a vhost-user device.  I assume the CPU is suspended
until arch/um code completes adding the entry to the calendar.

When the calendar decides to run the device entry it signals the
vhost-user kick eventfd.  The vhost-user device processes the virtqueue
as if it had been directly signalled by the CPU, totally unaware that
it's running within a simulation system.

The irq path is similar: the device signals the callfd and the calendar
adds an entry to notify UML that the request has completed.

Some pseudo-code:

arch/um/drivers/.../vhost-user.c:

  void um_vu_kick(struct um_vu_vq *vq)
  {
      if (simulation_mode) {
          calendar_add_entry({
              .type = CAL_ENTRY_TYPE_VHOST_USER_KICK,
              .device = vq->dev,
              .vq_idx = vq->idx,
          });
          return;
      }

      /* The normal code path: signal the kickfd */
      uint64_t val = 1;
      write(vq->kickfd, &val, sizeof(val));
  }

I'm suggesting this because it seems like a cleaner approach than
exposing the calendar concept to the vhost-user devices.  I'm not 100%
sure it offers the semantics you need to make everything deterministic
though.

A different topic: vhost-user does not have a 1:1 vq buffer:kick
relationship.  It's possible to add multiple buffers and kick only once.
It is also possible for the device to complete multiple buffers and only
call once.

This could pose a problem for simulation because it allows a degree of
non-determinism.  But as long as the both the CPU and the I/O completion
of the device happen on a strict schedule this isn't a problem.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-11  7:35             ` Stefan Hajnoczi
@ 2019-09-11  8:26               ` Johannes Berg
  2019-09-11 15:17                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-11  8:26 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Marc-André Lureau, qemu-devel, mst

On Wed, 2019-09-11 at 09:35 +0200, Stefan Hajnoczi wrote:
> On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> > On Tue, 2019-09-10 at 17:03 +0200, Stefan Hajnoczi wrote:
> > > > Now, this means that the CPU (that's part of the simulation) has to
> > > > *wait* for the device to add an entry to the simulation calendar in
> > > > response to the kick... That means that it really has to look like
> > > > 
> > > > CPU               device                   calendar
> > > >      ---[kick]-->
> > > >                          ---[add entry]-->
> > > >                          <---[return]-----
> > > 
> > > What are the semantics of returning from the calendar?  Does it mean
> > > "it's now your turn to run?", "your entry has been added and you'll be
> > > notified later when it's time to run?", or something else?
> > 
> > The latter - the entry was added, and you'll be notified when it's time
> > to run; but we need to have that state on the calendar so the CPU won't
> > converse with the calendar before that state is committed.
> 
> Is the device only adding a calendar entry and not doing any actual
> device emulation at this stage?

Correct, yes.

With one exception: in the case of the simtime (calendar) device, the
actual processing *does* happen at this stage, of course - the calendar
entry has to have been added before we return.

> If yes, then this suggests the system could be structured more cleanly.
> The vhost-user device process should focus on device emulation.  It
> should not be aware of the calendar.

Decoupling the device entirely from the simulation doesn't work, at
least it depends on what you want to emulate. If you don't care that
everything in the device happens immediately (in simulation time), then
this might be OK - but in most cases you do want to model some latency,
processing time or similar in the device, and that means the device has
to request more calendar entries for its further processing.

Take a network device for example that wants to model a 50ms latency. It
has to first have a calendar event to take the packet from the deriver
onto the wire, and then have another calendar event to push the packet
from the wire out some other driver. The first of those events could be
modelled by what you suggest below, the second cannot.

> The vhost-user protocol also shouldn't require modifications.
> 
> Instead, Linux arch/um code would add the entry to the calendar when the
> CPU wants to kick a vhost-user device.  I assume the CPU is suspended
> until arch/um code completes adding the entry to the calendar.

Right, OK, so far I'm following, and it seems perfectly reasonable.

Though as I said above (the simtime exception) - suspending the CPU
while adding an entry to the calendar actually also relies on the
KICK/ACK message semantics right now. This could easily be implemented
differently in this particular device though, e.g. by waiting for an ACK
message on a response virtqueue after sending the "add-entry" request on
the command virtqueue.

> When the calendar decides to run the device entry it signals the
> vhost-user kick eventfd.

Now you have to send those FDs also to the calendar, but I guess the
calendar is a vhost-user device too anyway, so we can send it the FD
along with the request to add the calendar entry, i.e. instead of adding
a calendar entry "please tell me" you can add a calendar entry with
"please kick this FD". Seems reasonable, though it requires much deeper
integration of the virtio implementation with the calendar than I was
planning, though possibly a bit less such integration on the device
side.

> The vhost-user device processes the virtqueue
> as if it had been directly signalled by the CPU, totally unaware that
> it's running within a simulation system.

As I said above, it cannot be entirely unaware unless it's a very
trivial device emulation. That *might* be something you actually
want/don't care, for example for a "control network" within the
simulation where you don't need to model any latencies, however it also
very easily introduces issues, say if the vhost-user device emulation,
focusing completely on emulation, starts doing 'sleep()' or similar OS
calls; they really should be going to the simulation instead.


However, really the place where this breaks down is that you don't know
when the device has finished processing.

As a totally made-up example, say you're emulating a strange /dev/null
device, but as a special quirk it can only ever consume an even number
of buffers. You give it a buffer on a virtqueue and kick it, nothing
happens; you give it another one and kick it, it'll consume both and
free (call) up the two entries, doing nothing else.

In the simulation of this device, it has to essentially behave like
this:
On kick, it schedules a calendar entry to process the interrupt. Once
that entry is signalled, the interrupt processing code runs and checks
the state of the virtqueue; if there's an even number of buffers it
releases them both. Regardless of whether that happened, it will then
relinquish its time slice, telling the calendar it's done processing.

In the model you propose, that last bit of "relinquish my time slice"
part will be missing. This _might_ be fine for some devices that always
reply *immediately* and never defer any work, i.e. every single kick
will be followed by a call essentially immediately. But already in my
simple hypothetical example above the call never happens for the first
buffer, so the simulation will get completely stuck.

Similarly, you can easily imagine other device that don't just process
the buffers but also have some more internal state, or internal timers,
or something like that.

Thus I don't think the part of your proposed model that disconnects the
device from the calendar is workable, or at least is only workable for a
very limited number of devices that never defer anything and have a very
simple command/response behaviour.

Now, the question remains though if the protocol changes could be
avoided. While I think that in theory that is possible using a model
such as yours, I suspect (and this is more gut feeling for now than
really well-argued) that we're better off with the protocol changes, for
example because:

1) This lets us detect that the device was set up for simulation, and if
   it doesn't support F_KICK_CALL_MSGS we can refuse to connect to it in
   simulation mode. This is a bit handwaving, since in simulation you
   control fully your devices and everything, but for somebody who
   hasn't read all of the detailed discussions here that might save some
   time understanding they cannot just add arbitrary devices to the
   simulation and expect things to work.
   But yes, doing this would also meansort of abusing the
   F_KICK_CALL_MSGS as a moniker for simulation, which isn't really how
   I spec'ed it, because I didn't want to introduce any simulation
   calendar language into the vhost-user spec ...
   (IOW, any other device might implement F_KICK_CALL_MSGS and you could
   connect it, but I'm sort of banking on the fact that implementing
   this protocol extension outside of simulation is stupid.)

2) When we add a calendar entry on behalf of the device, we'd have to
   associate it with the device somehow, so when the device then later
   relinquishes its time slice the calendar knows what's going on.
   Theoretically this isn't needed as only one thing should be running
   (the device that was told to via the kick-fd-calendar-entry), but in
   practice bugs will happen and detecting them helps a lot.


> I'm suggesting this because it seems like a cleaner approach than
> exposing the calendar concept to the vhost-user devices.

As I argue above, for all intents and purposes that only works for
extremely trivial devices, to the extent that I'd rather exclude them
than risk unexpected behaviour.

After all, the main use cases for this are going to be
1) modelling more complex devices (I intend to model a wifi device)
2) modelling network topologies in the virtual network switch (to which
   the UML instances connect via vhost-user, requiring calendar
   integration as described above)
3) similarly modelling wireless networks
etc.

> I'm not 100%
> sure it offers the semantics you need to make everything deterministic
> though.

It sounds like it does, but it doesn't seem like a big win to me.

But like I said before, if there's enough objection to adding something
like this to the protocol then certainly we can get away without it, and
the model you propose is even a bit nicer than the model I had in mind
(I considered putting all VQs into polling mode, and then having the
calendar tell the devices to poll rather than having the calendar kick
the FDs.)

Overall, I think it becomes more manageable with the extension though.

> A different topic: vhost-user does not have a 1:1 vq buffer:kick
> relationship.  It's possible to add multiple buffers and kick only once.
> It is also possible for the device to complete multiple buffers and only
> call once.
> 
> This could pose a problem for simulation because it allows a degree of
> non-determinism.  But as long as the both the CPU and the I/O completion
> of the device happen on a strict schedule this isn't a problem.

Yes, I'm aware, but this is fine. The device will not see those buffers
until it's kicked. It could even be that when the device schedules the
interrupt processing time for some future time (say if it's super busy
now), and the CPU continues running to add more buffers and even kick
again. Or the CPU doesn't release its time slice because it just simply
has code like

  add_outbuf(vq, buf)
  kick(vq)
  add_outbuf(vq, buf)
  kick(vq)
  wait_for_response() // release time slice here

Now, most drivers would probably avoid that first kick, but it still
models fine.

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-10 15:52                       ` Johannes Berg
@ 2019-09-11  9:16                         ` Michael S. Tsirkin
  2019-09-11  9:20                           ` Johannes Berg
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-11  9:16 UTC (permalink / raw)
  To: Johannes Berg; +Cc: qemu-devel

On Tue, Sep 10, 2019 at 05:52:36PM +0200, Johannes Berg wrote:
> On Mon, 2019-09-09 at 15:50 +0200, Johannes Berg wrote:
> 
> > > We can document how to behave in case of inconsistent protocol features,
> > > yes.
> > 
> > OK.
> 
> Coming back to this, I was just looking at it.
> 
> How/where would you like to see this done?
> 
> There isn't really any section that lists and explains the various
> protocol features, there's only a list. I could add a new section for
> "Simulation" or something like that that explains it, but then most
> people would probably skip that and not ever read the text about how you
> shouldn't implement F_KICK_CALL_MSGS :-)
> 
> Any thoughts?
> 
> johannes

Each feature is documented near the description of the functionality it
enables, that can work for this. I don't much like F_KICK_CALL_MSGS as
not generic enough but it's not simulation as such:
IN_BAND_NOTIFICATIONS?


As for how to handle errors, that probably belongs near
"Communication".

Or maybe add a new "Error handling" section.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-11  9:16                         ` Michael S. Tsirkin
@ 2019-09-11  9:20                           ` Johannes Berg
  2019-09-11  9:54                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Johannes Berg @ 2019-09-11  9:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel


> Each feature is documented near the description of the functionality it
> enables, that can work for this. 

Hmm, so you mean I should add a section on in-band notifications, and
document things there?

> I don't much like F_KICK_CALL_MSGS as
> not generic enough but it's not simulation as such:
> IN_BAND_NOTIFICATIONS?

Sure, sounds good to me, I guess I'm not good at naming things :)

> As for how to handle errors, that probably belongs near
> "Communication".
> 
> Or maybe add a new "Error handling" section.

OK.

Btw, I tried this yesterday in libvhost-user, but if I just do
vu_panic() it just aborts that message handling and hangs, if I
forcefully close the FD then it ends up crashing later ...

I'm tempted to go with vu_panic() only for now as that seems to be the
normal way to handle unexpected protocol errors there, many such other
errors probably should also close the FD?

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
  2019-09-11  9:20                           ` Johannes Berg
@ 2019-09-11  9:54                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-09-11  9:54 UTC (permalink / raw)
  To: Johannes Berg; +Cc: qemu-devel

On Wed, Sep 11, 2019 at 11:20:40AM +0200, Johannes Berg wrote:
> 
> > Each feature is documented near the description of the functionality it
> > enables, that can work for this. 
> 
> Hmm, so you mean I should add a section on in-band notifications, and
> document things there?

Like other messages - look at e.g. inflight description as an example.


There's also a bunch of work finding all places that
deal with kick/call FDs and updating that they
only make sense without the new feature flag.

> > I don't much like F_KICK_CALL_MSGS as
> > not generic enough but it's not simulation as such:
> > IN_BAND_NOTIFICATIONS?
> 
> Sure, sounds good to me, I guess I'm not good at naming things :)
> 
> > As for how to handle errors, that probably belongs near
> > "Communication".
> > 
> > Or maybe add a new "Error handling" section.
> 
> OK.
> 
> Btw, I tried this yesterday in libvhost-user, but if I just do
> vu_panic() it just aborts that message handling and hangs, if I
> forcefully close the FD then it ends up crashing later ...
> 
> I'm tempted to go with vu_panic() only for now as that seems to be the
> normal way to handle unexpected protocol errors there, many such other
> errors probably should also close the FD?
> 
> johannes

I'm fine with a TODO for now.

-- 
MST


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-11  8:26               ` Johannes Berg
@ 2019-09-11 15:17                 ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2019-09-11 15:17 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc-André Lureau, qemu-devel, Stefan Hajnoczi, Michael S. Tsirkin

On Wed, Sep 11, 2019 at 11:13 AM Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Wed, 2019-09-11 at 09:35 +0200, Stefan Hajnoczi wrote:
> > On Tue, Sep 10, 2019 at 05:14:36PM +0200, Johannes Berg wrote:
> > > On Tue, 2019-09-10 at 17:03 +0200, Stefan Hajnoczi wrote:
> > I'm suggesting this because it seems like a cleaner approach than
> > exposing the calendar concept to the vhost-user devices.
>
> As I argue above, for all intents and purposes that only works for
> extremely trivial devices, to the extent that I'd rather exclude them
> than risk unexpected behaviour.

That makes sense, I think a vhost-user protocol extension is necessary
here.  Thanks for explaining.

Stefan


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-02 12:12 ` [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages Johannes Berg
  2019-09-05 20:28   ` Johannes Berg
@ 2019-09-11 15:31   ` Stefan Hajnoczi
  2019-09-11 15:36     ` Johannes Berg
  1 sibling, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2019-09-11 15:31 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc-André Lureau, Stefan Hajnoczi, qemu-devel, Johannes Berg

On Mon, Sep 2, 2019 at 2:13 PM Johannes Berg <johannes@sipsolutions.net> wrote:
>
> From: Johannes Berg <johannes.berg@intel.com>
>
> For good reason, vhost-user is currently built asynchronously, that
> way better performance can be obtained. However, for certain use
> cases such as simulation, this is problematic.
>
> Consider an event-based simulation in which both the device and CPU
> have are scheduled according to a simulation "calendar". Now, for
> example, consider the CPU sending a command to the device, over a
> vring in the vhost-user protocol. In this case, the CPU must wait
> for the vring update to have been processed, so that the device has
> time to put an entry onto the simulation calendar to obtain time to
> handle the interrupt, before the CPU goes back to scheduling and
> thus updates the simulation calendar or even has the simulation
> move time forward.
>
> This cannot easily achieved with the eventfd based vring kick/call
> design.
>
> Extend the protocol slightly, so that a message can be used for kick
> and call instead, if the new VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS is
> negotiated. This in itself doesn't guarantee synchronisation, but both
> sides can also negotiate VHOST_USER_PROTOCOL_F_REPLY_ACK and thus get
> a reply to this message by setting the need_reply flag, and ensure
> synchronisation this way.
>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
>  docs/interop/vhost-user.rst | 36 ++++++++++++++++++++++++++++++++++--
>  1 file changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 7827b710aa0a..1270885fecf2 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -2,6 +2,7 @@
>  Vhost-user Protocol
>  ===================
>  :Copyright: 2014 Virtual Open Systems Sarl.
> +:Copyright: 2019 Intel Corporation
>  :Licence: This work is licensed under the terms of the GNU GPL,
>            version 2 or later. See the COPYING file in the top-level
>            directory.
> @@ -785,6 +786,7 @@ Protocol features
>    #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
>    #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
>    #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
> +  #define VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS 13
>
>  Master message types
>  --------------------
> @@ -946,7 +948,9 @@ Master message types
>    Bits (0-7) of the payload contain the vring index. Bit 8 is the
>    invalid FD flag. This flag is set when there is no file descriptor
>    in the ancillary data. This signals that polling should be used
> -  instead of waiting for a kick.
> +  instead of waiting for the call. however, if the protocol feature
> +  ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` has been negotiated it instead
> +  means the updates should be done using the messages.
>
>  ``VHOST_USER_SET_VRING_CALL``
>    :id: 13
> @@ -959,7 +963,9 @@ Master message types
>    Bits (0-7) of the payload contain the vring index. Bit 8 is the
>    invalid FD flag. This flag is set when there is no file descriptor
>    in the ancillary data. This signals that polling will be used
> -  instead of waiting for the call.
> +  instead of waiting for the call; however, if the protocol feature
> +  ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` has been negotiated it instead
> +  means the updates should be done using the messages.
>
>  ``VHOST_USER_SET_VRING_ERR``
>    :id: 14
> @@ -1190,6 +1196,19 @@ Master message types
>    ancillary data. The GPU protocol is used to inform the master of
>    rendering state and updates. See vhost-user-gpu.rst for details.
>
> +``VHOST_USER_VQ_CALL``

"call" should be "kick".  "kick" is the driver->device request
submission notification and "call" is the device->driver request
completion notification.

> +  :id: 34
> +  :equivalent ioctl: N/A
> +  :slave payload: vring state description
> +  :master payload: N/A
> +
> +  When the ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` protocol feature has
> +  been successfully negotiated, this message may be submitted by the master
> +  to indicate that a buffer was added to the vring instead of signalling it
> +  using the vring's event FD or having the slave rely on polling.
> +
> +  The state.num field is currently reserved and must be set to 0.

Please include an explanation of how this message is different from
the existing methods.  Maybe something like:

  Unlike eventfd or polling, this message allows the master to control
precisely when virtqueue processing happens.  When the master uses
``need_reply`` to receive a reply, it knows the device has become
aware of the virtqueue activity.

>  Slave message types
>  -------------------
>
> @@ -1246,6 +1265,19 @@ Slave message types
>    ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
>    successfully negotiated.
>
> +``VHOST_USER_VQ_KICK``

s/KICK/CALL/


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-11 15:31   ` Stefan Hajnoczi
@ 2019-09-11 15:36     ` Johannes Berg
  2019-09-11 15:38       ` Johannes Berg
  2019-09-12 12:22       ` Stefan Hajnoczi
  0 siblings, 2 replies; 38+ messages in thread
From: Johannes Berg @ 2019-09-11 15:36 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Marc-André Lureau, qemu-devel, Stefan Hajnoczi

On Wed, 2019-09-11 at 17:31 +0200, Stefan Hajnoczi wrote:

> > +``VHOST_USER_VQ_CALL``
> 
> "call" should be "kick".  "kick" is the driver->device request
> submission notification and "call" is the device->driver request
> completion notification.

Ahrg, yes. I confuse this all the time, sorry, will fix.

Btw, speaking of confusion ... this document may need some cleanup wrt.
the "client" language since it states that both sides might be a socket
server or client, but then goes on to talk about the "client" as though
it was equivalent to "slave", where normally it's actually the other way
around (the device is the one listening on the socket, i.e. it's the
server). This is most pronounced in the section "Starting and stopping
rings".

IMHO, it'd be good to talk more about "device" and "host" instead of
"slave" and "master" too since that's clearer, and just replace *all*
that language accordingly, but YMMV.

> > +  :id: 34
> > +  :equivalent ioctl: N/A
> > +  :slave payload: vring state description
> > +  :master payload: N/A
> > +
> > +  When the ``VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS`` protocol feature has
> > +  been successfully negotiated, this message may be submitted by the master
> > +  to indicate that a buffer was added to the vring instead of signalling it
> > +  using the vring's event FD or having the slave rely on polling.
> > +
> > +  The state.num field is currently reserved and must be set to 0.
> 
> Please include an explanation of how this message is different from
> the existing methods.  Maybe something like:
> 
>   Unlike eventfd or polling, this message allows the master to control
> precisely when virtqueue processing happens.  When the master uses
> ``need_reply`` to receive a reply, it knows the device has become
> aware of the virtqueue activity.

Sure, thanks, I'll incorporate that.

> >  Slave message types
> >  -------------------
> > 
> > @@ -1246,6 +1265,19 @@ Slave message types
> >    ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
> >    successfully negotiated.
> > 
> > +``VHOST_USER_VQ_KICK``
> 
> s/KICK/CALL/

Indeed. Wait, that should be VHOST_USER_SLAVE_VQ_CALL, actually. Maybe I
already did fix that in v2?

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-11 15:36     ` Johannes Berg
@ 2019-09-11 15:38       ` Johannes Berg
  2019-09-12 12:22       ` Stefan Hajnoczi
  1 sibling, 0 replies; 38+ messages in thread
From: Johannes Berg @ 2019-09-11 15:38 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Marc-André Lureau, qemu-devel, Stefan Hajnoczi

On Wed, 2019-09-11 at 17:36 +0200, Johannes Berg wrote:
> On Wed, 2019-09-11 at 17:31 +0200, Stefan Hajnoczi wrote:
> 
> > > +``VHOST_USER_VQ_CALL``

It should also be VRING, not VQ, but I did indeed fix that in v2 already
along with the CALL<->KICK inversion.

So I guess I just have to include the part about how it's different from
existing methods, and solve the discussion with Michael about polling
(and possibly use another flag there for message-based mode).

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-11 15:36     ` Johannes Berg
  2019-09-11 15:38       ` Johannes Berg
@ 2019-09-12 12:22       ` Stefan Hajnoczi
  2019-09-12 20:37         ` Johannes Berg
  1 sibling, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2019-09-12 12:22 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Marc-André Lureau, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

On Wed, Sep 11, 2019 at 05:36:32PM +0200, Johannes Berg wrote:
> On Wed, 2019-09-11 at 17:31 +0200, Stefan Hajnoczi wrote:
> 
> > > +``VHOST_USER_VQ_CALL``
> > 
> > "call" should be "kick".  "kick" is the driver->device request
> > submission notification and "call" is the device->driver request
> > completion notification.
> 
> Ahrg, yes. I confuse this all the time, sorry, will fix.
> 
> Btw, speaking of confusion ... this document may need some cleanup wrt.
> the "client" language since it states that both sides might be a socket
> server or client, but then goes on to talk about the "client" as though
> it was equivalent to "slave", where normally it's actually the other way
> around (the device is the one listening on the socket, i.e. it's the
> server). This is most pronounced in the section "Starting and stopping
> rings".
> 
> IMHO, it'd be good to talk more about "device" and "host" instead of
> "slave" and "master" too since that's clearer, and just replace *all*
> that language accordingly, but YMMV.

The vhost-user spec is unclear and inconsistent.  Patches are welcome.
A footnote describing the old terminology would be necessary so that
existing documentation, code, etc can still be decyphered when the spec
changes the terminology :).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages
  2019-09-12 12:22       ` Stefan Hajnoczi
@ 2019-09-12 20:37         ` Johannes Berg
  0 siblings, 0 replies; 38+ messages in thread
From: Johannes Berg @ 2019-09-12 20:37 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Marc-André Lureau, qemu-devel, Stefan Hajnoczi

On Thu, 2019-09-12 at 14:22 +0200, Stefan Hajnoczi wrote:
> 
> The vhost-user spec is unclear and inconsistent.  Patches are welcome.

:)

> A footnote describing the old terminology would be necessary so that
> existing documentation, code, etc can still be decyphered when the spec
> changes the terminology :).

Yeah.

But we (you) would have to decide first what the terminology actually
*should* be :-)

I'd have said something like "host" and "device", but then "host" can
get confusing in the context of qemu, is it the host that the VM is
running on? It's the VM that's hosting the device, but that's a bit confusing in this context.

Client/server might work, but it's not very obvious either (**), and
quite a bit of the text actually gets it wrong right now, so it'd be
very confusing to swap that and have a footnote or similar indicate that
it was described the other way around previously ...

Calling it master/slave as the text does now is a bit confusing to me
because it has DMA (of sorts) and that's called "bus mastering" in most
other contexts, but perhaps that's what would work best nonetheless,
though I'm not really a fan of that terminology.

Perhaps master/device would be nicer, getting the term "device" in there
somewhere would seem appropriate, even if it's not really actually a
device, but from the view inside the "master" VM it is a device...


(**) Btw, is it really true that there's a way to have the device make
the connection to a listening unix domain socket, as implied by the
spec? I cannot find evidence for such a mode in any code...

johannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, back to index

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-02 12:12 [Qemu-devel] [RFC] vhost-user simulation extension Johannes Berg
2019-09-02 12:12 ` [Qemu-devel] [RFC] docs: vhost-user: add in-band kick/call messages Johannes Berg
2019-09-05 20:28   ` Johannes Berg
2019-09-09 16:00     ` Stefan Hajnoczi
2019-09-09 17:34       ` Johannes Berg
2019-09-10 15:03         ` Stefan Hajnoczi
2019-09-10 15:14           ` Johannes Berg
2019-09-10 15:33             ` Michael S. Tsirkin
2019-09-10 15:34               ` Johannes Berg
2019-09-11  6:56                 ` Stefan Hajnoczi
2019-09-11  7:35             ` Stefan Hajnoczi
2019-09-11  8:26               ` Johannes Berg
2019-09-11 15:17                 ` Stefan Hajnoczi
2019-09-11 15:31   ` Stefan Hajnoczi
2019-09-11 15:36     ` Johannes Berg
2019-09-11 15:38       ` Johannes Berg
2019-09-12 12:22       ` Stefan Hajnoczi
2019-09-12 20:37         ` Johannes Berg
2019-09-06 12:13 ` [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS Johannes Berg
2019-09-06 14:22   ` Michael S. Tsirkin
2019-09-06 14:48     ` Johannes Berg
2019-09-06 15:12       ` Michael S. Tsirkin
2019-09-06 15:32         ` Johannes Berg
2019-09-08 13:13           ` Michael S. Tsirkin
2019-09-09 11:35             ` Johannes Berg
2019-09-09 12:41               ` Michael S. Tsirkin
2019-09-09 13:05                 ` Johannes Berg
2019-09-09 13:48                   ` Michael S. Tsirkin
2019-09-09 13:50                     ` Johannes Berg
2019-09-09 14:59                       ` Michael S. Tsirkin
2019-09-09 15:26                         ` Johannes Berg
2019-09-09 15:34                           ` Johannes Berg
2019-09-09 15:45                             ` Michael S. Tsirkin
2019-09-09 15:47                               ` Johannes Berg
2019-09-10 15:52                       ` Johannes Berg
2019-09-11  9:16                         ` Michael S. Tsirkin
2019-09-11  9:20                           ` Johannes Berg
2019-09-11  9:54                             ` Michael S. Tsirkin

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org qemu-devel@archiver.kernel.org
	public-inbox-index qemu-devel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox