qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/25] virtiofs dax patches
@ 2021-04-14 15:51 Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 01/25] DAX: vhost-user: Rework slave return values Dr. David Alan Gilbert (git)
                   ` (24 more replies)
  0 siblings, 25 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

  This series adds support for acceleration of virtiofs via DAX
mapping, using features added in the 5.11 Linux kernel.

  DAX originally existed in the kernel for mapping real storage
devices directly into memory, so that reads/writes turn into
reads/writes directly mapped into the storage device.

  virtiofs's DAX support is similar; a PCI BAR is exposed on the
virtiofs device corresponding to a DAX 'cache' of a user defined size.
The guest daemon then requests files to be mapped into that cache;
when that happens the virtiofsd sends filedescriptors and commands back
to the QEMU that mmap's those files directly into the memory slot
exposed to kvm.  The guest can then directly read/write to the files
exposed by virtiofs by reading/writing into the BAR.

  A typical invocation would be:
     -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,cache-size=4G 

and then the guest must mount with -o dax

  Note that the cache doesn't really take VM up on the host, because
everything placed there is just an mmap of a file, so you can afford
to use quite a large cache size.

  Unlike a real DAX device, the cache is a finite size that's
potentially smaller than the underlying filesystem (especially when
mapping granuality is taken into account).  Mapping, unmapping and
remapping must take place to juggle files into the cache if it's too
small.  Some workloads benefit more than others.

Gotchas:
  a) If something else on the host truncates an mmap'd file,
kvm gets rather upset;  for this reason it's advised that DAX is
currently only suitable for use on non-shared filesystems.

Dave

v2
  Cleanups from first review, rebase on current head

Dr. David Alan Gilbert (20):
  DAX: vhost-user: Rework slave return values
  virtiofsd: Don't assume header layout
  DAX: libvhost-user: Route slave message payload
  DAX: libvhost-user: Allow popping a queue element with bad pointers
  DAX subprojects/libvhost-user: Add virtio-fs slave types
  DAX: virtio: Add shared memory capability
  DAX: virtio-fs: Add cache BAR
  DAX: virtio-fs: Add vhost-user slave commands for mapping
  DAX: virtio-fs: Fill in slave commands for mapping
  DAX: virtiofsd Add cache accessor functions
  DAX: virtiofsd: Add setup/remove mappings fuse commands
  DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll
  DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
  DAX: virtiofsd: route se down to destroy method
  DAX: virtiofsd: Perform an unmap on destroy
  DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO
  DAX/unmap virtiofsd: Parse unmappable elements
  DAX/unmap virtiofsd: Route unmappable reads
  DAX/unmap virtiofsd: route unmappable write to slave command

Stefan Hajnoczi (1):
  DAX:virtiofsd: implement FUSE_INIT map_alignment field

Vivek Goyal (4):
  DAX: virtiofsd: Make lo_removemapping() work
  vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info
  vhost-user-fs: Implement drop CAP_FSETID functionality
  virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it

 block/export/vhost-user-blk-server.c      |   2 +-
 contrib/vhost-user-blk/vhost-user-blk.c   |   3 +-
 contrib/vhost-user-gpu/vhost-user-gpu.c   |   5 +-
 contrib/vhost-user-input/main.c           |   4 +-
 contrib/vhost-user-scsi/vhost-user-scsi.c |   2 +-
 docs/interop/vhost-user.rst               |  37 ++
 hw/virtio/meson.build                     |   1 +
 hw/virtio/trace-events                    |   6 +
 hw/virtio/vhost-backend.c                 |   6 +-
 hw/virtio/vhost-user-fs-pci.c             |  32 ++
 hw/virtio/vhost-user-fs.c                 | 395 ++++++++++++++++++++++
 hw/virtio/vhost-user.c                    |  60 +++-
 hw/virtio/virtio-pci.c                    |  20 ++
 hw/virtio/virtio-pci.h                    |   4 +
 include/hw/virtio/vhost-backend.h         |   2 +-
 include/hw/virtio/vhost-user-fs.h         |  43 +++
 meson.build                               |   6 +
 subprojects/libvhost-user/libvhost-user.c | 113 ++++++-
 subprojects/libvhost-user/libvhost-user.h |  57 +++-
 tests/vhost-user-bridge.c                 |   4 +-
 tools/virtiofsd/buffer.c                  |  22 +-
 tools/virtiofsd/fuse_common.h             |  17 +-
 tools/virtiofsd/fuse_lowlevel.c           |  92 ++++-
 tools/virtiofsd/fuse_lowlevel.h           |  78 ++++-
 tools/virtiofsd/fuse_virtio.c             | 372 ++++++++++++++++----
 tools/virtiofsd/passthrough_ll.c          | 117 ++++++-
 26 files changed, 1380 insertions(+), 120 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 01/25] DAX: vhost-user: Rework slave return values
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-16 10:59   ` [Virtio-fs] " Greg Kurz
  2021-04-14 15:51 ` [PATCH v2 02/25] virtiofsd: Don't assume header layout Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

All the current slave handlers on the qemu side generate an 'int'
return value that's squashed down to a bool (!!ret) and stuffed into
a uint64_t (field of a union) to be returned.

Move the uint64_t type back up through the individual handlers so
that we can make one actually return a full uint64_t.

Note that the definition in the interop spec says most of these
cases are defined as returning 0 on success and non-0 for failure,
so it's OK to change from a bool to another non-0.

Vivek:
This is needed because upcoming patches in series will add new functions
which want to return full error code. Existing functions continue to
return true/false so, it should not lead to change of behavior for
existing users.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost-backend.c         |  6 +++---
 hw/virtio/vhost-user.c            | 31 ++++++++++++++++---------------
 include/hw/virtio/vhost-backend.h |  2 +-
 3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 31b33bde37..1686c94767 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -401,8 +401,8 @@ int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
     return -ENODEV;
 }
 
-int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
-                                          struct vhost_iotlb_msg *imsg)
+uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
+                                        struct vhost_iotlb_msg *imsg)
 {
     int ret = 0;
 
@@ -429,5 +429,5 @@ int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
         break;
     }
 
-    return ret;
+    return !!ret;
 }
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index ded0c10453..3e4a25e108 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1405,24 +1405,25 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
     return 0;
 }
 
-static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
+static uint64_t vhost_user_slave_handle_config_change(struct vhost_dev *dev)
 {
     int ret = -1;
 
     if (!dev->config_ops) {
-        return -1;
+        return true;
     }
 
     if (dev->config_ops->vhost_dev_config_notifier) {
         ret = dev->config_ops->vhost_dev_config_notifier(dev);
     }
 
-    return ret;
+    return !!ret;
 }
 
-static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
-                                                       VhostUserVringArea *area,
-                                                       int fd)
+static uint64_t vhost_user_slave_handle_vring_host_notifier(
+                struct vhost_dev *dev,
+               VhostUserVringArea *area,
+               int fd)
 {
     int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
     size_t page_size = qemu_real_host_page_size;
@@ -1436,7 +1437,7 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
     if (!virtio_has_feature(dev->protocol_features,
                             VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
         vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
-        return -1;
+        return true;
     }
 
     n = &user->notifier[queue_idx];
@@ -1449,18 +1450,18 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
     }
 
     if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
-        return 0;
+        return false;
     }
 
     /* Sanity check. */
     if (area->size != page_size) {
-        return -1;
+        return true;
     }
 
     addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
                 fd, area->offset);
     if (addr == MAP_FAILED) {
-        return -1;
+        return true;
     }
 
     name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
@@ -1471,13 +1472,13 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
 
     if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
         munmap(addr, page_size);
-        return -1;
+        return true;
     }
 
     n->addr = addr;
     n->set = true;
 
-    return 0;
+    return false;
 }
 
 static void close_slave_channel(struct vhost_user *u)
@@ -1498,7 +1499,7 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
     VhostUserPayload payload = { 0, };
     Error *local_err = NULL;
     gboolean rc = G_SOURCE_CONTINUE;
-    int ret = 0;
+    uint64_t ret = 0;
     struct iovec iov;
     g_autofree int *fd = NULL;
     size_t fdsize = 0;
@@ -1539,7 +1540,7 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
         break;
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
-        ret = -EINVAL;
+        ret = true;
     }
 
     /*
@@ -1553,7 +1554,7 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
         hdr.flags &= ~VHOST_USER_NEED_REPLY_MASK;
         hdr.flags |= VHOST_USER_REPLY_MASK;
 
-        payload.u64 = !!ret;
+        payload.u64 = ret;
         hdr.size = sizeof(payload.u64);
 
         iovec[0].iov_base = &hdr;
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 8a6f8e2a7a..64ac6b6444 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -186,7 +186,7 @@ int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
 int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
                                                  uint64_t iova, uint64_t len);
 
-int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
+uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
                                           struct vhost_iotlb_msg *imsg);
 
 int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 02/25] virtiofsd: Don't assume header layout
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 01/25] DAX: vhost-user: Rework slave return values Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 03/25] DAX: libvhost-user: Route slave message payload Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

virtiofsd incorrectly assumed a fixed set of header layout in the virt
queue; assuming that the fuse and write headers were conveniently
separated from the data;  the spec doesn't allow us to take that
convenience, so fix it up to deal with it the hard way.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 94 +++++++++++++++++++++++++++--------
 1 file changed, 73 insertions(+), 21 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 3e13997406..6dd73c9b72 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -129,18 +129,55 @@ static void fv_panic(VuDev *dev, const char *err)
  * Copy from an iovec into a fuse_buf (memory only)
  * Caller must ensure there is space
  */
-static void copy_from_iov(struct fuse_buf *buf, size_t out_num,
-                          const struct iovec *out_sg)
+static size_t copy_from_iov(struct fuse_buf *buf, size_t out_num,
+                            const struct iovec *out_sg,
+                            size_t max)
 {
     void *dest = buf->mem;
+    size_t copied = 0;
 
-    while (out_num) {
+    while (out_num && max) {
         size_t onelen = out_sg->iov_len;
+        onelen = MIN(onelen, max);
         memcpy(dest, out_sg->iov_base, onelen);
         dest += onelen;
+        copied += onelen;
         out_sg++;
         out_num--;
+        max -= onelen;
     }
+
+    return copied;
+}
+
+/*
+ * Skip 'skip' bytes in the iov; 'sg_1stindex' is set as
+ * the index for the 1st iovec to read data from, and
+ * 'sg_1stskip' is the number of bytes to skip in that entry.
+ *
+ * Returns True if there are at least 'skip' bytes in the iovec
+ *
+ */
+static bool skip_iov(const struct iovec *sg, size_t sg_size,
+                     size_t skip,
+                     size_t *sg_1stindex, size_t *sg_1stskip)
+{
+    size_t vec;
+
+    for (vec = 0; vec < sg_size; vec++) {
+        if (sg[vec].iov_len > skip) {
+            *sg_1stskip = skip;
+            *sg_1stindex = vec;
+
+            return true;
+        }
+
+        skip -= sg[vec].iov_len;
+    }
+
+    *sg_1stindex = vec;
+    *sg_1stskip = 0;
+    return skip == 0;
 }
 
 /*
@@ -457,6 +494,7 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
     bool allocated_bufv = false;
     struct fuse_bufvec bufv;
     struct fuse_bufvec *pbufv;
+    struct fuse_in_header inh;
 
     assert(se->bufsize > sizeof(struct fuse_in_header));
 
@@ -505,14 +543,15 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
                  elem->index);
         assert(0); /* TODO */
     }
-    /* Copy just the first element and look at it */
-    copy_from_iov(&fbuf, 1, out_sg);
+    /* Copy just the fuse_in_header and look at it */
+    copy_from_iov(&fbuf, out_num, out_sg,
+                  sizeof(struct fuse_in_header));
+    memcpy(&inh, fbuf.mem, sizeof(struct fuse_in_header));
 
     pbufv = NULL; /* Compiler thinks an unitialised path */
-    if (out_num > 2 &&
-        out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
-        ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
-        out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
+    if (inh.opcode == FUSE_WRITE &&
+        out_len >= (sizeof(struct fuse_in_header) +
+                    sizeof(struct fuse_write_in))) {
         /*
          * For a write we don't actually need to copy the
          * data, we can just do it straight out of guest memory
@@ -521,15 +560,15 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
          */
         fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__);
 
-        /* copy the fuse_write_in header afte rthe fuse_in_header */
-        fbuf.mem += out_sg->iov_len;
-        copy_from_iov(&fbuf, 1, out_sg + 1);
-        fbuf.mem -= out_sg->iov_len;
-        fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
+        fbuf.size = copy_from_iov(&fbuf, out_num, out_sg,
+                                  sizeof(struct fuse_in_header) +
+                                  sizeof(struct fuse_write_in));
+        /* That copy reread the in_header, make sure we use the original */
+        memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header));
 
         /* Allocate the bufv, with space for the rest of the iov */
         pbufv = malloc(sizeof(struct fuse_bufvec) +
-                       sizeof(struct fuse_buf) * (out_num - 2));
+                       sizeof(struct fuse_buf) * out_num);
         if (!pbufv) {
             fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
                     __func__);
@@ -540,24 +579,37 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
         pbufv->count = 1;
         pbufv->buf[0] = fbuf;
 
-        size_t iovindex, pbufvindex;
-        iovindex = 2; /* 2 headers, separate iovs */
+        size_t iovindex, pbufvindex, iov_bytes_skip;
         pbufvindex = 1; /* 2 headers, 1 fusebuf */
 
+        if (!skip_iov(out_sg, out_num,
+                      sizeof(struct fuse_in_header) +
+                      sizeof(struct fuse_write_in),
+                      &iovindex, &iov_bytes_skip)) {
+            fuse_log(FUSE_LOG_ERR, "%s: skip failed\n",
+                    __func__);
+            goto out;
+        }
+
         for (; iovindex < out_num; iovindex++, pbufvindex++) {
             pbufv->count++;
             pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
             pbufv->buf[pbufvindex].flags = 0;
             pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
             pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+
+            if (iov_bytes_skip) {
+                pbufv->buf[pbufvindex].mem += iov_bytes_skip;
+                pbufv->buf[pbufvindex].size -= iov_bytes_skip;
+                iov_bytes_skip = 0;
+            }
         }
     } else {
         /* Normal (non fast write) path */
 
-        /* Copy the rest of the buffer */
-        fbuf.mem += out_sg->iov_len;
-        copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
-        fbuf.mem -= out_sg->iov_len;
+        copy_from_iov(&fbuf, out_num, out_sg, se->bufsize);
+        /* That copy reread the in_header, make sure we use the original */
+        memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header));
         fbuf.size = out_len;
 
         /* TODO! Endianness of header */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 03/25] DAX: libvhost-user: Route slave message payload
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 01/25] DAX: vhost-user: Rework slave return values Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 02/25] virtiofsd: Don't assume header layout Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 04/25] DAX: libvhost-user: Allow popping a queue element with bad pointers Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Route the uint64 payload from message replies on the slave back up
through vu_process_message_reply and to the callers.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 subprojects/libvhost-user/libvhost-user.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index fab7ca17ee..937f64480d 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -403,9 +403,11 @@ vu_send_reply(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
  * Processes a reply on the slave channel.
  * Entered with slave_mutex held and releases it before exit.
  * Returns true on success.
+ * *payload is written on success
  */
 static bool
-vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
+vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg,
+                         uint64_t *payload)
 {
     VhostUserMsg msg_reply;
     bool result = false;
@@ -425,7 +427,8 @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
         goto out;
     }
 
-    result = msg_reply.payload.u64 == 0;
+    *payload = msg_reply.payload.u64;
+    result = true;
 
 out:
     pthread_mutex_unlock(&dev->slave_mutex);
@@ -1312,6 +1315,8 @@ bool vu_set_queue_host_notifier(VuDev *dev, VuVirtq *vq, int fd,
 {
     int qidx = vq - dev->vq;
     int fd_num = 0;
+    bool res;
+    uint64_t payload = 0;
     VhostUserMsg vmsg = {
         .request = VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
         .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
@@ -1342,7 +1347,10 @@ bool vu_set_queue_host_notifier(VuDev *dev, VuVirtq *vq, int fd,
     }
 
     /* Also unlocks the slave_mutex */
-    return vu_process_message_reply(dev, &vmsg);
+    res = vu_process_message_reply(dev, &vmsg, &payload);
+    res = res && (payload == 0);
+
+    return res;
 }
 
 static bool
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 04/25] DAX: libvhost-user: Allow popping a queue element with bad pointers
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 03/25] DAX: libvhost-user: Route slave message payload Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 05/25] DAX subprojects/libvhost-user: Add virtio-fs slave types Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Allow a daemon implemented with libvhost-user to accept an
element with pointers to memory that aren't in the mapping table.
The daemon might have some special way to deal with some special
cases of this.

The default behaviour doesn't change.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/export/vhost-user-blk-server.c      |  2 +-
 contrib/vhost-user-blk/vhost-user-blk.c   |  3 +-
 contrib/vhost-user-gpu/vhost-user-gpu.c   |  5 ++-
 contrib/vhost-user-input/main.c           |  4 +-
 contrib/vhost-user-scsi/vhost-user-scsi.c |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 51 ++++++++++++++++++-----
 subprojects/libvhost-user/libvhost-user.h |  8 +++-
 tests/vhost-user-bridge.c                 |  4 +-
 tools/virtiofsd/fuse_virtio.c             |  3 +-
 9 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index fa06996d37..84c6432325 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -293,7 +293,7 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
     while (1) {
         VuBlkReq *req;
 
-        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
+        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq), NULL, NULL);
         if (!req) {
             break;
         }
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c
index d14b2896bf..01193552e9 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -235,7 +235,8 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
     unsigned out_num;
     VubReq *req;
 
-    elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) + sizeof(VubReq));
+    elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) + sizeof(VubReq),
+                        NULL, NULL);
     if (!elem) {
         return -1;
     }
diff --git a/contrib/vhost-user-gpu/vhost-user-gpu.c b/contrib/vhost-user-gpu/vhost-user-gpu.c
index f73f292c9f..827d15af00 100644
--- a/contrib/vhost-user-gpu/vhost-user-gpu.c
+++ b/contrib/vhost-user-gpu/vhost-user-gpu.c
@@ -840,7 +840,8 @@ vg_handle_ctrl(VuDev *dev, int qidx)
             return;
         }
 
-        cmd = vu_queue_pop(dev, vq, sizeof(struct virtio_gpu_ctrl_command));
+        cmd = vu_queue_pop(dev, vq, sizeof(struct virtio_gpu_ctrl_command),
+                           NULL, NULL);
         if (!cmd) {
             break;
         }
@@ -949,7 +950,7 @@ vg_handle_cursor(VuDev *dev, int qidx)
     struct virtio_gpu_update_cursor cursor;
 
     for (;;) {
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c
index c15d18c33f..d5c435605c 100644
--- a/contrib/vhost-user-input/main.c
+++ b/contrib/vhost-user-input/main.c
@@ -57,7 +57,7 @@ static void vi_input_send(VuInput *vi, struct virtio_input_event *event)
 
     /* ... then check available space ... */
     for (i = 0; i < vi->qindex; i++) {
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             while (--i >= 0) {
                 vu_queue_unpop(dev, vq, vi->queue[i].elem, 0);
@@ -141,7 +141,7 @@ static void vi_handle_sts(VuDev *dev, int qidx)
     g_debug("%s", G_STRFUNC);
 
     for (;;) {
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c b/contrib/vhost-user-scsi/vhost-user-scsi.c
index 4f6e3e2a24..7564d6ab2d 100644
--- a/contrib/vhost-user-scsi/vhost-user-scsi.c
+++ b/contrib/vhost-user-scsi/vhost-user-scsi.c
@@ -252,7 +252,7 @@ static void vus_proc_req(VuDev *vu_dev, int idx)
         VirtIOSCSICmdReq *req;
         VirtIOSCSICmdResp *rsp;
 
-        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             g_debug("No more elements pending on vq[%d]@%p", idx, vq);
             break;
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 937f64480d..68eb165755 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2469,7 +2469,8 @@ vu_queue_set_notification(VuDev *dev, VuVirtq *vq, int enable)
 
 static bool
 virtqueue_map_desc(VuDev *dev,
-                   unsigned int *p_num_sg, struct iovec *iov,
+                   unsigned int *p_num_sg, unsigned int *p_bad_sg,
+                   struct iovec *iov,
                    unsigned int max_num_sg, bool is_write,
                    uint64_t pa, size_t sz)
 {
@@ -2490,10 +2491,35 @@ virtqueue_map_desc(VuDev *dev,
             return false;
         }
 
-        iov[num_sg].iov_base = vu_gpa_to_va(dev, &len, pa);
-        if (iov[num_sg].iov_base == NULL) {
-            vu_panic(dev, "virtio: invalid address for buffers");
-            return false;
+        if (p_bad_sg && *p_bad_sg) {
+            /* A previous mapping was bad, we won't try and map this either */
+            *p_bad_sg = *p_bad_sg + 1;
+        }
+        if (!p_bad_sg || !*p_bad_sg) {
+            /* No bad mappings so far, lets try mapping this one */
+            iov[num_sg].iov_base = vu_gpa_to_va(dev, &len, pa);
+            if (iov[num_sg].iov_base == NULL) {
+                /*
+                 * OK, it won't map, either panic or if the caller can handle
+                 * it, then count it.
+                 */
+                if (!p_bad_sg) {
+                    vu_panic(dev, "virtio: invalid address for buffers");
+                    return false;
+                } else {
+                    *p_bad_sg = *p_bad_sg + 1;
+                }
+            }
+        }
+        if (p_bad_sg && *p_bad_sg) {
+            /*
+             * There was a bad mapping, either now or previously, since
+             * the caller set p_bad_sg it means it's prepared to deal with
+             * it, so give it the pa in the iov
+             * Note: In this case len will be the whole sz, so we won't
+             * go around again for this descriptor
+             */
+            iov[num_sg].iov_base = (void *)(uintptr_t)pa;
         }
         iov[num_sg].iov_len = len;
         num_sg++;
@@ -2524,7 +2550,8 @@ virtqueue_alloc_element(size_t sz,
 }
 
 static void *
-vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
+vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz,
+                  unsigned int *p_bad_in, unsigned int *p_bad_out)
 {
     struct vring_desc *desc = vq->vring.desc;
     uint64_t desc_addr, read_len;
@@ -2568,7 +2595,7 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
     /* Collect all the descriptors */
     do {
         if (le16toh(desc[i].flags) & VRING_DESC_F_WRITE) {
-            if (!virtqueue_map_desc(dev, &in_num, iov + out_num,
+            if (!virtqueue_map_desc(dev, &in_num, p_bad_in, iov + out_num,
                                VIRTQUEUE_MAX_SIZE - out_num, true,
                                le64toh(desc[i].addr),
                                le32toh(desc[i].len))) {
@@ -2579,7 +2606,7 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
                 vu_panic(dev, "Incorrect order for descriptors");
                 return NULL;
             }
-            if (!virtqueue_map_desc(dev, &out_num, iov,
+            if (!virtqueue_map_desc(dev, &out_num, p_bad_out, iov,
                                VIRTQUEUE_MAX_SIZE, false,
                                le64toh(desc[i].addr),
                                le32toh(desc[i].len))) {
@@ -2669,7 +2696,8 @@ vu_queue_inflight_post_put(VuDev *dev, VuVirtq *vq, int desc_idx)
 }
 
 void *
-vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz,
+             unsigned int *p_bad_in, unsigned int *p_bad_out)
 {
     int i;
     unsigned int head;
@@ -2682,7 +2710,8 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 
     if (unlikely(vq->resubmit_list && vq->resubmit_num > 0)) {
         i = (--vq->resubmit_num);
-        elem = vu_queue_map_desc(dev, vq, vq->resubmit_list[i].index, sz);
+        elem = vu_queue_map_desc(dev, vq, vq->resubmit_list[i].index, sz,
+                                 p_bad_in, p_bad_out);
 
         if (!vq->resubmit_num) {
             free(vq->resubmit_list);
@@ -2714,7 +2743,7 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         vring_set_avail_event(vq, vq->last_avail_idx);
     }
 
-    elem = vu_queue_map_desc(dev, vq, head, sz);
+    elem = vu_queue_map_desc(dev, vq, head, sz, p_bad_in, p_bad_out);
 
     if (!elem) {
         return NULL;
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 3d13dfadde..330b61c005 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -589,11 +589,17 @@ void vu_queue_notify_sync(VuDev *dev, VuVirtq *vq);
  * @dev: a VuDev context
  * @vq: a VuVirtq queue
  * @sz: the size of struct to return (must be >= VuVirtqElement)
+ * @p_bad_in: If none NULL, a pointer to an integer count of
+ *            unmappable regions in input descriptors
+ * @p_bad_out: If none NULL, a pointer to an integer count of
+ *            unmappable regions in output descriptors
+ *
  *
  * Returns: a VuVirtqElement filled from the queue or NULL. The
  * returned element must be free()-d by the caller.
  */
-void *vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz);
+void *vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz,
+                   unsigned int *p_bad_in, unsigned int *p_bad_out);
 
 
 /**
diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index 24815920b2..4f6829e6c3 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -184,7 +184,7 @@ vubr_handle_tx(VuDev *dev, int qidx)
         unsigned int out_num;
         struct iovec sg[VIRTQUEUE_MAX_SIZE], *out_sg;
 
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
@@ -299,7 +299,7 @@ vubr_backend_recv_cb(int sock, void *ctx)
         ssize_t ret, total = 0;
         unsigned int num;
 
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 6dd73c9b72..2604e7f418 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -732,7 +732,8 @@ static void *fv_queue_thread(void *opaque)
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
         while (1) {
-            FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest));
+            FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest),
+                                          NULL, NULL);
             if (!req) {
                 break;
             }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 05/25] DAX subprojects/libvhost-user: Add virtio-fs slave types
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 04/25] DAX: libvhost-user: Allow popping a queue element with bad pointers Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 06/25] DAX: virtio: Add shared memory capability Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add virtio-fs definitions to libvhost-user

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 subprojects/libvhost-user/libvhost-user.c | 48 +++++++++++++++++++++++
 subprojects/libvhost-user/libvhost-user.h | 40 +++++++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 68eb165755..97c909c6a8 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2918,3 +2918,51 @@ vu_queue_push(VuDev *dev, VuVirtq *vq,
     vu_queue_flush(dev, vq, 1);
     vu_queue_inflight_post_put(dev, vq, elem->index);
 }
+
+int64_t vu_fs_cache_request(VuDev *dev, VhostUserSlaveRequest req, int fd,
+                            VhostUserFSSlaveMsg *fsm)
+{
+    int fd_num = 0;
+    bool res;
+    uint64_t payload = 0;
+    VhostUserMsg vmsg = {
+        .request = req,
+        .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
+        .payload.fs = *fsm,
+    };
+
+    if (fsm->count > VHOST_USER_FS_SLAVE_MAX_ENTRIES) {
+        return -EINVAL;
+    }
+
+    vmsg.size = sizeof(VhostUserFSSlaveMsg) +
+                fsm->count * sizeof(VhostUserFSSlaveMsgEntry);
+    memcpy(&vmsg.payload.fs, fsm, vmsg.size);
+
+    if (fd != -1) {
+        vmsg.fds[fd_num++] = fd;
+    }
+
+    vmsg.fd_num = fd_num;
+
+    if (!vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD)) {
+        return -EINVAL;
+    }
+
+    pthread_mutex_lock(&dev->slave_mutex);
+    if (!vu_message_write(dev, dev->slave_fd, &vmsg)) {
+        pthread_mutex_unlock(&dev->slave_mutex);
+        return -EIO;
+    }
+
+    /* Also unlocks the slave_mutex */
+    res = vu_process_message_reply(dev, &vmsg, &payload);
+    if (!res) {
+        return -EIO;
+    }
+    /*
+     * Payload is delivered as uint64_t but is actually signed for
+     * errors.
+     */
+    return (int64_t)payload;
+}
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 330b61c005..70fc61171f 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -122,6 +122,33 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
+/* Structures carried over the slave channel back to QEMU */
+#define VHOST_USER_FS_SLAVE_MAX_ENTRIES 32
+
+/* For the flags field of VhostUserFSSlaveMsg */
+#define VHOST_USER_FS_FLAG_MAP_R (1u << 0)
+#define VHOST_USER_FS_FLAG_MAP_W (1u << 1)
+
+typedef struct {
+    /* Offsets within the file being mapped */
+    uint64_t fd_offset;
+    /* Offsets within the cache */
+    uint64_t c_offset;
+    /* Lengths of sections */
+    uint64_t len;
+    /* Flags, from VHOST_USER_FS_FLAG_* */
+    uint64_t flags;
+} VhostUserFSSlaveMsgEntry;
+
+typedef struct {
+    /* Number of entries */
+    uint16_t count;
+    /* Spare */
+    uint16_t align;
+
+    VhostUserFSSlaveMsgEntry entries[];
+} VhostUserFSSlaveMsg;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -197,6 +224,7 @@ typedef struct VhostUserMsg {
         VhostUserConfig config;
         VhostUserVringArea area;
         VhostUserInflight inflight;
+        VhostUserFSSlaveMsg fs;
     } payload;
 
     int fds[VHOST_MEMORY_BASELINE_NREGIONS];
@@ -693,4 +721,16 @@ void vu_queue_get_avail_bytes(VuDev *vdev, VuVirtq *vq, unsigned int *in_bytes,
 bool vu_queue_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int in_bytes,
                           unsigned int out_bytes);
 
+/**
+ * vu_fs_cache_request: Send a slave message for an fs client
+ * @dev: a VuDev context
+ * @req: The request type (map, unmap, sync)
+ * @fd: an fd (only required for map, else must be -1)
+ * @fsm: The body of the message
+ *
+ * Returns: 0 or above for success, nevative errno on error
+ */
+int64_t vu_fs_cache_request(VuDev *dev, VhostUserSlaveRequest req, int fd,
+                            VhostUserFSSlaveMsg *fsm);
+
 #endif /* LIBVHOST_USER_H */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 06/25] DAX: virtio: Add shared memory capability
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 05/25] DAX subprojects/libvhost-user: Add virtio-fs slave types Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 07/25] DAX: virtio-fs: Add cache BAR Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'
and the data structure 'virtio_pci_cap64' to go with it.
They allow defining shared memory regions with sizes and offsets
of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by the 'id' field in the base capability.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/virtio/virtio-pci.c | 20 ++++++++++++++++++++
 hw/virtio/virtio-pci.h |  4 ++++
 2 files changed, 24 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b321604d9b..493014fdf7 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1138,6 +1138,26 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
     return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+                           uint8_t bar, uint64_t offset, uint64_t length,
+                           uint8_t id)
+{
+    struct virtio_pci_cap64 cap = {
+        .cap.cap_len = sizeof cap,
+        .cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+    };
+    uint32_t mask32 = ~0;
+
+    cap.cap.bar = bar;
+    cap.cap.id = id;
+    cap.cap.length = cpu_to_le32(length & mask32);
+    cap.length_hi = cpu_to_le32((length >> 32) & mask32);
+    cap.cap.offset = cpu_to_le32(offset & mask32);
+    cap.offset_hi = cpu_to_le32((offset >> 32) & mask32);
+
+    return virtio_pci_add_mem_cap(proxy, &cap.cap);
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
                                        unsigned size)
 {
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 2446dcd9ae..5e5c4a4c6d 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -252,4 +252,8 @@ void virtio_pci_types_register(const VirtioPCIDeviceTypeInfo *t);
  */
 unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues);
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+                           uint8_t bar, uint64_t offset, uint64_t length,
+                           uint8_t id);
+
 #endif
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 07/25] DAX: virtio-fs: Add cache BAR
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 06/25] DAX: virtio: Add shared memory capability Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 08/25] DAX: virtio-fs: Add vhost-user slave commands for mapping Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a cache BAR into which files will be directly mapped.
The size can be set with the cache-size= property, e.g.
   -device vhost-user-fs-pci,chardev=char0,tag=myfs,cache-size=16G

The default is no cache.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with PPC fixes by:
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 hw/virtio/vhost-user-fs-pci.c     | 32 +++++++++++++++++++++++++++++++
 hw/virtio/vhost-user-fs.c         | 32 +++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user-fs.h |  2 ++
 3 files changed, 66 insertions(+)

diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
index 2ed8492b3f..20e447631f 100644
--- a/hw/virtio/vhost-user-fs-pci.c
+++ b/hw/virtio/vhost-user-fs-pci.c
@@ -12,14 +12,19 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/vhost-user-fs.h"
 #include "virtio-pci.h"
 #include "qom/object.h"
+#include "standard-headers/linux/virtio_fs.h"
+
+#define VIRTIO_FS_PCI_CACHE_BAR 2
 
 struct VHostUserFSPCI {
     VirtIOPCIProxy parent_obj;
     VHostUserFS vdev;
+    MemoryRegion cachebar;
 };
 
 typedef struct VHostUserFSPCI VHostUserFSPCI;
@@ -38,7 +43,9 @@ static Property vhost_user_fs_pci_properties[] = {
 static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VHostUserFSPCI *dev = VHOST_USER_FS_PCI(vpci_dev);
+    bool modern_pio = vpci_dev->flags & VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY;
     DeviceState *vdev = DEVICE(&dev->vdev);
+    uint64_t cachesize;
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
         /* Also reserve config change and hiprio queue vectors */
@@ -46,6 +53,31 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     }
 
     qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
+    cachesize = dev->vdev.conf.cache_size;
+
+    if (cachesize && modern_pio) {
+        error_setg(errp, "DAX Cache can not be used together with modern_pio");
+        return;
+    }
+
+    /*
+     * The bar starts with the data/DAX cache
+     * Others will be added later.
+     */
+    memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
+                       "vhost-user-fs-pci-cachebar", cachesize);
+    if (cachesize) {
+        memory_region_add_subregion(&dev->cachebar, 0, &dev->vdev.cache);
+        virtio_pci_add_shm_cap(vpci_dev, VIRTIO_FS_PCI_CACHE_BAR, 0, cachesize,
+                               VIRTIO_FS_SHMCAP_ID_CACHE);
+
+        /* After 'realized' so the memory region exists */
+        pci_register_bar(&vpci_dev->pci_dev, VIRTIO_FS_PCI_CACHE_BAR,
+                         PCI_BASE_ADDRESS_SPACE_MEMORY |
+                         PCI_BASE_ADDRESS_MEM_PREFETCH |
+                         PCI_BASE_ADDRESS_MEM_TYPE_64,
+                         &dev->cachebar);
+    }
 }
 
 static void vhost_user_fs_pci_class_init(ObjectClass *klass, void *data)
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 6f7f91533d..dd0a02aa99 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -35,6 +35,16 @@ static const int user_feature_bits[] = {
     VHOST_INVALID_FEATURE_BIT
 };
 
+/*
+ * The powerpc kernel code expects the memory to be accessible during
+ * addition/removal.
+ */
+#if defined(TARGET_PPC64) && defined(CONFIG_LINUX)
+#define DAX_WINDOW_PROT PROT_READ
+#else
+#define DAX_WINDOW_PROT PROT_NONE
+#endif
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
@@ -175,6 +185,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserFS *fs = VHOST_USER_FS(dev);
+    void *cache_ptr;
     unsigned int i;
     size_t len;
     int ret;
@@ -214,6 +225,26 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
                    VIRTQUEUE_MAX_SIZE);
         return;
     }
+    if (fs->conf.cache_size &&
+        (!is_power_of_2(fs->conf.cache_size) ||
+          fs->conf.cache_size < qemu_real_host_page_size)) {
+        error_setg(errp, "cache-size property must be a power of 2 "
+                         "no smaller than the page size");
+        return;
+    }
+    if (fs->conf.cache_size) {
+        /* Anonymous, private memory is not counted as overcommit */
+        cache_ptr = mmap(NULL, fs->conf.cache_size, DAX_WINDOW_PROT,
+                         MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+        if (cache_ptr == MAP_FAILED) {
+            error_setg(errp, "Unable to mmap blank cache");
+            return;
+        }
+
+        memory_region_init_ram_ptr(&fs->cache, OBJECT(vdev),
+                                   "virtio-fs-cache",
+                                   fs->conf.cache_size, cache_ptr);
+    }
 
     if (!vhost_user_init(&fs->vhost_user, &fs->conf.chardev, errp)) {
         return;
@@ -289,6 +320,7 @@ static Property vuf_properties[] = {
     DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
                        conf.num_request_queues, 1),
     DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+    DEFINE_PROP_SIZE("cache-size", VHostUserFS, conf.cache_size, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 0d62834c25..04596799e3 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -28,6 +28,7 @@ typedef struct {
     char *tag;
     uint16_t num_request_queues;
     uint16_t queue_size;
+    uint64_t cache_size;
 } VHostUserFSConf;
 
 struct VHostUserFS {
@@ -42,6 +43,7 @@ struct VHostUserFS {
     int32_t bootindex;
 
     /*< public >*/
+    MemoryRegion cache;
 };
 
 #endif /* _QEMU_VHOST_USER_FS_H */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 08/25] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 07/25] DAX: virtio-fs: Add cache BAR Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 16:35   ` [Virtio-fs] " Greg Kurz
  2021-04-14 15:51 ` [PATCH v2 09/25] DAX: virtio-fs: Fill in " Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The daemon may request that fd's be mapped into the virtio-fs cache
visible to the guest.
These mappings are triggered by commands sent over the slave fd
from the daemon.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/interop/vhost-user.rst               | 21 ++++++++
 hw/virtio/vhost-user-fs.c                 | 66 +++++++++++++++++++++++
 hw/virtio/vhost-user.c                    | 25 +++++++++
 include/hw/virtio/vhost-user-fs.h         | 33 ++++++++++++
 subprojects/libvhost-user/libvhost-user.h |  2 +
 5 files changed, 147 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index d6085f7045..09aee3565d 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1432,6 +1432,27 @@ Slave message types
 
   The state.num field is currently reserved and must be set to 0.
 
+``VHOST_USER_SLAVE_FS_MAP``
+  :id: 6
+  :equivalent ioctl: N/A
+  :slave payload: ``struct VhostUserFSSlaveMsg``
+  :master payload: N/A
+
+  Requests that an fd, provided in the ancillary data, be mmapped
+  into the virtio-fs cache; multiple chunks can be mapped in one
+  command.
+  A reply is generated indicating whether mapping succeeded.
+
+``VHOST_USER_SLAVE_FS_UNMAP``
+  :id: 7
+  :equivalent ioctl: N/A
+  :slave payload: ``struct VhostUserFSSlaveMsg``
+  :master payload: N/A
+
+  Requests that the range in the virtio-fs cache is unmapped;
+  multiple chunks can be unmapped in one command.
+  A reply is generated indicating whether unmapping succeeded.
+
 .. _reply_ack:
 
 VHOST_USER_PROTOCOL_F_REPLY_ACK
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index dd0a02aa99..169a146e72 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -45,6 +45,72 @@ static const int user_feature_bits[] = {
 #define DAX_WINDOW_PROT PROT_NONE
 #endif
 
+/*
+ * The message apparently had 'received_size' bytes, check this
+ * matches the count in the message.
+ *
+ * Returns true if the size matches.
+ */
+static bool check_slave_message_entries(const VhostUserFSSlaveMsg *sm,
+                                        int received_size)
+{
+    int tmp;
+
+    /*
+     * VhostUserFSSlaveMsg consists of a body followed by 'n' entries,
+     * (each VhostUserFSSlaveMsgEntry).  There's a maximum of
+     * VHOST_USER_FS_SLAVE_MAX_ENTRIES of these.
+     */
+    if (received_size <= sizeof(VhostUserFSSlaveMsg)) {
+        error_report("%s: Short VhostUserFSSlaveMsg size, %d", __func__,
+                     received_size);
+        return false;
+    }
+
+    tmp = received_size - sizeof(VhostUserFSSlaveMsg);
+    if (tmp % sizeof(VhostUserFSSlaveMsgEntry)) {
+        error_report("%s: Non-multiple VhostUserFSSlaveMsg size, %d", __func__,
+                     received_size);
+        return false;
+    }
+
+    tmp /= sizeof(VhostUserFSSlaveMsgEntry);
+    if (tmp != sm->count) {
+        error_report("%s: VhostUserFSSlaveMsg count mismatch, %d count: %d",
+                     __func__, tmp, sm->count);
+        return false;
+    }
+
+    if (sm->count > VHOST_USER_FS_SLAVE_MAX_ENTRIES) {
+        error_report("%s: VhostUserFSSlaveMsg too many entries: %d",
+                     __func__, sm->count);
+        return false;
+    }
+    return true;
+}
+
+uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
+                                 VhostUserFSSlaveMsg *sm, int fd)
+{
+    if (!check_slave_message_entries(sm, message_size)) {
+        return (uint64_t)-1;
+    }
+
+    /* TODO */
+    return (uint64_t)-1;
+}
+
+uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
+                                   VhostUserFSSlaveMsg *sm)
+{
+    if (!check_slave_message_entries(sm, message_size)) {
+        return (uint64_t)-1;
+    }
+
+    /* TODO */
+    return (uint64_t)-1;
+}
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 3e4a25e108..ad9170f8dc 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -12,6 +12,7 @@
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
+#include "hw/virtio/vhost-user-fs.h"
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-net.h"
@@ -133,6 +134,10 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
     VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
+    VHOST_USER_SLAVE_VRING_CALL = 4,
+    VHOST_USER_SLAVE_VRING_ERR = 5,
+    VHOST_USER_SLAVE_FS_MAP = 6,
+    VHOST_USER_SLAVE_FS_UNMAP = 7,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -205,6 +210,16 @@ typedef struct {
     uint32_t size; /* the following payload size */
 } QEMU_PACKED VhostUserHeader;
 
+/*
+ * VhostUserFSSlaveMsg is special since it has a variable entry count,
+ * but it does have a maximum, so make a type for that to fit in our union
+ * for max size.
+ */
+typedef struct {
+    VhostUserFSSlaveMsg fs;
+    VhostUserFSSlaveMsgEntry entries[VHOST_USER_FS_SLAVE_MAX_ENTRIES];
+} QEMU_PACKED VhostUserFSSlaveMsgMax;
+
 typedef union {
 #define VHOST_USER_VRING_IDX_MASK   (0xff)
 #define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
@@ -219,6 +234,8 @@ typedef union {
         VhostUserCryptoSession session;
         VhostUserVringArea area;
         VhostUserInflight inflight;
+        VhostUserFSSlaveMsg fs;
+        VhostUserFSSlaveMsg fs_max; /* Never actually used */
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -1538,6 +1555,14 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
         ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
                                                           fd ? fd[0] : -1);
         break;
+#ifdef CONFIG_VHOST_USER_FS
+    case VHOST_USER_SLAVE_FS_MAP:
+        ret = vhost_user_fs_slave_map(dev, hdr.size, &payload.fs, fd[0]);
+        break;
+    case VHOST_USER_SLAVE_FS_UNMAP:
+        ret = vhost_user_fs_slave_unmap(dev, hdr.size, &payload.fs);
+        break;
+#endif
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
         ret = true;
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 04596799e3..0766f17548 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -23,6 +23,33 @@
 #define TYPE_VHOST_USER_FS "vhost-user-fs-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS)
 
+/* Structures carried over the slave channel back to QEMU */
+#define VHOST_USER_FS_SLAVE_MAX_ENTRIES 32
+
+/* For the flags field of VhostUserFSSlaveMsg */
+#define VHOST_USER_FS_FLAG_MAP_R (1u << 0)
+#define VHOST_USER_FS_FLAG_MAP_W (1u << 1)
+
+typedef struct {
+    /* Offsets within the file being mapped */
+    uint64_t fd_offset;
+    /* Offsets within the cache */
+    uint64_t c_offset;
+    /* Lengths of sections */
+    uint64_t len;
+    /* Flags, from VHOST_USER_FS_FLAG_* */
+    uint64_t flags;
+} VhostUserFSSlaveMsgEntry;
+
+typedef struct {
+    /* Number of entries */
+    uint16_t count;
+    /* Spare */
+    uint16_t align;
+
+    VhostUserFSSlaveMsgEntry entries[];
+} VhostUserFSSlaveMsg;
+
 typedef struct {
     CharBackend chardev;
     char *tag;
@@ -46,4 +73,10 @@ struct VHostUserFS {
     MemoryRegion cache;
 };
 
+/* Callbacks from the vhost-user code for slave commands */
+uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
+                                 VhostUserFSSlaveMsg *sm, int fd);
+uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
+                                   VhostUserFSSlaveMsg *sm);
+
 #endif /* _QEMU_VHOST_USER_FS_H */
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 70fc61171f..a98c5f5c11 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -119,6 +119,8 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
     VHOST_USER_SLAVE_VRING_CALL = 4,
     VHOST_USER_SLAVE_VRING_ERR = 5,
+    VHOST_USER_SLAVE_FS_MAP = 6,
+    VHOST_USER_SLAVE_FS_UNMAP = 7,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 09/25] DAX: virtio-fs: Fill in slave commands for mapping
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 08/25] DAX: virtio-fs: Add vhost-user slave commands for mapping Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 10/25] DAX: virtiofsd Add cache accessor functions Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Fill in definitions for map, unmap and sync commands.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with fix by misono.tomohiro@fujitsu.com
---
 hw/virtio/vhost-user-fs.c | 117 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 113 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 169a146e72..963f694435 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -92,23 +92,132 @@ static bool check_slave_message_entries(const VhostUserFSSlaveMsg *sm,
 uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
                                  VhostUserFSSlaveMsg *sm, int fd)
 {
+    VHostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
+                          TYPE_VHOST_USER_FS);
+    if (!fs) {
+        error_report("%s: Bad fs ptr", __func__);
+        return (uint64_t)-1;
+    }
     if (!check_slave_message_entries(sm, message_size)) {
         return (uint64_t)-1;
     }
 
-    /* TODO */
-    return (uint64_t)-1;
+    size_t cache_size = fs->conf.cache_size;
+    if (!cache_size) {
+        error_report("map called when DAX cache not present");
+        return (uint64_t)-1;
+    }
+    void *cache_host = memory_region_get_ram_ptr(&fs->cache);
+
+    unsigned int i;
+    int res = 0;
+
+    if (fd < 0) {
+        error_report("Bad fd for map");
+        return (uint64_t)-1;
+    }
+
+    for (i = 0; i < sm->count; i++) {
+        VhostUserFSSlaveMsgEntry *e = &sm->entries[i];
+        if (e->len == 0) {
+            continue;
+        }
+
+        if ((e->c_offset + e->len) < e->len ||
+            (e->c_offset + e->len) > cache_size) {
+            error_report("Bad offset/len for map [%d] %" PRIx64 "+%" PRIx64,
+                         i, e->c_offset, e->len);
+            res = -1;
+            break;
+        }
+
+        if (mmap(cache_host + e->c_offset, e->len,
+                 ((e->flags & VHOST_USER_FS_FLAG_MAP_R) ? PROT_READ : 0) |
+                 ((e->flags & VHOST_USER_FS_FLAG_MAP_W) ? PROT_WRITE : 0),
+                 MAP_SHARED | MAP_FIXED,
+                 fd, e->fd_offset) != (cache_host + e->c_offset)) {
+            res = -errno;
+            error_report("map failed err %d [%d] %" PRIx64 "+%" PRIx64 " from %"
+                         PRIx64, errno, i, e->c_offset, e->len,
+                         e->fd_offset);
+            break;
+        }
+    }
+
+    if (res) {
+        /* Something went wrong, unmap them all */
+        vhost_user_fs_slave_unmap(dev, message_size, sm);
+    }
+    return (uint64_t)res;
 }
 
 uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
                                    VhostUserFSSlaveMsg *sm)
 {
+    VHostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
+                          TYPE_VHOST_USER_FS);
+    if (!fs) {
+        error_report("%s: Bad fs ptr", __func__);
+        return (uint64_t)-1;
+    }
     if (!check_slave_message_entries(sm, message_size)) {
         return (uint64_t)-1;
     }
 
-    /* TODO */
-    return (uint64_t)-1;
+    size_t cache_size = fs->conf.cache_size;
+    if (!cache_size) {
+        /*
+         * Since dax cache is disabled, there should be no unmap request.
+         * Howerver we still receives whole range unmap request during umount
+         * for cleanup. Ignore it.
+         */
+        if (sm->entries[0].len == ~(uint64_t)0) {
+            return 0;
+        }
+
+        error_report("unmap called when DAX cache not present");
+        return (uint64_t)-1;
+    }
+    void *cache_host = memory_region_get_ram_ptr(&fs->cache);
+
+    unsigned int i;
+    int res = 0;
+
+    /*
+     * Note even if one unmap fails we try the rest, since the effect
+     * is to clean up as much as possible.
+     */
+    for (i = 0; i < sm->count; i++) {
+        VhostUserFSSlaveMsgEntry *e = &sm->entries[i];
+        void *ptr;
+        if (e->len == 0) {
+            continue;
+        }
+
+        if (e->len == ~(uint64_t)0) {
+            /* Special case meaning the whole arena */
+            e->len = cache_size;
+        }
+
+        if ((e->c_offset + e->len) < e->len ||
+            (e->c_offset + e->len) > cache_size) {
+            error_report("Bad offset/len for unmap [%d] %" PRIx64 "+%" PRIx64,
+                         i, e->c_offset, e->len);
+            res = -1;
+            continue;
+        }
+
+        ptr = mmap(cache_host + e->c_offset, e->len, DAX_WINDOW_PROT,
+                   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+        if (ptr != (cache_host + e->c_offset)) {
+            res = -errno;
+            error_report("mmap failed (%s) [%d] %" PRIx64 "+%" PRIx64 " from %"
+                         PRIx64 " res: %p", strerror(errno), i, e->c_offset,
+                         e->len, e->fd_offset, ptr);
+        }
+    }
+
+    return (uint64_t)res;
 }
 
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 10/25] DAX: virtiofsd Add cache accessor functions
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 09/25] DAX: virtio-fs: Fill in " Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 11/25] DAX: virtiofsd: Add setup/remove mappings fuse commands Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add low level functions that the clients can use to map/unmap cache
areas.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.h | 21 +++++++++++++++++++++
 tools/virtiofsd/fuse_virtio.c   | 18 ++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 3bf786b034..3383e3a8a0 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -29,6 +29,8 @@
 #include <sys/uio.h>
 #include <utime.h>
 
+#include "subprojects/libvhost-user/libvhost-user.h"
+
 /*
  * Miscellaneous definitions
  */
@@ -1971,4 +1973,23 @@ void fuse_session_process_buf(struct fuse_session *se,
  */
 int fuse_session_receive_buf(struct fuse_session *se, struct fuse_buf *buf);
 
+/**
+ * For use with virtio-fs; request an fd be mapped into the cache
+ *
+ * @param req The request that triggered this action
+ * @param msg A set of mapping requests
+ * @param fd The fd to map
+ * @return Zero on success
+ */
+int64_t fuse_virtio_map(fuse_req_t req, VhostUserFSSlaveMsg *msg, int fd);
+
+/**
+ * For use with virtio-fs; request unmapping of part of the cache
+ *
+ * @param se The session this request is on
+ * @param msg A set of unmapping requests
+ * @return Zero on success
+ */
+int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg);
+
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 2604e7f418..85d90ca595 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -1123,3 +1123,21 @@ void virtio_session_close(struct fuse_session *se)
     free(se->virtio_dev);
     se->virtio_dev = NULL;
 }
+
+int64_t fuse_virtio_map(fuse_req_t req, VhostUserFSSlaveMsg *msg, int fd)
+{
+    if (!req->se->virtio_dev) {
+        return -ENODEV;
+    }
+    return vu_fs_cache_request(&req->se->virtio_dev->dev,
+                               VHOST_USER_SLAVE_FS_MAP, fd, msg);
+}
+
+int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg)
+{
+    if (!se->virtio_dev) {
+        return -ENODEV;
+    }
+    return vu_fs_cache_request(&se->virtio_dev->dev, VHOST_USER_SLAVE_FS_UNMAP,
+                               -1, msg);
+}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 11/25] DAX: virtiofsd: Add setup/remove mappings fuse commands
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 10/25] DAX: virtiofsd Add cache accessor functions Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 12/25] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add commands so that the guest kernel can ask the daemon to map file
sections into a guest kernel visible cache.

Note: Catherine Ho had sent a patch to fix an issue with multiple
removemapping. It was a merge issue though.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Including-fixes: Catherine Ho <catherine.hecx@gmail.com>
Signed-off-by: Catherine Ho <catherine.hecx@gmail.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 69 +++++++++++++++++++++++++++++++++
 tools/virtiofsd/fuse_lowlevel.h | 23 ++++++++++-
 2 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 58e32fc963..1a00307879 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1870,6 +1870,73 @@ static void do_lseek(fuse_req_t req, fuse_ino_t nodeid,
     }
 }
 
+static void do_setupmapping(fuse_req_t req, fuse_ino_t nodeid,
+                            struct fuse_mbuf_iter *iter)
+{
+    struct fuse_setupmapping_in *arg;
+    struct fuse_file_info fi;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
+
+    /*
+     *  TODO: Need to come up with a better definition of flags here; it can't
+     * be the kernel view of the flags, since that's abstracted from the client
+     * similarly, it's not the vhost-user set
+     * for now just use O_ flags
+     */
+    uint64_t genflags;
+
+    genflags = O_RDONLY;
+    if (arg->flags & FUSE_SETUPMAPPING_FLAG_WRITE) {
+        genflags = O_RDWR;
+    }
+
+    if (req->se->op.setupmapping) {
+        req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
+                                 arg->moffset, genflags, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
+}
+
+static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
+                             struct fuse_mbuf_iter *iter)
+{
+    struct fuse_removemapping_in *arg;
+    struct fuse_removemapping_one *one;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg || !arg->count ||
+        (uint64_t)arg->count * sizeof(*one) >= SIZE_MAX) {
+        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
+    if (!one) {
+        fuse_log(
+            FUSE_LOG_ERR,
+            "do_removemapping: invalid in, expected %d * %ld, has %ld - %ld\n",
+            arg->count, sizeof(*one), iter->size, iter->pos);
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    if (req->se->op.removemapping) {
+        req->se->op.removemapping(req, req->se, nodeid, arg->count, one);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
+}
+
 static void do_init(fuse_req_t req, fuse_ino_t nodeid,
                     struct fuse_mbuf_iter *iter)
 {
@@ -2267,6 +2334,8 @@ static struct {
     [FUSE_RENAME2] = { do_rename2, "RENAME2" },
     [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
     [FUSE_LSEEK] = { do_lseek, "LSEEK" },
+    [FUSE_SETUPMAPPING] = { do_setupmapping, "SETUPMAPPING" },
+    [FUSE_REMOVEMAPPING] = { do_removemapping, "REMOVEMAPPING" },
 };
 
 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 3383e3a8a0..0bf206264d 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -24,6 +24,7 @@
 #endif
 
 #include "fuse_common.h"
+#include "standard-headers/linux/fuse.h"
 
 #include <sys/statvfs.h>
 #include <sys/uio.h>
@@ -1171,7 +1172,6 @@ struct fuse_lowlevel_ops {
      */
     void (*readdirplus)(fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
                         struct fuse_file_info *fi);
-
     /**
      * Copy a range of data from one file to another
      *
@@ -1227,6 +1227,27 @@ struct fuse_lowlevel_ops {
      */
     void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
                   struct fuse_file_info *fi);
+
+    /*
+     * Map file sections into kernel visible cache
+     *
+     * Map a section of the file into address space visible to the kernel
+     * mounting the filesystem.
+     * TODO
+     */
+    void (*setupmapping)(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
+                         uint64_t len, uint64_t moffset, uint64_t flags,
+                         struct fuse_file_info *fi);
+
+    /*
+     * Unmap file sections in kernel visible cache
+     *
+     * Unmap sections previously mapped by setupmapping
+     * TODO
+     */
+    void (*removemapping)(fuse_req_t req, struct fuse_session *se,
+                          fuse_ino_t ino, unsigned num,
+                          struct fuse_removemapping_one *argp);
 };
 
 /**
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 12/25] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 11/25] DAX: virtiofsd: Add setup/remove mappings fuse commands Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 13/25] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 1553d2ef45..afa86650c1 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3144,6 +3144,22 @@ static void lo_destroy(void *userdata)
     pthread_mutex_unlock(&lo->mutex);
 }
 
+static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
+                            uint64_t len, uint64_t moffset, uint64_t flags,
+                            struct fuse_file_info *fi)
+{
+    /* TODO */
+    fuse_reply_err(req, ENOSYS);
+}
+
+static void lo_removemapping(fuse_req_t req, struct fuse_session *se,
+                             fuse_ino_t ino, unsigned num,
+                             struct fuse_removemapping_one *argp)
+{
+    /* TODO */
+    fuse_reply_err(req, ENOSYS);
+}
+
 static struct fuse_lowlevel_ops lo_oper = {
     .init = lo_init,
     .lookup = lo_lookup,
@@ -3185,6 +3201,8 @@ static struct fuse_lowlevel_ops lo_oper = {
 #endif
     .lseek = lo_lseek,
     .destroy = lo_destroy,
+    .setupmapping = lo_setupmapping,
+    .removemapping = lo_removemapping,
 };
 
 /* Print vhost-user.json backend program capabilities */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 13/25] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 12/25] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 14/25] DAX: virtiofsd: Make lo_removemapping() work Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up passthrough_ll's setupmapping to allocate, send to virtio
and then reply OK.

Guest might not pass file pointer. In that case using inode info, open
the file again, mmap() and close fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
With fix from:
Signed-off-by: Fotis Xenakis <foxen@windowslive.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c  | 13 ++++++--
 tools/virtiofsd/passthrough_ll.c | 57 ++++++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 1a00307879..81c4c2b373 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1899,8 +1899,17 @@ static void do_setupmapping(fuse_req_t req, fuse_ino_t nodeid,
     }
 
     if (req->se->op.setupmapping) {
-        req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
-                                 arg->moffset, genflags, &fi);
+        /*
+         * TODO: Add a flag to request which tells if arg->fh is
+         * valid or not.
+         */
+        if (fi.fh == (uint64_t)-1) {
+            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
+                                     arg->moffset, genflags, NULL);
+        } else {
+            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
+                                     arg->moffset, genflags, &fi);
+        }
     } else {
         fuse_reply_err(req, ENOSYS);
     }
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index afa86650c1..bf7a7f3c23 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3148,8 +3148,61 @@ static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
                             uint64_t len, uint64_t moffset, uint64_t flags,
                             struct fuse_file_info *fi)
 {
-    /* TODO */
-    fuse_reply_err(req, ENOSYS);
+    struct lo_data *lo = lo_data(req);
+    int ret = 0, fd;
+    VhostUserFSSlaveMsg *msg = g_malloc0(sizeof(VhostUserFSSlaveMsg) +
+                                         sizeof(VhostUserFSSlaveMsgEntry));
+    uint64_t vhu_flags;
+    char *buf;
+    bool writable = flags & O_RDWR;
+
+    fuse_log(FUSE_LOG_DEBUG,
+             "lo_setupmapping(ino=%" PRIu64 ", fi=0x%p,"
+             " foffset=%" PRIu64 ", len=%" PRIu64 ", moffset=%" PRIu64
+             ", flags=%" PRIu64 ")\n",
+             ino, (void *)fi, foffset, len, moffset, flags);
+
+    vhu_flags = VHOST_USER_FS_FLAG_MAP_R;
+    if (writable) {
+        vhu_flags |= VHOST_USER_FS_FLAG_MAP_W;
+    }
+
+    msg->count = 1;
+    msg->entries[0].fd_offset = foffset;
+    msg->entries[0].len = len;
+    msg->entries[0].c_offset = moffset;
+    msg->entries[0].flags = vhu_flags;
+
+    if (fi) {
+        fd = lo_fi_fd(req, fi);
+    } else {
+        ret = asprintf(&buf, "%i", lo_fd(req, ino));
+        if (ret == -1) {
+            g_free(msg);
+            return (void)fuse_reply_err(req, errno);
+        }
+
+        fd = openat(lo->proc_self_fd, buf, flags);
+        free(buf);
+        if (fd == -1) {
+            g_free(msg);
+            return (void)fuse_reply_err(req, errno);
+        }
+    }
+
+    ret = fuse_virtio_map(req, msg, fd);
+    if (ret < 0) {
+        fuse_log(FUSE_LOG_ERR,
+                 "%s: map over virtio failed (ino=%" PRId64
+                 "fd=%d moffset=0x%" PRIx64 "). err = %d\n",
+                 __func__, ino, fd, moffset, ret);
+    }
+
+    if (!fi) {
+        close(fd);
+    }
+    fuse_reply_err(req, -ret);
+    g_free(msg);
 }
 
 static void lo_removemapping(fuse_req_t req, struct fuse_session *se,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 14/25] DAX: virtiofsd: Make lo_removemapping() work
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 13/25] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 15/25] DAX: virtiofsd: route se down to destroy method Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: Vivek Goyal <vgoyal@redhat.com>

Let guest pass in the offset in dax window a mapping is currently
mapped at and needs to be removed.

Vivek added the initial support to remove single mapping and later Peng
added patch to support removing multiple mappings in single command.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index bf7a7f3c23..a9f0505414 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3209,8 +3209,36 @@ static void lo_removemapping(fuse_req_t req, struct fuse_session *se,
                              fuse_ino_t ino, unsigned num,
                              struct fuse_removemapping_one *argp)
 {
-    /* TODO */
-    fuse_reply_err(req, ENOSYS);
+    VhostUserFSSlaveMsg *msg;
+    size_t alloc_count = (num > VHOST_USER_FS_SLAVE_MAX_ENTRIES) ?
+                              VHOST_USER_FS_SLAVE_MAX_ENTRIES : num;
+    int ret = 0;
+    msg = g_malloc0(sizeof(VhostUserFSSlaveMsg) +
+                    alloc_count * sizeof(VhostUserFSSlaveMsgEntry));
+
+    for (int i = 0, o = 0; num > 0; i++, argp++) {
+        VhostUserFSSlaveMsgEntry *e = &msg->entries[o];
+
+        e->len = argp->len;
+        e->c_offset = argp->moffset;
+
+        o++;
+        if (--num == 0 || o == VHOST_USER_FS_SLAVE_MAX_ENTRIES) {
+            msg->count = o;
+            ret = fuse_virtio_unmap(se, msg);
+            if (ret < 0) {
+                fuse_log(FUSE_LOG_ERR,
+                         "%s: unmap over virtio failed "
+                         "(offset=0x%lx, len=0x%lx). err=%d\n",
+                         __func__, argp->moffset, argp->len, ret);
+                break;
+            }
+            o = 0;
+        }
+    }
+
+    fuse_reply_err(req, -ret);
+    g_free(msg);
 }
 
 static struct fuse_lowlevel_ops lo_oper = {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 15/25] DAX: virtiofsd: route se down to destroy method
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 14/25] DAX: virtiofsd: Make lo_removemapping() work Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 16/25] DAX: virtiofsd: Perform an unmap on destroy Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We're going to need to pass the session down to destroy so that it can
pass it back to do the remove mapping.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c  | 6 +++---
 tools/virtiofsd/fuse_lowlevel.h  | 2 +-
 tools/virtiofsd/passthrough_ll.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 81c4c2b373..e82edce72a 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2222,7 +2222,7 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
     se->got_destroy = 1;
     se->got_init = 0;
     if (se->op.destroy) {
-        se->op.destroy(se->userdata);
+        se->op.destroy(se->userdata, se);
     }
 
     send_reply_ok(req, NULL, 0);
@@ -2449,7 +2449,7 @@ void fuse_session_process_buf_int(struct fuse_session *se,
             se->got_destroy = 1;
             se->got_init = 0;
             if (se->op.destroy) {
-                se->op.destroy(se->userdata);
+                se->op.destroy(se->userdata, se);
             }
         } else {
             goto reply_err;
@@ -2538,7 +2538,7 @@ void fuse_session_destroy(struct fuse_session *se)
 {
     if (se->got_init && !se->got_destroy) {
         if (se->op.destroy) {
-            se->op.destroy(se->userdata);
+            se->op.destroy(se->userdata, se);
         }
     }
     pthread_rwlock_destroy(&se->init_rwlock);
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 0bf206264d..27b07bfc22 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -209,7 +209,7 @@ struct fuse_lowlevel_ops {
      *
      * @param userdata the user data passed to fuse_session_new()
      */
-    void (*destroy)(void *userdata);
+    void (*destroy)(void *userdata, struct fuse_session *se);
 
     /**
      * Look up a directory entry by name and get its attributes.
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index a9f0505414..910c4831fb 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3124,7 +3124,7 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
     }
 }
 
-static void lo_destroy(void *userdata)
+static void lo_destroy(void *userdata, struct fuse_session *se)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 16/25] DAX: virtiofsd: Perform an unmap on destroy
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 15/25] DAX: virtiofsd: route se down to destroy method Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 17/25] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Force unmap all remaining dax cache entries on a destroy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 910c4831fb..726343677e 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3128,6 +3128,20 @@ static void lo_destroy(void *userdata, struct fuse_session *se)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
 
+    if (fuse_lowlevel_is_virtio(se)) {
+        VhostUserFSSlaveMsg *msg = g_malloc0(sizeof(VhostUserFSSlaveMsg) +
+                                             sizeof(VhostUserFSSlaveMsgEntry));
+
+        msg->count = 0;
+        msg->entries[0].len = ~(uint64_t)0; /* Special: means 'all' */
+        msg->entries[0].c_offset = 0;
+        if (fuse_virtio_unmap(se, msg)) {
+            fuse_log(FUSE_LOG_ERR, "%s: unmap during destroy failed\n",
+                     __func__);
+        }
+        g_free(msg);
+    }
+
     pthread_mutex_lock(&lo->mutex);
     while (true) {
         GHashTableIter iter;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 17/25] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 16/25] DAX: virtiofsd: Perform an unmap on destroy Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-21 20:07   ` [Virtio-fs] " Vivek Goyal
  2021-04-14 15:51 ` [PATCH v2 18/25] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
client to ask qemu to perform a read/write from an fd directly
to GPA.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/interop/vhost-user.rst               | 16 ++++
 hw/virtio/trace-events                    |  6 ++
 hw/virtio/vhost-user-fs.c                 | 95 +++++++++++++++++++++++
 hw/virtio/vhost-user.c                    |  4 +
 include/hw/virtio/vhost-user-fs.h         |  2 +
 subprojects/libvhost-user/libvhost-user.h |  1 +
 6 files changed, 124 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 09aee3565d..2fa62ea451 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1453,6 +1453,22 @@ Slave message types
   multiple chunks can be unmapped in one command.
   A reply is generated indicating whether unmapping succeeded.
 
+``VHOST_USER_SLAVE_FS_IO``
+  :id: 9
+  :equivalent ioctl: N/A
+  :slave payload: ``struct VhostUserFSSlaveMsg``
+  :master payload: N/A
+
+  Requests that IO be performed directly from an fd, passed in ancillary
+  data, to guest memory on behalf of the daemon; this is normally for a
+  case where a memory region isn't visible to the daemon. slave payload
+  has flags which determine the direction of IO operation.
+
+  The ``VHOST_USER_FS_FLAG_MAP_R`` flag must be set in the ``flags`` field to
+  read from the file into RAM.
+  The ``VHOST_USER_FS_FLAG_MAP_W`` flag must be set in the ``flags`` field to
+  write to the file from RAM.
+
 .. _reply_ack:
 
 VHOST_USER_PROTOCOL_F_REPLY_ACK
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index c62727f879..20557a078e 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
 vhost_vdpa_set_owner(void *dev) "dev: %p"
 vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
 
+# vhost-user-fs.c
+
+vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
+vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
+vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
+
 # virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
 virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 963f694435..5511838f29 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -23,6 +23,8 @@
 #include "hw/virtio/vhost-user-fs.h"
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
+#include "exec/address-spaces.h"
+#include "trace.h"
 
 static const int user_feature_bits[] = {
     VIRTIO_F_VERSION_1,
@@ -220,6 +222,99 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
     return (uint64_t)res;
 }
 
+uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
+                                VhostUserFSSlaveMsg *sm, int fd)
+{
+    VHostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
+                          TYPE_VHOST_USER_FS);
+    if (!fs) {
+        error_report("%s: Bad fs ptr", __func__);
+        return (uint64_t)-1;
+    }
+    if (!check_slave_message_entries(sm, message_size)) {
+        return (uint64_t)-1;
+    }
+
+    unsigned int i;
+    int res = 0;
+    size_t done = 0;
+
+    if (fd < 0) {
+        error_report("Bad fd for map");
+        return (uint64_t)-1;
+    }
+
+    for (i = 0; i < sm->count && !res; i++) {
+        VhostUserFSSlaveMsgEntry *e = &sm->entries[i];
+        if (e->len == 0) {
+            continue;
+        }
+
+        size_t len = e->len;
+        uint64_t fd_offset = e->fd_offset;
+        hwaddr gpa = e->c_offset;
+
+        while (len && !res) {
+            hwaddr xlat, xlat_len;
+            bool is_write = e->flags & VHOST_USER_FS_FLAG_MAP_W;
+            MemoryRegion *mr = address_space_translate(dev->vdev->dma_as, gpa,
+                                                       &xlat, &xlat_len,
+                                                       is_write,
+                                                       MEMTXATTRS_UNSPECIFIED);
+            if (!mr || !xlat_len) {
+                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
+                res = -EFAULT;
+                break;
+            }
+
+            trace_vhost_user_fs_slave_io_loop(mr->name,
+                                          (uint64_t)xlat,
+                                          memory_region_is_ram(mr),
+                                          memory_region_is_romd(mr),
+                                          (size_t)xlat_len);
+
+            void *hostptr = qemu_map_ram_ptr(mr->ram_block,
+                                             xlat);
+            ssize_t transferred;
+            if (e->flags & VHOST_USER_FS_FLAG_MAP_R) {
+                /* Read from file into RAM */
+                if (mr->readonly) {
+                    res = -EFAULT;
+                    break;
+                }
+                transferred = pread(fd, hostptr, xlat_len, fd_offset);
+            } else if (e->flags & VHOST_USER_FS_FLAG_MAP_W) {
+                /* Write into file from RAM */
+                transferred = pwrite(fd, hostptr, xlat_len, fd_offset);
+            } else {
+                transferred = EINVAL;
+            }
+
+            trace_vhost_user_fs_slave_io_loop_res(transferred);
+            if (transferred < 0) {
+                res = -errno;
+                break;
+            }
+            if (!transferred) {
+                /* EOF */
+                break;
+            }
+
+            done += transferred;
+            fd_offset += transferred;
+            gpa += transferred;
+            len -= transferred;
+        }
+    }
+    close(fd);
+
+    trace_vhost_user_fs_slave_io_exit(res, done);
+    if (res < 0) {
+        return (uint64_t)res;
+    }
+    return (uint64_t)done;
+}
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index ad9170f8dc..b9699586ae 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -138,6 +138,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_VRING_ERR = 5,
     VHOST_USER_SLAVE_FS_MAP = 6,
     VHOST_USER_SLAVE_FS_UNMAP = 7,
+    VHOST_USER_SLAVE_FS_IO = 8,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -1562,6 +1563,9 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
     case VHOST_USER_SLAVE_FS_UNMAP:
         ret = vhost_user_fs_slave_unmap(dev, hdr.size, &payload.fs);
         break;
+    case VHOST_USER_SLAVE_FS_IO:
+        ret = vhost_user_fs_slave_io(dev, hdr.size, &payload.fs, fd[0]);
+        break;
 #endif
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 0766f17548..2931164e23 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -78,5 +78,7 @@ uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
                                  VhostUserFSSlaveMsg *sm, int fd);
 uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
                                    VhostUserFSSlaveMsg *sm);
+uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
+                                VhostUserFSSlaveMsg *sm, int fd);
 
 #endif /* _QEMU_VHOST_USER_FS_H */
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index a98c5f5c11..42b0833c4b 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -121,6 +121,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_VRING_ERR = 5,
     VHOST_USER_SLAVE_FS_MAP = 6,
     VHOST_USER_SLAVE_FS_UNMAP = 7,
+    VHOST_USER_SLAVE_FS_IO = 8,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 18/25] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 17/25] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 19/25] DAX/unmap virtiofsd: Parse unmappable elements Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a wrapper to send VHOST_USER_SLAVE_FS_IO commands and a
further wrapper for sending a fuse_buf write using the FS_IO
slave command.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.h | 25 +++++++++++++++++++
 tools/virtiofsd/fuse_virtio.c   | 43 +++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 27b07bfc22..757cdae49b 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -2013,4 +2013,29 @@ int64_t fuse_virtio_map(fuse_req_t req, VhostUserFSSlaveMsg *msg, int fd);
  */
 int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg);
 
+/**
+ * For use with virtio-fs; request IO directly to memory
+ *
+ * @param se The current session
+ * @param msg A set of IO requests
+ * @param fd The fd to map
+ * @return Length on success, negative errno on error
+ */
+int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
+                       int fd);
+
+/**
+ * For use with virtio-fs; wrapper for fuse_virtio_io for writes
+ * from memory to an fd
+ * @param req The request that triggered this action
+ * @param dst The destination (file) memory buffer
+ * @param dst_off Byte offset in the file
+ * @param src The source (memory) buffer
+ * @param src_off The GPA
+ * @param len Length in bytes
+ */
+ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
+                          size_t dst_off, const struct fuse_buf *src,
+                          size_t src_off, size_t len);
+
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 85d90ca595..402e94dde6 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -1141,3 +1141,46 @@ int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg)
     return vu_fs_cache_request(&se->virtio_dev->dev, VHOST_USER_SLAVE_FS_UNMAP,
                                -1, msg);
 }
+
+int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
+                       int fd)
+{
+    if (!se->virtio_dev) {
+        return -ENODEV;
+    }
+    return vu_fs_cache_request(&se->virtio_dev->dev, VHOST_USER_SLAVE_FS_IO,
+                               fd, msg);
+}
+
+/*
+ * Write to a file (dst) from an area of guest GPA (src) that probably
+ * isn't visible to the daemon.
+ */
+ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
+                          size_t dst_off, const struct fuse_buf *src,
+                          size_t src_off, size_t len)
+{
+    VhostUserFSSlaveMsg *msg = g_malloc0(sizeof(VhostUserFSSlaveMsg) +
+                                         sizeof(VhostUserFSSlaveMsgEntry));
+
+    msg->count = 1;
+
+    if (dst->flags & FUSE_BUF_FD_SEEK) {
+        msg->entries[0].fd_offset = dst->pos + dst_off;
+    } else {
+        off_t cur = lseek(dst->fd, 0, SEEK_CUR);
+        if (cur == (off_t)-1) {
+            g_free(msg);
+            return -errno;
+        }
+        msg->entries[0].fd_offset = cur;
+    }
+    msg->entries[0].c_offset = (uintptr_t)src->mem + src_off;
+    msg->entries[0].len = len;
+    msg->entries[0].flags = VHOST_USER_FS_FLAG_MAP_W;
+
+    int64_t result = fuse_virtio_io(req->se, msg, dst->fd);
+    fuse_log(FUSE_LOG_DEBUG, "%s: result=%ld\n", __func__, result);
+    g_free(msg);
+    return result;
+}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 19/25] DAX/unmap virtiofsd: Parse unmappable elements
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 18/25] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 20/25] DAX/unmap virtiofsd: Route unmappable reads Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

For some read/writes the virtio queue elements are unmappable by
the daemon; these are cases where the data is to be read/written
from non-RAM.  In viritofs's case this is typically a direct read/write
into an mmap'd DAX file also on virtiofs (possibly on another instance).

When we receive a virtio queue element, check that we have enough
mappable data to handle the headers.  Make a note of the number of
unmappable 'in' entries (ie. for read data back to the VMM),
and flag the fuse_bufvec for 'out' entries with a new flag
FUSE_BUF_PHYS_ADDR.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with fix by:
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/buffer.c      |   4 +-
 tools/virtiofsd/fuse_common.h |   7 ++
 tools/virtiofsd/fuse_virtio.c | 230 ++++++++++++++++++++++++----------
 3 files changed, 173 insertions(+), 68 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 874f01c488..1a050aa441 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -77,6 +77,7 @@ static ssize_t fuse_buf_write(const struct fuse_buf *dst, size_t dst_off,
     ssize_t res = 0;
     size_t copied = 0;
 
+    assert(!(src->flags & FUSE_BUF_PHYS_ADDR));
     while (len) {
         if (dst->flags & FUSE_BUF_FD_SEEK) {
             res = pwrite(dst->fd, (char *)src->mem + src_off, len,
@@ -272,7 +273,8 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv)
      * process
      */
     for (i = 0; i < srcv->count; i++) {
-        if (srcv->buf[i].flags & FUSE_BUF_IS_FD) {
+        if ((srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR) ||
+            (srcv->buf[i].flags & FUSE_BUF_IS_FD)) {
             break;
         }
     }
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index fa9671872e..af43cf19f9 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -626,6 +626,13 @@ enum fuse_buf_flags {
      * detected.
      */
     FUSE_BUF_FD_RETRY = (1 << 3),
+
+    /**
+     * The addresses in the iovec represent guest physical addresses
+     * that can't be mapped by the daemon process.
+     * IO must be bounced back to the VMM to do it.
+     */
+    FUSE_BUF_PHYS_ADDR = (1 << 4),
 };
 
 /**
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 402e94dde6..5ed78bd8cf 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -49,6 +49,10 @@ typedef struct {
     VuVirtqElement elem;
     struct fuse_chan ch;
 
+    /* Number of unmappable iovecs */
+    unsigned bad_in_num;
+    unsigned bad_out_num;
+
     /* Used to complete requests that involve no reply */
     bool reply_sent;
 } FVRequest;
@@ -353,8 +357,10 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
 
     /* The 'in' part of the elem is to qemu */
     unsigned int in_num = elem->in_num;
+    unsigned int bad_in_num = req->bad_in_num;
     struct iovec *in_sg = elem->in_sg;
     size_t in_len = iov_size(in_sg, in_num);
+    size_t in_len_writeable = iov_size(in_sg, in_num - bad_in_num);
     fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
              __func__, elem->index, in_num, in_len);
 
@@ -362,7 +368,7 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
      * The elem should have room for a 'fuse_out_header' (out from fuse)
      * plus the data based on the len in the header.
      */
-    if (in_len < sizeof(struct fuse_out_header)) {
+    if (in_len_writeable < sizeof(struct fuse_out_header)) {
         fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
                  __func__, elem->index);
         ret = E2BIG;
@@ -389,7 +395,7 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
     memcpy(in_sg_cpy, in_sg, sizeof(struct iovec) * in_num);
     /* These get updated as we skip */
     struct iovec *in_sg_ptr = in_sg_cpy;
-    int in_sg_cpy_count = in_num;
+    int in_sg_cpy_count = in_num - bad_in_num;
 
     /* skip over parts of in_sg that contained the header iov */
     size_t skip_size = iov_len;
@@ -523,17 +529,21 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
 
     /* The 'out' part of the elem is from qemu */
     unsigned int out_num = elem->out_num;
+    unsigned int out_num_readable = out_num - req->bad_out_num;
     struct iovec *out_sg = elem->out_sg;
     size_t out_len = iov_size(out_sg, out_num);
+    size_t out_len_readable = iov_size(out_sg, out_num_readable);
     fuse_log(FUSE_LOG_DEBUG,
-             "%s: elem %d: with %d out desc of length %zd\n",
-             __func__, elem->index, out_num, out_len);
+             "%s: elem %d: with %d out desc of length %zd"
+             " bad_in_num=%u bad_out_num=%u\n",
+             __func__, elem->index, out_num, out_len, req->bad_in_num,
+             req->bad_out_num);
 
     /*
      * The elem should contain a 'fuse_in_header' (in to fuse)
      * plus the data based on the len in the header.
      */
-    if (out_len < sizeof(struct fuse_in_header)) {
+    if (out_len_readable < sizeof(struct fuse_in_header)) {
         fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for in_header\n",
                  __func__, elem->index);
         assert(0); /* TODO */
@@ -544,80 +554,163 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
         assert(0); /* TODO */
     }
     /* Copy just the fuse_in_header and look at it */
-    copy_from_iov(&fbuf, out_num, out_sg,
+    copy_from_iov(&fbuf, out_num_readable, out_sg,
                   sizeof(struct fuse_in_header));
     memcpy(&inh, fbuf.mem, sizeof(struct fuse_in_header));
 
     pbufv = NULL; /* Compiler thinks an unitialised path */
-    if (inh.opcode == FUSE_WRITE &&
-        out_len >= (sizeof(struct fuse_in_header) +
-                    sizeof(struct fuse_write_in))) {
-        /*
-         * For a write we don't actually need to copy the
-         * data, we can just do it straight out of guest memory
-         * but we must still copy the headers in case the guest
-         * was nasty and changed them while we were using them.
-         */
-        fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__);
-
-        fbuf.size = copy_from_iov(&fbuf, out_num, out_sg,
-                                  sizeof(struct fuse_in_header) +
-                                  sizeof(struct fuse_write_in));
-        /* That copy reread the in_header, make sure we use the original */
-        memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header));
-
-        /* Allocate the bufv, with space for the rest of the iov */
-        pbufv = malloc(sizeof(struct fuse_bufvec) +
-                       sizeof(struct fuse_buf) * out_num);
-        if (!pbufv) {
-            fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
-                    __func__);
-            goto out;
-        }
+    if (req->bad_in_num || req->bad_out_num) {
+        bool handled_unmappable = false;
+
+        if (!req->bad_in_num &&
+            inh.opcode == FUSE_WRITE &&
+            out_len_readable >= (sizeof(struct fuse_in_header) +
+                                 sizeof(struct fuse_write_in))) {
+            handled_unmappable = true;
+
+            /* copy the fuse_write_in header after fuse_in_header */
+            fbuf.size = copy_from_iov(&fbuf, out_num_readable, out_sg,
+                                      sizeof(struct fuse_in_header) +
+                                      sizeof(struct fuse_write_in));
+            /* That copy reread the in_header, make sure we use the original */
+            memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header));
+
+            /* Allocate the bufv, with space for the rest of the iov */
+            pbufv = malloc(sizeof(struct fuse_bufvec) +
+                           sizeof(struct fuse_buf) * out_num);
+            if (!pbufv) {
+                fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
+                        __func__);
+                goto out;
+            }
 
-        allocated_bufv = true;
-        pbufv->count = 1;
-        pbufv->buf[0] = fbuf;
+            allocated_bufv = true;
+            pbufv->count = 1;
+            pbufv->buf[0] = fbuf;
 
-        size_t iovindex, pbufvindex, iov_bytes_skip;
-        pbufvindex = 1; /* 2 headers, 1 fusebuf */
+            size_t iovindex, pbufvindex, iov_bytes_skip;
+            pbufvindex = 1; /* 2 headers, 1 fusebuf */
 
-        if (!skip_iov(out_sg, out_num,
-                      sizeof(struct fuse_in_header) +
-                      sizeof(struct fuse_write_in),
-                      &iovindex, &iov_bytes_skip)) {
-            fuse_log(FUSE_LOG_ERR, "%s: skip failed\n",
-                    __func__);
-            goto out;
-        }
+            if (!skip_iov(out_sg, out_num,
+                          sizeof(struct fuse_in_header) +
+                          sizeof(struct fuse_write_in),
+                          &iovindex, &iov_bytes_skip)) {
+                fuse_log(FUSE_LOG_ERR, "%s: skip failed\n",
+                        __func__);
+                goto out;
+            }
 
-        for (; iovindex < out_num; iovindex++, pbufvindex++) {
-            pbufv->count++;
-            pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
-            pbufv->buf[pbufvindex].flags = 0;
-            pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
-            pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
-
-            if (iov_bytes_skip) {
-                pbufv->buf[pbufvindex].mem += iov_bytes_skip;
-                pbufv->buf[pbufvindex].size -= iov_bytes_skip;
-                iov_bytes_skip = 0;
+            for (; iovindex < out_num; iovindex++, pbufvindex++) {
+                pbufv->count++;
+                pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
+                pbufv->buf[pbufvindex].flags =
+                    (iovindex < out_num_readable) ? 0 :
+                                                    FUSE_BUF_PHYS_ADDR;
+                pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
+                pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+
+                if (iov_bytes_skip) {
+                    pbufv->buf[pbufvindex].mem += iov_bytes_skip;
+                    pbufv->buf[pbufvindex].size -= iov_bytes_skip;
+                    iov_bytes_skip = 0;
+                }
             }
         }
-    } else {
-        /* Normal (non fast write) path */
 
-        copy_from_iov(&fbuf, out_num, out_sg, se->bufsize);
-        /* That copy reread the in_header, make sure we use the original */
-        memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header));
-        fbuf.size = out_len;
+        if (req->bad_in_num &&
+            inh.opcode == FUSE_READ &&
+            out_len_readable >=
+                (sizeof(struct fuse_in_header) + sizeof(struct fuse_read_in))) {
+            fuse_log(FUSE_LOG_DEBUG,
+                     "Unmappable read case "
+                     "in_num=%d bad_in_num=%d\n",
+                     elem->in_num, req->bad_in_num);
+            handled_unmappable = true;
+        }
+
+        if (!handled_unmappable) {
+            fuse_log(FUSE_LOG_ERR,
+                     "Unhandled unmappable element: out: %d(b:%d) in: "
+                     "%d(b:%d)",
+                     out_num, req->bad_out_num, elem->in_num, req->bad_in_num);
+            fv_panic(dev, "Unhandled unmappable element");
+        }
+    }
+
+    if (!req->bad_out_num) {
+        if (inh.opcode == FUSE_WRITE &&
+            out_len_readable >= (sizeof(struct fuse_in_header) +
+                                 sizeof(struct fuse_write_in))) {
+            /*
+             * For a write we don't actually need to copy the
+             * data, we can just do it straight out of guest memory
+             * but we must still copy the headers in case the guest
+             * was nasty and changed them while we were using them.
+             */
+            fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n",
+                     __func__);
+
+            fbuf.size = copy_from_iov(&fbuf, out_num, out_sg,
+                                      sizeof(struct fuse_in_header) +
+                                      sizeof(struct fuse_write_in));
+            /* That copy reread the in_header, make sure we use the original */
+            memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header));
+
+            /* Allocate the bufv, with space for the rest of the iov */
+            pbufv = malloc(sizeof(struct fuse_bufvec) +
+                           sizeof(struct fuse_buf) * out_num);
+            if (!pbufv) {
+                fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
+                        __func__);
+                goto out;
+            }
+
+            allocated_bufv = true;
+            pbufv->count = 1;
+            pbufv->buf[0] = fbuf;
 
-        /* TODO! Endianness of header */
+            size_t iovindex, pbufvindex, iov_bytes_skip;
+            pbufvindex = 1; /* 2 headers, 1 fusebuf */
 
-        /* TODO: Add checks for fuse_session_exited */
-        bufv.buf[0] = fbuf;
-        bufv.count = 1;
-        pbufv = &bufv;
+            if (!skip_iov(out_sg, out_num,
+                          sizeof(struct fuse_in_header) +
+                          sizeof(struct fuse_write_in),
+                          &iovindex, &iov_bytes_skip)) {
+                fuse_log(FUSE_LOG_ERR, "%s: skip failed\n",
+                        __func__);
+                goto out;
+            }
+
+            for (; iovindex < out_num; iovindex++, pbufvindex++) {
+                pbufv->count++;
+                pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
+                pbufv->buf[pbufvindex].flags = 0;
+                pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
+                pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+
+                if (iov_bytes_skip) {
+                    pbufv->buf[pbufvindex].mem += iov_bytes_skip;
+                    pbufv->buf[pbufvindex].size -= iov_bytes_skip;
+                    iov_bytes_skip = 0;
+                }
+            }
+        } else {
+            /* Normal (non fast write) path */
+
+            /* Copy the rest of the buffer */
+            copy_from_iov(&fbuf, out_num, out_sg, se->bufsize);
+            /* That copy reread the in_header, make sure we use the original */
+            memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header));
+
+            fbuf.size = out_len;
+
+            /* TODO! Endianness of header */
+
+            /* TODO: Add checks for fuse_session_exited */
+            bufv.buf[0] = fbuf;
+            bufv.count = 1;
+            pbufv = &bufv;
+        }
     }
     pbufv->idx = 0;
     pbufv->off = 0;
@@ -732,13 +825,16 @@ static void *fv_queue_thread(void *opaque)
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
         while (1) {
+            unsigned int bad_in_num = 0, bad_out_num = 0;
             FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest),
-                                          NULL, NULL);
+                                          &bad_in_num, &bad_out_num);
             if (!req) {
                 break;
             }
 
             req->reply_sent = false;
+            req->bad_in_num = bad_in_num;
+            req->bad_out_num = bad_out_num;
 
             if (!se->thread_pool_size) {
                 req_list = g_list_prepend(req_list, req);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 20/25] DAX/unmap virtiofsd: Route unmappable reads
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 19/25] DAX/unmap virtiofsd: Parse unmappable elements Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 21/25] DAX/unmap virtiofsd: route unmappable write to slave command Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When a read with unmappable buffers is found, map it to a slave
read command.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 37 +++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 5ed78bd8cf..887e79a126 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -459,6 +459,43 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
         in_sg_left -= ret;
         len -= ret;
     } while (in_sg_left);
+
+    if (bad_in_num) {
+        /* TODO: Rework to send in fewer messages */
+        VhostUserFSSlaveMsg *msg = g_malloc0(sizeof(VhostUserFSSlaveMsg) +
+                                             sizeof(VhostUserFSSlaveMsgEntry));
+        while (len && bad_in_num) {
+            msg->count = 1;
+            msg->entries[0].flags = VHOST_USER_FS_FLAG_MAP_R;
+            msg->entries[0].fd_offset = buf->buf[0].pos;
+            msg->entries[0].c_offset =
+                (uint64_t)(uintptr_t)in_sg_ptr[0].iov_base;
+            msg->entries[0].len = in_sg_ptr[0].iov_len;
+            if (len < msg->entries[0].len) {
+                msg->entries[0].len = len;
+             }
+            int64_t req_res = fuse_virtio_io(se, msg, buf->buf[0].fd);
+            fuse_log(FUSE_LOG_DEBUG,
+                     "%s: bad loop; len=%zd bad_in_num=%d fd_offset=%zd "
+                     "c_offset=%p req_res=%ld\n",
+                     __func__, len, bad_in_num, buf->buf[0].pos,
+                     in_sg_ptr[0].iov_base, req_res);
+            if (req_res > 0) {
+                len -= msg->entries[0].len;
+                buf->buf[0].pos += msg->entries[0].len;
+                in_sg_ptr++;
+                bad_in_num--;
+            } else if (req_res == 0) {
+                break;
+            } else {
+                ret = req_res;
+                free(in_sg_cpy);
+                g_free(msg);
+                goto err;
+            }
+        }
+        g_free(msg);
+    }
     free(in_sg_cpy);
 
     /* Need to fix out->len on EOF */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 21/25] DAX/unmap virtiofsd: route unmappable write to slave command
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 20/25] DAX/unmap virtiofsd: Route unmappable reads Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 22/25] DAX:virtiofsd: implement FUSE_INIT map_alignment field Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When a fuse_buf_copy is performed on an element with FUSE_BUF_PHYS_ADDR
route it to a fuse_virtio_write request that does a slave command to
perform the write.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/buffer.c         | 14 +++++++++++---
 tools/virtiofsd/fuse_common.h    |  6 +++++-
 tools/virtiofsd/fuse_lowlevel.h  |  3 ---
 tools/virtiofsd/passthrough_ll.c |  2 +-
 4 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 1a050aa441..8135d52d2a 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -200,13 +200,20 @@ static ssize_t fuse_buf_fd_to_fd(const struct fuse_buf *dst, size_t dst_off,
     return copied;
 }
 
-static ssize_t fuse_buf_copy_one(const struct fuse_buf *dst, size_t dst_off,
+static ssize_t fuse_buf_copy_one(fuse_req_t req,
+                                 const struct fuse_buf *dst, size_t dst_off,
                                  const struct fuse_buf *src, size_t src_off,
                                  size_t len)
 {
     int src_is_fd = src->flags & FUSE_BUF_IS_FD;
     int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
+    int src_is_phys = src->flags & FUSE_BUF_PHYS_ADDR;
+    int dst_is_phys = src->flags & FUSE_BUF_PHYS_ADDR;
 
+    if (src_is_phys && !src_is_fd && dst_is_fd) {
+        return fuse_virtio_write(req, dst, dst_off, src, src_off, len);
+    }
+    assert(!src_is_phys && !dst_is_phys);
     if (!src_is_fd && !dst_is_fd) {
         char *dstmem = (char *)dst->mem + dst_off;
         char *srcmem = (char *)src->mem + src_off;
@@ -259,7 +266,8 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
     return 1;
 }
 
-ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv)
+ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv,
+                      struct fuse_bufvec *srcv)
 {
     size_t copied = 0, i;
 
@@ -301,7 +309,7 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv)
         dst_len = dst->size - dstv->off;
         len = min_size(src_len, dst_len);
 
-        res = fuse_buf_copy_one(dst, dstv->off, src, srcv->off, len);
+        res = fuse_buf_copy_one(req, dst, dstv->off, src, srcv->off, len);
         if (res < 0) {
             if (!copied) {
                 return res;
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index af43cf19f9..beed03aa93 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -510,6 +510,8 @@ struct fuse_conn_info {
 struct fuse_session;
 struct fuse_pollhandle;
 struct fuse_conn_info_opts;
+struct fuse_req;
+typedef struct fuse_req *fuse_req_t;
 
 /**
  * This function parses several command-line options that can be used
@@ -728,11 +730,13 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv);
 /**
  * Copy data from one buffer vector to another
  *
+ * @param req The request this copy is part of
  * @param dst destination buffer vector
  * @param src source buffer vector
  * @return actual number of bytes copied or -errno on error
  */
-ssize_t fuse_buf_copy(struct fuse_bufvec *dst, struct fuse_bufvec *src);
+ssize_t fuse_buf_copy(fuse_req_t req,
+                      struct fuse_bufvec *dst, struct fuse_bufvec *src);
 
 /**
  * Memory buffer iterator
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 757cdae49b..24e580aafe 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -42,9 +42,6 @@
 /** Inode number type */
 typedef uint64_t fuse_ino_t;
 
-/** Request pointer type */
-typedef struct fuse_req *fuse_req_t;
-
 /**
  * Session
  *
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 726343677e..b3954376b3 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2301,7 +2301,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
         }
     }
 
-    res = fuse_buf_copy(&out_buf, in_buf);
+    res = fuse_buf_copy(req, &out_buf, in_buf);
     if (res < 0) {
         fuse_reply_err(req, -res);
     } else {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 22/25] DAX:virtiofsd: implement FUSE_INIT map_alignment field
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 21/25] DAX/unmap virtiofsd: route unmappable write to slave command Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 23/25] vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: Stefan Hajnoczi <stefanha@redhat.com>

Communicate the host page size to the FUSE client so that
FUSE_SETUPMAPPING/FUSE_REMOVEMAPPING requests are aware of our alignment
constraints.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index e82edce72a..08999a6bc5 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/host-utils.h"
 #include "fuse_i.h"
 #include "standard-headers/linux/fuse.h"
 #include "fuse_misc.h"
@@ -2194,6 +2195,12 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     outarg.max_background = se->conn.max_background;
     outarg.congestion_threshold = se->conn.congestion_threshold;
     outarg.time_gran = se->conn.time_gran;
+    if (arg->flags & FUSE_MAP_ALIGNMENT) {
+        outarg.flags |= FUSE_MAP_ALIGNMENT;
+
+        /* This constraint comes from mmap(2) and munmap(2) */
+        outarg.map_alignment = ctz64(sysconf(_SC_PAGE_SIZE));
+    }
 
     if (se->conn.want & FUSE_CAP_HANDLE_KILLPRIV_V2) {
         outarg.flags |= FUSE_HANDLE_KILLPRIV_V2;
@@ -2207,6 +2214,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
              outarg.congestion_threshold);
     fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n", outarg.time_gran);
+    fuse_log(FUSE_LOG_DEBUG, "   map_alignment=%u\n", outarg.map_alignment);
 
     send_reply_ok(req, &outarg, outargsize);
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 23/25] vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 22/25] DAX:virtiofsd: implement FUSE_INIT map_alignment field Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 24/25] vhost-user-fs: Implement drop CAP_FSETID functionality Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 25/25] virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: Vivek Goyal <vgoyal@redhat.com>

Extend VhostUserFSSlaveMsg so that slave can ask it to drop CAP_FSETID
before doing I/O on fd.

In some cases, virtiofsd takes the onus of clearing setuid bit on a file
when WRITE happens. Generally virtiofsd does the WRITE to fd (from guest
memory which is mapped in virtiofsd as well), but if this memory is
unmappable in virtiofsd (like cache window), then virtiofsd asks qemu
to do the I/O instead.

To retain the capability to drop suid bit on write, qemu needs to
drop the CAP_FSETID as well before write to fd. Extend VhostUserFSSlaveMsg
so that virtiofsd can specify in message if CAP_FSETID needs to be
dropped.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 hw/virtio/vhost-user-fs.c                 | 5 +++++
 include/hw/virtio/vhost-user-fs.h         | 6 ++++++
 subprojects/libvhost-user/libvhost-user.h | 6 ++++++
 3 files changed, 17 insertions(+)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 5511838f29..23bb8436e1 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -244,6 +244,11 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
         return (uint64_t)-1;
     }
 
+    if (sm->flags & VHOST_USER_FS_GENFLAG_DROP_FSETID) {
+        error_report("Dropping CAP_FSETID is not supported");
+        return (uint64_t)-ENOTSUP;
+    }
+
     for (i = 0; i < sm->count && !res; i++) {
         VhostUserFSSlaveMsgEntry *e = &sm->entries[i];
         if (e->len == 0) {
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 2931164e23..bcd797c0cc 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -30,6 +30,10 @@ OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS)
 #define VHOST_USER_FS_FLAG_MAP_R (1u << 0)
 #define VHOST_USER_FS_FLAG_MAP_W (1u << 1)
 
+/* Generic flags for the overall message and not individual ranges */
+/* Drop capability CAP_FSETID during the operation */
+#define VHOST_USER_FS_GENFLAG_DROP_FSETID (1u << 0)
+
 typedef struct {
     /* Offsets within the file being mapped */
     uint64_t fd_offset;
@@ -42,6 +46,8 @@ typedef struct {
 } VhostUserFSSlaveMsgEntry;
 
 typedef struct {
+    /* Generic flags for the overall message */
+    uint32_t flags;
     /* Number of entries */
     uint16_t count;
     /* Spare */
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 42b0833c4b..4dba4321f4 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -132,6 +132,10 @@ typedef enum VhostUserSlaveRequest {
 #define VHOST_USER_FS_FLAG_MAP_R (1u << 0)
 #define VHOST_USER_FS_FLAG_MAP_W (1u << 1)
 
+/* Generic flags for the overall message and not individual ranges */
+/* Drop capability CAP_FSETID during the operation */
+#define VHOST_USER_FS_GENFLAG_DROP_FSETID (1u << 0)
+
 typedef struct {
     /* Offsets within the file being mapped */
     uint64_t fd_offset;
@@ -144,6 +148,8 @@ typedef struct {
 } VhostUserFSSlaveMsgEntry;
 
 typedef struct {
+    /* Generic flags for the overall message */
+    uint32_t flags;
     /* Number of entries */
     uint16_t count;
     /* Spare */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 24/25] vhost-user-fs: Implement drop CAP_FSETID functionality
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 23/25] vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  2021-04-14 15:51 ` [PATCH v2 25/25] virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: Vivek Goyal <vgoyal@redhat.com>

As part of slave_io message, slave can ask to do I/O on an fd. Additionally
slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
Implement functionality to drop CAP_FSETID and gain it back after the
operation.

This also creates a dependency on libcap-ng.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 hw/virtio/meson.build     |  1 +
 hw/virtio/vhost-user-fs.c | 92 ++++++++++++++++++++++++++++++++++++++-
 meson.build               |  6 +++
 3 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index fbff9bc9d4..bdcdc82e13 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -18,6 +18,7 @@ virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('virtio-crypto.c'))
 virtio_ss.add(when: ['CONFIG_VIRTIO_CRYPTO', 'CONFIG_VIRTIO_PCI'], if_true: files('virtio-crypto-pci.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER_FS', if_true: files('vhost-user-fs.c'))
+virtio_ss.add(when: 'CONFIG_VHOST_USER_FS', if_true: libcap_ng)
 virtio_ss.add(when: ['CONFIG_VHOST_USER_FS', 'CONFIG_VIRTIO_PCI'], if_true: files('vhost-user-fs-pci.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_PMEM', if_true: files('virtio-pmem.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_VSOCK', if_true: files('vhost-vsock.c', 'vhost-vsock-common.c'))
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 23bb8436e1..09947257f1 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -13,6 +13,8 @@
 
 #include "qemu/osdep.h"
 #include <sys/ioctl.h>
+#include <cap-ng.h>
+#include <sys/syscall.h>
 #include "standard-headers/linux/virtio_fs.h"
 #include "qapi/error.h"
 #include "hw/qdev-properties.h"
@@ -91,6 +93,84 @@ static bool check_slave_message_entries(const VhostUserFSSlaveMsg *sm,
     return true;
 }
 
+/*
+ * Helpers for dropping and regaining effective capabilities. Returns 0
+ * on success, error otherwise
+ */
+static int drop_effective_cap(const char *cap_name, bool *cap_dropped)
+{
+    int cap, ret;
+
+    cap = capng_name_to_capability(cap_name);
+    if (cap < 0) {
+        ret = -errno;
+        error_report("capng_name_to_capability(%s) failed:%s", cap_name,
+                     strerror(errno));
+        goto out;
+    }
+
+    if (capng_get_caps_process()) {
+        ret = -errno;
+        error_report("capng_get_caps_process() failed:%s", strerror(errno));
+        goto out;
+    }
+
+    /* We dont have this capability in effective set already. */
+    if (!capng_have_capability(CAPNG_EFFECTIVE, cap)) {
+        ret = 0;
+        goto out;
+    }
+
+    if (capng_update(CAPNG_DROP, CAPNG_EFFECTIVE, cap)) {
+        ret = -errno;
+        error_report("capng_update(DROP,) failed");
+        goto out;
+    }
+    if (capng_apply(CAPNG_SELECT_CAPS)) {
+        ret = -errno;
+        error_report("drop:capng_apply() failed");
+        goto out;
+    }
+
+    ret = 0;
+    if (cap_dropped) {
+        *cap_dropped = true;
+    }
+
+out:
+    return ret;
+}
+
+static int gain_effective_cap(const char *cap_name)
+{
+    int cap;
+    int ret = 0;
+
+    cap = capng_name_to_capability(cap_name);
+    if (cap < 0) {
+        ret = -errno;
+        error_report("capng_name_to_capability(%s) failed:%s", cap_name,
+                     strerror(errno));
+        goto out;
+    }
+
+    if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE, cap)) {
+        ret = -errno;
+        error_report("capng_update(ADD,) failed");
+        goto out;
+    }
+
+    if (capng_apply(CAPNG_SELECT_CAPS)) {
+        ret = -errno;
+        error_report("gain:capng_apply() failed");
+        goto out;
+    }
+    ret = 0;
+
+out:
+    return ret;
+}
+
 uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
                                  VhostUserFSSlaveMsg *sm, int fd)
 {
@@ -238,6 +318,7 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
     unsigned int i;
     int res = 0;
     size_t done = 0;
+    bool cap_fsetid_dropped = false;
 
     if (fd < 0) {
         error_report("Bad fd for map");
@@ -245,8 +326,10 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
     }
 
     if (sm->flags & VHOST_USER_FS_GENFLAG_DROP_FSETID) {
-        error_report("Dropping CAP_FSETID is not supported");
-        return (uint64_t)-ENOTSUP;
+        res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
+        if (res != 0) {
+            return (uint64_t)res;
+        }
     }
 
     for (i = 0; i < sm->count && !res; i++) {
@@ -313,6 +396,11 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
     }
     close(fd);
 
+    if (cap_fsetid_dropped) {
+        if (gain_effective_cap("FSETID")) {
+            error_report("Failed to gain CAP_FSETID");
+        }
+    }
     trace_vhost_user_fs_slave_io_exit(res, done);
     if (res < 0) {
         return (uint64_t)res;
diff --git a/meson.build b/meson.build
index c6f4b0cf5e..71899d0993 100644
--- a/meson.build
+++ b/meson.build
@@ -1081,6 +1081,12 @@ elif get_option('virtfs').disabled()
   have_virtfs = false
 endif
 
+if config_host.has_key('CONFIG_VHOST_USER_FS')
+  if not libcap_ng.found()
+    error('vhost-user-fs requires libcap-ng-devel')
+  endif
+endif
+
 config_host_data.set_quoted('CONFIG_BINDIR', get_option('prefix') / get_option('bindir'))
 config_host_data.set_quoted('CONFIG_PREFIX', get_option('prefix'))
 config_host_data.set_quoted('CONFIG_QEMU_CONFDIR', get_option('prefix') / qemu_confdir)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 25/25] virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it
  2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2021-04-14 15:51 ` [PATCH v2 24/25] vhost-user-fs: Implement drop CAP_FSETID functionality Dr. David Alan Gilbert (git)
@ 2021-04-14 15:51 ` Dr. David Alan Gilbert (git)
  24 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-14 15:51 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, virtio-fs

From: Vivek Goyal <vgoyal@redhat.com>

If qemu guest asked to drop CAP_FSETID upon write, send that info
to qemu in SLAVE_FS_IO message so that qemu can drop capability
before WRITE. This is to make sure that any setuid bit is killed
on fd (if there is one set).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/buffer.c         | 10 ++++++----
 tools/virtiofsd/fuse_common.h    |  6 +++++-
 tools/virtiofsd/fuse_lowlevel.h  |  6 +++++-
 tools/virtiofsd/fuse_virtio.c    |  5 ++++-
 tools/virtiofsd/passthrough_ll.c |  2 +-
 5 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 8135d52d2a..b4cda7db9a 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -203,7 +203,7 @@ static ssize_t fuse_buf_fd_to_fd(const struct fuse_buf *dst, size_t dst_off,
 static ssize_t fuse_buf_copy_one(fuse_req_t req,
                                  const struct fuse_buf *dst, size_t dst_off,
                                  const struct fuse_buf *src, size_t src_off,
-                                 size_t len)
+                                 size_t len, bool dropped_cap_fsetid)
 {
     int src_is_fd = src->flags & FUSE_BUF_IS_FD;
     int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
@@ -211,7 +211,8 @@ static ssize_t fuse_buf_copy_one(fuse_req_t req,
     int dst_is_phys = src->flags & FUSE_BUF_PHYS_ADDR;
 
     if (src_is_phys && !src_is_fd && dst_is_fd) {
-        return fuse_virtio_write(req, dst, dst_off, src, src_off, len);
+        return fuse_virtio_write(req, dst, dst_off, src, src_off, len,
+                                 dropped_cap_fsetid);
     }
     assert(!src_is_phys && !dst_is_phys);
     if (!src_is_fd && !dst_is_fd) {
@@ -267,7 +268,7 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
 }
 
 ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv,
-                      struct fuse_bufvec *srcv)
+                      struct fuse_bufvec *srcv, bool dropped_cap_fsetid)
 {
     size_t copied = 0, i;
 
@@ -309,7 +310,8 @@ ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv,
         dst_len = dst->size - dstv->off;
         len = min_size(src_len, dst_len);
 
-        res = fuse_buf_copy_one(req, dst, dstv->off, src, srcv->off, len);
+        res = fuse_buf_copy_one(req, dst, dstv->off, src, srcv->off, len,
+                                dropped_cap_fsetid);
         if (res < 0) {
             if (!copied) {
                 return res;
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index beed03aa93..8a75729be9 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -733,10 +733,14 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv);
  * @param req The request this copy is part of
  * @param dst destination buffer vector
  * @param src source buffer vector
+ * @param dropped_cap_fsetid Caller has dropped CAP_FSETID. If work is handed
+ *        over to a different thread/process, CAP_FSETID needs to be dropped
+ *        there as well.
  * @return actual number of bytes copied or -errno on error
  */
 ssize_t fuse_buf_copy(fuse_req_t req,
-                      struct fuse_bufvec *dst, struct fuse_bufvec *src);
+                      struct fuse_bufvec *dst, struct fuse_bufvec *src,
+                      bool dropped_cap_fsetid);
 
 /**
  * Memory buffer iterator
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 24e580aafe..dfd7e1525c 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -2030,9 +2030,13 @@ int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
  * @param src The source (memory) buffer
  * @param src_off The GPA
  * @param len Length in bytes
+ * @param dropped_cap_fsetid Caller dropped CAP_FSETID. If it is being handed
+ *        over to different thread/process, CAP_FSETID needs to be dropped
+ *        before write.
  */
 ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
                           size_t dst_off, const struct fuse_buf *src,
-                          size_t src_off, size_t len);
+                          size_t src_off, size_t len,
+                          bool dropped_cap_fsetid);
 
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 887e79a126..525452d036 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -1291,7 +1291,7 @@ int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
  */
 ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
                           size_t dst_off, const struct fuse_buf *src,
-                          size_t src_off, size_t len)
+                          size_t src_off, size_t len, bool dropped_cap_fsetid)
 {
     VhostUserFSSlaveMsg *msg = g_malloc0(sizeof(VhostUserFSSlaveMsg) +
                                          sizeof(VhostUserFSSlaveMsgEntry));
@@ -1311,6 +1311,9 @@ ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
     msg->entries[0].c_offset = (uintptr_t)src->mem + src_off;
     msg->entries[0].len = len;
     msg->entries[0].flags = VHOST_USER_FS_FLAG_MAP_W;
+    if (dropped_cap_fsetid) {
+        msg->flags |= VHOST_USER_FS_GENFLAG_DROP_FSETID;
+    }
 
     int64_t result = fuse_virtio_io(req->se, msg, dst->fd);
     fuse_log(FUSE_LOG_DEBUG, "%s: result=%ld\n", __func__, result);
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b3954376b3..d0fa7de538 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2301,7 +2301,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
         }
     }
 
-    res = fuse_buf_copy(req, &out_buf, in_buf);
+    res = fuse_buf_copy(req, &out_buf, in_buf, fi->kill_priv);
     if (res < 0) {
         fuse_reply_err(req, -res);
     } else {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Virtio-fs] [PATCH v2 08/25] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-04-14 15:51 ` [PATCH v2 08/25] DAX: virtio-fs: Add vhost-user slave commands for mapping Dr. David Alan Gilbert (git)
@ 2021-04-14 16:35   ` Greg Kurz
  2021-04-21 17:49     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kurz @ 2021-04-14 16:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: virtio-fs, qemu-devel, stefanha, vgoyal

On Wed, 14 Apr 2021 16:51:20 +0100
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The daemon may request that fd's be mapped into the virtio-fs cache
> visible to the guest.
> These mappings are triggered by commands sent over the slave fd
> from the daemon.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  docs/interop/vhost-user.rst               | 21 ++++++++
>  hw/virtio/vhost-user-fs.c                 | 66 +++++++++++++++++++++++
>  hw/virtio/vhost-user.c                    | 25 +++++++++
>  include/hw/virtio/vhost-user-fs.h         | 33 ++++++++++++
>  subprojects/libvhost-user/libvhost-user.h |  2 +
>  5 files changed, 147 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index d6085f7045..09aee3565d 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1432,6 +1432,27 @@ Slave message types
>  
>    The state.num field is currently reserved and must be set to 0.
>  
> +``VHOST_USER_SLAVE_FS_MAP``
> +  :id: 6
> +  :equivalent ioctl: N/A
> +  :slave payload: ``struct VhostUserFSSlaveMsg``
> +  :master payload: N/A
> +
> +  Requests that an fd, provided in the ancillary data, be mmapped
> +  into the virtio-fs cache; multiple chunks can be mapped in one
> +  command.
> +  A reply is generated indicating whether mapping succeeded.
> +
> +``VHOST_USER_SLAVE_FS_UNMAP``
> +  :id: 7
> +  :equivalent ioctl: N/A
> +  :slave payload: ``struct VhostUserFSSlaveMsg``
> +  :master payload: N/A
> +
> +  Requests that the range in the virtio-fs cache is unmapped;
> +  multiple chunks can be unmapped in one command.
> +  A reply is generated indicating whether unmapping succeeded.
> +
>  .. _reply_ack:
>  
>  VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index dd0a02aa99..169a146e72 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -45,6 +45,72 @@ static const int user_feature_bits[] = {
>  #define DAX_WINDOW_PROT PROT_NONE
>  #endif
>  
> +/*
> + * The message apparently had 'received_size' bytes, check this
> + * matches the count in the message.
> + *
> + * Returns true if the size matches.
> + */
> +static bool check_slave_message_entries(const VhostUserFSSlaveMsg *sm,
> +                                        int received_size)
> +{
> +    int tmp;
> +
> +    /*
> +     * VhostUserFSSlaveMsg consists of a body followed by 'n' entries,
> +     * (each VhostUserFSSlaveMsgEntry).  There's a maximum of
> +     * VHOST_USER_FS_SLAVE_MAX_ENTRIES of these.
> +     */
> +    if (received_size <= sizeof(VhostUserFSSlaveMsg)) {
> +        error_report("%s: Short VhostUserFSSlaveMsg size, %d", __func__,
> +                     received_size);
> +        return false;
> +    }
> +
> +    tmp = received_size - sizeof(VhostUserFSSlaveMsg);
> +    if (tmp % sizeof(VhostUserFSSlaveMsgEntry)) {
> +        error_report("%s: Non-multiple VhostUserFSSlaveMsg size, %d", __func__,
> +                     received_size);
> +        return false;
> +    }
> +
> +    tmp /= sizeof(VhostUserFSSlaveMsgEntry);
> +    if (tmp != sm->count) {
> +        error_report("%s: VhostUserFSSlaveMsg count mismatch, %d count: %d",
> +                     __func__, tmp, sm->count);
> +        return false;
> +    }
> +
> +    if (sm->count > VHOST_USER_FS_SLAVE_MAX_ENTRIES) {
> +        error_report("%s: VhostUserFSSlaveMsg too many entries: %d",
> +                     __func__, sm->count);
> +        return false;
> +    }
> +    return true;
> +}
> +
> +uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
> +                                 VhostUserFSSlaveMsg *sm, int fd)
> +{
> +    if (!check_slave_message_entries(sm, message_size)) {
> +        return (uint64_t)-1;
> +    }
> +
> +    /* TODO */
> +    return (uint64_t)-1;
> +}
> +
> +uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> +                                   VhostUserFSSlaveMsg *sm)
> +{
> +    if (!check_slave_message_entries(sm, message_size)) {
> +        return (uint64_t)-1;
> +    }
> +
> +    /* TODO */
> +    return (uint64_t)-1;
> +}
> +
>  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 3e4a25e108..ad9170f8dc 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -12,6 +12,7 @@
>  #include "qapi/error.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
> +#include "hw/virtio/vhost-user-fs.h"
>  #include "hw/virtio/vhost-backend.h"
>  #include "hw/virtio/virtio.h"
>  #include "hw/virtio/virtio-net.h"
> @@ -133,6 +134,10 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
>      VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> +    VHOST_USER_SLAVE_VRING_CALL = 4,
> +    VHOST_USER_SLAVE_VRING_ERR = 5,
> +    VHOST_USER_SLAVE_FS_MAP = 6,
> +    VHOST_USER_SLAVE_FS_UNMAP = 7,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -205,6 +210,16 @@ typedef struct {
>      uint32_t size; /* the following payload size */
>  } QEMU_PACKED VhostUserHeader;
>  
> +/*
> + * VhostUserFSSlaveMsg is special since it has a variable entry count,
> + * but it does have a maximum, so make a type for that to fit in our union
> + * for max size.
> + */
> +typedef struct {
> +    VhostUserFSSlaveMsg fs;
> +    VhostUserFSSlaveMsgEntry entries[VHOST_USER_FS_SLAVE_MAX_ENTRIES];
> +} QEMU_PACKED VhostUserFSSlaveMsgMax;
> +
>  typedef union {
>  #define VHOST_USER_VRING_IDX_MASK   (0xff)
>  #define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
> @@ -219,6 +234,8 @@ typedef union {
>          VhostUserCryptoSession session;
>          VhostUserVringArea area;
>          VhostUserInflight inflight;
> +        VhostUserFSSlaveMsg fs;
> +        VhostUserFSSlaveMsg fs_max; /* Never actually used */
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -1538,6 +1555,14 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
>          ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
>                                                            fd ? fd[0] : -1);
>          break;
> +#ifdef CONFIG_VHOST_USER_FS
> +    case VHOST_USER_SLAVE_FS_MAP:
> +        ret = vhost_user_fs_slave_map(dev, hdr.size, &payload.fs, fd[0]);

Since the QIOChannel conversion, the array of fds is dynamically allocated
by qio_channel_readv_full_all() when needed. You should do "fd ? fd[0] : -1"
like for vhost_user_slave_handle_vring_host_notifier() just above.

Same goes for patch 17 "DAX/unmap: virtiofsd: Add  VHOST_USER_SLAVE_FS_IO".

> +        break;
> +    case VHOST_USER_SLAVE_FS_UNMAP:
> +        ret = vhost_user_fs_slave_unmap(dev, hdr.size, &payload.fs);
> +        break;
> +#endif
>      default:
>          error_report("Received unexpected msg type: %d.", hdr.request);
>          ret = true;
> diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> index 04596799e3..0766f17548 100644
> --- a/include/hw/virtio/vhost-user-fs.h
> +++ b/include/hw/virtio/vhost-user-fs.h
> @@ -23,6 +23,33 @@
>  #define TYPE_VHOST_USER_FS "vhost-user-fs-device"
>  OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS)
>  
> +/* Structures carried over the slave channel back to QEMU */
> +#define VHOST_USER_FS_SLAVE_MAX_ENTRIES 32
> +
> +/* For the flags field of VhostUserFSSlaveMsg */
> +#define VHOST_USER_FS_FLAG_MAP_R (1u << 0)
> +#define VHOST_USER_FS_FLAG_MAP_W (1u << 1)
> +
> +typedef struct {
> +    /* Offsets within the file being mapped */
> +    uint64_t fd_offset;
> +    /* Offsets within the cache */
> +    uint64_t c_offset;
> +    /* Lengths of sections */
> +    uint64_t len;
> +    /* Flags, from VHOST_USER_FS_FLAG_* */
> +    uint64_t flags;
> +} VhostUserFSSlaveMsgEntry;
> +
> +typedef struct {
> +    /* Number of entries */
> +    uint16_t count;
> +    /* Spare */
> +    uint16_t align;
> +
> +    VhostUserFSSlaveMsgEntry entries[];
> +} VhostUserFSSlaveMsg;
> +
>  typedef struct {
>      CharBackend chardev;
>      char *tag;
> @@ -46,4 +73,10 @@ struct VHostUserFS {
>      MemoryRegion cache;
>  };
>  
> +/* Callbacks from the vhost-user code for slave commands */
> +uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
> +                                 VhostUserFSSlaveMsg *sm, int fd);
> +uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> +                                   VhostUserFSSlaveMsg *sm);
> +
>  #endif /* _QEMU_VHOST_USER_FS_H */
> diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> index 70fc61171f..a98c5f5c11 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -119,6 +119,8 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
>      VHOST_USER_SLAVE_VRING_CALL = 4,
>      VHOST_USER_SLAVE_VRING_ERR = 5,
> +    VHOST_USER_SLAVE_FS_MAP = 6,
> +    VHOST_USER_SLAVE_FS_UNMAP = 7,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Virtio-fs] [PATCH v2 01/25] DAX: vhost-user: Rework slave return values
  2021-04-14 15:51 ` [PATCH v2 01/25] DAX: vhost-user: Rework slave return values Dr. David Alan Gilbert (git)
@ 2021-04-16 10:59   ` Greg Kurz
  2021-04-21 17:31     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kurz @ 2021-04-16 10:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: virtio-fs, qemu-devel, stefanha, vgoyal

On Wed, 14 Apr 2021 16:51:13 +0100
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> All the current slave handlers on the qemu side generate an 'int'
> return value that's squashed down to a bool (!!ret) and stuffed into
> a uint64_t (field of a union) to be returned.
> 
> Move the uint64_t type back up through the individual handlers so
> that we can make one actually return a full uint64_t.
> 
> Note that the definition in the interop spec says most of these
> cases are defined as returning 0 on success and non-0 for failure,
> so it's OK to change from a bool to another non-0.
> 
> Vivek:
> This is needed because upcoming patches in series will add new functions
> which want to return full error code. Existing functions continue to
> return true/false so, it should not lead to change of behavior for
> existing users.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---

LGTM

Just an indentation nit...

>  hw/virtio/vhost-backend.c         |  6 +++---
>  hw/virtio/vhost-user.c            | 31 ++++++++++++++++---------------
>  include/hw/virtio/vhost-backend.h |  2 +-
>  3 files changed, 20 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index 31b33bde37..1686c94767 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -401,8 +401,8 @@ int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
>      return -ENODEV;
>  }
>  
> -int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
> -                                          struct vhost_iotlb_msg *imsg)
> +uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
> +                                        struct vhost_iotlb_msg *imsg)
>  {
>      int ret = 0;
>  
> @@ -429,5 +429,5 @@ int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
>          break;
>      }
>  
> -    return ret;
> +    return !!ret;
>  }
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index ded0c10453..3e4a25e108 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1405,24 +1405,25 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
>      return 0;
>  }
>  
> -static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> +static uint64_t vhost_user_slave_handle_config_change(struct vhost_dev *dev)
>  {
>      int ret = -1;
>  
>      if (!dev->config_ops) {
> -        return -1;
> +        return true;
>      }
>  
>      if (dev->config_ops->vhost_dev_config_notifier) {
>          ret = dev->config_ops->vhost_dev_config_notifier(dev);
>      }
>  
> -    return ret;
> +    return !!ret;
>  }
>  
> -static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> -                                                       VhostUserVringArea *area,
> -                                                       int fd)
> +static uint64_t vhost_user_slave_handle_vring_host_notifier(
> +                struct vhost_dev *dev,
> +               VhostUserVringArea *area,
> +               int fd)

... here.

Anyway,

Reviewed-by: Greg Kurz <groug@kaod.org>

>  {
>      int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
>      size_t page_size = qemu_real_host_page_size;
> @@ -1436,7 +1437,7 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
>      if (!virtio_has_feature(dev->protocol_features,
>                              VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
>          vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> -        return -1;
> +        return true;
>      }
>  
>      n = &user->notifier[queue_idx];
> @@ -1449,18 +1450,18 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
>      }
>  
>      if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> -        return 0;
> +        return false;
>      }
>  
>      /* Sanity check. */
>      if (area->size != page_size) {
> -        return -1;
> +        return true;
>      }
>  
>      addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
>                  fd, area->offset);
>      if (addr == MAP_FAILED) {
> -        return -1;
> +        return true;
>      }
>  
>      name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> @@ -1471,13 +1472,13 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
>  
>      if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
>          munmap(addr, page_size);
> -        return -1;
> +        return true;
>      }
>  
>      n->addr = addr;
>      n->set = true;
>  
> -    return 0;
> +    return false;
>  }
>  
>  static void close_slave_channel(struct vhost_user *u)
> @@ -1498,7 +1499,7 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
>      VhostUserPayload payload = { 0, };
>      Error *local_err = NULL;
>      gboolean rc = G_SOURCE_CONTINUE;
> -    int ret = 0;
> +    uint64_t ret = 0;
>      struct iovec iov;
>      g_autofree int *fd = NULL;
>      size_t fdsize = 0;
> @@ -1539,7 +1540,7 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
>          break;
>      default:
>          error_report("Received unexpected msg type: %d.", hdr.request);
> -        ret = -EINVAL;
> +        ret = true;
>      }
>  
>      /*
> @@ -1553,7 +1554,7 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
>          hdr.flags &= ~VHOST_USER_NEED_REPLY_MASK;
>          hdr.flags |= VHOST_USER_REPLY_MASK;
>  
> -        payload.u64 = !!ret;
> +        payload.u64 = ret;
>          hdr.size = sizeof(payload.u64);
>  
>          iovec[0].iov_base = &hdr;
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 8a6f8e2a7a..64ac6b6444 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -186,7 +186,7 @@ int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
>  int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
>                                                   uint64_t iova, uint64_t len);
>  
> -int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
> +uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
>                                            struct vhost_iotlb_msg *imsg);
>  
>  int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd);



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Virtio-fs] [PATCH v2 01/25] DAX: vhost-user: Rework slave return values
  2021-04-16 10:59   ` [Virtio-fs] " Greg Kurz
@ 2021-04-21 17:31     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-21 17:31 UTC (permalink / raw)
  To: Greg Kurz; +Cc: virtio-fs, qemu-devel, stefanha, vgoyal

* Greg Kurz (groug@kaod.org) wrote:
> On Wed, 14 Apr 2021 16:51:13 +0100
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> 
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > All the current slave handlers on the qemu side generate an 'int'
> > return value that's squashed down to a bool (!!ret) and stuffed into
> > a uint64_t (field of a union) to be returned.
> > 
> > Move the uint64_t type back up through the individual handlers so
> > that we can make one actually return a full uint64_t.
> > 
> > Note that the definition in the interop spec says most of these
> > cases are defined as returning 0 on success and non-0 for failure,
> > so it's OK to change from a bool to another non-0.
> > 
> > Vivek:
> > This is needed because upcoming patches in series will add new functions
> > which want to return full error code. Existing functions continue to
> > return true/false so, it should not lead to change of behavior for
> > existing users.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> 
> LGTM
> 
> Just an indentation nit...
> 
> >  hw/virtio/vhost-backend.c         |  6 +++---
> >  hw/virtio/vhost-user.c            | 31 ++++++++++++++++---------------
> >  include/hw/virtio/vhost-backend.h |  2 +-
> >  3 files changed, 20 insertions(+), 19 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 31b33bde37..1686c94767 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -401,8 +401,8 @@ int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
> >      return -ENODEV;
> >  }
> >  
> > -int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
> > -                                          struct vhost_iotlb_msg *imsg)
> > +uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
> > +                                        struct vhost_iotlb_msg *imsg)
> >  {
> >      int ret = 0;
> >  
> > @@ -429,5 +429,5 @@ int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
> >          break;
> >      }
> >  
> > -    return ret;
> > +    return !!ret;
> >  }
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index ded0c10453..3e4a25e108 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -1405,24 +1405,25 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
> >      return 0;
> >  }
> >  
> > -static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > +static uint64_t vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> >  {
> >      int ret = -1;
> >  
> >      if (!dev->config_ops) {
> > -        return -1;
> > +        return true;
> >      }
> >  
> >      if (dev->config_ops->vhost_dev_config_notifier) {
> >          ret = dev->config_ops->vhost_dev_config_notifier(dev);
> >      }
> >  
> > -    return ret;
> > +    return !!ret;
> >  }
> >  
> > -static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> > -                                                       VhostUserVringArea *area,
> > -                                                       int fd)
> > +static uint64_t vhost_user_slave_handle_vring_host_notifier(
> > +                struct vhost_dev *dev,
> > +               VhostUserVringArea *area,
> > +               int fd)
> 
> ... here.

I've reworked that to:
+static uint64_t vhost_user_slave_handle_vring_host_notifier(
+                    struct vhost_dev *dev,
+                    VhostUserVringArea *area,
+                    int fd)

the function name is getting too hideously long to actually put the
arguments in line with the bracket.

> Anyway,
> 
> Reviewed-by: Greg Kurz <groug@kaod.org>

Thanks.

Dave

> 
> >  {
> >      int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> >      size_t page_size = qemu_real_host_page_size;
> > @@ -1436,7 +1437,7 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> >      if (!virtio_has_feature(dev->protocol_features,
> >                              VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> >          vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > -        return -1;
> > +        return true;
> >      }
> >  
> >      n = &user->notifier[queue_idx];
> > @@ -1449,18 +1450,18 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> >      }
> >  
> >      if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > -        return 0;
> > +        return false;
> >      }
> >  
> >      /* Sanity check. */
> >      if (area->size != page_size) {
> > -        return -1;
> > +        return true;
> >      }
> >  
> >      addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> >                  fd, area->offset);
> >      if (addr == MAP_FAILED) {
> > -        return -1;
> > +        return true;
> >      }
> >  
> >      name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > @@ -1471,13 +1472,13 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> >  
> >      if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> >          munmap(addr, page_size);
> > -        return -1;
> > +        return true;
> >      }
> >  
> >      n->addr = addr;
> >      n->set = true;
> >  
> > -    return 0;
> > +    return false;
> >  }
> >  
> >  static void close_slave_channel(struct vhost_user *u)
> > @@ -1498,7 +1499,7 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
> >      VhostUserPayload payload = { 0, };
> >      Error *local_err = NULL;
> >      gboolean rc = G_SOURCE_CONTINUE;
> > -    int ret = 0;
> > +    uint64_t ret = 0;
> >      struct iovec iov;
> >      g_autofree int *fd = NULL;
> >      size_t fdsize = 0;
> > @@ -1539,7 +1540,7 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
> >          break;
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> > -        ret = -EINVAL;
> > +        ret = true;
> >      }
> >  
> >      /*
> > @@ -1553,7 +1554,7 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
> >          hdr.flags &= ~VHOST_USER_NEED_REPLY_MASK;
> >          hdr.flags |= VHOST_USER_REPLY_MASK;
> >  
> > -        payload.u64 = !!ret;
> > +        payload.u64 = ret;
> >          hdr.size = sizeof(payload.u64);
> >  
> >          iovec[0].iov_base = &hdr;
> > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > index 8a6f8e2a7a..64ac6b6444 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -186,7 +186,7 @@ int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> >  int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
> >                                                   uint64_t iova, uint64_t len);
> >  
> > -int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
> > +uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
> >                                            struct vhost_iotlb_msg *imsg);
> >  
> >  int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd);
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Virtio-fs] [PATCH v2 08/25] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-04-14 16:35   ` [Virtio-fs] " Greg Kurz
@ 2021-04-21 17:49     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-21 17:49 UTC (permalink / raw)
  To: Greg Kurz; +Cc: virtio-fs, qemu-devel, stefanha, vgoyal

* Greg Kurz (groug@kaod.org) wrote:
> On Wed, 14 Apr 2021 16:51:20 +0100
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> 
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The daemon may request that fd's be mapped into the virtio-fs cache
> > visible to the guest.
> > These mappings are triggered by commands sent over the slave fd
> > from the daemon.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  docs/interop/vhost-user.rst               | 21 ++++++++
> >  hw/virtio/vhost-user-fs.c                 | 66 +++++++++++++++++++++++
> >  hw/virtio/vhost-user.c                    | 25 +++++++++
> >  include/hw/virtio/vhost-user-fs.h         | 33 ++++++++++++
> >  subprojects/libvhost-user/libvhost-user.h |  2 +
> >  5 files changed, 147 insertions(+)
> > 
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d6085f7045..09aee3565d 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1432,6 +1432,27 @@ Slave message types
> >  
> >    The state.num field is currently reserved and must be set to 0.
> >  
> > +``VHOST_USER_SLAVE_FS_MAP``
> > +  :id: 6
> > +  :equivalent ioctl: N/A
> > +  :slave payload: ``struct VhostUserFSSlaveMsg``
> > +  :master payload: N/A
> > +
> > +  Requests that an fd, provided in the ancillary data, be mmapped
> > +  into the virtio-fs cache; multiple chunks can be mapped in one
> > +  command.
> > +  A reply is generated indicating whether mapping succeeded.
> > +
> > +``VHOST_USER_SLAVE_FS_UNMAP``
> > +  :id: 7
> > +  :equivalent ioctl: N/A
> > +  :slave payload: ``struct VhostUserFSSlaveMsg``
> > +  :master payload: N/A
> > +
> > +  Requests that the range in the virtio-fs cache is unmapped;
> > +  multiple chunks can be unmapped in one command.
> > +  A reply is generated indicating whether unmapping succeeded.
> > +
> >  .. _reply_ack:
> >  
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index dd0a02aa99..169a146e72 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -45,6 +45,72 @@ static const int user_feature_bits[] = {
> >  #define DAX_WINDOW_PROT PROT_NONE
> >  #endif
> >  
> > +/*
> > + * The message apparently had 'received_size' bytes, check this
> > + * matches the count in the message.
> > + *
> > + * Returns true if the size matches.
> > + */
> > +static bool check_slave_message_entries(const VhostUserFSSlaveMsg *sm,
> > +                                        int received_size)
> > +{
> > +    int tmp;
> > +
> > +    /*
> > +     * VhostUserFSSlaveMsg consists of a body followed by 'n' entries,
> > +     * (each VhostUserFSSlaveMsgEntry).  There's a maximum of
> > +     * VHOST_USER_FS_SLAVE_MAX_ENTRIES of these.
> > +     */
> > +    if (received_size <= sizeof(VhostUserFSSlaveMsg)) {
> > +        error_report("%s: Short VhostUserFSSlaveMsg size, %d", __func__,
> > +                     received_size);
> > +        return false;
> > +    }
> > +
> > +    tmp = received_size - sizeof(VhostUserFSSlaveMsg);
> > +    if (tmp % sizeof(VhostUserFSSlaveMsgEntry)) {
> > +        error_report("%s: Non-multiple VhostUserFSSlaveMsg size, %d", __func__,
> > +                     received_size);
> > +        return false;
> > +    }
> > +
> > +    tmp /= sizeof(VhostUserFSSlaveMsgEntry);
> > +    if (tmp != sm->count) {
> > +        error_report("%s: VhostUserFSSlaveMsg count mismatch, %d count: %d",
> > +                     __func__, tmp, sm->count);
> > +        return false;
> > +    }
> > +
> > +    if (sm->count > VHOST_USER_FS_SLAVE_MAX_ENTRIES) {
> > +        error_report("%s: VhostUserFSSlaveMsg too many entries: %d",
> > +                     __func__, sm->count);
> > +        return false;
> > +    }
> > +    return true;
> > +}
> > +
> > +uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
> > +                                 VhostUserFSSlaveMsg *sm, int fd)
> > +{
> > +    if (!check_slave_message_entries(sm, message_size)) {
> > +        return (uint64_t)-1;
> > +    }
> > +
> > +    /* TODO */
> > +    return (uint64_t)-1;
> > +}
> > +
> > +uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> > +                                   VhostUserFSSlaveMsg *sm)
> > +{
> > +    if (!check_slave_message_entries(sm, message_size)) {
> > +        return (uint64_t)-1;
> > +    }
> > +
> > +    /* TODO */
> > +    return (uint64_t)-1;
> > +}
> > +
> >  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 3e4a25e108..ad9170f8dc 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -12,6 +12,7 @@
> >  #include "qapi/error.h"
> >  #include "hw/virtio/vhost.h"
> >  #include "hw/virtio/vhost-user.h"
> > +#include "hw/virtio/vhost-user-fs.h"
> >  #include "hw/virtio/vhost-backend.h"
> >  #include "hw/virtio/virtio.h"
> >  #include "hw/virtio/virtio-net.h"
> > @@ -133,6 +134,10 @@ typedef enum VhostUserSlaveRequest {
> >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> >      VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > +    VHOST_USER_SLAVE_VRING_CALL = 4,
> > +    VHOST_USER_SLAVE_VRING_ERR = 5,
> > +    VHOST_USER_SLAVE_FS_MAP = 6,
> > +    VHOST_USER_SLAVE_FS_UNMAP = 7,
> >      VHOST_USER_SLAVE_MAX
> >  }  VhostUserSlaveRequest;
> >  
> > @@ -205,6 +210,16 @@ typedef struct {
> >      uint32_t size; /* the following payload size */
> >  } QEMU_PACKED VhostUserHeader;
> >  
> > +/*
> > + * VhostUserFSSlaveMsg is special since it has a variable entry count,
> > + * but it does have a maximum, so make a type for that to fit in our union
> > + * for max size.
> > + */
> > +typedef struct {
> > +    VhostUserFSSlaveMsg fs;
> > +    VhostUserFSSlaveMsgEntry entries[VHOST_USER_FS_SLAVE_MAX_ENTRIES];
> > +} QEMU_PACKED VhostUserFSSlaveMsgMax;
> > +
> >  typedef union {
> >  #define VHOST_USER_VRING_IDX_MASK   (0xff)
> >  #define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
> > @@ -219,6 +234,8 @@ typedef union {
> >          VhostUserCryptoSession session;
> >          VhostUserVringArea area;
> >          VhostUserInflight inflight;
> > +        VhostUserFSSlaveMsg fs;
> > +        VhostUserFSSlaveMsg fs_max; /* Never actually used */
> >  } VhostUserPayload;
> >  
> >  typedef struct VhostUserMsg {
> > @@ -1538,6 +1555,14 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
> >          ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> >                                                            fd ? fd[0] : -1);
> >          break;
> > +#ifdef CONFIG_VHOST_USER_FS
> > +    case VHOST_USER_SLAVE_FS_MAP:
> > +        ret = vhost_user_fs_slave_map(dev, hdr.size, &payload.fs, fd[0]);
> 
> Since the QIOChannel conversion, the array of fds is dynamically allocated
> by qio_channel_readv_full_all() when needed. You should do "fd ? fd[0] : -1"
> like for vhost_user_slave_handle_vring_host_notifier() just above.
> 
> Same goes for patch 17 "DAX/unmap: virtiofsd: Add  VHOST_USER_SLAVE_FS_IO".

Thanks; fixed!

Dave

> > +        break;
> > +    case VHOST_USER_SLAVE_FS_UNMAP:
> > +        ret = vhost_user_fs_slave_unmap(dev, hdr.size, &payload.fs);
> > +        break;
> > +#endif
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> >          ret = true;
> > diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> > index 04596799e3..0766f17548 100644
> > --- a/include/hw/virtio/vhost-user-fs.h
> > +++ b/include/hw/virtio/vhost-user-fs.h
> > @@ -23,6 +23,33 @@
> >  #define TYPE_VHOST_USER_FS "vhost-user-fs-device"
> >  OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS)
> >  
> > +/* Structures carried over the slave channel back to QEMU */
> > +#define VHOST_USER_FS_SLAVE_MAX_ENTRIES 32
> > +
> > +/* For the flags field of VhostUserFSSlaveMsg */
> > +#define VHOST_USER_FS_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FS_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > +    /* Offsets within the file being mapped */
> > +    uint64_t fd_offset;
> > +    /* Offsets within the cache */
> > +    uint64_t c_offset;
> > +    /* Lengths of sections */
> > +    uint64_t len;
> > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > +    uint64_t flags;
> > +} VhostUserFSSlaveMsgEntry;
> > +
> > +typedef struct {
> > +    /* Number of entries */
> > +    uint16_t count;
> > +    /* Spare */
> > +    uint16_t align;
> > +
> > +    VhostUserFSSlaveMsgEntry entries[];
> > +} VhostUserFSSlaveMsg;
> > +
> >  typedef struct {
> >      CharBackend chardev;
> >      char *tag;
> > @@ -46,4 +73,10 @@ struct VHostUserFS {
> >      MemoryRegion cache;
> >  };
> >  
> > +/* Callbacks from the vhost-user code for slave commands */
> > +uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
> > +                                 VhostUserFSSlaveMsg *sm, int fd);
> > +uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> > +                                   VhostUserFSSlaveMsg *sm);
> > +
> >  #endif /* _QEMU_VHOST_USER_FS_H */
> > diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> > index 70fc61171f..a98c5f5c11 100644
> > --- a/subprojects/libvhost-user/libvhost-user.h
> > +++ b/subprojects/libvhost-user/libvhost-user.h
> > @@ -119,6 +119,8 @@ typedef enum VhostUserSlaveRequest {
> >      VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> >      VHOST_USER_SLAVE_VRING_CALL = 4,
> >      VHOST_USER_SLAVE_VRING_ERR = 5,
> > +    VHOST_USER_SLAVE_FS_MAP = 6,
> > +    VHOST_USER_SLAVE_FS_UNMAP = 7,
> >      VHOST_USER_SLAVE_MAX
> >  }  VhostUserSlaveRequest;
> >  
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Virtio-fs] [PATCH v2 17/25] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  2021-04-14 15:51 ` [PATCH v2 17/25] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
@ 2021-04-21 20:07   ` Vivek Goyal
  2021-04-22  9:29     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Vivek Goyal @ 2021-04-21 20:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: virtio-fs, qemu-devel, stefanha

On Wed, Apr 14, 2021 at 04:51:29PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
> client to ask qemu to perform a read/write from an fd directly
> to GPA.

Hi David,

Today I came across process_vm_readv() and process_vm_writev().

https://man7.org/linux/man-pages/man2/process_vm_readv.2.html

I am wondering if we can use these to read from/write to qemu address
space (DAX window, which virtiofsd does not have access to).

So for the case of reading from DAX window into an fd, we probably
will have to first read from qemu DAX window to virtiofsd local memory
and then do a writev(fd).

Don't know if this approach is faster/slower as compared to sending
a vhost-user message to qemu.

Thanks
Vivek


> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  docs/interop/vhost-user.rst               | 16 ++++
>  hw/virtio/trace-events                    |  6 ++
>  hw/virtio/vhost-user-fs.c                 | 95 +++++++++++++++++++++++
>  hw/virtio/vhost-user.c                    |  4 +
>  include/hw/virtio/vhost-user-fs.h         |  2 +
>  subprojects/libvhost-user/libvhost-user.h |  1 +
>  6 files changed, 124 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 09aee3565d..2fa62ea451 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1453,6 +1453,22 @@ Slave message types
>    multiple chunks can be unmapped in one command.
>    A reply is generated indicating whether unmapping succeeded.
>  
> +``VHOST_USER_SLAVE_FS_IO``
> +  :id: 9
> +  :equivalent ioctl: N/A
> +  :slave payload: ``struct VhostUserFSSlaveMsg``
> +  :master payload: N/A
> +
> +  Requests that IO be performed directly from an fd, passed in ancillary
> +  data, to guest memory on behalf of the daemon; this is normally for a
> +  case where a memory region isn't visible to the daemon. slave payload
> +  has flags which determine the direction of IO operation.
> +
> +  The ``VHOST_USER_FS_FLAG_MAP_R`` flag must be set in the ``flags`` field to
> +  read from the file into RAM.
> +  The ``VHOST_USER_FS_FLAG_MAP_W`` flag must be set in the ``flags`` field to
> +  write to the file from RAM.
> +
>  .. _reply_ack:
>  
>  VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index c62727f879..20557a078e 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
>  vhost_vdpa_set_owner(void *dev) "dev: %p"
>  vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
>  
> +# vhost-user-fs.c
> +
> +vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
> +vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
> +vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
> +
>  # virtio.c
>  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
>  virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 963f694435..5511838f29 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -23,6 +23,8 @@
>  #include "hw/virtio/vhost-user-fs.h"
>  #include "monitor/monitor.h"
>  #include "sysemu/sysemu.h"
> +#include "exec/address-spaces.h"
> +#include "trace.h"
>  
>  static const int user_feature_bits[] = {
>      VIRTIO_F_VERSION_1,
> @@ -220,6 +222,99 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
>      return (uint64_t)res;
>  }
>  
> +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
> +                                VhostUserFSSlaveMsg *sm, int fd)
> +{
> +    VHostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
> +                          TYPE_VHOST_USER_FS);
> +    if (!fs) {
> +        error_report("%s: Bad fs ptr", __func__);
> +        return (uint64_t)-1;
> +    }
> +    if (!check_slave_message_entries(sm, message_size)) {
> +        return (uint64_t)-1;
> +    }
> +
> +    unsigned int i;
> +    int res = 0;
> +    size_t done = 0;
> +
> +    if (fd < 0) {
> +        error_report("Bad fd for map");
> +        return (uint64_t)-1;
> +    }
> +
> +    for (i = 0; i < sm->count && !res; i++) {
> +        VhostUserFSSlaveMsgEntry *e = &sm->entries[i];
> +        if (e->len == 0) {
> +            continue;
> +        }
> +
> +        size_t len = e->len;
> +        uint64_t fd_offset = e->fd_offset;
> +        hwaddr gpa = e->c_offset;
> +
> +        while (len && !res) {
> +            hwaddr xlat, xlat_len;
> +            bool is_write = e->flags & VHOST_USER_FS_FLAG_MAP_W;
> +            MemoryRegion *mr = address_space_translate(dev->vdev->dma_as, gpa,
> +                                                       &xlat, &xlat_len,
> +                                                       is_write,
> +                                                       MEMTXATTRS_UNSPECIFIED);
> +            if (!mr || !xlat_len) {
> +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> +                res = -EFAULT;
> +                break;
> +            }
> +
> +            trace_vhost_user_fs_slave_io_loop(mr->name,
> +                                          (uint64_t)xlat,
> +                                          memory_region_is_ram(mr),
> +                                          memory_region_is_romd(mr),
> +                                          (size_t)xlat_len);
> +
> +            void *hostptr = qemu_map_ram_ptr(mr->ram_block,
> +                                             xlat);
> +            ssize_t transferred;
> +            if (e->flags & VHOST_USER_FS_FLAG_MAP_R) {
> +                /* Read from file into RAM */
> +                if (mr->readonly) {
> +                    res = -EFAULT;
> +                    break;
> +                }
> +                transferred = pread(fd, hostptr, xlat_len, fd_offset);
> +            } else if (e->flags & VHOST_USER_FS_FLAG_MAP_W) {
> +                /* Write into file from RAM */
> +                transferred = pwrite(fd, hostptr, xlat_len, fd_offset);
> +            } else {
> +                transferred = EINVAL;
> +            }
> +
> +            trace_vhost_user_fs_slave_io_loop_res(transferred);
> +            if (transferred < 0) {
> +                res = -errno;
> +                break;
> +            }
> +            if (!transferred) {
> +                /* EOF */
> +                break;
> +            }
> +
> +            done += transferred;
> +            fd_offset += transferred;
> +            gpa += transferred;
> +            len -= transferred;
> +        }
> +    }
> +    close(fd);
> +
> +    trace_vhost_user_fs_slave_io_exit(res, done);
> +    if (res < 0) {
> +        return (uint64_t)res;
> +    }
> +    return (uint64_t)done;
> +}
> +
>  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index ad9170f8dc..b9699586ae 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -138,6 +138,7 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_VRING_ERR = 5,
>      VHOST_USER_SLAVE_FS_MAP = 6,
>      VHOST_USER_SLAVE_FS_UNMAP = 7,
> +    VHOST_USER_SLAVE_FS_IO = 8,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -1562,6 +1563,9 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
>      case VHOST_USER_SLAVE_FS_UNMAP:
>          ret = vhost_user_fs_slave_unmap(dev, hdr.size, &payload.fs);
>          break;
> +    case VHOST_USER_SLAVE_FS_IO:
> +        ret = vhost_user_fs_slave_io(dev, hdr.size, &payload.fs, fd[0]);
> +        break;
>  #endif
>      default:
>          error_report("Received unexpected msg type: %d.", hdr.request);
> diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> index 0766f17548..2931164e23 100644
> --- a/include/hw/virtio/vhost-user-fs.h
> +++ b/include/hw/virtio/vhost-user-fs.h
> @@ -78,5 +78,7 @@ uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
>                                   VhostUserFSSlaveMsg *sm, int fd);
>  uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
>                                     VhostUserFSSlaveMsg *sm);
> +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
> +                                VhostUserFSSlaveMsg *sm, int fd);
>  
>  #endif /* _QEMU_VHOST_USER_FS_H */
> diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> index a98c5f5c11..42b0833c4b 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -121,6 +121,7 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_VRING_ERR = 5,
>      VHOST_USER_SLAVE_FS_MAP = 6,
>      VHOST_USER_SLAVE_FS_UNMAP = 7,
> +    VHOST_USER_SLAVE_FS_IO = 8,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> -- 
> 2.31.1
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Virtio-fs] [PATCH v2 17/25] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  2021-04-21 20:07   ` [Virtio-fs] " Vivek Goyal
@ 2021-04-22  9:29     ` Dr. David Alan Gilbert
  2021-04-22 15:40       ` Vivek Goyal
  0 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-22  9:29 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, stefanha

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Wed, Apr 14, 2021 at 04:51:29PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
> > client to ask qemu to perform a read/write from an fd directly
> > to GPA.
> 
> Hi David,
> 
> Today I came across process_vm_readv() and process_vm_writev().
> 
> https://man7.org/linux/man-pages/man2/process_vm_readv.2.html

Yes, I just saw these (reading the lwn article on process_vm_exec)

> I am wondering if we can use these to read from/write to qemu address
> space (DAX window, which virtiofsd does not have access to).
> 
> So for the case of reading from DAX window into an fd, we probably
> will have to first read from qemu DAX window to virtiofsd local memory
> and then do a writev(fd).
> 
> Don't know if this approach is faster/slower as compared to sending
> a vhost-user message to qemu.

I think there are some other problems as well:
  a) I'm not sure the permissions currently work out - I think it's
saying you need to either have CAP_SYS_PTRACE or the same uid as the
other process; and I don't think that's normally the relationship
between the daemon and the qemu.

  b) The virtiofsd would have to know something about the addresses in
qemu's address space, rather than currently only knowing guest
addresses.

  c) No one said that mapping is linear/simple, especially for an area
where an fd wasn't passed for shared memory.

Also, this interface does protect qemu from the daemon writing to
arbitrary qemu data strctures.

Still, those interfaces do sound attractive for something - just not
quite figured out what.

Dave



> Thanks
> Vivek
> 
> 
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  docs/interop/vhost-user.rst               | 16 ++++
> >  hw/virtio/trace-events                    |  6 ++
> >  hw/virtio/vhost-user-fs.c                 | 95 +++++++++++++++++++++++
> >  hw/virtio/vhost-user.c                    |  4 +
> >  include/hw/virtio/vhost-user-fs.h         |  2 +
> >  subprojects/libvhost-user/libvhost-user.h |  1 +
> >  6 files changed, 124 insertions(+)
> > 
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index 09aee3565d..2fa62ea451 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1453,6 +1453,22 @@ Slave message types
> >    multiple chunks can be unmapped in one command.
> >    A reply is generated indicating whether unmapping succeeded.
> >  
> > +``VHOST_USER_SLAVE_FS_IO``
> > +  :id: 9
> > +  :equivalent ioctl: N/A
> > +  :slave payload: ``struct VhostUserFSSlaveMsg``
> > +  :master payload: N/A
> > +
> > +  Requests that IO be performed directly from an fd, passed in ancillary
> > +  data, to guest memory on behalf of the daemon; this is normally for a
> > +  case where a memory region isn't visible to the daemon. slave payload
> > +  has flags which determine the direction of IO operation.
> > +
> > +  The ``VHOST_USER_FS_FLAG_MAP_R`` flag must be set in the ``flags`` field to
> > +  read from the file into RAM.
> > +  The ``VHOST_USER_FS_FLAG_MAP_W`` flag must be set in the ``flags`` field to
> > +  write to the file from RAM.
> > +
> >  .. _reply_ack:
> >  
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index c62727f879..20557a078e 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
> >  vhost_vdpa_set_owner(void *dev) "dev: %p"
> >  vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> >  
> > +# vhost-user-fs.c
> > +
> > +vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
> > +vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
> > +vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
> > +
> >  # virtio.c
> >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> >  virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 963f694435..5511838f29 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -23,6 +23,8 @@
> >  #include "hw/virtio/vhost-user-fs.h"
> >  #include "monitor/monitor.h"
> >  #include "sysemu/sysemu.h"
> > +#include "exec/address-spaces.h"
> > +#include "trace.h"
> >  
> >  static const int user_feature_bits[] = {
> >      VIRTIO_F_VERSION_1,
> > @@ -220,6 +222,99 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> >      return (uint64_t)res;
> >  }
> >  
> > +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
> > +                                VhostUserFSSlaveMsg *sm, int fd)
> > +{
> > +    VHostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
> > +                          TYPE_VHOST_USER_FS);
> > +    if (!fs) {
> > +        error_report("%s: Bad fs ptr", __func__);
> > +        return (uint64_t)-1;
> > +    }
> > +    if (!check_slave_message_entries(sm, message_size)) {
> > +        return (uint64_t)-1;
> > +    }
> > +
> > +    unsigned int i;
> > +    int res = 0;
> > +    size_t done = 0;
> > +
> > +    if (fd < 0) {
> > +        error_report("Bad fd for map");
> > +        return (uint64_t)-1;
> > +    }
> > +
> > +    for (i = 0; i < sm->count && !res; i++) {
> > +        VhostUserFSSlaveMsgEntry *e = &sm->entries[i];
> > +        if (e->len == 0) {
> > +            continue;
> > +        }
> > +
> > +        size_t len = e->len;
> > +        uint64_t fd_offset = e->fd_offset;
> > +        hwaddr gpa = e->c_offset;
> > +
> > +        while (len && !res) {
> > +            hwaddr xlat, xlat_len;
> > +            bool is_write = e->flags & VHOST_USER_FS_FLAG_MAP_W;
> > +            MemoryRegion *mr = address_space_translate(dev->vdev->dma_as, gpa,
> > +                                                       &xlat, &xlat_len,
> > +                                                       is_write,
> > +                                                       MEMTXATTRS_UNSPECIFIED);
> > +            if (!mr || !xlat_len) {
> > +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> > +                res = -EFAULT;
> > +                break;
> > +            }
> > +
> > +            trace_vhost_user_fs_slave_io_loop(mr->name,
> > +                                          (uint64_t)xlat,
> > +                                          memory_region_is_ram(mr),
> > +                                          memory_region_is_romd(mr),
> > +                                          (size_t)xlat_len);
> > +
> > +            void *hostptr = qemu_map_ram_ptr(mr->ram_block,
> > +                                             xlat);
> > +            ssize_t transferred;
> > +            if (e->flags & VHOST_USER_FS_FLAG_MAP_R) {
> > +                /* Read from file into RAM */
> > +                if (mr->readonly) {
> > +                    res = -EFAULT;
> > +                    break;
> > +                }
> > +                transferred = pread(fd, hostptr, xlat_len, fd_offset);
> > +            } else if (e->flags & VHOST_USER_FS_FLAG_MAP_W) {
> > +                /* Write into file from RAM */
> > +                transferred = pwrite(fd, hostptr, xlat_len, fd_offset);
> > +            } else {
> > +                transferred = EINVAL;
> > +            }
> > +
> > +            trace_vhost_user_fs_slave_io_loop_res(transferred);
> > +            if (transferred < 0) {
> > +                res = -errno;
> > +                break;
> > +            }
> > +            if (!transferred) {
> > +                /* EOF */
> > +                break;
> > +            }
> > +
> > +            done += transferred;
> > +            fd_offset += transferred;
> > +            gpa += transferred;
> > +            len -= transferred;
> > +        }
> > +    }
> > +    close(fd);
> > +
> > +    trace_vhost_user_fs_slave_io_exit(res, done);
> > +    if (res < 0) {
> > +        return (uint64_t)res;
> > +    }
> > +    return (uint64_t)done;
> > +}
> > +
> >  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index ad9170f8dc..b9699586ae 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -138,6 +138,7 @@ typedef enum VhostUserSlaveRequest {
> >      VHOST_USER_SLAVE_VRING_ERR = 5,
> >      VHOST_USER_SLAVE_FS_MAP = 6,
> >      VHOST_USER_SLAVE_FS_UNMAP = 7,
> > +    VHOST_USER_SLAVE_FS_IO = 8,
> >      VHOST_USER_SLAVE_MAX
> >  }  VhostUserSlaveRequest;
> >  
> > @@ -1562,6 +1563,9 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
> >      case VHOST_USER_SLAVE_FS_UNMAP:
> >          ret = vhost_user_fs_slave_unmap(dev, hdr.size, &payload.fs);
> >          break;
> > +    case VHOST_USER_SLAVE_FS_IO:
> > +        ret = vhost_user_fs_slave_io(dev, hdr.size, &payload.fs, fd[0]);
> > +        break;
> >  #endif
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> > diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> > index 0766f17548..2931164e23 100644
> > --- a/include/hw/virtio/vhost-user-fs.h
> > +++ b/include/hw/virtio/vhost-user-fs.h
> > @@ -78,5 +78,7 @@ uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
> >                                   VhostUserFSSlaveMsg *sm, int fd);
> >  uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> >                                     VhostUserFSSlaveMsg *sm);
> > +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
> > +                                VhostUserFSSlaveMsg *sm, int fd);
> >  
> >  #endif /* _QEMU_VHOST_USER_FS_H */
> > diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> > index a98c5f5c11..42b0833c4b 100644
> > --- a/subprojects/libvhost-user/libvhost-user.h
> > +++ b/subprojects/libvhost-user/libvhost-user.h
> > @@ -121,6 +121,7 @@ typedef enum VhostUserSlaveRequest {
> >      VHOST_USER_SLAVE_VRING_ERR = 5,
> >      VHOST_USER_SLAVE_FS_MAP = 6,
> >      VHOST_USER_SLAVE_FS_UNMAP = 7,
> > +    VHOST_USER_SLAVE_FS_IO = 8,
> >      VHOST_USER_SLAVE_MAX
> >  }  VhostUserSlaveRequest;
> >  
> > -- 
> > 2.31.1
> > 
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Virtio-fs] [PATCH v2 17/25] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  2021-04-22  9:29     ` Dr. David Alan Gilbert
@ 2021-04-22 15:40       ` Vivek Goyal
  2021-04-22 15:48         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Vivek Goyal @ 2021-04-22 15:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-fs, qemu-devel, stefanha

On Thu, Apr 22, 2021 at 10:29:08AM +0100, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Wed, Apr 14, 2021 at 04:51:29PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
> > > client to ask qemu to perform a read/write from an fd directly
> > > to GPA.
> > 
> > Hi David,
> > 
> > Today I came across process_vm_readv() and process_vm_writev().
> > 
> > https://man7.org/linux/man-pages/man2/process_vm_readv.2.html
> 
> Yes, I just saw these (reading the lwn article on process_vm_exec)
> 
> > I am wondering if we can use these to read from/write to qemu address
> > space (DAX window, which virtiofsd does not have access to).
> > 
> > So for the case of reading from DAX window into an fd, we probably
> > will have to first read from qemu DAX window to virtiofsd local memory
> > and then do a writev(fd).
> > 
> > Don't know if this approach is faster/slower as compared to sending
> > a vhost-user message to qemu.
> 
> I think there are some other problems as well:
>   a) I'm not sure the permissions currently work out - I think it's
> saying you need to either have CAP_SYS_PTRACE or the same uid as the
> other process; and I don't think that's normally the relationship
> between the daemon and the qemu.

That's a good point. We don't give CAP_SYS_PTRACE yet. May be if
qemu and virtiofsd run in same user namespace then giving CAP_SYS_PTRACE
might not be a bad idea (hopefully these interfaces work if CAP_SYS_PTRACE
is not given in init_user_ns).

We don't have a plan to run virtiofsd as user "qemu". So that will not
work well.

> 
>   b) The virtiofsd would have to know something about the addresses in
> qemu's address space, rather than currently only knowing guest
> addresses.

Yes. But that could be one time message exchange to know where the
DAX window is in qemu address space?

> 
>   c) No one said that mapping is linear/simple, especially for an area
> where an fd wasn't passed for shared memory.

I don't understand this point. DAX window mapping is linear in 
qemu address space, right? And these new interfaces are only doing
two types of I/O/

- Read from fd and write to DAX window
- Read from DAX window and write to fd.

So it is assumed fd is there. 

I am probably not getting the point you referring to. Please elaborate
a bit more.

> 
> Also, this interface does protect qemu from the daemon writing to
> arbitrary qemu data strctures.

But daemon seems to be more priviliged here (running as root), as
compared to qemu. So I can understand that it protects from 
accidental writing.

> 
> Still, those interfaces do sound attractive for something - just not
> quite figured out what.

Yes, let's stick to current implementation for now. We can experiment
with these new interfaces down the line.

Vivek

> 
> Dave
> 
> 
> 
> > Thanks
> > Vivek
> > 
> > 
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  docs/interop/vhost-user.rst               | 16 ++++
> > >  hw/virtio/trace-events                    |  6 ++
> > >  hw/virtio/vhost-user-fs.c                 | 95 +++++++++++++++++++++++
> > >  hw/virtio/vhost-user.c                    |  4 +
> > >  include/hw/virtio/vhost-user-fs.h         |  2 +
> > >  subprojects/libvhost-user/libvhost-user.h |  1 +
> > >  6 files changed, 124 insertions(+)
> > > 
> > > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > > index 09aee3565d..2fa62ea451 100644
> > > --- a/docs/interop/vhost-user.rst
> > > +++ b/docs/interop/vhost-user.rst
> > > @@ -1453,6 +1453,22 @@ Slave message types
> > >    multiple chunks can be unmapped in one command.
> > >    A reply is generated indicating whether unmapping succeeded.
> > >  
> > > +``VHOST_USER_SLAVE_FS_IO``
> > > +  :id: 9
> > > +  :equivalent ioctl: N/A
> > > +  :slave payload: ``struct VhostUserFSSlaveMsg``
> > > +  :master payload: N/A
> > > +
> > > +  Requests that IO be performed directly from an fd, passed in ancillary
> > > +  data, to guest memory on behalf of the daemon; this is normally for a
> > > +  case where a memory region isn't visible to the daemon. slave payload
> > > +  has flags which determine the direction of IO operation.
> > > +
> > > +  The ``VHOST_USER_FS_FLAG_MAP_R`` flag must be set in the ``flags`` field to
> > > +  read from the file into RAM.
> > > +  The ``VHOST_USER_FS_FLAG_MAP_W`` flag must be set in the ``flags`` field to
> > > +  write to the file from RAM.
> > > +
> > >  .. _reply_ack:
> > >  
> > >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > index c62727f879..20557a078e 100644
> > > --- a/hw/virtio/trace-events
> > > +++ b/hw/virtio/trace-events
> > > @@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
> > >  vhost_vdpa_set_owner(void *dev) "dev: %p"
> > >  vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> > >  
> > > +# vhost-user-fs.c
> > > +
> > > +vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
> > > +vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
> > > +vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
> > > +
> > >  # virtio.c
> > >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > >  virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
> > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > index 963f694435..5511838f29 100644
> > > --- a/hw/virtio/vhost-user-fs.c
> > > +++ b/hw/virtio/vhost-user-fs.c
> > > @@ -23,6 +23,8 @@
> > >  #include "hw/virtio/vhost-user-fs.h"
> > >  #include "monitor/monitor.h"
> > >  #include "sysemu/sysemu.h"
> > > +#include "exec/address-spaces.h"
> > > +#include "trace.h"
> > >  
> > >  static const int user_feature_bits[] = {
> > >      VIRTIO_F_VERSION_1,
> > > @@ -220,6 +222,99 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> > >      return (uint64_t)res;
> > >  }
> > >  
> > > +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
> > > +                                VhostUserFSSlaveMsg *sm, int fd)
> > > +{
> > > +    VHostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
> > > +                          TYPE_VHOST_USER_FS);
> > > +    if (!fs) {
> > > +        error_report("%s: Bad fs ptr", __func__);
> > > +        return (uint64_t)-1;
> > > +    }
> > > +    if (!check_slave_message_entries(sm, message_size)) {
> > > +        return (uint64_t)-1;
> > > +    }
> > > +
> > > +    unsigned int i;
> > > +    int res = 0;
> > > +    size_t done = 0;
> > > +
> > > +    if (fd < 0) {
> > > +        error_report("Bad fd for map");
> > > +        return (uint64_t)-1;
> > > +    }
> > > +
> > > +    for (i = 0; i < sm->count && !res; i++) {
> > > +        VhostUserFSSlaveMsgEntry *e = &sm->entries[i];
> > > +        if (e->len == 0) {
> > > +            continue;
> > > +        }
> > > +
> > > +        size_t len = e->len;
> > > +        uint64_t fd_offset = e->fd_offset;
> > > +        hwaddr gpa = e->c_offset;
> > > +
> > > +        while (len && !res) {
> > > +            hwaddr xlat, xlat_len;
> > > +            bool is_write = e->flags & VHOST_USER_FS_FLAG_MAP_W;
> > > +            MemoryRegion *mr = address_space_translate(dev->vdev->dma_as, gpa,
> > > +                                                       &xlat, &xlat_len,
> > > +                                                       is_write,
> > > +                                                       MEMTXATTRS_UNSPECIFIED);
> > > +            if (!mr || !xlat_len) {
> > > +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> > > +                res = -EFAULT;
> > > +                break;
> > > +            }
> > > +
> > > +            trace_vhost_user_fs_slave_io_loop(mr->name,
> > > +                                          (uint64_t)xlat,
> > > +                                          memory_region_is_ram(mr),
> > > +                                          memory_region_is_romd(mr),
> > > +                                          (size_t)xlat_len);
> > > +
> > > +            void *hostptr = qemu_map_ram_ptr(mr->ram_block,
> > > +                                             xlat);
> > > +            ssize_t transferred;
> > > +            if (e->flags & VHOST_USER_FS_FLAG_MAP_R) {
> > > +                /* Read from file into RAM */
> > > +                if (mr->readonly) {
> > > +                    res = -EFAULT;
> > > +                    break;
> > > +                }
> > > +                transferred = pread(fd, hostptr, xlat_len, fd_offset);
> > > +            } else if (e->flags & VHOST_USER_FS_FLAG_MAP_W) {
> > > +                /* Write into file from RAM */
> > > +                transferred = pwrite(fd, hostptr, xlat_len, fd_offset);
> > > +            } else {
> > > +                transferred = EINVAL;
> > > +            }
> > > +
> > > +            trace_vhost_user_fs_slave_io_loop_res(transferred);
> > > +            if (transferred < 0) {
> > > +                res = -errno;
> > > +                break;
> > > +            }
> > > +            if (!transferred) {
> > > +                /* EOF */
> > > +                break;
> > > +            }
> > > +
> > > +            done += transferred;
> > > +            fd_offset += transferred;
> > > +            gpa += transferred;
> > > +            len -= transferred;
> > > +        }
> > > +    }
> > > +    close(fd);
> > > +
> > > +    trace_vhost_user_fs_slave_io_exit(res, done);
> > > +    if (res < 0) {
> > > +        return (uint64_t)res;
> > > +    }
> > > +    return (uint64_t)done;
> > > +}
> > > +
> > >  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
> > >  {
> > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index ad9170f8dc..b9699586ae 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -138,6 +138,7 @@ typedef enum VhostUserSlaveRequest {
> > >      VHOST_USER_SLAVE_VRING_ERR = 5,
> > >      VHOST_USER_SLAVE_FS_MAP = 6,
> > >      VHOST_USER_SLAVE_FS_UNMAP = 7,
> > > +    VHOST_USER_SLAVE_FS_IO = 8,
> > >      VHOST_USER_SLAVE_MAX
> > >  }  VhostUserSlaveRequest;
> > >  
> > > @@ -1562,6 +1563,9 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
> > >      case VHOST_USER_SLAVE_FS_UNMAP:
> > >          ret = vhost_user_fs_slave_unmap(dev, hdr.size, &payload.fs);
> > >          break;
> > > +    case VHOST_USER_SLAVE_FS_IO:
> > > +        ret = vhost_user_fs_slave_io(dev, hdr.size, &payload.fs, fd[0]);
> > > +        break;
> > >  #endif
> > >      default:
> > >          error_report("Received unexpected msg type: %d.", hdr.request);
> > > diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> > > index 0766f17548..2931164e23 100644
> > > --- a/include/hw/virtio/vhost-user-fs.h
> > > +++ b/include/hw/virtio/vhost-user-fs.h
> > > @@ -78,5 +78,7 @@ uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
> > >                                   VhostUserFSSlaveMsg *sm, int fd);
> > >  uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> > >                                     VhostUserFSSlaveMsg *sm);
> > > +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
> > > +                                VhostUserFSSlaveMsg *sm, int fd);
> > >  
> > >  #endif /* _QEMU_VHOST_USER_FS_H */
> > > diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> > > index a98c5f5c11..42b0833c4b 100644
> > > --- a/subprojects/libvhost-user/libvhost-user.h
> > > +++ b/subprojects/libvhost-user/libvhost-user.h
> > > @@ -121,6 +121,7 @@ typedef enum VhostUserSlaveRequest {
> > >      VHOST_USER_SLAVE_VRING_ERR = 5,
> > >      VHOST_USER_SLAVE_FS_MAP = 6,
> > >      VHOST_USER_SLAVE_FS_UNMAP = 7,
> > > +    VHOST_USER_SLAVE_FS_IO = 8,
> > >      VHOST_USER_SLAVE_MAX
> > >  }  VhostUserSlaveRequest;
> > >  
> > > -- 
> > > 2.31.1
> > > 
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Virtio-fs] [PATCH v2 17/25] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  2021-04-22 15:40       ` Vivek Goyal
@ 2021-04-22 15:48         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-22 15:48 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, stefanha

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Thu, Apr 22, 2021 at 10:29:08AM +0100, Dr. David Alan Gilbert wrote:
> > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > On Wed, Apr 14, 2021 at 04:51:29PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
> > > > client to ask qemu to perform a read/write from an fd directly
> > > > to GPA.
> > > 
> > > Hi David,
> > > 
> > > Today I came across process_vm_readv() and process_vm_writev().
> > > 
> > > https://man7.org/linux/man-pages/man2/process_vm_readv.2.html
> > 
> > Yes, I just saw these (reading the lwn article on process_vm_exec)
> > 
> > > I am wondering if we can use these to read from/write to qemu address
> > > space (DAX window, which virtiofsd does not have access to).
> > > 
> > > So for the case of reading from DAX window into an fd, we probably
> > > will have to first read from qemu DAX window to virtiofsd local memory
> > > and then do a writev(fd).
> > > 
> > > Don't know if this approach is faster/slower as compared to sending
> > > a vhost-user message to qemu.
> > 
> > I think there are some other problems as well:
> >   a) I'm not sure the permissions currently work out - I think it's
> > saying you need to either have CAP_SYS_PTRACE or the same uid as the
> > other process; and I don't think that's normally the relationship
> > between the daemon and the qemu.
> 
> That's a good point. We don't give CAP_SYS_PTRACE yet. May be if
> qemu and virtiofsd run in same user namespace then giving CAP_SYS_PTRACE
> might not be a bad idea (hopefully these interfaces work if CAP_SYS_PTRACE
> is not given in init_user_ns).
> 
> We don't have a plan to run virtiofsd as user "qemu". So that will not
> work well.
> 
> > 
> >   b) The virtiofsd would have to know something about the addresses in
> > qemu's address space, rather than currently only knowing guest
> > addresses.
> 
> Yes. But that could be one time message exchange to know where the
> DAX window is in qemu address space?
> 
> > 
> >   c) No one said that mapping is linear/simple, especially for an area
> > where an fd wasn't passed for shared memory.
> 
> I don't understand this point. DAX window mapping is linear in 
> qemu address space, right? And these new interfaces are only doing
> two types of I/O/
> 
> - Read from fd and write to DAX window
> - Read from DAX window and write to fd.
> 
> So it is assumed fd is there. 
> 
> I am probably not getting the point you referring to. Please elaborate
> a bit more.

This interface is a bit more flexible than DAX; for example, it might
well work for a read into a host-passthrough device - e.g. a read
from a virtiofs filesytsem into a physical GPU buffer or real pmem
PCI device - and now we're getting into not being able to assume how
easy it is to translate those addresses.

> > Also, this interface does protect qemu from the daemon writing to
> > arbitrary qemu data strctures.
> 
> But daemon seems to be more priviliged here (running as root), as
> compared to qemu. So I can understand that it protects from 
> accidental writing.

Yes, but in principal it doesn't need to be - e.g. the daemon could
be root in it's own namespace separate from qemu, or read-only
less privileged for configuration data.

> > Still, those interfaces do sound attractive for something - just not
> > quite figured out what.
> 
> Yes, let's stick to current implementation for now. We can experiment
> with these new interfaces down the line.

Yep.

Dave

> Vivek
> 
> > 
> > Dave
> > 
> > 
> > 
> > > Thanks
> > > Vivek
> > > 
> > > 
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >  docs/interop/vhost-user.rst               | 16 ++++
> > > >  hw/virtio/trace-events                    |  6 ++
> > > >  hw/virtio/vhost-user-fs.c                 | 95 +++++++++++++++++++++++
> > > >  hw/virtio/vhost-user.c                    |  4 +
> > > >  include/hw/virtio/vhost-user-fs.h         |  2 +
> > > >  subprojects/libvhost-user/libvhost-user.h |  1 +
> > > >  6 files changed, 124 insertions(+)
> > > > 
> > > > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > > > index 09aee3565d..2fa62ea451 100644
> > > > --- a/docs/interop/vhost-user.rst
> > > > +++ b/docs/interop/vhost-user.rst
> > > > @@ -1453,6 +1453,22 @@ Slave message types
> > > >    multiple chunks can be unmapped in one command.
> > > >    A reply is generated indicating whether unmapping succeeded.
> > > >  
> > > > +``VHOST_USER_SLAVE_FS_IO``
> > > > +  :id: 9
> > > > +  :equivalent ioctl: N/A
> > > > +  :slave payload: ``struct VhostUserFSSlaveMsg``
> > > > +  :master payload: N/A
> > > > +
> > > > +  Requests that IO be performed directly from an fd, passed in ancillary
> > > > +  data, to guest memory on behalf of the daemon; this is normally for a
> > > > +  case where a memory region isn't visible to the daemon. slave payload
> > > > +  has flags which determine the direction of IO operation.
> > > > +
> > > > +  The ``VHOST_USER_FS_FLAG_MAP_R`` flag must be set in the ``flags`` field to
> > > > +  read from the file into RAM.
> > > > +  The ``VHOST_USER_FS_FLAG_MAP_W`` flag must be set in the ``flags`` field to
> > > > +  write to the file from RAM.
> > > > +
> > > >  .. _reply_ack:
> > > >  
> > > >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > > index c62727f879..20557a078e 100644
> > > > --- a/hw/virtio/trace-events
> > > > +++ b/hw/virtio/trace-events
> > > > @@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
> > > >  vhost_vdpa_set_owner(void *dev) "dev: %p"
> > > >  vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> > > >  
> > > > +# vhost-user-fs.c
> > > > +
> > > > +vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
> > > > +vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
> > > > +vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
> > > > +
> > > >  # virtio.c
> > > >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > > >  virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
> > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > > index 963f694435..5511838f29 100644
> > > > --- a/hw/virtio/vhost-user-fs.c
> > > > +++ b/hw/virtio/vhost-user-fs.c
> > > > @@ -23,6 +23,8 @@
> > > >  #include "hw/virtio/vhost-user-fs.h"
> > > >  #include "monitor/monitor.h"
> > > >  #include "sysemu/sysemu.h"
> > > > +#include "exec/address-spaces.h"
> > > > +#include "trace.h"
> > > >  
> > > >  static const int user_feature_bits[] = {
> > > >      VIRTIO_F_VERSION_1,
> > > > @@ -220,6 +222,99 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> > > >      return (uint64_t)res;
> > > >  }
> > > >  
> > > > +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
> > > > +                                VhostUserFSSlaveMsg *sm, int fd)
> > > > +{
> > > > +    VHostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
> > > > +                          TYPE_VHOST_USER_FS);
> > > > +    if (!fs) {
> > > > +        error_report("%s: Bad fs ptr", __func__);
> > > > +        return (uint64_t)-1;
> > > > +    }
> > > > +    if (!check_slave_message_entries(sm, message_size)) {
> > > > +        return (uint64_t)-1;
> > > > +    }
> > > > +
> > > > +    unsigned int i;
> > > > +    int res = 0;
> > > > +    size_t done = 0;
> > > > +
> > > > +    if (fd < 0) {
> > > > +        error_report("Bad fd for map");
> > > > +        return (uint64_t)-1;
> > > > +    }
> > > > +
> > > > +    for (i = 0; i < sm->count && !res; i++) {
> > > > +        VhostUserFSSlaveMsgEntry *e = &sm->entries[i];
> > > > +        if (e->len == 0) {
> > > > +            continue;
> > > > +        }
> > > > +
> > > > +        size_t len = e->len;
> > > > +        uint64_t fd_offset = e->fd_offset;
> > > > +        hwaddr gpa = e->c_offset;
> > > > +
> > > > +        while (len && !res) {
> > > > +            hwaddr xlat, xlat_len;
> > > > +            bool is_write = e->flags & VHOST_USER_FS_FLAG_MAP_W;
> > > > +            MemoryRegion *mr = address_space_translate(dev->vdev->dma_as, gpa,
> > > > +                                                       &xlat, &xlat_len,
> > > > +                                                       is_write,
> > > > +                                                       MEMTXATTRS_UNSPECIFIED);
> > > > +            if (!mr || !xlat_len) {
> > > > +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> > > > +                res = -EFAULT;
> > > > +                break;
> > > > +            }
> > > > +
> > > > +            trace_vhost_user_fs_slave_io_loop(mr->name,
> > > > +                                          (uint64_t)xlat,
> > > > +                                          memory_region_is_ram(mr),
> > > > +                                          memory_region_is_romd(mr),
> > > > +                                          (size_t)xlat_len);
> > > > +
> > > > +            void *hostptr = qemu_map_ram_ptr(mr->ram_block,
> > > > +                                             xlat);
> > > > +            ssize_t transferred;
> > > > +            if (e->flags & VHOST_USER_FS_FLAG_MAP_R) {
> > > > +                /* Read from file into RAM */
> > > > +                if (mr->readonly) {
> > > > +                    res = -EFAULT;
> > > > +                    break;
> > > > +                }
> > > > +                transferred = pread(fd, hostptr, xlat_len, fd_offset);
> > > > +            } else if (e->flags & VHOST_USER_FS_FLAG_MAP_W) {
> > > > +                /* Write into file from RAM */
> > > > +                transferred = pwrite(fd, hostptr, xlat_len, fd_offset);
> > > > +            } else {
> > > > +                transferred = EINVAL;
> > > > +            }
> > > > +
> > > > +            trace_vhost_user_fs_slave_io_loop_res(transferred);
> > > > +            if (transferred < 0) {
> > > > +                res = -errno;
> > > > +                break;
> > > > +            }
> > > > +            if (!transferred) {
> > > > +                /* EOF */
> > > > +                break;
> > > > +            }
> > > > +
> > > > +            done += transferred;
> > > > +            fd_offset += transferred;
> > > > +            gpa += transferred;
> > > > +            len -= transferred;
> > > > +        }
> > > > +    }
> > > > +    close(fd);
> > > > +
> > > > +    trace_vhost_user_fs_slave_io_exit(res, done);
> > > > +    if (res < 0) {
> > > > +        return (uint64_t)res;
> > > > +    }
> > > > +    return (uint64_t)done;
> > > > +}
> > > > +
> > > >  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
> > > >  {
> > > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > index ad9170f8dc..b9699586ae 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -138,6 +138,7 @@ typedef enum VhostUserSlaveRequest {
> > > >      VHOST_USER_SLAVE_VRING_ERR = 5,
> > > >      VHOST_USER_SLAVE_FS_MAP = 6,
> > > >      VHOST_USER_SLAVE_FS_UNMAP = 7,
> > > > +    VHOST_USER_SLAVE_FS_IO = 8,
> > > >      VHOST_USER_SLAVE_MAX
> > > >  }  VhostUserSlaveRequest;
> > > >  
> > > > @@ -1562,6 +1563,9 @@ static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
> > > >      case VHOST_USER_SLAVE_FS_UNMAP:
> > > >          ret = vhost_user_fs_slave_unmap(dev, hdr.size, &payload.fs);
> > > >          break;
> > > > +    case VHOST_USER_SLAVE_FS_IO:
> > > > +        ret = vhost_user_fs_slave_io(dev, hdr.size, &payload.fs, fd[0]);
> > > > +        break;
> > > >  #endif
> > > >      default:
> > > >          error_report("Received unexpected msg type: %d.", hdr.request);
> > > > diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> > > > index 0766f17548..2931164e23 100644
> > > > --- a/include/hw/virtio/vhost-user-fs.h
> > > > +++ b/include/hw/virtio/vhost-user-fs.h
> > > > @@ -78,5 +78,7 @@ uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, int message_size,
> > > >                                   VhostUserFSSlaveMsg *sm, int fd);
> > > >  uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev, int message_size,
> > > >                                     VhostUserFSSlaveMsg *sm);
> > > > +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, int message_size,
> > > > +                                VhostUserFSSlaveMsg *sm, int fd);
> > > >  
> > > >  #endif /* _QEMU_VHOST_USER_FS_H */
> > > > diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> > > > index a98c5f5c11..42b0833c4b 100644
> > > > --- a/subprojects/libvhost-user/libvhost-user.h
> > > > +++ b/subprojects/libvhost-user/libvhost-user.h
> > > > @@ -121,6 +121,7 @@ typedef enum VhostUserSlaveRequest {
> > > >      VHOST_USER_SLAVE_VRING_ERR = 5,
> > > >      VHOST_USER_SLAVE_FS_MAP = 6,
> > > >      VHOST_USER_SLAVE_FS_UNMAP = 7,
> > > > +    VHOST_USER_SLAVE_FS_IO = 8,
> > > >      VHOST_USER_SLAVE_MAX
> > > >  }  VhostUserSlaveRequest;
> > > >  
> > > > -- 
> > > > 2.31.1
> > > > 
> > > > _______________________________________________
> > > > Virtio-fs mailing list
> > > > Virtio-fs@redhat.com
> > > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > -- 
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-04-22 15:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14 15:51 [PATCH v2 00/25] virtiofs dax patches Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 01/25] DAX: vhost-user: Rework slave return values Dr. David Alan Gilbert (git)
2021-04-16 10:59   ` [Virtio-fs] " Greg Kurz
2021-04-21 17:31     ` Dr. David Alan Gilbert
2021-04-14 15:51 ` [PATCH v2 02/25] virtiofsd: Don't assume header layout Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 03/25] DAX: libvhost-user: Route slave message payload Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 04/25] DAX: libvhost-user: Allow popping a queue element with bad pointers Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 05/25] DAX subprojects/libvhost-user: Add virtio-fs slave types Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 06/25] DAX: virtio: Add shared memory capability Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 07/25] DAX: virtio-fs: Add cache BAR Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 08/25] DAX: virtio-fs: Add vhost-user slave commands for mapping Dr. David Alan Gilbert (git)
2021-04-14 16:35   ` [Virtio-fs] " Greg Kurz
2021-04-21 17:49     ` Dr. David Alan Gilbert
2021-04-14 15:51 ` [PATCH v2 09/25] DAX: virtio-fs: Fill in " Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 10/25] DAX: virtiofsd Add cache accessor functions Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 11/25] DAX: virtiofsd: Add setup/remove mappings fuse commands Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 12/25] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 13/25] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 14/25] DAX: virtiofsd: Make lo_removemapping() work Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 15/25] DAX: virtiofsd: route se down to destroy method Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 16/25] DAX: virtiofsd: Perform an unmap on destroy Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 17/25] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
2021-04-21 20:07   ` [Virtio-fs] " Vivek Goyal
2021-04-22  9:29     ` Dr. David Alan Gilbert
2021-04-22 15:40       ` Vivek Goyal
2021-04-22 15:48         ` Dr. David Alan Gilbert
2021-04-14 15:51 ` [PATCH v2 18/25] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 19/25] DAX/unmap virtiofsd: Parse unmappable elements Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 20/25] DAX/unmap virtiofsd: Route unmappable reads Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 21/25] DAX/unmap virtiofsd: route unmappable write to slave command Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 22/25] DAX:virtiofsd: implement FUSE_INIT map_alignment field Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 23/25] vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 24/25] vhost-user-fs: Implement drop CAP_FSETID functionality Dr. David Alan Gilbert (git)
2021-04-14 15:51 ` [PATCH v2 25/25] virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).