All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/24] virtiofs dax patches
@ 2021-02-09 19:02 ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This series adds support for acceleration of virtiofs via DAX
mapping, using features added in the 5.11 Linux kernel.

  DAX originally existed in the kernel for mapping real storage
devices directly into memory, so that reads/writes turn into
reads/writes directly mapped into the storage device.

  virtiofs's DAX support is similar; a PCI BAR is exposed on the
virtiofs device corresponding to a DAX 'cache' of a user defined size.
The guest daemon then requests files to be mapped into that cache;
when that happens the virtiofsd sends filedescriptors and commands back
to the QEMU that mmap's those files directly into the memory slot
exposed to kvm.  The guest can then directly read/write to the files
exposed by virtiofs by reading/writing into the BAR.

  A typical invocation would be:
     -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,cache-size=4G 

and then the guest must mount with -o mount.

  Note that the cache doesn't really take VM up on the host, because
everything placed there is just an mmap of a file, so you can afford
to use quite a large cache size.

  Unlike a real DAX device, the cache is a finite size that's
potentially smaller than the underlying filesystem (especially when
mapping granuality is taken into account).  Mapping, unmapping and
remapping must take place to juggle files into the cache if it's too
small.  Some workloads benefit more than others.

Gotchas:
  a) The vhost-user slave channel has some bad reset behaviours;
these are fixed by Vivek's '[RFC PATCH 0/6] vhost-user: Shutdown/Flush
slave channel properly' that are on the list.

  b) If something else on the host truncates an mmap'd file,
kvm gets rather upset;  for this reason it's advised that DAX is
currently only suitable for use on non-shared filesystems.

Thanks a lot to Vivek who has spent a lot of time on the kernel side
and cleaning this series up.

Dave

Dr. David Alan Gilbert (19):
  DAX: vhost-user: Rework slave return values
  DAX: libvhost-user: Route slave message payload
  DAX: libvhost-user: Allow popping a queue element with bad pointers
  DAX subprojects/libvhost-user: Add virtio-fs slave types
  DAX: virtio: Add shared memory capability
  DAX: virtio-fs: Add cache BAR
  DAX: virtio-fs: Add vhost-user slave commands for mapping
  DAX: virtio-fs: Fill in slave commands for mapping
  DAX: virtiofsd Add cache accessor functions
  DAX: virtiofsd: Add setup/remove mappings fuse commands
  DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll
  DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
  DAX: virtiofsd: route se down to destroy method
  DAX: virtiofsd: Perform an unmap on destroy
  DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO
  DAX/unmap virtiofsd: Parse unmappable elements
  DAX/unmap virtiofsd: Route unmappable reads
  DAX/unmap virtiofsd: route unmappable write to slave command

Stefan Hajnoczi (1):
  DAX:virtiofsd: implement FUSE_INIT map_alignment field

Vivek Goyal (4):
  DAX: virtiofsd: Make lo_removemapping() work
  vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info
  vhost-user-fs: Implement drop CAP_FSETID functionality
  virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it

 block/export/vhost-user-blk-server.c      |   2 +-
 contrib/vhost-user-blk/vhost-user-blk.c   |   3 +-
 contrib/vhost-user-gpu/vhost-user-gpu.c   |   5 +-
 contrib/vhost-user-input/main.c           |   4 +-
 contrib/vhost-user-scsi/vhost-user-scsi.c |   2 +-
 docs/interop/vhost-user.rst               |  31 ++
 hw/virtio/meson.build                     |   1 +
 hw/virtio/trace-events                    |   6 +
 hw/virtio/vhost-backend.c                 |   4 +-
 hw/virtio/vhost-user-fs-pci.c             |  25 ++
 hw/virtio/vhost-user-fs.c                 | 330 ++++++++++++++++++++++
 hw/virtio/vhost-user.c                    |  50 +++-
 hw/virtio/virtio-pci.c                    |  20 ++
 hw/virtio/virtio-pci.h                    |   4 +
 include/hw/virtio/vhost-backend.h         |   2 +-
 include/hw/virtio/vhost-user-fs.h         |  34 +++
 meson.build                               |   6 +
 subprojects/libvhost-user/libvhost-user.c | 106 ++++++-
 subprojects/libvhost-user/libvhost-user.h |  48 +++-
 tests/vhost-user-bridge.c                 |   4 +-
 tools/virtiofsd/buffer.c                  |  22 +-
 tools/virtiofsd/fuse_common.h             |  17 +-
 tools/virtiofsd/fuse_lowlevel.c           |  91 +++++-
 tools/virtiofsd/fuse_lowlevel.h           |  78 ++++-
 tools/virtiofsd/fuse_virtio.c             | 282 ++++++++++++++----
 tools/virtiofsd/passthrough_ll.c          | 103 ++++++-
 26 files changed, 1166 insertions(+), 114 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 00/24] virtiofs dax patches
@ 2021-02-09 19:02 ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This series adds support for acceleration of virtiofs via DAX
mapping, using features added in the 5.11 Linux kernel.

  DAX originally existed in the kernel for mapping real storage
devices directly into memory, so that reads/writes turn into
reads/writes directly mapped into the storage device.

  virtiofs's DAX support is similar; a PCI BAR is exposed on the
virtiofs device corresponding to a DAX 'cache' of a user defined size.
The guest daemon then requests files to be mapped into that cache;
when that happens the virtiofsd sends filedescriptors and commands back
to the QEMU that mmap's those files directly into the memory slot
exposed to kvm.  The guest can then directly read/write to the files
exposed by virtiofs by reading/writing into the BAR.

  A typical invocation would be:
     -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,cache-size=4G 

and then the guest must mount with -o mount.

  Note that the cache doesn't really take VM up on the host, because
everything placed there is just an mmap of a file, so you can afford
to use quite a large cache size.

  Unlike a real DAX device, the cache is a finite size that's
potentially smaller than the underlying filesystem (especially when
mapping granuality is taken into account).  Mapping, unmapping and
remapping must take place to juggle files into the cache if it's too
small.  Some workloads benefit more than others.

Gotchas:
  a) The vhost-user slave channel has some bad reset behaviours;
these are fixed by Vivek's '[RFC PATCH 0/6] vhost-user: Shutdown/Flush
slave channel properly' that are on the list.

  b) If something else on the host truncates an mmap'd file,
kvm gets rather upset;  for this reason it's advised that DAX is
currently only suitable for use on non-shared filesystems.

Thanks a lot to Vivek who has spent a lot of time on the kernel side
and cleaning this series up.

Dave

Dr. David Alan Gilbert (19):
  DAX: vhost-user: Rework slave return values
  DAX: libvhost-user: Route slave message payload
  DAX: libvhost-user: Allow popping a queue element with bad pointers
  DAX subprojects/libvhost-user: Add virtio-fs slave types
  DAX: virtio: Add shared memory capability
  DAX: virtio-fs: Add cache BAR
  DAX: virtio-fs: Add vhost-user slave commands for mapping
  DAX: virtio-fs: Fill in slave commands for mapping
  DAX: virtiofsd Add cache accessor functions
  DAX: virtiofsd: Add setup/remove mappings fuse commands
  DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll
  DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
  DAX: virtiofsd: route se down to destroy method
  DAX: virtiofsd: Perform an unmap on destroy
  DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO
  DAX/unmap virtiofsd: Parse unmappable elements
  DAX/unmap virtiofsd: Route unmappable reads
  DAX/unmap virtiofsd: route unmappable write to slave command

Stefan Hajnoczi (1):
  DAX:virtiofsd: implement FUSE_INIT map_alignment field

Vivek Goyal (4):
  DAX: virtiofsd: Make lo_removemapping() work
  vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info
  vhost-user-fs: Implement drop CAP_FSETID functionality
  virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it

 block/export/vhost-user-blk-server.c      |   2 +-
 contrib/vhost-user-blk/vhost-user-blk.c   |   3 +-
 contrib/vhost-user-gpu/vhost-user-gpu.c   |   5 +-
 contrib/vhost-user-input/main.c           |   4 +-
 contrib/vhost-user-scsi/vhost-user-scsi.c |   2 +-
 docs/interop/vhost-user.rst               |  31 ++
 hw/virtio/meson.build                     |   1 +
 hw/virtio/trace-events                    |   6 +
 hw/virtio/vhost-backend.c                 |   4 +-
 hw/virtio/vhost-user-fs-pci.c             |  25 ++
 hw/virtio/vhost-user-fs.c                 | 330 ++++++++++++++++++++++
 hw/virtio/vhost-user.c                    |  50 +++-
 hw/virtio/virtio-pci.c                    |  20 ++
 hw/virtio/virtio-pci.h                    |   4 +
 include/hw/virtio/vhost-backend.h         |   2 +-
 include/hw/virtio/vhost-user-fs.h         |  34 +++
 meson.build                               |   6 +
 subprojects/libvhost-user/libvhost-user.c | 106 ++++++-
 subprojects/libvhost-user/libvhost-user.h |  48 +++-
 tests/vhost-user-bridge.c                 |   4 +-
 tools/virtiofsd/buffer.c                  |  22 +-
 tools/virtiofsd/fuse_common.h             |  17 +-
 tools/virtiofsd/fuse_lowlevel.c           |  91 +++++-
 tools/virtiofsd/fuse_lowlevel.h           |  78 ++++-
 tools/virtiofsd/fuse_virtio.c             | 282 ++++++++++++++----
 tools/virtiofsd/passthrough_ll.c          | 103 ++++++-
 26 files changed, 1166 insertions(+), 114 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 01/24] DAX: vhost-user: Rework slave return values
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

All the current slave handlers on the qemu side generate an 'int'
return value that's squashed down to a bool (!!ret) and stuffed into
a uint64_t (field of a union) to be returned.

Move the uint64_t type back up through the individual handlers so
that we can mkae one actually return a full uint64_t.

Note that the definition in the interop spec says most of these
cases are defined as returning 0 on success and non-0 for failure,
so it's OK to change from a bool to another non-0.

Vivek:
This is needed because upcoming patches in series will add new functions
which want to return full error code. Existing functions continue to
return true/false so, it should not lead to change of behavior for
existing users.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost-backend.c         |  4 ++--
 hw/virtio/vhost-user.c            | 32 ++++++++++++++++---------------
 include/hw/virtio/vhost-backend.h |  2 +-
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 31b33bde37..21082084ea 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -401,7 +401,7 @@ int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
     return -ENODEV;
 }
 
-int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
+uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
                                           struct vhost_iotlb_msg *imsg)
 {
     int ret = 0;
@@ -429,5 +429,5 @@ int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
         break;
     }
 
-    return ret;
+    return !!ret;
 }
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 2fdd5daf74..13789cc55e 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1317,24 +1317,25 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
     return 0;
 }
 
-static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
+static uint64_t vhost_user_slave_handle_config_change(struct vhost_dev *dev)
 {
     int ret = -1;
 
     if (!dev->config_ops) {
-        return -1;
+        return true;
     }
 
     if (dev->config_ops->vhost_dev_config_notifier) {
         ret = dev->config_ops->vhost_dev_config_notifier(dev);
     }
 
-    return ret;
+    return !!ret;
 }
 
-static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
-                                                       VhostUserVringArea *area,
-                                                       int fd)
+static uint64_t vhost_user_slave_handle_vring_host_notifier(
+                struct vhost_dev *dev,
+               VhostUserVringArea *area,
+               int fd)
 {
     int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
     size_t page_size = qemu_real_host_page_size;
@@ -1348,7 +1349,7 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
     if (!virtio_has_feature(dev->protocol_features,
                             VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
         vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
-        return -1;
+        return true;
     }
 
     n = &user->notifier[queue_idx];
@@ -1361,18 +1362,18 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
     }
 
     if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
-        return 0;
+        return false;
     }
 
     /* Sanity check. */
     if (area->size != page_size) {
-        return -1;
+        return true;
     }
 
     addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
                 fd, area->offset);
     if (addr == MAP_FAILED) {
-        return -1;
+        return true;
     }
 
     name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
@@ -1383,13 +1384,13 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
 
     if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
         munmap(addr, page_size);
-        return -1;
+        return true;
     }
 
     n->addr = addr;
     n->set = true;
 
-    return 0;
+    return false;
 }
 
 static void slave_read(void *opaque)
@@ -1398,7 +1399,8 @@ static void slave_read(void *opaque)
     struct vhost_user *u = dev->opaque;
     VhostUserHeader hdr = { 0, };
     VhostUserPayload payload = { 0, };
-    int size, ret = 0;
+    int size;
+    uint64_t ret = 0;
     struct iovec iov;
     struct msghdr msgh;
     int fd[VHOST_USER_SLAVE_MAX_FDS];
@@ -1472,7 +1474,7 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
-        ret = -EINVAL;
+        ret = (uint64_t)-EINVAL;
     }
 
     /* Close the remaining file descriptors. */
@@ -1493,7 +1495,7 @@ static void slave_read(void *opaque)
         hdr.flags &= ~VHOST_USER_NEED_REPLY_MASK;
         hdr.flags |= VHOST_USER_REPLY_MASK;
 
-        payload.u64 = !!ret;
+        payload.u64 = ret;
         hdr.size = sizeof(payload.u64);
 
         iovec[0].iov_base = &hdr;
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 8a6f8e2a7a..64ac6b6444 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -186,7 +186,7 @@ int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
 int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
                                                  uint64_t iova, uint64_t len);
 
-int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
+uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
                                           struct vhost_iotlb_msg *imsg);
 
 int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 01/24] DAX: vhost-user: Rework slave return values
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

All the current slave handlers on the qemu side generate an 'int'
return value that's squashed down to a bool (!!ret) and stuffed into
a uint64_t (field of a union) to be returned.

Move the uint64_t type back up through the individual handlers so
that we can mkae one actually return a full uint64_t.

Note that the definition in the interop spec says most of these
cases are defined as returning 0 on success and non-0 for failure,
so it's OK to change from a bool to another non-0.

Vivek:
This is needed because upcoming patches in series will add new functions
which want to return full error code. Existing functions continue to
return true/false so, it should not lead to change of behavior for
existing users.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost-backend.c         |  4 ++--
 hw/virtio/vhost-user.c            | 32 ++++++++++++++++---------------
 include/hw/virtio/vhost-backend.h |  2 +-
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 31b33bde37..21082084ea 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -401,7 +401,7 @@ int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
     return -ENODEV;
 }
 
-int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
+uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
                                           struct vhost_iotlb_msg *imsg)
 {
     int ret = 0;
@@ -429,5 +429,5 @@ int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
         break;
     }
 
-    return ret;
+    return !!ret;
 }
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 2fdd5daf74..13789cc55e 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1317,24 +1317,25 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
     return 0;
 }
 
-static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
+static uint64_t vhost_user_slave_handle_config_change(struct vhost_dev *dev)
 {
     int ret = -1;
 
     if (!dev->config_ops) {
-        return -1;
+        return true;
     }
 
     if (dev->config_ops->vhost_dev_config_notifier) {
         ret = dev->config_ops->vhost_dev_config_notifier(dev);
     }
 
-    return ret;
+    return !!ret;
 }
 
-static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
-                                                       VhostUserVringArea *area,
-                                                       int fd)
+static uint64_t vhost_user_slave_handle_vring_host_notifier(
+                struct vhost_dev *dev,
+               VhostUserVringArea *area,
+               int fd)
 {
     int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
     size_t page_size = qemu_real_host_page_size;
@@ -1348,7 +1349,7 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
     if (!virtio_has_feature(dev->protocol_features,
                             VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
         vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
-        return -1;
+        return true;
     }
 
     n = &user->notifier[queue_idx];
@@ -1361,18 +1362,18 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
     }
 
     if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
-        return 0;
+        return false;
     }
 
     /* Sanity check. */
     if (area->size != page_size) {
-        return -1;
+        return true;
     }
 
     addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
                 fd, area->offset);
     if (addr == MAP_FAILED) {
-        return -1;
+        return true;
     }
 
     name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
@@ -1383,13 +1384,13 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
 
     if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
         munmap(addr, page_size);
-        return -1;
+        return true;
     }
 
     n->addr = addr;
     n->set = true;
 
-    return 0;
+    return false;
 }
 
 static void slave_read(void *opaque)
@@ -1398,7 +1399,8 @@ static void slave_read(void *opaque)
     struct vhost_user *u = dev->opaque;
     VhostUserHeader hdr = { 0, };
     VhostUserPayload payload = { 0, };
-    int size, ret = 0;
+    int size;
+    uint64_t ret = 0;
     struct iovec iov;
     struct msghdr msgh;
     int fd[VHOST_USER_SLAVE_MAX_FDS];
@@ -1472,7 +1474,7 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
-        ret = -EINVAL;
+        ret = (uint64_t)-EINVAL;
     }
 
     /* Close the remaining file descriptors. */
@@ -1493,7 +1495,7 @@ static void slave_read(void *opaque)
         hdr.flags &= ~VHOST_USER_NEED_REPLY_MASK;
         hdr.flags |= VHOST_USER_REPLY_MASK;
 
-        payload.u64 = !!ret;
+        payload.u64 = ret;
         hdr.size = sizeof(payload.u64);
 
         iovec[0].iov_base = &hdr;
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 8a6f8e2a7a..64ac6b6444 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -186,7 +186,7 @@ int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
 int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
                                                  uint64_t iova, uint64_t len);
 
-int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
+uint64_t vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
                                           struct vhost_iotlb_msg *imsg);
 
 int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 02/24] DAX: libvhost-user: Route slave message payload
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Route the uint64 payload from message replies on the slave back up
through vu_process_message_reply and to the callers.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 subprojects/libvhost-user/libvhost-user.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index fab7ca17ee..937f64480d 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -403,9 +403,11 @@ vu_send_reply(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
  * Processes a reply on the slave channel.
  * Entered with slave_mutex held and releases it before exit.
  * Returns true on success.
+ * *payload is written on success
  */
 static bool
-vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
+vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg,
+                         uint64_t *payload)
 {
     VhostUserMsg msg_reply;
     bool result = false;
@@ -425,7 +427,8 @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
         goto out;
     }
 
-    result = msg_reply.payload.u64 == 0;
+    *payload = msg_reply.payload.u64;
+    result = true;
 
 out:
     pthread_mutex_unlock(&dev->slave_mutex);
@@ -1312,6 +1315,8 @@ bool vu_set_queue_host_notifier(VuDev *dev, VuVirtq *vq, int fd,
 {
     int qidx = vq - dev->vq;
     int fd_num = 0;
+    bool res;
+    uint64_t payload = 0;
     VhostUserMsg vmsg = {
         .request = VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
         .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
@@ -1342,7 +1347,10 @@ bool vu_set_queue_host_notifier(VuDev *dev, VuVirtq *vq, int fd,
     }
 
     /* Also unlocks the slave_mutex */
-    return vu_process_message_reply(dev, &vmsg);
+    res = vu_process_message_reply(dev, &vmsg, &payload);
+    res = res && (payload == 0);
+
+    return res;
 }
 
 static bool
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 02/24] DAX: libvhost-user: Route slave message payload
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Route the uint64 payload from message replies on the slave back up
through vu_process_message_reply and to the callers.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 subprojects/libvhost-user/libvhost-user.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index fab7ca17ee..937f64480d 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -403,9 +403,11 @@ vu_send_reply(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
  * Processes a reply on the slave channel.
  * Entered with slave_mutex held and releases it before exit.
  * Returns true on success.
+ * *payload is written on success
  */
 static bool
-vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
+vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg,
+                         uint64_t *payload)
 {
     VhostUserMsg msg_reply;
     bool result = false;
@@ -425,7 +427,8 @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
         goto out;
     }
 
-    result = msg_reply.payload.u64 == 0;
+    *payload = msg_reply.payload.u64;
+    result = true;
 
 out:
     pthread_mutex_unlock(&dev->slave_mutex);
@@ -1312,6 +1315,8 @@ bool vu_set_queue_host_notifier(VuDev *dev, VuVirtq *vq, int fd,
 {
     int qidx = vq - dev->vq;
     int fd_num = 0;
+    bool res;
+    uint64_t payload = 0;
     VhostUserMsg vmsg = {
         .request = VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
         .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
@@ -1342,7 +1347,10 @@ bool vu_set_queue_host_notifier(VuDev *dev, VuVirtq *vq, int fd,
     }
 
     /* Also unlocks the slave_mutex */
-    return vu_process_message_reply(dev, &vmsg);
+    res = vu_process_message_reply(dev, &vmsg, &payload);
+    res = res && (payload == 0);
+
+    return res;
 }
 
 static bool
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 03/24] DAX: libvhost-user: Allow popping a queue element with bad pointers
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Allow a daemon implemented with libvhost-user to accept an
element with pointers to memory that aren't in the mapping table.
The daemon might have some special way to deal with some special
cases of this.

The default behaviour doesn't change.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 block/export/vhost-user-blk-server.c      |  2 +-
 contrib/vhost-user-blk/vhost-user-blk.c   |  3 +-
 contrib/vhost-user-gpu/vhost-user-gpu.c   |  5 ++-
 contrib/vhost-user-input/main.c           |  4 +-
 contrib/vhost-user-scsi/vhost-user-scsi.c |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 51 ++++++++++++++++++-----
 subprojects/libvhost-user/libvhost-user.h |  8 +++-
 tests/vhost-user-bridge.c                 |  4 +-
 tools/virtiofsd/fuse_virtio.c             |  3 +-
 9 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index ab2c4d44c4..ea2d302e33 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -205,7 +205,7 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
     while (1) {
         VuBlkReq *req;
 
-        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
+        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq), NULL, NULL);
         if (!req) {
             break;
         }
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c
index d14b2896bf..01193552e9 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -235,7 +235,8 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
     unsigned out_num;
     VubReq *req;
 
-    elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) + sizeof(VubReq));
+    elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) + sizeof(VubReq),
+                        NULL, NULL);
     if (!elem) {
         return -1;
     }
diff --git a/contrib/vhost-user-gpu/vhost-user-gpu.c b/contrib/vhost-user-gpu/vhost-user-gpu.c
index b27990ffdb..58f50ae83f 100644
--- a/contrib/vhost-user-gpu/vhost-user-gpu.c
+++ b/contrib/vhost-user-gpu/vhost-user-gpu.c
@@ -840,7 +840,8 @@ vg_handle_ctrl(VuDev *dev, int qidx)
             return;
         }
 
-        cmd = vu_queue_pop(dev, vq, sizeof(struct virtio_gpu_ctrl_command));
+        cmd = vu_queue_pop(dev, vq, sizeof(struct virtio_gpu_ctrl_command),
+                           NULL, NULL);
         if (!cmd) {
             break;
         }
@@ -943,7 +944,7 @@ vg_handle_cursor(VuDev *dev, int qidx)
     struct virtio_gpu_update_cursor cursor;
 
     for (;;) {
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c
index c15d18c33f..d5c435605c 100644
--- a/contrib/vhost-user-input/main.c
+++ b/contrib/vhost-user-input/main.c
@@ -57,7 +57,7 @@ static void vi_input_send(VuInput *vi, struct virtio_input_event *event)
 
     /* ... then check available space ... */
     for (i = 0; i < vi->qindex; i++) {
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             while (--i >= 0) {
                 vu_queue_unpop(dev, vq, vi->queue[i].elem, 0);
@@ -141,7 +141,7 @@ static void vi_handle_sts(VuDev *dev, int qidx)
     g_debug("%s", G_STRFUNC);
 
     for (;;) {
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c b/contrib/vhost-user-scsi/vhost-user-scsi.c
index 4f6e3e2a24..7564d6ab2d 100644
--- a/contrib/vhost-user-scsi/vhost-user-scsi.c
+++ b/contrib/vhost-user-scsi/vhost-user-scsi.c
@@ -252,7 +252,7 @@ static void vus_proc_req(VuDev *vu_dev, int idx)
         VirtIOSCSICmdReq *req;
         VirtIOSCSICmdResp *rsp;
 
-        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             g_debug("No more elements pending on vq[%d]@%p", idx, vq);
             break;
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 937f64480d..68eb165755 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2469,7 +2469,8 @@ vu_queue_set_notification(VuDev *dev, VuVirtq *vq, int enable)
 
 static bool
 virtqueue_map_desc(VuDev *dev,
-                   unsigned int *p_num_sg, struct iovec *iov,
+                   unsigned int *p_num_sg, unsigned int *p_bad_sg,
+                   struct iovec *iov,
                    unsigned int max_num_sg, bool is_write,
                    uint64_t pa, size_t sz)
 {
@@ -2490,10 +2491,35 @@ virtqueue_map_desc(VuDev *dev,
             return false;
         }
 
-        iov[num_sg].iov_base = vu_gpa_to_va(dev, &len, pa);
-        if (iov[num_sg].iov_base == NULL) {
-            vu_panic(dev, "virtio: invalid address for buffers");
-            return false;
+        if (p_bad_sg && *p_bad_sg) {
+            /* A previous mapping was bad, we won't try and map this either */
+            *p_bad_sg = *p_bad_sg + 1;
+        }
+        if (!p_bad_sg || !*p_bad_sg) {
+            /* No bad mappings so far, lets try mapping this one */
+            iov[num_sg].iov_base = vu_gpa_to_va(dev, &len, pa);
+            if (iov[num_sg].iov_base == NULL) {
+                /*
+                 * OK, it won't map, either panic or if the caller can handle
+                 * it, then count it.
+                 */
+                if (!p_bad_sg) {
+                    vu_panic(dev, "virtio: invalid address for buffers");
+                    return false;
+                } else {
+                    *p_bad_sg = *p_bad_sg + 1;
+                }
+            }
+        }
+        if (p_bad_sg && *p_bad_sg) {
+            /*
+             * There was a bad mapping, either now or previously, since
+             * the caller set p_bad_sg it means it's prepared to deal with
+             * it, so give it the pa in the iov
+             * Note: In this case len will be the whole sz, so we won't
+             * go around again for this descriptor
+             */
+            iov[num_sg].iov_base = (void *)(uintptr_t)pa;
         }
         iov[num_sg].iov_len = len;
         num_sg++;
@@ -2524,7 +2550,8 @@ virtqueue_alloc_element(size_t sz,
 }
 
 static void *
-vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
+vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz,
+                  unsigned int *p_bad_in, unsigned int *p_bad_out)
 {
     struct vring_desc *desc = vq->vring.desc;
     uint64_t desc_addr, read_len;
@@ -2568,7 +2595,7 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
     /* Collect all the descriptors */
     do {
         if (le16toh(desc[i].flags) & VRING_DESC_F_WRITE) {
-            if (!virtqueue_map_desc(dev, &in_num, iov + out_num,
+            if (!virtqueue_map_desc(dev, &in_num, p_bad_in, iov + out_num,
                                VIRTQUEUE_MAX_SIZE - out_num, true,
                                le64toh(desc[i].addr),
                                le32toh(desc[i].len))) {
@@ -2579,7 +2606,7 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
                 vu_panic(dev, "Incorrect order for descriptors");
                 return NULL;
             }
-            if (!virtqueue_map_desc(dev, &out_num, iov,
+            if (!virtqueue_map_desc(dev, &out_num, p_bad_out, iov,
                                VIRTQUEUE_MAX_SIZE, false,
                                le64toh(desc[i].addr),
                                le32toh(desc[i].len))) {
@@ -2669,7 +2696,8 @@ vu_queue_inflight_post_put(VuDev *dev, VuVirtq *vq, int desc_idx)
 }
 
 void *
-vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz,
+             unsigned int *p_bad_in, unsigned int *p_bad_out)
 {
     int i;
     unsigned int head;
@@ -2682,7 +2710,8 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 
     if (unlikely(vq->resubmit_list && vq->resubmit_num > 0)) {
         i = (--vq->resubmit_num);
-        elem = vu_queue_map_desc(dev, vq, vq->resubmit_list[i].index, sz);
+        elem = vu_queue_map_desc(dev, vq, vq->resubmit_list[i].index, sz,
+                                 p_bad_in, p_bad_out);
 
         if (!vq->resubmit_num) {
             free(vq->resubmit_list);
@@ -2714,7 +2743,7 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         vring_set_avail_event(vq, vq->last_avail_idx);
     }
 
-    elem = vu_queue_map_desc(dev, vq, head, sz);
+    elem = vu_queue_map_desc(dev, vq, head, sz, p_bad_in, p_bad_out);
 
     if (!elem) {
         return NULL;
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 3d13dfadde..330b61c005 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -589,11 +589,17 @@ void vu_queue_notify_sync(VuDev *dev, VuVirtq *vq);
  * @dev: a VuDev context
  * @vq: a VuVirtq queue
  * @sz: the size of struct to return (must be >= VuVirtqElement)
+ * @p_bad_in: If none NULL, a pointer to an integer count of
+ *            unmappable regions in input descriptors
+ * @p_bad_out: If none NULL, a pointer to an integer count of
+ *            unmappable regions in output descriptors
+ *
  *
  * Returns: a VuVirtqElement filled from the queue or NULL. The
  * returned element must be free()-d by the caller.
  */
-void *vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz);
+void *vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz,
+                   unsigned int *p_bad_in, unsigned int *p_bad_out);
 
 
 /**
diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index 24815920b2..4f6829e6c3 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -184,7 +184,7 @@ vubr_handle_tx(VuDev *dev, int qidx)
         unsigned int out_num;
         struct iovec sg[VIRTQUEUE_MAX_SIZE], *out_sg;
 
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
@@ -299,7 +299,7 @@ vubr_backend_recv_cb(int sock, void *ctx)
         ssize_t ret, total = 0;
         unsigned int num;
 
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index ddcefee427..bd19358437 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -657,7 +657,8 @@ static void *fv_queue_thread(void *opaque)
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
         while (1) {
-            FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest));
+            FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest),
+                                          NULL, NULL);
             if (!req) {
                 break;
             }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 03/24] DAX: libvhost-user: Allow popping a queue element with bad pointers
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Allow a daemon implemented with libvhost-user to accept an
element with pointers to memory that aren't in the mapping table.
The daemon might have some special way to deal with some special
cases of this.

The default behaviour doesn't change.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 block/export/vhost-user-blk-server.c      |  2 +-
 contrib/vhost-user-blk/vhost-user-blk.c   |  3 +-
 contrib/vhost-user-gpu/vhost-user-gpu.c   |  5 ++-
 contrib/vhost-user-input/main.c           |  4 +-
 contrib/vhost-user-scsi/vhost-user-scsi.c |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 51 ++++++++++++++++++-----
 subprojects/libvhost-user/libvhost-user.h |  8 +++-
 tests/vhost-user-bridge.c                 |  4 +-
 tools/virtiofsd/fuse_virtio.c             |  3 +-
 9 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index ab2c4d44c4..ea2d302e33 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -205,7 +205,7 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
     while (1) {
         VuBlkReq *req;
 
-        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
+        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq), NULL, NULL);
         if (!req) {
             break;
         }
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c
index d14b2896bf..01193552e9 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -235,7 +235,8 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
     unsigned out_num;
     VubReq *req;
 
-    elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) + sizeof(VubReq));
+    elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) + sizeof(VubReq),
+                        NULL, NULL);
     if (!elem) {
         return -1;
     }
diff --git a/contrib/vhost-user-gpu/vhost-user-gpu.c b/contrib/vhost-user-gpu/vhost-user-gpu.c
index b27990ffdb..58f50ae83f 100644
--- a/contrib/vhost-user-gpu/vhost-user-gpu.c
+++ b/contrib/vhost-user-gpu/vhost-user-gpu.c
@@ -840,7 +840,8 @@ vg_handle_ctrl(VuDev *dev, int qidx)
             return;
         }
 
-        cmd = vu_queue_pop(dev, vq, sizeof(struct virtio_gpu_ctrl_command));
+        cmd = vu_queue_pop(dev, vq, sizeof(struct virtio_gpu_ctrl_command),
+                           NULL, NULL);
         if (!cmd) {
             break;
         }
@@ -943,7 +944,7 @@ vg_handle_cursor(VuDev *dev, int qidx)
     struct virtio_gpu_update_cursor cursor;
 
     for (;;) {
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c
index c15d18c33f..d5c435605c 100644
--- a/contrib/vhost-user-input/main.c
+++ b/contrib/vhost-user-input/main.c
@@ -57,7 +57,7 @@ static void vi_input_send(VuInput *vi, struct virtio_input_event *event)
 
     /* ... then check available space ... */
     for (i = 0; i < vi->qindex; i++) {
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             while (--i >= 0) {
                 vu_queue_unpop(dev, vq, vi->queue[i].elem, 0);
@@ -141,7 +141,7 @@ static void vi_handle_sts(VuDev *dev, int qidx)
     g_debug("%s", G_STRFUNC);
 
     for (;;) {
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c b/contrib/vhost-user-scsi/vhost-user-scsi.c
index 4f6e3e2a24..7564d6ab2d 100644
--- a/contrib/vhost-user-scsi/vhost-user-scsi.c
+++ b/contrib/vhost-user-scsi/vhost-user-scsi.c
@@ -252,7 +252,7 @@ static void vus_proc_req(VuDev *vu_dev, int idx)
         VirtIOSCSICmdReq *req;
         VirtIOSCSICmdResp *rsp;
 
-        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             g_debug("No more elements pending on vq[%d]@%p", idx, vq);
             break;
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 937f64480d..68eb165755 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2469,7 +2469,8 @@ vu_queue_set_notification(VuDev *dev, VuVirtq *vq, int enable)
 
 static bool
 virtqueue_map_desc(VuDev *dev,
-                   unsigned int *p_num_sg, struct iovec *iov,
+                   unsigned int *p_num_sg, unsigned int *p_bad_sg,
+                   struct iovec *iov,
                    unsigned int max_num_sg, bool is_write,
                    uint64_t pa, size_t sz)
 {
@@ -2490,10 +2491,35 @@ virtqueue_map_desc(VuDev *dev,
             return false;
         }
 
-        iov[num_sg].iov_base = vu_gpa_to_va(dev, &len, pa);
-        if (iov[num_sg].iov_base == NULL) {
-            vu_panic(dev, "virtio: invalid address for buffers");
-            return false;
+        if (p_bad_sg && *p_bad_sg) {
+            /* A previous mapping was bad, we won't try and map this either */
+            *p_bad_sg = *p_bad_sg + 1;
+        }
+        if (!p_bad_sg || !*p_bad_sg) {
+            /* No bad mappings so far, lets try mapping this one */
+            iov[num_sg].iov_base = vu_gpa_to_va(dev, &len, pa);
+            if (iov[num_sg].iov_base == NULL) {
+                /*
+                 * OK, it won't map, either panic or if the caller can handle
+                 * it, then count it.
+                 */
+                if (!p_bad_sg) {
+                    vu_panic(dev, "virtio: invalid address for buffers");
+                    return false;
+                } else {
+                    *p_bad_sg = *p_bad_sg + 1;
+                }
+            }
+        }
+        if (p_bad_sg && *p_bad_sg) {
+            /*
+             * There was a bad mapping, either now or previously, since
+             * the caller set p_bad_sg it means it's prepared to deal with
+             * it, so give it the pa in the iov
+             * Note: In this case len will be the whole sz, so we won't
+             * go around again for this descriptor
+             */
+            iov[num_sg].iov_base = (void *)(uintptr_t)pa;
         }
         iov[num_sg].iov_len = len;
         num_sg++;
@@ -2524,7 +2550,8 @@ virtqueue_alloc_element(size_t sz,
 }
 
 static void *
-vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
+vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz,
+                  unsigned int *p_bad_in, unsigned int *p_bad_out)
 {
     struct vring_desc *desc = vq->vring.desc;
     uint64_t desc_addr, read_len;
@@ -2568,7 +2595,7 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
     /* Collect all the descriptors */
     do {
         if (le16toh(desc[i].flags) & VRING_DESC_F_WRITE) {
-            if (!virtqueue_map_desc(dev, &in_num, iov + out_num,
+            if (!virtqueue_map_desc(dev, &in_num, p_bad_in, iov + out_num,
                                VIRTQUEUE_MAX_SIZE - out_num, true,
                                le64toh(desc[i].addr),
                                le32toh(desc[i].len))) {
@@ -2579,7 +2606,7 @@ vu_queue_map_desc(VuDev *dev, VuVirtq *vq, unsigned int idx, size_t sz)
                 vu_panic(dev, "Incorrect order for descriptors");
                 return NULL;
             }
-            if (!virtqueue_map_desc(dev, &out_num, iov,
+            if (!virtqueue_map_desc(dev, &out_num, p_bad_out, iov,
                                VIRTQUEUE_MAX_SIZE, false,
                                le64toh(desc[i].addr),
                                le32toh(desc[i].len))) {
@@ -2669,7 +2696,8 @@ vu_queue_inflight_post_put(VuDev *dev, VuVirtq *vq, int desc_idx)
 }
 
 void *
-vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
+vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz,
+             unsigned int *p_bad_in, unsigned int *p_bad_out)
 {
     int i;
     unsigned int head;
@@ -2682,7 +2710,8 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
 
     if (unlikely(vq->resubmit_list && vq->resubmit_num > 0)) {
         i = (--vq->resubmit_num);
-        elem = vu_queue_map_desc(dev, vq, vq->resubmit_list[i].index, sz);
+        elem = vu_queue_map_desc(dev, vq, vq->resubmit_list[i].index, sz,
+                                 p_bad_in, p_bad_out);
 
         if (!vq->resubmit_num) {
             free(vq->resubmit_list);
@@ -2714,7 +2743,7 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
         vring_set_avail_event(vq, vq->last_avail_idx);
     }
 
-    elem = vu_queue_map_desc(dev, vq, head, sz);
+    elem = vu_queue_map_desc(dev, vq, head, sz, p_bad_in, p_bad_out);
 
     if (!elem) {
         return NULL;
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 3d13dfadde..330b61c005 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -589,11 +589,17 @@ void vu_queue_notify_sync(VuDev *dev, VuVirtq *vq);
  * @dev: a VuDev context
  * @vq: a VuVirtq queue
  * @sz: the size of struct to return (must be >= VuVirtqElement)
+ * @p_bad_in: If none NULL, a pointer to an integer count of
+ *            unmappable regions in input descriptors
+ * @p_bad_out: If none NULL, a pointer to an integer count of
+ *            unmappable regions in output descriptors
+ *
  *
  * Returns: a VuVirtqElement filled from the queue or NULL. The
  * returned element must be free()-d by the caller.
  */
-void *vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz);
+void *vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz,
+                   unsigned int *p_bad_in, unsigned int *p_bad_out);
 
 
 /**
diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index 24815920b2..4f6829e6c3 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -184,7 +184,7 @@ vubr_handle_tx(VuDev *dev, int qidx)
         unsigned int out_num;
         struct iovec sg[VIRTQUEUE_MAX_SIZE], *out_sg;
 
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
@@ -299,7 +299,7 @@ vubr_backend_recv_cb(int sock, void *ctx)
         ssize_t ret, total = 0;
         unsigned int num;
 
-        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+        elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement), NULL, NULL);
         if (!elem) {
             break;
         }
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index ddcefee427..bd19358437 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -657,7 +657,8 @@ static void *fv_queue_thread(void *opaque)
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
         while (1) {
-            FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest));
+            FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest),
+                                          NULL, NULL);
             if (!req) {
                 break;
             }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 04/24] DAX subprojects/libvhost-user: Add virtio-fs slave types
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add virtio-fs definitions to libvhost-user

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 subprojects/libvhost-user/libvhost-user.c | 41 +++++++++++++++++++++++
 subprojects/libvhost-user/libvhost-user.h | 31 +++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 68eb165755..b35abdd9f9 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2918,3 +2918,44 @@ vu_queue_push(VuDev *dev, VuVirtq *vq,
     vu_queue_flush(dev, vq, 1);
     vu_queue_inflight_post_put(dev, vq, elem->index);
 }
+
+int64_t vu_fs_cache_request(VuDev *dev, VhostUserSlaveRequest req, int fd,
+                            VhostUserFSSlaveMsg *fsm)
+{
+    int fd_num = 0;
+    bool res;
+    uint64_t payload = 0;
+    VhostUserMsg vmsg = {
+        .request = req,
+        .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
+        .size = sizeof(vmsg.payload.fs),
+        .payload.fs = *fsm,
+    };
+
+    if (fd != -1) {
+        vmsg.fds[fd_num++] = fd;
+    }
+
+    vmsg.fd_num = fd_num;
+
+    if (!vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD)) {
+        return -EINVAL;
+    }
+
+    pthread_mutex_lock(&dev->slave_mutex);
+    if (!vu_message_write(dev, dev->slave_fd, &vmsg)) {
+        pthread_mutex_unlock(&dev->slave_mutex);
+        return -EIO;
+    }
+
+    /* Also unlocks the slave_mutex */
+    res = vu_process_message_reply(dev, &vmsg, &payload);
+    if (!res) {
+        return -EIO;
+    }
+    /*
+     * Payload is delivered as uint64_t but is actually signed for
+     * errors.
+     */
+    return (int64_t)payload;
+}
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 330b61c005..e12e9c1532 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -122,6 +122,24 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
+/* Structures carried over the slave channel back to QEMU */
+#define VHOST_USER_FS_SLAVE_ENTRIES 8
+
+/* For the flags field of VhostUserFSSlaveMsg */
+#define VHOST_USER_FS_FLAG_MAP_R (1ull << 0)
+#define VHOST_USER_FS_FLAG_MAP_W (1ull << 1)
+
+typedef struct {
+    /* Offsets within the file being mapped */
+    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Offsets within the cache */
+    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Lengths of sections */
+    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Flags, from VHOST_USER_FS_FLAG_* */
+    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
+} VhostUserFSSlaveMsg;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -197,6 +215,7 @@ typedef struct VhostUserMsg {
         VhostUserConfig config;
         VhostUserVringArea area;
         VhostUserInflight inflight;
+        VhostUserFSSlaveMsg fs;
     } payload;
 
     int fds[VHOST_MEMORY_BASELINE_NREGIONS];
@@ -693,4 +712,16 @@ void vu_queue_get_avail_bytes(VuDev *vdev, VuVirtq *vq, unsigned int *in_bytes,
 bool vu_queue_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int in_bytes,
                           unsigned int out_bytes);
 
+/**
+ * vu_fs_cache_request: Send a slave message for an fs client
+ * @dev: a VuDev context
+ * @req: The request type (map, unmap, sync)
+ * @fd: an fd (only required for map, else must be -1)
+ * @fsm: The body of the message
+ *
+ * Returns: 0 or above for success, nevative errno on error
+ */
+int64_t vu_fs_cache_request(VuDev *dev, VhostUserSlaveRequest req, int fd,
+                            VhostUserFSSlaveMsg *fsm);
+
 #endif /* LIBVHOST_USER_H */
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 04/24] DAX subprojects/libvhost-user: Add virtio-fs slave types
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add virtio-fs definitions to libvhost-user

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 subprojects/libvhost-user/libvhost-user.c | 41 +++++++++++++++++++++++
 subprojects/libvhost-user/libvhost-user.h | 31 +++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 68eb165755..b35abdd9f9 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2918,3 +2918,44 @@ vu_queue_push(VuDev *dev, VuVirtq *vq,
     vu_queue_flush(dev, vq, 1);
     vu_queue_inflight_post_put(dev, vq, elem->index);
 }
+
+int64_t vu_fs_cache_request(VuDev *dev, VhostUserSlaveRequest req, int fd,
+                            VhostUserFSSlaveMsg *fsm)
+{
+    int fd_num = 0;
+    bool res;
+    uint64_t payload = 0;
+    VhostUserMsg vmsg = {
+        .request = req,
+        .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
+        .size = sizeof(vmsg.payload.fs),
+        .payload.fs = *fsm,
+    };
+
+    if (fd != -1) {
+        vmsg.fds[fd_num++] = fd;
+    }
+
+    vmsg.fd_num = fd_num;
+
+    if (!vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD)) {
+        return -EINVAL;
+    }
+
+    pthread_mutex_lock(&dev->slave_mutex);
+    if (!vu_message_write(dev, dev->slave_fd, &vmsg)) {
+        pthread_mutex_unlock(&dev->slave_mutex);
+        return -EIO;
+    }
+
+    /* Also unlocks the slave_mutex */
+    res = vu_process_message_reply(dev, &vmsg, &payload);
+    if (!res) {
+        return -EIO;
+    }
+    /*
+     * Payload is delivered as uint64_t but is actually signed for
+     * errors.
+     */
+    return (int64_t)payload;
+}
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 330b61c005..e12e9c1532 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -122,6 +122,24 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
+/* Structures carried over the slave channel back to QEMU */
+#define VHOST_USER_FS_SLAVE_ENTRIES 8
+
+/* For the flags field of VhostUserFSSlaveMsg */
+#define VHOST_USER_FS_FLAG_MAP_R (1ull << 0)
+#define VHOST_USER_FS_FLAG_MAP_W (1ull << 1)
+
+typedef struct {
+    /* Offsets within the file being mapped */
+    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Offsets within the cache */
+    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Lengths of sections */
+    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Flags, from VHOST_USER_FS_FLAG_* */
+    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
+} VhostUserFSSlaveMsg;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -197,6 +215,7 @@ typedef struct VhostUserMsg {
         VhostUserConfig config;
         VhostUserVringArea area;
         VhostUserInflight inflight;
+        VhostUserFSSlaveMsg fs;
     } payload;
 
     int fds[VHOST_MEMORY_BASELINE_NREGIONS];
@@ -693,4 +712,16 @@ void vu_queue_get_avail_bytes(VuDev *vdev, VuVirtq *vq, unsigned int *in_bytes,
 bool vu_queue_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int in_bytes,
                           unsigned int out_bytes);
 
+/**
+ * vu_fs_cache_request: Send a slave message for an fs client
+ * @dev: a VuDev context
+ * @req: The request type (map, unmap, sync)
+ * @fd: an fd (only required for map, else must be -1)
+ * @fsm: The body of the message
+ *
+ * Returns: 0 or above for success, nevative errno on error
+ */
+int64_t vu_fs_cache_request(VuDev *dev, VhostUserSlaveRequest req, int fd,
+                            VhostUserFSSlaveMsg *fsm);
+
 #endif /* LIBVHOST_USER_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 05/24] DAX: virtio: Add shared memory capability
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'
and the data structure 'virtio_pci_cap64' to go with it.
They allow defining shared memory regions with sizes and offsets
of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by the 'id' field in the base capability.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/virtio-pci.c | 20 ++++++++++++++++++++
 hw/virtio/virtio-pci.h |  4 ++++
 2 files changed, 24 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 094c36aa3e..de378c594e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1138,6 +1138,26 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
     return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+                           uint8_t bar, uint64_t offset, uint64_t length,
+                           uint8_t id)
+{
+    struct virtio_pci_cap64 cap = {
+        .cap.cap_len = sizeof cap,
+        .cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+    };
+    uint32_t mask32 = ~0;
+
+    cap.cap.bar = bar;
+    cap.cap.id = id;
+    cap.cap.length = cpu_to_le32(length & mask32);
+    cap.length_hi = cpu_to_le32((length >> 32) & mask32);
+    cap.cap.offset = cpu_to_le32(offset & mask32);
+    cap.offset_hi = cpu_to_le32((offset >> 32) & mask32);
+
+    return virtio_pci_add_mem_cap(proxy, &cap.cap);
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
                                        unsigned size)
 {
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index d7d5d403a9..31ca339099 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -247,4 +247,8 @@ void virtio_pci_types_register(const VirtioPCIDeviceTypeInfo *t);
  */
 unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues);
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+                           uint8_t bar, uint64_t offset, uint64_t length,
+                           uint8_t id);
+
 #endif
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 05/24] DAX: virtio: Add shared memory capability
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'
and the data structure 'virtio_pci_cap64' to go with it.
They allow defining shared memory regions with sizes and offsets
of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by the 'id' field in the base capability.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/virtio-pci.c | 20 ++++++++++++++++++++
 hw/virtio/virtio-pci.h |  4 ++++
 2 files changed, 24 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 094c36aa3e..de378c594e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1138,6 +1138,26 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
     return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+                           uint8_t bar, uint64_t offset, uint64_t length,
+                           uint8_t id)
+{
+    struct virtio_pci_cap64 cap = {
+        .cap.cap_len = sizeof cap,
+        .cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+    };
+    uint32_t mask32 = ~0;
+
+    cap.cap.bar = bar;
+    cap.cap.id = id;
+    cap.cap.length = cpu_to_le32(length & mask32);
+    cap.length_hi = cpu_to_le32((length >> 32) & mask32);
+    cap.cap.offset = cpu_to_le32(offset & mask32);
+    cap.offset_hi = cpu_to_le32((offset >> 32) & mask32);
+
+    return virtio_pci_add_mem_cap(proxy, &cap.cap);
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
                                        unsigned size)
 {
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index d7d5d403a9..31ca339099 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -247,4 +247,8 @@ void virtio_pci_types_register(const VirtioPCIDeviceTypeInfo *t);
  */
 unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues);
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+                           uint8_t bar, uint64_t offset, uint64_t length,
+                           uint8_t id);
+
 #endif
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 06/24] DAX: virtio-fs: Add cache BAR
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a cache BAR into which files will be directly mapped.
The size can be set with the cache-size= property, e.g.
   -device vhost-user-fs-pci,chardev=char0,tag=myfs,cache-size=16G

The default is no cache.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with PPC fixes by:
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 hw/virtio/vhost-user-fs-pci.c     | 25 ++++++++++++++++++++++++
 hw/virtio/vhost-user-fs.c         | 32 +++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user-fs.h |  2 ++
 3 files changed, 59 insertions(+)

diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
index 2ed8492b3f..0388e063c6 100644
--- a/hw/virtio/vhost-user-fs-pci.c
+++ b/hw/virtio/vhost-user-fs-pci.c
@@ -16,10 +16,14 @@
 #include "hw/virtio/vhost-user-fs.h"
 #include "virtio-pci.h"
 #include "qom/object.h"
+#include "standard-headers/linux/virtio_fs.h"
+
+#define VIRTIO_FS_PCI_CACHE_BAR 2
 
 struct VHostUserFSPCI {
     VirtIOPCIProxy parent_obj;
     VHostUserFS vdev;
+    MemoryRegion cachebar;
 };
 
 typedef struct VHostUserFSPCI VHostUserFSPCI;
@@ -39,6 +43,7 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VHostUserFSPCI *dev = VHOST_USER_FS_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
+    uint64_t cachesize;
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
         /* Also reserve config change and hiprio queue vectors */
@@ -46,6 +51,26 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     }
 
     qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
+    cachesize = dev->vdev.conf.cache_size;
+
+    /*
+     * The bar starts with the data/DAX cache
+     * Others will be added later.
+     */
+    memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
+                       "vhost-fs-pci-cachebar", cachesize);
+    if (cachesize) {
+        memory_region_add_subregion(&dev->cachebar, 0, &dev->vdev.cache);
+        virtio_pci_add_shm_cap(vpci_dev, VIRTIO_FS_PCI_CACHE_BAR, 0, cachesize,
+                               VIRTIO_FS_SHMCAP_ID_CACHE);
+    }
+
+    /* After 'realized' so the memory region exists */
+    pci_register_bar(&vpci_dev->pci_dev, VIRTIO_FS_PCI_CACHE_BAR,
+                     PCI_BASE_ADDRESS_SPACE_MEMORY |
+                     PCI_BASE_ADDRESS_MEM_PREFETCH |
+                     PCI_BASE_ADDRESS_MEM_TYPE_64,
+                     &dev->cachebar);
 }
 
 static void vhost_user_fs_pci_class_init(ObjectClass *klass, void *data)
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index ac4fc34b36..b077d8e705 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,16 @@
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
 
+/*
+ * The powerpc kernel code expects the memory to be accessible during
+ * addition/removal.
+ */
+#if defined(TARGET_PPC64) && defined(CONFIG_LINUX)
+#define DAX_WINDOW_PROT PROT_READ
+#else
+#define DAX_WINDOW_PROT PROT_NONE
+#endif
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
@@ -163,6 +173,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserFS *fs = VHOST_USER_FS(dev);
+    void *cache_ptr;
     unsigned int i;
     size_t len;
     int ret;
@@ -202,6 +213,26 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
                    VIRTQUEUE_MAX_SIZE);
         return;
     }
+    if (fs->conf.cache_size &&
+        (!is_power_of_2(fs->conf.cache_size) ||
+          fs->conf.cache_size < qemu_real_host_page_size)) {
+        error_setg(errp, "cache-size property must be a power of 2 "
+                         "no smaller than the page size");
+        return;
+    }
+    if (fs->conf.cache_size) {
+        /* Anonymous, private memory is not counted as overcommit */
+        cache_ptr = mmap(NULL, fs->conf.cache_size, DAX_WINDOW_PROT,
+                         MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+        if (cache_ptr == MAP_FAILED) {
+            error_setg(errp, "Unable to mmap blank cache");
+            return;
+        }
+
+        memory_region_init_ram_ptr(&fs->cache, OBJECT(vdev),
+                                   "virtio-fs-cache",
+                                   fs->conf.cache_size, cache_ptr);
+    }
 
     if (!vhost_user_init(&fs->vhost_user, &fs->conf.chardev, errp)) {
         return;
@@ -277,6 +308,7 @@ static Property vuf_properties[] = {
     DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
                        conf.num_request_queues, 1),
     DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+    DEFINE_PROP_SIZE("cache-size", VHostUserFS, conf.cache_size, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 0d62834c25..04596799e3 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -28,6 +28,7 @@ typedef struct {
     char *tag;
     uint16_t num_request_queues;
     uint16_t queue_size;
+    uint64_t cache_size;
 } VHostUserFSConf;
 
 struct VHostUserFS {
@@ -42,6 +43,7 @@ struct VHostUserFS {
     int32_t bootindex;
 
     /*< public >*/
+    MemoryRegion cache;
 };
 
 #endif /* _QEMU_VHOST_USER_FS_H */
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 06/24] DAX: virtio-fs: Add cache BAR
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a cache BAR into which files will be directly mapped.
The size can be set with the cache-size= property, e.g.
   -device vhost-user-fs-pci,chardev=char0,tag=myfs,cache-size=16G

The default is no cache.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with PPC fixes by:
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 hw/virtio/vhost-user-fs-pci.c     | 25 ++++++++++++++++++++++++
 hw/virtio/vhost-user-fs.c         | 32 +++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user-fs.h |  2 ++
 3 files changed, 59 insertions(+)

diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
index 2ed8492b3f..0388e063c6 100644
--- a/hw/virtio/vhost-user-fs-pci.c
+++ b/hw/virtio/vhost-user-fs-pci.c
@@ -16,10 +16,14 @@
 #include "hw/virtio/vhost-user-fs.h"
 #include "virtio-pci.h"
 #include "qom/object.h"
+#include "standard-headers/linux/virtio_fs.h"
+
+#define VIRTIO_FS_PCI_CACHE_BAR 2
 
 struct VHostUserFSPCI {
     VirtIOPCIProxy parent_obj;
     VHostUserFS vdev;
+    MemoryRegion cachebar;
 };
 
 typedef struct VHostUserFSPCI VHostUserFSPCI;
@@ -39,6 +43,7 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VHostUserFSPCI *dev = VHOST_USER_FS_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
+    uint64_t cachesize;
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
         /* Also reserve config change and hiprio queue vectors */
@@ -46,6 +51,26 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     }
 
     qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
+    cachesize = dev->vdev.conf.cache_size;
+
+    /*
+     * The bar starts with the data/DAX cache
+     * Others will be added later.
+     */
+    memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
+                       "vhost-fs-pci-cachebar", cachesize);
+    if (cachesize) {
+        memory_region_add_subregion(&dev->cachebar, 0, &dev->vdev.cache);
+        virtio_pci_add_shm_cap(vpci_dev, VIRTIO_FS_PCI_CACHE_BAR, 0, cachesize,
+                               VIRTIO_FS_SHMCAP_ID_CACHE);
+    }
+
+    /* After 'realized' so the memory region exists */
+    pci_register_bar(&vpci_dev->pci_dev, VIRTIO_FS_PCI_CACHE_BAR,
+                     PCI_BASE_ADDRESS_SPACE_MEMORY |
+                     PCI_BASE_ADDRESS_MEM_PREFETCH |
+                     PCI_BASE_ADDRESS_MEM_TYPE_64,
+                     &dev->cachebar);
 }
 
 static void vhost_user_fs_pci_class_init(ObjectClass *klass, void *data)
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index ac4fc34b36..b077d8e705 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,16 @@
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
 
+/*
+ * The powerpc kernel code expects the memory to be accessible during
+ * addition/removal.
+ */
+#if defined(TARGET_PPC64) && defined(CONFIG_LINUX)
+#define DAX_WINDOW_PROT PROT_READ
+#else
+#define DAX_WINDOW_PROT PROT_NONE
+#endif
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
@@ -163,6 +173,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserFS *fs = VHOST_USER_FS(dev);
+    void *cache_ptr;
     unsigned int i;
     size_t len;
     int ret;
@@ -202,6 +213,26 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
                    VIRTQUEUE_MAX_SIZE);
         return;
     }
+    if (fs->conf.cache_size &&
+        (!is_power_of_2(fs->conf.cache_size) ||
+          fs->conf.cache_size < qemu_real_host_page_size)) {
+        error_setg(errp, "cache-size property must be a power of 2 "
+                         "no smaller than the page size");
+        return;
+    }
+    if (fs->conf.cache_size) {
+        /* Anonymous, private memory is not counted as overcommit */
+        cache_ptr = mmap(NULL, fs->conf.cache_size, DAX_WINDOW_PROT,
+                         MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+        if (cache_ptr == MAP_FAILED) {
+            error_setg(errp, "Unable to mmap blank cache");
+            return;
+        }
+
+        memory_region_init_ram_ptr(&fs->cache, OBJECT(vdev),
+                                   "virtio-fs-cache",
+                                   fs->conf.cache_size, cache_ptr);
+    }
 
     if (!vhost_user_init(&fs->vhost_user, &fs->conf.chardev, errp)) {
         return;
@@ -277,6 +308,7 @@ static Property vuf_properties[] = {
     DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
                        conf.num_request_queues, 1),
     DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+    DEFINE_PROP_SIZE("cache-size", VHostUserFS, conf.cache_size, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 0d62834c25..04596799e3 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -28,6 +28,7 @@ typedef struct {
     char *tag;
     uint16_t num_request_queues;
     uint16_t queue_size;
+    uint64_t cache_size;
 } VHostUserFSConf;
 
 struct VHostUserFS {
@@ -42,6 +43,7 @@ struct VHostUserFS {
     int32_t bootindex;
 
     /*< public >*/
+    MemoryRegion cache;
 };
 
 #endif /* _QEMU_VHOST_USER_FS_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The daemon may request that fd's be mapped into the virtio-fs cache
visible to the guest.
These mappings are triggered by commands sent over the slave fd
from the daemon.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/interop/vhost-user.rst               | 20 +++++++++++++++++++
 hw/virtio/vhost-user-fs.c                 | 14 +++++++++++++
 hw/virtio/vhost-user.c                    | 14 +++++++++++++
 include/hw/virtio/vhost-user-fs.h         | 24 +++++++++++++++++++++++
 subprojects/libvhost-user/libvhost-user.h |  2 ++
 5 files changed, 74 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index d6085f7045..1deedd3407 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1432,6 +1432,26 @@ Slave message types
 
   The state.num field is currently reserved and must be set to 0.
 
+``VHOST_USER_SLAVE_FS_MAP``
+  :id: 6
+  :equivalent ioctl: N/A
+  :slave payload: fd + n * (offset + address + len)
+  :master payload: N/A
+
+  Requests that the QEMU mmap the given fd into the virtio-fs cache;
+  multiple chunks can be mapped in one command.
+  A reply is generated indicating whether mapping succeeded.
+
+``VHOST_USER_SLAVE_FS_UNMAP``
+  :id: 7
+  :equivalent ioctl: N/A
+  :slave payload: n * (address + len)
+  :master payload: N/A
+
+  Requests that the QEMU un-mmap the given range in the virtio-fs cache;
+  multiple chunks can be unmapped in one command.
+  A reply is generated indicating whether unmapping succeeded.
+
 .. _reply_ack:
 
 VHOST_USER_PROTOCOL_F_REPLY_ACK
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index b077d8e705..78401d2ff1 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -34,6 +34,20 @@
 #define DAX_WINDOW_PROT PROT_NONE
 #endif
 
+uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
+                                 int fd)
+{
+    /* TODO */
+    return (uint64_t)-1;
+}
+
+uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
+                                   VhostUserFSSlaveMsg *sm)
+{
+    /* TODO */
+    return (uint64_t)-1;
+}
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 13789cc55e..21e40ff91a 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -12,6 +12,7 @@
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
+#include "hw/virtio/vhost-user-fs.h"
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-net.h"
@@ -132,6 +133,10 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
     VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
+    VHOST_USER_SLAVE_VRING_CALL = 4,
+    VHOST_USER_SLAVE_VRING_ERR = 5,
+    VHOST_USER_SLAVE_FS_MAP = 6,
+    VHOST_USER_SLAVE_FS_UNMAP = 7,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -218,6 +223,7 @@ typedef union {
         VhostUserCryptoSession session;
         VhostUserVringArea area;
         VhostUserInflight inflight;
+        VhostUserFSSlaveMsg fs;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -1472,6 +1478,14 @@ static void slave_read(void *opaque)
         ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
                                                           fd[0]);
         break;
+#ifdef CONFIG_VHOST_USER_FS
+    case VHOST_USER_SLAVE_FS_MAP:
+        ret = vhost_user_fs_slave_map(dev, &payload.fs, fd[0]);
+        break;
+    case VHOST_USER_SLAVE_FS_UNMAP:
+        ret = vhost_user_fs_slave_unmap(dev, &payload.fs);
+        break;
+#endif
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
         ret = (uint64_t)-EINVAL;
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 04596799e3..25e14ab17a 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -23,6 +23,24 @@
 #define TYPE_VHOST_USER_FS "vhost-user-fs-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS)
 
+/* Structures carried over the slave channel back to QEMU */
+#define VHOST_USER_FS_SLAVE_ENTRIES 8
+
+/* For the flags field of VhostUserFSSlaveMsg */
+#define VHOST_USER_FS_FLAG_MAP_R (1ull << 0)
+#define VHOST_USER_FS_FLAG_MAP_W (1ull << 1)
+
+typedef struct {
+    /* Offsets within the file being mapped */
+    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Offsets within the cache */
+    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Lengths of sections */
+    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Flags, from VHOST_USER_FS_FLAG_* */
+    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
+} VhostUserFSSlaveMsg;
+
 typedef struct {
     CharBackend chardev;
     char *tag;
@@ -46,4 +64,10 @@ struct VHostUserFS {
     MemoryRegion cache;
 };
 
+/* Callbacks from the vhost-user code for slave commands */
+uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
+                                 int fd);
+uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
+                                   VhostUserFSSlaveMsg *sm);
+
 #endif /* _QEMU_VHOST_USER_FS_H */
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index e12e9c1532..150b1121cc 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -119,6 +119,8 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
     VHOST_USER_SLAVE_VRING_CALL = 4,
     VHOST_USER_SLAVE_VRING_ERR = 5,
+    VHOST_USER_SLAVE_FS_MAP = 6,
+    VHOST_USER_SLAVE_FS_UNMAP = 7,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The daemon may request that fd's be mapped into the virtio-fs cache
visible to the guest.
These mappings are triggered by commands sent over the slave fd
from the daemon.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/interop/vhost-user.rst               | 20 +++++++++++++++++++
 hw/virtio/vhost-user-fs.c                 | 14 +++++++++++++
 hw/virtio/vhost-user.c                    | 14 +++++++++++++
 include/hw/virtio/vhost-user-fs.h         | 24 +++++++++++++++++++++++
 subprojects/libvhost-user/libvhost-user.h |  2 ++
 5 files changed, 74 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index d6085f7045..1deedd3407 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1432,6 +1432,26 @@ Slave message types
 
   The state.num field is currently reserved and must be set to 0.
 
+``VHOST_USER_SLAVE_FS_MAP``
+  :id: 6
+  :equivalent ioctl: N/A
+  :slave payload: fd + n * (offset + address + len)
+  :master payload: N/A
+
+  Requests that the QEMU mmap the given fd into the virtio-fs cache;
+  multiple chunks can be mapped in one command.
+  A reply is generated indicating whether mapping succeeded.
+
+``VHOST_USER_SLAVE_FS_UNMAP``
+  :id: 7
+  :equivalent ioctl: N/A
+  :slave payload: n * (address + len)
+  :master payload: N/A
+
+  Requests that the QEMU un-mmap the given range in the virtio-fs cache;
+  multiple chunks can be unmapped in one command.
+  A reply is generated indicating whether unmapping succeeded.
+
 .. _reply_ack:
 
 VHOST_USER_PROTOCOL_F_REPLY_ACK
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index b077d8e705..78401d2ff1 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -34,6 +34,20 @@
 #define DAX_WINDOW_PROT PROT_NONE
 #endif
 
+uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
+                                 int fd)
+{
+    /* TODO */
+    return (uint64_t)-1;
+}
+
+uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
+                                   VhostUserFSSlaveMsg *sm)
+{
+    /* TODO */
+    return (uint64_t)-1;
+}
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 13789cc55e..21e40ff91a 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -12,6 +12,7 @@
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
+#include "hw/virtio/vhost-user-fs.h"
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-net.h"
@@ -132,6 +133,10 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
     VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
+    VHOST_USER_SLAVE_VRING_CALL = 4,
+    VHOST_USER_SLAVE_VRING_ERR = 5,
+    VHOST_USER_SLAVE_FS_MAP = 6,
+    VHOST_USER_SLAVE_FS_UNMAP = 7,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -218,6 +223,7 @@ typedef union {
         VhostUserCryptoSession session;
         VhostUserVringArea area;
         VhostUserInflight inflight;
+        VhostUserFSSlaveMsg fs;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -1472,6 +1478,14 @@ static void slave_read(void *opaque)
         ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
                                                           fd[0]);
         break;
+#ifdef CONFIG_VHOST_USER_FS
+    case VHOST_USER_SLAVE_FS_MAP:
+        ret = vhost_user_fs_slave_map(dev, &payload.fs, fd[0]);
+        break;
+    case VHOST_USER_SLAVE_FS_UNMAP:
+        ret = vhost_user_fs_slave_unmap(dev, &payload.fs);
+        break;
+#endif
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
         ret = (uint64_t)-EINVAL;
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 04596799e3..25e14ab17a 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -23,6 +23,24 @@
 #define TYPE_VHOST_USER_FS "vhost-user-fs-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS)
 
+/* Structures carried over the slave channel back to QEMU */
+#define VHOST_USER_FS_SLAVE_ENTRIES 8
+
+/* For the flags field of VhostUserFSSlaveMsg */
+#define VHOST_USER_FS_FLAG_MAP_R (1ull << 0)
+#define VHOST_USER_FS_FLAG_MAP_W (1ull << 1)
+
+typedef struct {
+    /* Offsets within the file being mapped */
+    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Offsets within the cache */
+    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Lengths of sections */
+    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
+    /* Flags, from VHOST_USER_FS_FLAG_* */
+    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
+} VhostUserFSSlaveMsg;
+
 typedef struct {
     CharBackend chardev;
     char *tag;
@@ -46,4 +64,10 @@ struct VHostUserFS {
     MemoryRegion cache;
 };
 
+/* Callbacks from the vhost-user code for slave commands */
+uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
+                                 int fd);
+uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
+                                   VhostUserFSSlaveMsg *sm);
+
 #endif /* _QEMU_VHOST_USER_FS_H */
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index e12e9c1532..150b1121cc 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -119,6 +119,8 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
     VHOST_USER_SLAVE_VRING_CALL = 4,
     VHOST_USER_SLAVE_VRING_ERR = 5,
+    VHOST_USER_SLAVE_FS_MAP = 6,
+    VHOST_USER_SLAVE_FS_UNMAP = 7,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 08/24] DAX: virtio-fs: Fill in slave commands for mapping
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Fill in definitions for map, unmap and sync commands.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with fix by misono.tomohiro@fujitsu.com
---
 hw/virtio/vhost-user-fs.c | 115 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 111 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 78401d2ff1..5f2fca4d82 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -37,15 +37,122 @@
 uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
                                  int fd)
 {
-    /* TODO */
-    return (uint64_t)-1;
+    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
+    if (!fs) {
+        /* Shouldn't happen - but seen on error path */
+        error_report("Bad fs ptr");
+        return (uint64_t)-1;
+    }
+    size_t cache_size = fs->conf.cache_size;
+    if (!cache_size) {
+        error_report("map called when DAX cache not present");
+        return (uint64_t)-1;
+    }
+    void *cache_host = memory_region_get_ram_ptr(&fs->cache);
+
+    unsigned int i;
+    int res = 0;
+
+    if (fd < 0) {
+        error_report("Bad fd for map");
+        return (uint64_t)-1;
+    }
+
+    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES; i++) {
+        if (sm->len[i] == 0) {
+            continue;
+        }
+
+        if ((sm->c_offset[i] + sm->len[i]) < sm->len[i] ||
+            (sm->c_offset[i] + sm->len[i]) > cache_size) {
+            error_report("Bad offset/len for map [%d] %" PRIx64 "+%" PRIx64,
+                         i, sm->c_offset[i], sm->len[i]);
+            res = -1;
+            break;
+        }
+
+        if (mmap(cache_host + sm->c_offset[i], sm->len[i],
+                 ((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) ? PROT_READ : 0) |
+                 ((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_W) ? PROT_WRITE : 0),
+                 MAP_SHARED | MAP_FIXED,
+                 fd, sm->fd_offset[i]) != (cache_host + sm->c_offset[i])) {
+            res = -errno;
+            error_report("map failed err %d [%d] %" PRIx64 "+%" PRIx64 " from %"
+                         PRIx64, errno, i, sm->c_offset[i], sm->len[i],
+                         sm->fd_offset[i]);
+            break;
+        }
+    }
+
+    if (res) {
+        /* Something went wrong, unmap them all */
+        vhost_user_fs_slave_unmap(dev, sm);
+    }
+    return (uint64_t)res;
 }
 
 uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
                                    VhostUserFSSlaveMsg *sm)
 {
-    /* TODO */
-    return (uint64_t)-1;
+    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
+    if (!fs) {
+        /* Shouldn't happen - but seen on error path */
+        error_report("Bad fs ptr");
+        return (uint64_t)-1;
+    }
+    size_t cache_size = fs->conf.cache_size;
+    if (!cache_size) {
+        /*
+         * Since dax cache is disabled, there should be no unmap request.
+         * Howerver we still receives whole range unmap request during umount
+         * for cleanup. Ignore it.
+         */
+        if (sm->len[0] == ~(uint64_t)0) {
+            return 0;
+        }
+
+        error_report("unmap called when DAX cache not present");
+        return (uint64_t)-1;
+    }
+    void *cache_host = memory_region_get_ram_ptr(&fs->cache);
+
+    unsigned int i;
+    int res = 0;
+
+    /*
+     * Note even if one unmap fails we try the rest, since the effect
+     * is to clean up as much as possible.
+     */
+    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES; i++) {
+        void *ptr;
+        if (sm->len[i] == 0) {
+            continue;
+        }
+
+        if (sm->len[i] == ~(uint64_t)0) {
+            /* Special case meaning the whole arena */
+            sm->len[i] = cache_size;
+        }
+
+        if ((sm->c_offset[i] + sm->len[i]) < sm->len[i] ||
+            (sm->c_offset[i] + sm->len[i]) > cache_size) {
+            error_report("Bad offset/len for unmap [%d] %" PRIx64 "+%" PRIx64,
+                         i, sm->c_offset[i], sm->len[i]);
+            res = -1;
+            continue;
+        }
+
+        ptr = mmap(cache_host + sm->c_offset[i], sm->len[i], DAX_WINDOW_PROT,
+                   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+        if (ptr != (cache_host + sm->c_offset[i])) {
+            res = -errno;
+            error_report("mmap failed (%s) [%d] %" PRIx64 "+%" PRIx64 " from %"
+                         PRIx64 " res: %p", strerror(errno), i, sm->c_offset[i],
+                         sm->len[i], sm->fd_offset[i], ptr);
+        }
+    }
+
+    return (uint64_t)res;
 }
 
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 08/24] DAX: virtio-fs: Fill in slave commands for mapping
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Fill in definitions for map, unmap and sync commands.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with fix by misono.tomohiro@fujitsu.com
---
 hw/virtio/vhost-user-fs.c | 115 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 111 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 78401d2ff1..5f2fca4d82 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -37,15 +37,122 @@
 uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
                                  int fd)
 {
-    /* TODO */
-    return (uint64_t)-1;
+    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
+    if (!fs) {
+        /* Shouldn't happen - but seen on error path */
+        error_report("Bad fs ptr");
+        return (uint64_t)-1;
+    }
+    size_t cache_size = fs->conf.cache_size;
+    if (!cache_size) {
+        error_report("map called when DAX cache not present");
+        return (uint64_t)-1;
+    }
+    void *cache_host = memory_region_get_ram_ptr(&fs->cache);
+
+    unsigned int i;
+    int res = 0;
+
+    if (fd < 0) {
+        error_report("Bad fd for map");
+        return (uint64_t)-1;
+    }
+
+    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES; i++) {
+        if (sm->len[i] == 0) {
+            continue;
+        }
+
+        if ((sm->c_offset[i] + sm->len[i]) < sm->len[i] ||
+            (sm->c_offset[i] + sm->len[i]) > cache_size) {
+            error_report("Bad offset/len for map [%d] %" PRIx64 "+%" PRIx64,
+                         i, sm->c_offset[i], sm->len[i]);
+            res = -1;
+            break;
+        }
+
+        if (mmap(cache_host + sm->c_offset[i], sm->len[i],
+                 ((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) ? PROT_READ : 0) |
+                 ((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_W) ? PROT_WRITE : 0),
+                 MAP_SHARED | MAP_FIXED,
+                 fd, sm->fd_offset[i]) != (cache_host + sm->c_offset[i])) {
+            res = -errno;
+            error_report("map failed err %d [%d] %" PRIx64 "+%" PRIx64 " from %"
+                         PRIx64, errno, i, sm->c_offset[i], sm->len[i],
+                         sm->fd_offset[i]);
+            break;
+        }
+    }
+
+    if (res) {
+        /* Something went wrong, unmap them all */
+        vhost_user_fs_slave_unmap(dev, sm);
+    }
+    return (uint64_t)res;
 }
 
 uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
                                    VhostUserFSSlaveMsg *sm)
 {
-    /* TODO */
-    return (uint64_t)-1;
+    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
+    if (!fs) {
+        /* Shouldn't happen - but seen on error path */
+        error_report("Bad fs ptr");
+        return (uint64_t)-1;
+    }
+    size_t cache_size = fs->conf.cache_size;
+    if (!cache_size) {
+        /*
+         * Since dax cache is disabled, there should be no unmap request.
+         * Howerver we still receives whole range unmap request during umount
+         * for cleanup. Ignore it.
+         */
+        if (sm->len[0] == ~(uint64_t)0) {
+            return 0;
+        }
+
+        error_report("unmap called when DAX cache not present");
+        return (uint64_t)-1;
+    }
+    void *cache_host = memory_region_get_ram_ptr(&fs->cache);
+
+    unsigned int i;
+    int res = 0;
+
+    /*
+     * Note even if one unmap fails we try the rest, since the effect
+     * is to clean up as much as possible.
+     */
+    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES; i++) {
+        void *ptr;
+        if (sm->len[i] == 0) {
+            continue;
+        }
+
+        if (sm->len[i] == ~(uint64_t)0) {
+            /* Special case meaning the whole arena */
+            sm->len[i] = cache_size;
+        }
+
+        if ((sm->c_offset[i] + sm->len[i]) < sm->len[i] ||
+            (sm->c_offset[i] + sm->len[i]) > cache_size) {
+            error_report("Bad offset/len for unmap [%d] %" PRIx64 "+%" PRIx64,
+                         i, sm->c_offset[i], sm->len[i]);
+            res = -1;
+            continue;
+        }
+
+        ptr = mmap(cache_host + sm->c_offset[i], sm->len[i], DAX_WINDOW_PROT,
+                   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+        if (ptr != (cache_host + sm->c_offset[i])) {
+            res = -errno;
+            error_report("mmap failed (%s) [%d] %" PRIx64 "+%" PRIx64 " from %"
+                         PRIx64 " res: %p", strerror(errno), i, sm->c_offset[i],
+                         sm->len[i], sm->fd_offset[i], ptr);
+        }
+    }
+
+    return (uint64_t)res;
 }
 
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 09/24] DAX: virtiofsd Add cache accessor functions
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add low level functions that the clients can use to map/unmap cache
areas.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.h | 21 +++++++++++++++++++++
 tools/virtiofsd/fuse_virtio.c   | 18 ++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 0e10a14bc9..c0ff4f07a4 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -29,6 +29,8 @@
 #include <sys/uio.h>
 #include <utime.h>
 
+#include "subprojects/libvhost-user/libvhost-user.h"
+
 /*
  * Miscellaneous definitions
  */
@@ -1970,4 +1972,23 @@ void fuse_session_process_buf(struct fuse_session *se,
  */
 int fuse_session_receive_buf(struct fuse_session *se, struct fuse_buf *buf);
 
+/**
+ * For use with virtio-fs; request an fd be mapped into the cache
+ *
+ * @param req The request that triggered this action
+ * @param msg A set of mapping requests
+ * @param fd The fd to map
+ * @return Zero on success
+ */
+int64_t fuse_virtio_map(fuse_req_t req, VhostUserFSSlaveMsg *msg, int fd);
+
+/**
+ * For use with virtio-fs; request unmapping of part of the cache
+ *
+ * @param se The session this request is on
+ * @param msg A set of unmapping requests
+ * @return Zero on success
+ */
+int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg);
+
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index bd19358437..f217a093c8 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -1044,3 +1044,21 @@ void virtio_session_close(struct fuse_session *se)
     free(se->virtio_dev);
     se->virtio_dev = NULL;
 }
+
+int64_t fuse_virtio_map(fuse_req_t req, VhostUserFSSlaveMsg *msg, int fd)
+{
+    if (!req->se->virtio_dev) {
+        return -ENODEV;
+    }
+    return vu_fs_cache_request(&req->se->virtio_dev->dev,
+                               VHOST_USER_SLAVE_FS_MAP, fd, msg);
+}
+
+int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg)
+{
+    if (!se->virtio_dev) {
+        return -ENODEV;
+    }
+    return vu_fs_cache_request(&se->virtio_dev->dev, VHOST_USER_SLAVE_FS_UNMAP,
+                               -1, msg);
+}
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 09/24] DAX: virtiofsd Add cache accessor functions
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add low level functions that the clients can use to map/unmap cache
areas.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.h | 21 +++++++++++++++++++++
 tools/virtiofsd/fuse_virtio.c   | 18 ++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 0e10a14bc9..c0ff4f07a4 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -29,6 +29,8 @@
 #include <sys/uio.h>
 #include <utime.h>
 
+#include "subprojects/libvhost-user/libvhost-user.h"
+
 /*
  * Miscellaneous definitions
  */
@@ -1970,4 +1972,23 @@ void fuse_session_process_buf(struct fuse_session *se,
  */
 int fuse_session_receive_buf(struct fuse_session *se, struct fuse_buf *buf);
 
+/**
+ * For use with virtio-fs; request an fd be mapped into the cache
+ *
+ * @param req The request that triggered this action
+ * @param msg A set of mapping requests
+ * @param fd The fd to map
+ * @return Zero on success
+ */
+int64_t fuse_virtio_map(fuse_req_t req, VhostUserFSSlaveMsg *msg, int fd);
+
+/**
+ * For use with virtio-fs; request unmapping of part of the cache
+ *
+ * @param se The session this request is on
+ * @param msg A set of unmapping requests
+ * @return Zero on success
+ */
+int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg);
+
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index bd19358437..f217a093c8 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -1044,3 +1044,21 @@ void virtio_session_close(struct fuse_session *se)
     free(se->virtio_dev);
     se->virtio_dev = NULL;
 }
+
+int64_t fuse_virtio_map(fuse_req_t req, VhostUserFSSlaveMsg *msg, int fd)
+{
+    if (!req->se->virtio_dev) {
+        return -ENODEV;
+    }
+    return vu_fs_cache_request(&req->se->virtio_dev->dev,
+                               VHOST_USER_SLAVE_FS_MAP, fd, msg);
+}
+
+int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg)
+{
+    if (!se->virtio_dev) {
+        return -ENODEV;
+    }
+    return vu_fs_cache_request(&se->virtio_dev->dev, VHOST_USER_SLAVE_FS_UNMAP,
+                               -1, msg);
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add commands so that the guest kernel can ask the daemon to map file
sections into a guest kernel visible cache.

Note: Catherine Ho had sent a patch to fix an issue with multiple
removemapping. It was a merge issue though.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Including-fixes: Catherine Ho <catherine.hecx@gmail.com>
Signed-off-by: Catherine Ho <catherine.hecx@gmail.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 68 +++++++++++++++++++++++++++++++++
 tools/virtiofsd/fuse_lowlevel.h | 23 ++++++++++-
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index e94b71110b..0d3768b7d0 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1868,6 +1868,72 @@ static void do_lseek(fuse_req_t req, fuse_ino_t nodeid,
     }
 }
 
+static void do_setupmapping(fuse_req_t req, fuse_ino_t nodeid,
+                            struct fuse_mbuf_iter *iter)
+{
+    struct fuse_setupmapping_in *arg;
+    struct fuse_file_info fi;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
+
+    /*
+     *  TODO: Need to come up with a better definition of flags here; it can't
+     * be the kernel view of the flags, since that's abstracted from the client
+     * similarly, it's not the vhost-user set
+     * for now just use O_ flags
+     */
+    uint64_t genflags;
+
+    genflags = O_RDONLY;
+    if (arg->flags & FUSE_SETUPMAPPING_FLAG_WRITE) {
+        genflags = O_RDWR;
+    }
+
+    if (req->se->op.setupmapping) {
+        req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
+                                 arg->moffset, genflags, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
+}
+
+static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
+                             struct fuse_mbuf_iter *iter)
+{
+    struct fuse_removemapping_in *arg;
+    struct fuse_removemapping_one *one;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg || arg->count <= 0) {
+        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
+    if (!one) {
+        fuse_log(
+            FUSE_LOG_ERR,
+            "do_removemapping: invalid in, expected %d * %ld, has %ld - %ld\n",
+            arg->count, sizeof(*one), iter->size, iter->pos);
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    if (req->se->op.removemapping) {
+        req->se->op.removemapping(req, req->se, nodeid, arg->count, one);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
+}
+
 static void do_init(fuse_req_t req, fuse_ino_t nodeid,
                     struct fuse_mbuf_iter *iter)
 {
@@ -2258,6 +2324,8 @@ static struct {
     [FUSE_RENAME2] = { do_rename2, "RENAME2" },
     [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
     [FUSE_LSEEK] = { do_lseek, "LSEEK" },
+    [FUSE_SETUPMAPPING] = { do_setupmapping, "SETUPMAPPING" },
+    [FUSE_REMOVEMAPPING] = { do_removemapping, "REMOVEMAPPING" },
 };
 
 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index c0ff4f07a4..014564ff07 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -24,6 +24,7 @@
 #endif
 
 #include "fuse_common.h"
+#include "standard-headers/linux/fuse.h"
 
 #include <sys/statvfs.h>
 #include <sys/uio.h>
@@ -1170,7 +1171,6 @@ struct fuse_lowlevel_ops {
      */
     void (*readdirplus)(fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
                         struct fuse_file_info *fi);
-
     /**
      * Copy a range of data from one file to another
      *
@@ -1226,6 +1226,27 @@ struct fuse_lowlevel_ops {
      */
     void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
                   struct fuse_file_info *fi);
+
+    /*
+     * Map file sections into kernel visible cache
+     *
+     * Map a section of the file into address space visible to the kernel
+     * mounting the filesystem.
+     * TODO
+     */
+    void (*setupmapping)(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
+                         uint64_t len, uint64_t moffset, uint64_t flags,
+                         struct fuse_file_info *fi);
+
+    /*
+     * Unmap file sections in kernel visible cache
+     *
+     * Unmap sections previously mapped by setupmapping
+     * TODO
+     */
+    void (*removemapping)(fuse_req_t req, struct fuse_session *se,
+                          fuse_ino_t ino, unsigned num,
+                          struct fuse_removemapping_one *argp);
 };
 
 /**
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add commands so that the guest kernel can ask the daemon to map file
sections into a guest kernel visible cache.

Note: Catherine Ho had sent a patch to fix an issue with multiple
removemapping. It was a merge issue though.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Including-fixes: Catherine Ho <catherine.hecx@gmail.com>
Signed-off-by: Catherine Ho <catherine.hecx@gmail.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 68 +++++++++++++++++++++++++++++++++
 tools/virtiofsd/fuse_lowlevel.h | 23 ++++++++++-
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index e94b71110b..0d3768b7d0 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1868,6 +1868,72 @@ static void do_lseek(fuse_req_t req, fuse_ino_t nodeid,
     }
 }
 
+static void do_setupmapping(fuse_req_t req, fuse_ino_t nodeid,
+                            struct fuse_mbuf_iter *iter)
+{
+    struct fuse_setupmapping_in *arg;
+    struct fuse_file_info fi;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
+
+    /*
+     *  TODO: Need to come up with a better definition of flags here; it can't
+     * be the kernel view of the flags, since that's abstracted from the client
+     * similarly, it's not the vhost-user set
+     * for now just use O_ flags
+     */
+    uint64_t genflags;
+
+    genflags = O_RDONLY;
+    if (arg->flags & FUSE_SETUPMAPPING_FLAG_WRITE) {
+        genflags = O_RDWR;
+    }
+
+    if (req->se->op.setupmapping) {
+        req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
+                                 arg->moffset, genflags, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
+}
+
+static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
+                             struct fuse_mbuf_iter *iter)
+{
+    struct fuse_removemapping_in *arg;
+    struct fuse_removemapping_one *one;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg || arg->count <= 0) {
+        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
+    if (!one) {
+        fuse_log(
+            FUSE_LOG_ERR,
+            "do_removemapping: invalid in, expected %d * %ld, has %ld - %ld\n",
+            arg->count, sizeof(*one), iter->size, iter->pos);
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    if (req->se->op.removemapping) {
+        req->se->op.removemapping(req, req->se, nodeid, arg->count, one);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
+}
+
 static void do_init(fuse_req_t req, fuse_ino_t nodeid,
                     struct fuse_mbuf_iter *iter)
 {
@@ -2258,6 +2324,8 @@ static struct {
     [FUSE_RENAME2] = { do_rename2, "RENAME2" },
     [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
     [FUSE_LSEEK] = { do_lseek, "LSEEK" },
+    [FUSE_SETUPMAPPING] = { do_setupmapping, "SETUPMAPPING" },
+    [FUSE_REMOVEMAPPING] = { do_removemapping, "REMOVEMAPPING" },
 };
 
 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index c0ff4f07a4..014564ff07 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -24,6 +24,7 @@
 #endif
 
 #include "fuse_common.h"
+#include "standard-headers/linux/fuse.h"
 
 #include <sys/statvfs.h>
 #include <sys/uio.h>
@@ -1170,7 +1171,6 @@ struct fuse_lowlevel_ops {
      */
     void (*readdirplus)(fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
                         struct fuse_file_info *fi);
-
     /**
      * Copy a range of data from one file to another
      *
@@ -1226,6 +1226,27 @@ struct fuse_lowlevel_ops {
      */
     void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
                   struct fuse_file_info *fi);
+
+    /*
+     * Map file sections into kernel visible cache
+     *
+     * Map a section of the file into address space visible to the kernel
+     * mounting the filesystem.
+     * TODO
+     */
+    void (*setupmapping)(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
+                         uint64_t len, uint64_t moffset, uint64_t flags,
+                         struct fuse_file_info *fi);
+
+    /*
+     * Unmap file sections in kernel visible cache
+     *
+     * Unmap sections previously mapped by setupmapping
+     * TODO
+     */
+    void (*removemapping)(fuse_req_t req, struct fuse_session *se,
+                          fuse_ino_t ino, unsigned num,
+                          struct fuse_removemapping_one *argp);
 };
 
 /**
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 11/24] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 147b59338a..31c43d67a0 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2963,6 +2963,22 @@ static void lo_destroy(void *userdata)
     pthread_mutex_unlock(&lo->mutex);
 }
 
+static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
+                            uint64_t len, uint64_t moffset, uint64_t flags,
+                            struct fuse_file_info *fi)
+{
+    /* TODO */
+    fuse_reply_err(req, ENOSYS);
+}
+
+static void lo_removemapping(fuse_req_t req, struct fuse_session *se,
+                             fuse_ino_t ino, unsigned num,
+                             struct fuse_removemapping_one *argp)
+{
+    /* TODO */
+    fuse_reply_err(req, ENOSYS);
+}
+
 static struct fuse_lowlevel_ops lo_oper = {
     .init = lo_init,
     .lookup = lo_lookup,
@@ -3004,6 +3020,8 @@ static struct fuse_lowlevel_ops lo_oper = {
 #endif
     .lseek = lo_lseek,
     .destroy = lo_destroy,
+    .setupmapping = lo_setupmapping,
+    .removemapping = lo_removemapping,
 };
 
 /* Print vhost-user.json backend program capabilities */
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 11/24] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 147b59338a..31c43d67a0 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2963,6 +2963,22 @@ static void lo_destroy(void *userdata)
     pthread_mutex_unlock(&lo->mutex);
 }
 
+static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
+                            uint64_t len, uint64_t moffset, uint64_t flags,
+                            struct fuse_file_info *fi)
+{
+    /* TODO */
+    fuse_reply_err(req, ENOSYS);
+}
+
+static void lo_removemapping(fuse_req_t req, struct fuse_session *se,
+                             fuse_ino_t ino, unsigned num,
+                             struct fuse_removemapping_one *argp)
+{
+    /* TODO */
+    fuse_reply_err(req, ENOSYS);
+}
+
 static struct fuse_lowlevel_ops lo_oper = {
     .init = lo_init,
     .lookup = lo_lookup,
@@ -3004,6 +3020,8 @@ static struct fuse_lowlevel_ops lo_oper = {
 #endif
     .lseek = lo_lseek,
     .destroy = lo_destroy,
+    .setupmapping = lo_setupmapping,
+    .removemapping = lo_removemapping,
 };
 
 /* Print vhost-user.json backend program capabilities */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 12/24] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up passthrough_ll's setupmapping to allocate, send to virtio
and then reply OK.

Guest might not pass file pointer. In that case using inode info, open
the file again, mmap() and close fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
With fix from:
Signed-off-by: Fotis Xenakis <foxen@windowslive.com>
---
 tools/virtiofsd/fuse_lowlevel.c  | 13 ++++++--
 tools/virtiofsd/passthrough_ll.c | 52 ++++++++++++++++++++++++++++++--
 2 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 0d3768b7d0..f74583e095 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1897,8 +1897,17 @@ static void do_setupmapping(fuse_req_t req, fuse_ino_t nodeid,
     }
 
     if (req->se->op.setupmapping) {
-        req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
-                                 arg->moffset, genflags, &fi);
+        /*
+         * TODO: Add a flag to request which tells if arg->fh is
+         * valid or not.
+         */
+        if (fi.fh == (uint64_t)-1) {
+            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
+                                     arg->moffset, genflags, NULL);
+        } else {
+            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
+                                     arg->moffset, genflags, &fi);
+        }
     } else {
         fuse_reply_err(req, ENOSYS);
     }
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 31c43d67a0..0493f00756 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2967,8 +2967,56 @@ static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
                             uint64_t len, uint64_t moffset, uint64_t flags,
                             struct fuse_file_info *fi)
 {
-    /* TODO */
-    fuse_reply_err(req, ENOSYS);
+    struct lo_data *lo = lo_data(req);
+    int ret = 0, fd;
+    VhostUserFSSlaveMsg msg = { 0 };
+    uint64_t vhu_flags;
+    char *buf;
+    bool writable = flags & O_RDWR;
+
+    fuse_log(FUSE_LOG_DEBUG,
+             "lo_setupmapping(ino=%" PRIu64 ", fi=0x%p,"
+             " foffset=%" PRIu64 ", len=%" PRIu64 ", moffset=%" PRIu64
+             ", flags=%" PRIu64 ")\n",
+             ino, (void *)fi, foffset, len, moffset, flags);
+
+    vhu_flags = VHOST_USER_FS_FLAG_MAP_R;
+    if (writable) {
+        vhu_flags |= VHOST_USER_FS_FLAG_MAP_W;
+    }
+
+    msg.fd_offset[0] = foffset;
+    msg.len[0] = len;
+    msg.c_offset[0] = moffset;
+    msg.flags[0] = vhu_flags;
+
+    if (fi) {
+        fd = lo_fi_fd(req, fi);
+    } else {
+        ret = asprintf(&buf, "%i", lo_fd(req, ino));
+        if (ret == -1) {
+            return (void)fuse_reply_err(req, errno);
+        }
+
+        fd = openat(lo->proc_self_fd, buf, flags);
+        free(buf);
+        if (fd == -1) {
+            return (void)fuse_reply_err(req, errno);
+        }
+    }
+
+    ret = fuse_virtio_map(req, &msg, fd);
+    if (ret < 0) {
+        fuse_log(FUSE_LOG_ERR,
+                 "%s: map over virtio failed (ino=%" PRId64
+                 "fd=%d moffset=0x%" PRIx64 "). err = %d\n",
+                 __func__, ino, fd, moffset, ret);
+    }
+
+    if (!fi) {
+        close(fd);
+    }
+    fuse_reply_err(req, -ret);
 }
 
 static void lo_removemapping(fuse_req_t req, struct fuse_session *se,
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 12/24] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up passthrough_ll's setupmapping to allocate, send to virtio
and then reply OK.

Guest might not pass file pointer. In that case using inode info, open
the file again, mmap() and close fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
With fix from:
Signed-off-by: Fotis Xenakis <foxen@windowslive.com>
---
 tools/virtiofsd/fuse_lowlevel.c  | 13 ++++++--
 tools/virtiofsd/passthrough_ll.c | 52 ++++++++++++++++++++++++++++++--
 2 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 0d3768b7d0..f74583e095 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1897,8 +1897,17 @@ static void do_setupmapping(fuse_req_t req, fuse_ino_t nodeid,
     }
 
     if (req->se->op.setupmapping) {
-        req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
-                                 arg->moffset, genflags, &fi);
+        /*
+         * TODO: Add a flag to request which tells if arg->fh is
+         * valid or not.
+         */
+        if (fi.fh == (uint64_t)-1) {
+            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
+                                     arg->moffset, genflags, NULL);
+        } else {
+            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
+                                     arg->moffset, genflags, &fi);
+        }
     } else {
         fuse_reply_err(req, ENOSYS);
     }
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 31c43d67a0..0493f00756 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2967,8 +2967,56 @@ static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
                             uint64_t len, uint64_t moffset, uint64_t flags,
                             struct fuse_file_info *fi)
 {
-    /* TODO */
-    fuse_reply_err(req, ENOSYS);
+    struct lo_data *lo = lo_data(req);
+    int ret = 0, fd;
+    VhostUserFSSlaveMsg msg = { 0 };
+    uint64_t vhu_flags;
+    char *buf;
+    bool writable = flags & O_RDWR;
+
+    fuse_log(FUSE_LOG_DEBUG,
+             "lo_setupmapping(ino=%" PRIu64 ", fi=0x%p,"
+             " foffset=%" PRIu64 ", len=%" PRIu64 ", moffset=%" PRIu64
+             ", flags=%" PRIu64 ")\n",
+             ino, (void *)fi, foffset, len, moffset, flags);
+
+    vhu_flags = VHOST_USER_FS_FLAG_MAP_R;
+    if (writable) {
+        vhu_flags |= VHOST_USER_FS_FLAG_MAP_W;
+    }
+
+    msg.fd_offset[0] = foffset;
+    msg.len[0] = len;
+    msg.c_offset[0] = moffset;
+    msg.flags[0] = vhu_flags;
+
+    if (fi) {
+        fd = lo_fi_fd(req, fi);
+    } else {
+        ret = asprintf(&buf, "%i", lo_fd(req, ino));
+        if (ret == -1) {
+            return (void)fuse_reply_err(req, errno);
+        }
+
+        fd = openat(lo->proc_self_fd, buf, flags);
+        free(buf);
+        if (fd == -1) {
+            return (void)fuse_reply_err(req, errno);
+        }
+    }
+
+    ret = fuse_virtio_map(req, &msg, fd);
+    if (ret < 0) {
+        fuse_log(FUSE_LOG_ERR,
+                 "%s: map over virtio failed (ino=%" PRId64
+                 "fd=%d moffset=0x%" PRIx64 "). err = %d\n",
+                 __func__, ino, fd, moffset, ret);
+    }
+
+    if (!fi) {
+        close(fd);
+    }
+    fuse_reply_err(req, -ret);
 }
 
 static void lo_removemapping(fuse_req_t req, struct fuse_session *se,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 13/24] DAX: virtiofsd: Make lo_removemapping() work
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Vivek Goyal <vgoyal@redhat.com>

Let guest pass in the offset in dax window a mapping is currently
mapped at and needs to be removed.

Vivek added the initial support to remove single mapping and later Peng
added patch to support removing multiple mappings in single command.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0493f00756..971ff2b2ea 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3023,8 +3023,30 @@ static void lo_removemapping(fuse_req_t req, struct fuse_session *se,
                              fuse_ino_t ino, unsigned num,
                              struct fuse_removemapping_one *argp)
 {
-    /* TODO */
-    fuse_reply_err(req, ENOSYS);
+    VhostUserFSSlaveMsg msg = { 0 };
+    int ret = 0;
+
+    for (int i = 0; num > 0; i++, argp++) {
+        msg.len[i] = argp->len;
+        msg.c_offset[i] = argp->moffset;
+
+        if (--num == 0 || i == VHOST_USER_FS_SLAVE_ENTRIES - 1) {
+            ret = fuse_virtio_unmap(se, &msg);
+            if (ret < 0) {
+                fuse_log(FUSE_LOG_ERR,
+                         "%s: unmap over virtio failed "
+                         "(offset=0x%lx, len=0x%lx). err=%d\n",
+                         __func__, argp->moffset, argp->len, ret);
+                break;
+            }
+            if (num > 0) {
+                i = 0;
+                memset(&msg, 0, sizeof(msg));
+            }
+        }
+    }
+
+    fuse_reply_err(req, -ret);
 }
 
 static struct fuse_lowlevel_ops lo_oper = {
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 13/24] DAX: virtiofsd: Make lo_removemapping() work
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Vivek Goyal <vgoyal@redhat.com>

Let guest pass in the offset in dax window a mapping is currently
mapped at and needs to be removed.

Vivek added the initial support to remove single mapping and later Peng
added patch to support removing multiple mappings in single command.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0493f00756..971ff2b2ea 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3023,8 +3023,30 @@ static void lo_removemapping(fuse_req_t req, struct fuse_session *se,
                              fuse_ino_t ino, unsigned num,
                              struct fuse_removemapping_one *argp)
 {
-    /* TODO */
-    fuse_reply_err(req, ENOSYS);
+    VhostUserFSSlaveMsg msg = { 0 };
+    int ret = 0;
+
+    for (int i = 0; num > 0; i++, argp++) {
+        msg.len[i] = argp->len;
+        msg.c_offset[i] = argp->moffset;
+
+        if (--num == 0 || i == VHOST_USER_FS_SLAVE_ENTRIES - 1) {
+            ret = fuse_virtio_unmap(se, &msg);
+            if (ret < 0) {
+                fuse_log(FUSE_LOG_ERR,
+                         "%s: unmap over virtio failed "
+                         "(offset=0x%lx, len=0x%lx). err=%d\n",
+                         __func__, argp->moffset, argp->len, ret);
+                break;
+            }
+            if (num > 0) {
+                i = 0;
+                memset(&msg, 0, sizeof(msg));
+            }
+        }
+    }
+
+    fuse_reply_err(req, -ret);
 }
 
 static struct fuse_lowlevel_ops lo_oper = {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 14/24] DAX: virtiofsd: route se down to destroy method
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We're going to need to pass the session down to destroy so that it can
pass it back to do the remove mapping.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c  | 6 +++---
 tools/virtiofsd/fuse_lowlevel.h  | 2 +-
 tools/virtiofsd/passthrough_ll.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index f74583e095..99ba000c2e 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2212,7 +2212,7 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
     se->got_destroy = 1;
     se->got_init = 0;
     if (se->op.destroy) {
-        se->op.destroy(se->userdata);
+        se->op.destroy(se->userdata, se);
     }
 
     send_reply_ok(req, NULL, 0);
@@ -2439,7 +2439,7 @@ void fuse_session_process_buf_int(struct fuse_session *se,
             se->got_destroy = 1;
             se->got_init = 0;
             if (se->op.destroy) {
-                se->op.destroy(se->userdata);
+                se->op.destroy(se->userdata, se);
             }
         } else {
             goto reply_err;
@@ -2527,7 +2527,7 @@ void fuse_session_destroy(struct fuse_session *se)
 {
     if (se->got_init && !se->got_destroy) {
         if (se->op.destroy) {
-            se->op.destroy(se->userdata);
+            se->op.destroy(se->userdata, se);
         }
     }
     pthread_rwlock_destroy(&se->init_rwlock);
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 014564ff07..53439f5432 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -208,7 +208,7 @@ struct fuse_lowlevel_ops {
      *
      * @param userdata the user data passed to fuse_session_new()
      */
-    void (*destroy)(void *userdata);
+    void (*destroy)(void *userdata, struct fuse_session *se);
 
     /**
      * Look up a directory entry by name and get its attributes.
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 971ff2b2ea..badac23fef 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2943,7 +2943,7 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
     }
 }
 
-static void lo_destroy(void *userdata)
+static void lo_destroy(void *userdata, struct fuse_session *se)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 14/24] DAX: virtiofsd: route se down to destroy method
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We're going to need to pass the session down to destroy so that it can
pass it back to do the remove mapping.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c  | 6 +++---
 tools/virtiofsd/fuse_lowlevel.h  | 2 +-
 tools/virtiofsd/passthrough_ll.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index f74583e095..99ba000c2e 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2212,7 +2212,7 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
     se->got_destroy = 1;
     se->got_init = 0;
     if (se->op.destroy) {
-        se->op.destroy(se->userdata);
+        se->op.destroy(se->userdata, se);
     }
 
     send_reply_ok(req, NULL, 0);
@@ -2439,7 +2439,7 @@ void fuse_session_process_buf_int(struct fuse_session *se,
             se->got_destroy = 1;
             se->got_init = 0;
             if (se->op.destroy) {
-                se->op.destroy(se->userdata);
+                se->op.destroy(se->userdata, se);
             }
         } else {
             goto reply_err;
@@ -2527,7 +2527,7 @@ void fuse_session_destroy(struct fuse_session *se)
 {
     if (se->got_init && !se->got_destroy) {
         if (se->op.destroy) {
-            se->op.destroy(se->userdata);
+            se->op.destroy(se->userdata, se);
         }
     }
     pthread_rwlock_destroy(&se->init_rwlock);
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 014564ff07..53439f5432 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -208,7 +208,7 @@ struct fuse_lowlevel_ops {
      *
      * @param userdata the user data passed to fuse_session_new()
      */
-    void (*destroy)(void *userdata);
+    void (*destroy)(void *userdata, struct fuse_session *se);
 
     /**
      * Look up a directory entry by name and get its attributes.
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 971ff2b2ea..badac23fef 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2943,7 +2943,7 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
     }
 }
 
-static void lo_destroy(void *userdata)
+static void lo_destroy(void *userdata, struct fuse_session *se)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 15/24] DAX: virtiofsd: Perform an unmap on destroy
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Force unmap all remaining dax cache entries on a destroy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index badac23fef..21ddb434ae 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2947,6 +2947,17 @@ static void lo_destroy(void *userdata, struct fuse_session *se)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
 
+    if (fuse_lowlevel_is_virtio(se)) {
+        VhostUserFSSlaveMsg msg = { 0 };
+
+        msg.len[0] = ~(uint64_t)0; /* Special: means 'all' */
+        msg.c_offset[0] = 0;
+        if (fuse_virtio_unmap(se, &msg)) {
+            fuse_log(FUSE_LOG_ERR, "%s: unmap during destroy failed\n",
+                     __func__);
+        }
+    }
+
     pthread_mutex_lock(&lo->mutex);
     while (true) {
         GHashTableIter iter;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 15/24] DAX: virtiofsd: Perform an unmap on destroy
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Force unmap all remaining dax cache entries on a destroy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index badac23fef..21ddb434ae 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2947,6 +2947,17 @@ static void lo_destroy(void *userdata, struct fuse_session *se)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
 
+    if (fuse_lowlevel_is_virtio(se)) {
+        VhostUserFSSlaveMsg msg = { 0 };
+
+        msg.len[0] = ~(uint64_t)0; /* Special: means 'all' */
+        msg.c_offset[0] = 0;
+        if (fuse_virtio_unmap(se, &msg)) {
+            fuse_log(FUSE_LOG_ERR, "%s: unmap during destroy failed\n",
+                     __func__);
+        }
+    }
+
     pthread_mutex_lock(&lo->mutex);
     while (true) {
         GHashTableIter iter;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 16/24] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
client to ask qemu to perform a read/write from an fd directly
to GPA.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/interop/vhost-user.rst               | 11 +++
 hw/virtio/trace-events                    |  6 ++
 hw/virtio/vhost-user-fs.c                 | 84 +++++++++++++++++++++++
 hw/virtio/vhost-user.c                    |  4 ++
 include/hw/virtio/vhost-user-fs.h         |  2 +
 subprojects/libvhost-user/libvhost-user.h |  1 +
 6 files changed, 108 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 1deedd3407..821712f4a2 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1452,6 +1452,17 @@ Slave message types
   multiple chunks can be unmapped in one command.
   A reply is generated indicating whether unmapping succeeded.
 
+``VHOST_USER_SLAVE_FS_IO``
+  :id: 9
+  :equivalent ioctl: N/A
+  :slave payload: fd + n * (offset + address + len)
+  :master payload: N/A
+
+  Requests that the QEMU performs IO directly from an fd to guest memory
+  on behalf of the daemon; this is normally for a case where a memory region
+  isn't visible to the daemon. slave payload has flags which determine
+  the direction of IO operation.
+
 .. _reply_ack:
 
 VHOST_USER_PROTOCOL_F_REPLY_ACK
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index c62727f879..20557a078e 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
 vhost_vdpa_set_owner(void *dev) "dev: %p"
 vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
 
+# vhost-user-fs.c
+
+vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
+vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
+vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
+
 # virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
 virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 5f2fca4d82..357bc1d04e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -23,6 +23,8 @@
 #include "hw/virtio/vhost-user-fs.h"
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
+#include "exec/address-spaces.h"
+#include "trace.h"
 
 /*
  * The powerpc kernel code expects the memory to be accessible during
@@ -155,6 +157,88 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
     return (uint64_t)res;
 }
 
+uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
+                                int fd)
+{
+    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
+    if (!fs) {
+        /* Shouldn't happen - but seen it in error paths */
+        error_report("Bad fs ptr");
+        return (uint64_t)-1;
+    }
+
+    unsigned int i;
+    int res = 0;
+    size_t done = 0;
+
+    if (fd < 0) {
+        error_report("Bad fd for map");
+        return (uint64_t)-1;
+    }
+
+    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
+        if (sm->len[i] == 0) {
+            continue;
+        }
+
+        size_t len = sm->len[i];
+        hwaddr gpa = sm->c_offset[i];
+
+        while (len && !res) {
+            MemoryRegionSection mrs = memory_region_find(get_system_memory(),
+                                                         gpa, len);
+            size_t mrs_size = (size_t)int128_get64(mrs.size);
+
+            if (!mrs_size) {
+                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
+                res = -EFAULT;
+                break;
+            }
+
+            trace_vhost_user_fs_slave_io_loop(mrs.mr->name,
+                                          (uint64_t)mrs.offset_within_region,
+                                          memory_region_is_ram(mrs.mr),
+                                          memory_region_is_romd(mrs.mr),
+                                          (size_t)mrs_size);
+
+            void *hostptr = qemu_map_ram_ptr(mrs.mr->ram_block,
+                                             mrs.offset_within_region);
+            ssize_t transferred;
+            if (sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) {
+                /* Read from file into RAM */
+                if (mrs.mr->readonly) {
+                    res = -EFAULT;
+                    break;
+                }
+                transferred = pread(fd, hostptr, mrs_size, sm->fd_offset[i]);
+            } else {
+                /* Write into file from RAM */
+                assert((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_W));
+                transferred = pwrite(fd, hostptr, mrs_size, sm->fd_offset[i]);
+            }
+            trace_vhost_user_fs_slave_io_loop_res(transferred);
+            if (transferred < 0) {
+                res = -errno;
+                break;
+            }
+            if (!transferred) {
+                /* EOF */
+                break;
+            }
+
+            done += transferred;
+            len -= transferred;
+        }
+    }
+    close(fd);
+
+    trace_vhost_user_fs_slave_io_exit(res, done);
+    if (res < 0) {
+        return (uint64_t)res;
+    }
+    return (uint64_t)done;
+}
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 21e40ff91a..0bc83c2714 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -137,6 +137,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_VRING_ERR = 5,
     VHOST_USER_SLAVE_FS_MAP = 6,
     VHOST_USER_SLAVE_FS_UNMAP = 7,
+    VHOST_USER_SLAVE_FS_IO = 8,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -1485,6 +1486,9 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_FS_UNMAP:
         ret = vhost_user_fs_slave_unmap(dev, &payload.fs);
         break;
+    case VHOST_USER_SLAVE_FS_IO:
+        ret = vhost_user_fs_slave_io(dev, &payload.fs, fd[0]);
+        break;
 #endif
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 25e14ab17a..ffd3165c29 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -69,5 +69,7 @@ uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
                                  int fd);
 uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
                                    VhostUserFSSlaveMsg *sm);
+uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
+                                int fd);
 
 #endif /* _QEMU_VHOST_USER_FS_H */
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 150b1121cc..a398148ed9 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -121,6 +121,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_VRING_ERR = 5,
     VHOST_USER_SLAVE_FS_MAP = 6,
     VHOST_USER_SLAVE_FS_UNMAP = 7,
+    VHOST_USER_SLAVE_FS_IO = 8,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 16/24] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
client to ask qemu to perform a read/write from an fd directly
to GPA.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/interop/vhost-user.rst               | 11 +++
 hw/virtio/trace-events                    |  6 ++
 hw/virtio/vhost-user-fs.c                 | 84 +++++++++++++++++++++++
 hw/virtio/vhost-user.c                    |  4 ++
 include/hw/virtio/vhost-user-fs.h         |  2 +
 subprojects/libvhost-user/libvhost-user.h |  1 +
 6 files changed, 108 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 1deedd3407..821712f4a2 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1452,6 +1452,17 @@ Slave message types
   multiple chunks can be unmapped in one command.
   A reply is generated indicating whether unmapping succeeded.
 
+``VHOST_USER_SLAVE_FS_IO``
+  :id: 9
+  :equivalent ioctl: N/A
+  :slave payload: fd + n * (offset + address + len)
+  :master payload: N/A
+
+  Requests that the QEMU performs IO directly from an fd to guest memory
+  on behalf of the daemon; this is normally for a case where a memory region
+  isn't visible to the daemon. slave payload has flags which determine
+  the direction of IO operation.
+
 .. _reply_ack:
 
 VHOST_USER_PROTOCOL_F_REPLY_ACK
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index c62727f879..20557a078e 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
 vhost_vdpa_set_owner(void *dev) "dev: %p"
 vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
 
+# vhost-user-fs.c
+
+vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
+vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
+vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
+
 # virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
 virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 5f2fca4d82..357bc1d04e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -23,6 +23,8 @@
 #include "hw/virtio/vhost-user-fs.h"
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
+#include "exec/address-spaces.h"
+#include "trace.h"
 
 /*
  * The powerpc kernel code expects the memory to be accessible during
@@ -155,6 +157,88 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
     return (uint64_t)res;
 }
 
+uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
+                                int fd)
+{
+    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
+    if (!fs) {
+        /* Shouldn't happen - but seen it in error paths */
+        error_report("Bad fs ptr");
+        return (uint64_t)-1;
+    }
+
+    unsigned int i;
+    int res = 0;
+    size_t done = 0;
+
+    if (fd < 0) {
+        error_report("Bad fd for map");
+        return (uint64_t)-1;
+    }
+
+    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
+        if (sm->len[i] == 0) {
+            continue;
+        }
+
+        size_t len = sm->len[i];
+        hwaddr gpa = sm->c_offset[i];
+
+        while (len && !res) {
+            MemoryRegionSection mrs = memory_region_find(get_system_memory(),
+                                                         gpa, len);
+            size_t mrs_size = (size_t)int128_get64(mrs.size);
+
+            if (!mrs_size) {
+                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
+                res = -EFAULT;
+                break;
+            }
+
+            trace_vhost_user_fs_slave_io_loop(mrs.mr->name,
+                                          (uint64_t)mrs.offset_within_region,
+                                          memory_region_is_ram(mrs.mr),
+                                          memory_region_is_romd(mrs.mr),
+                                          (size_t)mrs_size);
+
+            void *hostptr = qemu_map_ram_ptr(mrs.mr->ram_block,
+                                             mrs.offset_within_region);
+            ssize_t transferred;
+            if (sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) {
+                /* Read from file into RAM */
+                if (mrs.mr->readonly) {
+                    res = -EFAULT;
+                    break;
+                }
+                transferred = pread(fd, hostptr, mrs_size, sm->fd_offset[i]);
+            } else {
+                /* Write into file from RAM */
+                assert((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_W));
+                transferred = pwrite(fd, hostptr, mrs_size, sm->fd_offset[i]);
+            }
+            trace_vhost_user_fs_slave_io_loop_res(transferred);
+            if (transferred < 0) {
+                res = -errno;
+                break;
+            }
+            if (!transferred) {
+                /* EOF */
+                break;
+            }
+
+            done += transferred;
+            len -= transferred;
+        }
+    }
+    close(fd);
+
+    trace_vhost_user_fs_slave_io_exit(res, done);
+    if (res < 0) {
+        return (uint64_t)res;
+    }
+    return (uint64_t)done;
+}
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 21e40ff91a..0bc83c2714 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -137,6 +137,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_VRING_ERR = 5,
     VHOST_USER_SLAVE_FS_MAP = 6,
     VHOST_USER_SLAVE_FS_UNMAP = 7,
+    VHOST_USER_SLAVE_FS_IO = 8,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -1485,6 +1486,9 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_FS_UNMAP:
         ret = vhost_user_fs_slave_unmap(dev, &payload.fs);
         break;
+    case VHOST_USER_SLAVE_FS_IO:
+        ret = vhost_user_fs_slave_io(dev, &payload.fs, fd[0]);
+        break;
 #endif
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 25e14ab17a..ffd3165c29 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -69,5 +69,7 @@ uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
                                  int fd);
 uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
                                    VhostUserFSSlaveMsg *sm);
+uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
+                                int fd);
 
 #endif /* _QEMU_VHOST_USER_FS_H */
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index 150b1121cc..a398148ed9 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -121,6 +121,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_VRING_ERR = 5,
     VHOST_USER_SLAVE_FS_MAP = 6,
     VHOST_USER_SLAVE_FS_UNMAP = 7,
+    VHOST_USER_SLAVE_FS_IO = 8,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 17/24] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a wrapper to send VHOST_USER_SLAVE_FS_IO commands and a
further wrapper for sending a fuse_buf write using the FS_IO
slave command.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.h | 25 ++++++++++++++++++++++
 tools/virtiofsd/fuse_virtio.c   | 38 +++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 53439f5432..af928b262f 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -2012,4 +2012,29 @@ int64_t fuse_virtio_map(fuse_req_t req, VhostUserFSSlaveMsg *msg, int fd);
  */
 int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg);
 
+/**
+ * For use with virtio-fs; request IO directly to memory
+ *
+ * @param se The current session
+ * @param msg A set of IO requests
+ * @param fd The fd to map
+ * @return Length on success, negative errno on error
+ */
+int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
+                       int fd);
+
+/**
+ * For use with virtio-fs; wrapper for fuse_virtio_io for writes
+ * from memory to an fd
+ * @param req The request that triggered this action
+ * @param dst The destination (file) memory buffer
+ * @param dst_off Byte offset in the file
+ * @param src The source (memory) buffer
+ * @param src_off The GPA
+ * @param len Length in bytes
+ */
+ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
+                          size_t dst_off, const struct fuse_buf *src,
+                          size_t src_off, size_t len);
+
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index f217a093c8..8feb3c0261 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -1062,3 +1062,41 @@ int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg)
     return vu_fs_cache_request(&se->virtio_dev->dev, VHOST_USER_SLAVE_FS_UNMAP,
                                -1, msg);
 }
+
+int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
+                       int fd)
+{
+    if (!se->virtio_dev) {
+        return -ENODEV;
+    }
+    return vu_fs_cache_request(&se->virtio_dev->dev, VHOST_USER_SLAVE_FS_IO,
+                               fd, msg);
+}
+
+/*
+ * Write to a file (dst) from an area of guest GPA (src) that probably
+ * isn't visible to the daemon.
+ */
+ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
+                          size_t dst_off, const struct fuse_buf *src,
+                          size_t src_off, size_t len)
+{
+    VhostUserFSSlaveMsg msg = { 0 };
+
+    if (dst->flags & FUSE_BUF_FD_SEEK) {
+        msg.fd_offset[0] = dst->pos + dst_off;
+    } else {
+        off_t cur = lseek(dst->fd, 0, SEEK_CUR);
+        if (cur == (off_t)-1) {
+            return -errno;
+        }
+        msg.fd_offset[0] = cur;
+    }
+    msg.c_offset[0] = (uintptr_t)src->mem + src_off;
+    msg.len[0] = len;
+    msg.flags[0] = VHOST_USER_FS_FLAG_MAP_W;
+
+    int64_t result = fuse_virtio_io(req->se, &msg, dst->fd);
+    fuse_log(FUSE_LOG_DEBUG, "%s: result=%ld\n", __func__, result);
+    return result;
+}
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 17/24] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a wrapper to send VHOST_USER_SLAVE_FS_IO commands and a
further wrapper for sending a fuse_buf write using the FS_IO
slave command.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.h | 25 ++++++++++++++++++++++
 tools/virtiofsd/fuse_virtio.c   | 38 +++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 53439f5432..af928b262f 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -2012,4 +2012,29 @@ int64_t fuse_virtio_map(fuse_req_t req, VhostUserFSSlaveMsg *msg, int fd);
  */
 int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg);
 
+/**
+ * For use with virtio-fs; request IO directly to memory
+ *
+ * @param se The current session
+ * @param msg A set of IO requests
+ * @param fd The fd to map
+ * @return Length on success, negative errno on error
+ */
+int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
+                       int fd);
+
+/**
+ * For use with virtio-fs; wrapper for fuse_virtio_io for writes
+ * from memory to an fd
+ * @param req The request that triggered this action
+ * @param dst The destination (file) memory buffer
+ * @param dst_off Byte offset in the file
+ * @param src The source (memory) buffer
+ * @param src_off The GPA
+ * @param len Length in bytes
+ */
+ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
+                          size_t dst_off, const struct fuse_buf *src,
+                          size_t src_off, size_t len);
+
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index f217a093c8..8feb3c0261 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -1062,3 +1062,41 @@ int64_t fuse_virtio_unmap(struct fuse_session *se, VhostUserFSSlaveMsg *msg)
     return vu_fs_cache_request(&se->virtio_dev->dev, VHOST_USER_SLAVE_FS_UNMAP,
                                -1, msg);
 }
+
+int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
+                       int fd)
+{
+    if (!se->virtio_dev) {
+        return -ENODEV;
+    }
+    return vu_fs_cache_request(&se->virtio_dev->dev, VHOST_USER_SLAVE_FS_IO,
+                               fd, msg);
+}
+
+/*
+ * Write to a file (dst) from an area of guest GPA (src) that probably
+ * isn't visible to the daemon.
+ */
+ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
+                          size_t dst_off, const struct fuse_buf *src,
+                          size_t src_off, size_t len)
+{
+    VhostUserFSSlaveMsg msg = { 0 };
+
+    if (dst->flags & FUSE_BUF_FD_SEEK) {
+        msg.fd_offset[0] = dst->pos + dst_off;
+    } else {
+        off_t cur = lseek(dst->fd, 0, SEEK_CUR);
+        if (cur == (off_t)-1) {
+            return -errno;
+        }
+        msg.fd_offset[0] = cur;
+    }
+    msg.c_offset[0] = (uintptr_t)src->mem + src_off;
+    msg.len[0] = len;
+    msg.flags[0] = VHOST_USER_FS_FLAG_MAP_W;
+
+    int64_t result = fuse_virtio_io(req->se, &msg, dst->fd);
+    fuse_log(FUSE_LOG_DEBUG, "%s: result=%ld\n", __func__, result);
+    return result;
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

For some read/writes the virtio queue elements are unmappable by
the daemon; these are cases where the data is to be read/written
from non-RAM.  In viritofs's case this is typically a direct read/write
into an mmap'd DAX file also on virtiofs (possibly on another instance).

When we receive a virtio queue element, check that we have enough
mappable data to handle the headers.  Make a note of the number of
unmappable 'in' entries (ie. for read data back to the VMM),
and flag the fuse_bufvec for 'out' entries with a new flag
FUSE_BUF_PHYS_ADDR.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with fix by:
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/buffer.c      |   4 +-
 tools/virtiofsd/fuse_common.h |   7 ++
 tools/virtiofsd/fuse_virtio.c | 191 ++++++++++++++++++++++++----------
 3 files changed, 145 insertions(+), 57 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 874f01c488..1a050aa441 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -77,6 +77,7 @@ static ssize_t fuse_buf_write(const struct fuse_buf *dst, size_t dst_off,
     ssize_t res = 0;
     size_t copied = 0;
 
+    assert(!(src->flags & FUSE_BUF_PHYS_ADDR));
     while (len) {
         if (dst->flags & FUSE_BUF_FD_SEEK) {
             res = pwrite(dst->fd, (char *)src->mem + src_off, len,
@@ -272,7 +273,8 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv)
      * process
      */
     for (i = 0; i < srcv->count; i++) {
-        if (srcv->buf[i].flags & FUSE_BUF_IS_FD) {
+        if ((srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR) ||
+            (srcv->buf[i].flags & FUSE_BUF_IS_FD)) {
             break;
         }
     }
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index a090040bb2..ed9280de91 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -611,6 +611,13 @@ enum fuse_buf_flags {
      * detected.
      */
     FUSE_BUF_FD_RETRY = (1 << 3),
+
+    /**
+     * The addresses in the iovec represent guest physical addresses
+     * that can't be mapped by the daemon process.
+     * IO must be bounced back to the VMM to do it.
+     */
+    FUSE_BUF_PHYS_ADDR = (1 << 4),
 };
 
 /**
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 8feb3c0261..8fa438525f 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -49,6 +49,10 @@ typedef struct {
     VuVirtqElement elem;
     struct fuse_chan ch;
 
+    /* Number of unmappable iovecs */
+    unsigned bad_in_num;
+    unsigned bad_out_num;
+
     /* Used to complete requests that involve no reply */
     bool reply_sent;
 } FVRequest;
@@ -291,8 +295,10 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
 
     /* The 'in' part of the elem is to qemu */
     unsigned int in_num = elem->in_num;
+    unsigned int bad_in_num = req->bad_in_num;
     struct iovec *in_sg = elem->in_sg;
     size_t in_len = iov_size(in_sg, in_num);
+    size_t in_len_writeable = iov_size(in_sg, in_num - bad_in_num);
     fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
              __func__, elem->index, in_num, in_len);
 
@@ -300,7 +306,7 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
      * The elem should have room for a 'fuse_out_header' (out from fuse)
      * plus the data based on the len in the header.
      */
-    if (in_len < sizeof(struct fuse_out_header)) {
+    if (in_len_writeable < sizeof(struct fuse_out_header)) {
         fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
                  __func__, elem->index);
         ret = E2BIG;
@@ -327,7 +333,7 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
     memcpy(in_sg_cpy, in_sg, sizeof(struct iovec) * in_num);
     /* These get updated as we skip */
     struct iovec *in_sg_ptr = in_sg_cpy;
-    int in_sg_cpy_count = in_num;
+    int in_sg_cpy_count = in_num - bad_in_num;
 
     /* skip over parts of in_sg that contained the header iov */
     size_t skip_size = iov_len;
@@ -460,17 +466,21 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
 
     /* The 'out' part of the elem is from qemu */
     unsigned int out_num = elem->out_num;
+    unsigned int out_num_readable = out_num - req->bad_out_num;
     struct iovec *out_sg = elem->out_sg;
     size_t out_len = iov_size(out_sg, out_num);
+    size_t out_len_readable = iov_size(out_sg, out_num_readable);
     fuse_log(FUSE_LOG_DEBUG,
-             "%s: elem %d: with %d out desc of length %zd\n",
-             __func__, elem->index, out_num, out_len);
+             "%s: elem %d: with %d out desc of length %zd"
+             " bad_in_num=%u bad_out_num=%u\n",
+             __func__, elem->index, out_num, out_len, req->bad_in_num,
+             req->bad_out_num);
 
     /*
      * The elem should contain a 'fuse_in_header' (in to fuse)
      * plus the data based on the len in the header.
      */
-    if (out_len < sizeof(struct fuse_in_header)) {
+    if (out_len_readable < sizeof(struct fuse_in_header)) {
         fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for in_header\n",
                  __func__, elem->index);
         assert(0); /* TODO */
@@ -484,63 +494,129 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
     copy_from_iov(&fbuf, 1, out_sg);
 
     pbufv = NULL; /* Compiler thinks an unitialised path */
-    if (out_num > 2 &&
-        out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
-        ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
-        out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
-        /*
-         * For a write we don't actually need to copy the
-         * data, we can just do it straight out of guest memory
-         * but we must still copy the headers in case the guest
-         * was nasty and changed them while we were using them.
-         */
-        fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__);
-
-        /* copy the fuse_write_in header afte rthe fuse_in_header */
-        fbuf.mem += out_sg->iov_len;
-        copy_from_iov(&fbuf, 1, out_sg + 1);
-        fbuf.mem -= out_sg->iov_len;
-        fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
-
-        /* Allocate the bufv, with space for the rest of the iov */
-        pbufv = malloc(sizeof(struct fuse_bufvec) +
-                       sizeof(struct fuse_buf) * (out_num - 2));
-        if (!pbufv) {
-            fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
-                    __func__);
-            goto out;
-        }
+    if (req->bad_in_num || req->bad_out_num) {
+        bool handled_unmappable = false;
+
+        if (out_num > 2 && out_num_readable >= 2 && !req->bad_in_num &&
+            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
+            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
+            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
+            handled_unmappable = true;
+
+            /* copy the fuse_write_in header after fuse_in_header */
+            fbuf.mem += out_sg->iov_len;
+            copy_from_iov(&fbuf, 1, out_sg + 1);
+            fbuf.mem -= out_sg->iov_len;
+            fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
+
+            /* Allocate the bufv, with space for the rest of the iov */
+            pbufv = malloc(sizeof(struct fuse_bufvec) +
+                           sizeof(struct fuse_buf) * (out_num - 2));
+            if (!pbufv) {
+                fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
+                        __func__);
+                goto out;
+            }
 
-        allocated_bufv = true;
-        pbufv->count = 1;
-        pbufv->buf[0] = fbuf;
+            allocated_bufv = true;
+            pbufv->count = 1;
+            pbufv->buf[0] = fbuf;
+
+            size_t iovindex, pbufvindex;
+            iovindex = 2; /* 2 headers, separate iovs */
+            pbufvindex = 1; /* 2 headers, 1 fusebuf */
+
+            for (; iovindex < out_num; iovindex++, pbufvindex++) {
+                pbufv->count++;
+                pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
+                pbufv->buf[pbufvindex].flags =
+                    (iovindex < out_num_readable) ? 0 :
+                                                    FUSE_BUF_PHYS_ADDR;
+                pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
+                pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+            }
+        }
 
-        size_t iovindex, pbufvindex;
-        iovindex = 2; /* 2 headers, separate iovs */
-        pbufvindex = 1; /* 2 headers, 1 fusebuf */
+        if (out_num == 2 && out_num_readable == 2 && req->bad_in_num &&
+            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
+            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_READ &&
+            out_sg[1].iov_len == sizeof(struct fuse_read_in)) {
+            fuse_log(FUSE_LOG_DEBUG,
+                     "Unmappable read case "
+                     "in_num=%d bad_in_num=%d\n",
+                     elem->in_num, req->bad_in_num);
+            handled_unmappable = true;
+        }
 
-        for (; iovindex < out_num; iovindex++, pbufvindex++) {
-            pbufv->count++;
-            pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
-            pbufv->buf[pbufvindex].flags = 0;
-            pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
-            pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+        if (!handled_unmappable) {
+            fuse_log(FUSE_LOG_ERR,
+                     "Unhandled unmappable element: out: %d(b:%d) in: "
+                     "%d(b:%d)",
+                     out_num, req->bad_out_num, elem->in_num, req->bad_in_num);
+            fv_panic(dev, "Unhandled unmappable element");
         }
-    } else {
-        /* Normal (non fast write) path */
+    }
+
+    if (!req->bad_out_num) {
+        if (out_num > 2 &&
+            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
+            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
+            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
+            /*
+             * For a write we don't actually need to copy the
+             * data, we can just do it straight out of guest memory
+             * but we must still copy the headers in case the guest
+             * was nasty and changed them while we were using them.
+             */
+            fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n",
+                     __func__);
+
+            /* copy the fuse_write_in header after fuse_in_header */
+            fbuf.mem += out_sg->iov_len;
+            copy_from_iov(&fbuf, 1, out_sg + 1);
+            fbuf.mem -= out_sg->iov_len;
+            fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
+
+            /* Allocate the bufv, with space for the rest of the iov */
+            pbufv = malloc(sizeof(struct fuse_bufvec) +
+                           sizeof(struct fuse_buf) * (out_num - 2));
+            if (!pbufv) {
+                fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
+                        __func__);
+                goto out;
+            }
 
-        /* Copy the rest of the buffer */
-        fbuf.mem += out_sg->iov_len;
-        copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
-        fbuf.mem -= out_sg->iov_len;
-        fbuf.size = out_len;
+            allocated_bufv = true;
+            pbufv->count = 1;
+            pbufv->buf[0] = fbuf;
 
-        /* TODO! Endianness of header */
+            size_t iovindex, pbufvindex;
+            iovindex = 2; /* 2 headers, separate iovs */
+            pbufvindex = 1; /* 2 headers, 1 fusebuf */
 
-        /* TODO: Add checks for fuse_session_exited */
-        bufv.buf[0] = fbuf;
-        bufv.count = 1;
-        pbufv = &bufv;
+            for (; iovindex < out_num; iovindex++, pbufvindex++) {
+                pbufv->count++;
+                pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
+                pbufv->buf[pbufvindex].flags = 0;
+                pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
+                pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+            }
+        } else {
+            /* Normal (non fast write) path */
+
+            /* Copy the rest of the buffer */
+            fbuf.mem += out_sg->iov_len;
+            copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
+            fbuf.mem -= out_sg->iov_len;
+            fbuf.size = out_len;
+
+            /* TODO! Endianness of header */
+
+            /* TODO: Add checks for fuse_session_exited */
+            bufv.buf[0] = fbuf;
+            bufv.count = 1;
+            pbufv = &bufv;
+        }
     }
     pbufv->idx = 0;
     pbufv->off = 0;
@@ -657,13 +733,16 @@ static void *fv_queue_thread(void *opaque)
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
         while (1) {
+            unsigned int bad_in_num = 0, bad_out_num = 0;
             FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest),
-                                          NULL, NULL);
+                                          &bad_in_num, &bad_out_num);
             if (!req) {
                 break;
             }
 
             req->reply_sent = false;
+            req->bad_in_num = bad_in_num;
+            req->bad_out_num = bad_out_num;
 
             if (!se->thread_pool_size) {
                 req_list = g_list_prepend(req_list, req);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

For some read/writes the virtio queue elements are unmappable by
the daemon; these are cases where the data is to be read/written
from non-RAM.  In viritofs's case this is typically a direct read/write
into an mmap'd DAX file also on virtiofs (possibly on another instance).

When we receive a virtio queue element, check that we have enough
mappable data to handle the headers.  Make a note of the number of
unmappable 'in' entries (ie. for read data back to the VMM),
and flag the fuse_bufvec for 'out' entries with a new flag
FUSE_BUF_PHYS_ADDR.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with fix by:
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/buffer.c      |   4 +-
 tools/virtiofsd/fuse_common.h |   7 ++
 tools/virtiofsd/fuse_virtio.c | 191 ++++++++++++++++++++++++----------
 3 files changed, 145 insertions(+), 57 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 874f01c488..1a050aa441 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -77,6 +77,7 @@ static ssize_t fuse_buf_write(const struct fuse_buf *dst, size_t dst_off,
     ssize_t res = 0;
     size_t copied = 0;
 
+    assert(!(src->flags & FUSE_BUF_PHYS_ADDR));
     while (len) {
         if (dst->flags & FUSE_BUF_FD_SEEK) {
             res = pwrite(dst->fd, (char *)src->mem + src_off, len,
@@ -272,7 +273,8 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv)
      * process
      */
     for (i = 0; i < srcv->count; i++) {
-        if (srcv->buf[i].flags & FUSE_BUF_IS_FD) {
+        if ((srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR) ||
+            (srcv->buf[i].flags & FUSE_BUF_IS_FD)) {
             break;
         }
     }
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index a090040bb2..ed9280de91 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -611,6 +611,13 @@ enum fuse_buf_flags {
      * detected.
      */
     FUSE_BUF_FD_RETRY = (1 << 3),
+
+    /**
+     * The addresses in the iovec represent guest physical addresses
+     * that can't be mapped by the daemon process.
+     * IO must be bounced back to the VMM to do it.
+     */
+    FUSE_BUF_PHYS_ADDR = (1 << 4),
 };
 
 /**
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 8feb3c0261..8fa438525f 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -49,6 +49,10 @@ typedef struct {
     VuVirtqElement elem;
     struct fuse_chan ch;
 
+    /* Number of unmappable iovecs */
+    unsigned bad_in_num;
+    unsigned bad_out_num;
+
     /* Used to complete requests that involve no reply */
     bool reply_sent;
 } FVRequest;
@@ -291,8 +295,10 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
 
     /* The 'in' part of the elem is to qemu */
     unsigned int in_num = elem->in_num;
+    unsigned int bad_in_num = req->bad_in_num;
     struct iovec *in_sg = elem->in_sg;
     size_t in_len = iov_size(in_sg, in_num);
+    size_t in_len_writeable = iov_size(in_sg, in_num - bad_in_num);
     fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
              __func__, elem->index, in_num, in_len);
 
@@ -300,7 +306,7 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
      * The elem should have room for a 'fuse_out_header' (out from fuse)
      * plus the data based on the len in the header.
      */
-    if (in_len < sizeof(struct fuse_out_header)) {
+    if (in_len_writeable < sizeof(struct fuse_out_header)) {
         fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
                  __func__, elem->index);
         ret = E2BIG;
@@ -327,7 +333,7 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
     memcpy(in_sg_cpy, in_sg, sizeof(struct iovec) * in_num);
     /* These get updated as we skip */
     struct iovec *in_sg_ptr = in_sg_cpy;
-    int in_sg_cpy_count = in_num;
+    int in_sg_cpy_count = in_num - bad_in_num;
 
     /* skip over parts of in_sg that contained the header iov */
     size_t skip_size = iov_len;
@@ -460,17 +466,21 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
 
     /* The 'out' part of the elem is from qemu */
     unsigned int out_num = elem->out_num;
+    unsigned int out_num_readable = out_num - req->bad_out_num;
     struct iovec *out_sg = elem->out_sg;
     size_t out_len = iov_size(out_sg, out_num);
+    size_t out_len_readable = iov_size(out_sg, out_num_readable);
     fuse_log(FUSE_LOG_DEBUG,
-             "%s: elem %d: with %d out desc of length %zd\n",
-             __func__, elem->index, out_num, out_len);
+             "%s: elem %d: with %d out desc of length %zd"
+             " bad_in_num=%u bad_out_num=%u\n",
+             __func__, elem->index, out_num, out_len, req->bad_in_num,
+             req->bad_out_num);
 
     /*
      * The elem should contain a 'fuse_in_header' (in to fuse)
      * plus the data based on the len in the header.
      */
-    if (out_len < sizeof(struct fuse_in_header)) {
+    if (out_len_readable < sizeof(struct fuse_in_header)) {
         fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for in_header\n",
                  __func__, elem->index);
         assert(0); /* TODO */
@@ -484,63 +494,129 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
     copy_from_iov(&fbuf, 1, out_sg);
 
     pbufv = NULL; /* Compiler thinks an unitialised path */
-    if (out_num > 2 &&
-        out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
-        ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
-        out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
-        /*
-         * For a write we don't actually need to copy the
-         * data, we can just do it straight out of guest memory
-         * but we must still copy the headers in case the guest
-         * was nasty and changed them while we were using them.
-         */
-        fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__);
-
-        /* copy the fuse_write_in header afte rthe fuse_in_header */
-        fbuf.mem += out_sg->iov_len;
-        copy_from_iov(&fbuf, 1, out_sg + 1);
-        fbuf.mem -= out_sg->iov_len;
-        fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
-
-        /* Allocate the bufv, with space for the rest of the iov */
-        pbufv = malloc(sizeof(struct fuse_bufvec) +
-                       sizeof(struct fuse_buf) * (out_num - 2));
-        if (!pbufv) {
-            fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
-                    __func__);
-            goto out;
-        }
+    if (req->bad_in_num || req->bad_out_num) {
+        bool handled_unmappable = false;
+
+        if (out_num > 2 && out_num_readable >= 2 && !req->bad_in_num &&
+            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
+            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
+            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
+            handled_unmappable = true;
+
+            /* copy the fuse_write_in header after fuse_in_header */
+            fbuf.mem += out_sg->iov_len;
+            copy_from_iov(&fbuf, 1, out_sg + 1);
+            fbuf.mem -= out_sg->iov_len;
+            fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
+
+            /* Allocate the bufv, with space for the rest of the iov */
+            pbufv = malloc(sizeof(struct fuse_bufvec) +
+                           sizeof(struct fuse_buf) * (out_num - 2));
+            if (!pbufv) {
+                fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
+                        __func__);
+                goto out;
+            }
 
-        allocated_bufv = true;
-        pbufv->count = 1;
-        pbufv->buf[0] = fbuf;
+            allocated_bufv = true;
+            pbufv->count = 1;
+            pbufv->buf[0] = fbuf;
+
+            size_t iovindex, pbufvindex;
+            iovindex = 2; /* 2 headers, separate iovs */
+            pbufvindex = 1; /* 2 headers, 1 fusebuf */
+
+            for (; iovindex < out_num; iovindex++, pbufvindex++) {
+                pbufv->count++;
+                pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
+                pbufv->buf[pbufvindex].flags =
+                    (iovindex < out_num_readable) ? 0 :
+                                                    FUSE_BUF_PHYS_ADDR;
+                pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
+                pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+            }
+        }
 
-        size_t iovindex, pbufvindex;
-        iovindex = 2; /* 2 headers, separate iovs */
-        pbufvindex = 1; /* 2 headers, 1 fusebuf */
+        if (out_num == 2 && out_num_readable == 2 && req->bad_in_num &&
+            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
+            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_READ &&
+            out_sg[1].iov_len == sizeof(struct fuse_read_in)) {
+            fuse_log(FUSE_LOG_DEBUG,
+                     "Unmappable read case "
+                     "in_num=%d bad_in_num=%d\n",
+                     elem->in_num, req->bad_in_num);
+            handled_unmappable = true;
+        }
 
-        for (; iovindex < out_num; iovindex++, pbufvindex++) {
-            pbufv->count++;
-            pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
-            pbufv->buf[pbufvindex].flags = 0;
-            pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
-            pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+        if (!handled_unmappable) {
+            fuse_log(FUSE_LOG_ERR,
+                     "Unhandled unmappable element: out: %d(b:%d) in: "
+                     "%d(b:%d)",
+                     out_num, req->bad_out_num, elem->in_num, req->bad_in_num);
+            fv_panic(dev, "Unhandled unmappable element");
         }
-    } else {
-        /* Normal (non fast write) path */
+    }
+
+    if (!req->bad_out_num) {
+        if (out_num > 2 &&
+            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
+            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
+            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
+            /*
+             * For a write we don't actually need to copy the
+             * data, we can just do it straight out of guest memory
+             * but we must still copy the headers in case the guest
+             * was nasty and changed them while we were using them.
+             */
+            fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n",
+                     __func__);
+
+            /* copy the fuse_write_in header after fuse_in_header */
+            fbuf.mem += out_sg->iov_len;
+            copy_from_iov(&fbuf, 1, out_sg + 1);
+            fbuf.mem -= out_sg->iov_len;
+            fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
+
+            /* Allocate the bufv, with space for the rest of the iov */
+            pbufv = malloc(sizeof(struct fuse_bufvec) +
+                           sizeof(struct fuse_buf) * (out_num - 2));
+            if (!pbufv) {
+                fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
+                        __func__);
+                goto out;
+            }
 
-        /* Copy the rest of the buffer */
-        fbuf.mem += out_sg->iov_len;
-        copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
-        fbuf.mem -= out_sg->iov_len;
-        fbuf.size = out_len;
+            allocated_bufv = true;
+            pbufv->count = 1;
+            pbufv->buf[0] = fbuf;
 
-        /* TODO! Endianness of header */
+            size_t iovindex, pbufvindex;
+            iovindex = 2; /* 2 headers, separate iovs */
+            pbufvindex = 1; /* 2 headers, 1 fusebuf */
 
-        /* TODO: Add checks for fuse_session_exited */
-        bufv.buf[0] = fbuf;
-        bufv.count = 1;
-        pbufv = &bufv;
+            for (; iovindex < out_num; iovindex++, pbufvindex++) {
+                pbufv->count++;
+                pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
+                pbufv->buf[pbufvindex].flags = 0;
+                pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
+                pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+            }
+        } else {
+            /* Normal (non fast write) path */
+
+            /* Copy the rest of the buffer */
+            fbuf.mem += out_sg->iov_len;
+            copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
+            fbuf.mem -= out_sg->iov_len;
+            fbuf.size = out_len;
+
+            /* TODO! Endianness of header */
+
+            /* TODO: Add checks for fuse_session_exited */
+            bufv.buf[0] = fbuf;
+            bufv.count = 1;
+            pbufv = &bufv;
+        }
     }
     pbufv->idx = 0;
     pbufv->off = 0;
@@ -657,13 +733,16 @@ static void *fv_queue_thread(void *opaque)
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
         while (1) {
+            unsigned int bad_in_num = 0, bad_out_num = 0;
             FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest),
-                                          NULL, NULL);
+                                          &bad_in_num, &bad_out_num);
             if (!req) {
                 break;
             }
 
             req->reply_sent = false;
+            req->bad_in_num = bad_in_num;
+            req->bad_out_num = bad_out_num;
 
             if (!se->thread_pool_size) {
                 req_list = g_list_prepend(req_list, req);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 19/24] DAX/unmap virtiofsd: Route unmappable reads
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When a read with unmappable buffers is found, map it to a slave
read command.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 8fa438525f..316d1f2463 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -397,6 +397,37 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
         in_sg_left -= ret;
         len -= ret;
     } while (in_sg_left);
+
+    if (bad_in_num) {
+        while (len && bad_in_num) {
+            VhostUserFSSlaveMsg msg = { 0 };
+            msg.flags[0] = VHOST_USER_FS_FLAG_MAP_R;
+            msg.fd_offset[0] = buf->buf[0].pos;
+            msg.c_offset[0] = (uint64_t)(uintptr_t)in_sg_ptr[0].iov_base;
+            msg.len[0] = in_sg_ptr[0].iov_len;
+            if (len < msg.len[0]) {
+                msg.len[0] = len;
+            }
+            int64_t req_res = fuse_virtio_io(se, &msg, buf->buf[0].fd);
+            fuse_log(FUSE_LOG_DEBUG,
+                     "%s: bad loop; len=%zd bad_in_num=%d fd_offset=%zd "
+                     "c_offset=%p req_res=%ld\n",
+                     __func__, len, bad_in_num, buf->buf[0].pos,
+                     in_sg_ptr[0].iov_base, req_res);
+            if (req_res > 0) {
+                len -= msg.len[0];
+                buf->buf[0].pos += msg.len[0];
+                in_sg_ptr++;
+                bad_in_num--;
+            } else if (req_res == 0) {
+                break;
+            } else {
+                ret = req_res;
+                free(in_sg_cpy);
+                goto err;
+            }
+        }
+    }
     free(in_sg_cpy);
 
     /* Need to fix out->len on EOF */
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 19/24] DAX/unmap virtiofsd: Route unmappable reads
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When a read with unmappable buffers is found, map it to a slave
read command.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 8fa438525f..316d1f2463 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -397,6 +397,37 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
         in_sg_left -= ret;
         len -= ret;
     } while (in_sg_left);
+
+    if (bad_in_num) {
+        while (len && bad_in_num) {
+            VhostUserFSSlaveMsg msg = { 0 };
+            msg.flags[0] = VHOST_USER_FS_FLAG_MAP_R;
+            msg.fd_offset[0] = buf->buf[0].pos;
+            msg.c_offset[0] = (uint64_t)(uintptr_t)in_sg_ptr[0].iov_base;
+            msg.len[0] = in_sg_ptr[0].iov_len;
+            if (len < msg.len[0]) {
+                msg.len[0] = len;
+            }
+            int64_t req_res = fuse_virtio_io(se, &msg, buf->buf[0].fd);
+            fuse_log(FUSE_LOG_DEBUG,
+                     "%s: bad loop; len=%zd bad_in_num=%d fd_offset=%zd "
+                     "c_offset=%p req_res=%ld\n",
+                     __func__, len, bad_in_num, buf->buf[0].pos,
+                     in_sg_ptr[0].iov_base, req_res);
+            if (req_res > 0) {
+                len -= msg.len[0];
+                buf->buf[0].pos += msg.len[0];
+                in_sg_ptr++;
+                bad_in_num--;
+            } else if (req_res == 0) {
+                break;
+            } else {
+                ret = req_res;
+                free(in_sg_cpy);
+                goto err;
+            }
+        }
+    }
     free(in_sg_cpy);
 
     /* Need to fix out->len on EOF */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 20/24] DAX/unmap virtiofsd: route unmappable write to slave command
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When a fuse_buf_copy is performed on an element with FUSE_BUF_PHYS_ADDR
route it to a fuse_virtio_write request that does a slave command to
perform the write.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/buffer.c         | 14 +++++++++++---
 tools/virtiofsd/fuse_common.h    |  6 +++++-
 tools/virtiofsd/fuse_lowlevel.h  |  3 ---
 tools/virtiofsd/passthrough_ll.c |  2 +-
 4 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 1a050aa441..8135d52d2a 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -200,13 +200,20 @@ static ssize_t fuse_buf_fd_to_fd(const struct fuse_buf *dst, size_t dst_off,
     return copied;
 }
 
-static ssize_t fuse_buf_copy_one(const struct fuse_buf *dst, size_t dst_off,
+static ssize_t fuse_buf_copy_one(fuse_req_t req,
+                                 const struct fuse_buf *dst, size_t dst_off,
                                  const struct fuse_buf *src, size_t src_off,
                                  size_t len)
 {
     int src_is_fd = src->flags & FUSE_BUF_IS_FD;
     int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
+    int src_is_phys = src->flags & FUSE_BUF_PHYS_ADDR;
+    int dst_is_phys = src->flags & FUSE_BUF_PHYS_ADDR;
 
+    if (src_is_phys && !src_is_fd && dst_is_fd) {
+        return fuse_virtio_write(req, dst, dst_off, src, src_off, len);
+    }
+    assert(!src_is_phys && !dst_is_phys);
     if (!src_is_fd && !dst_is_fd) {
         char *dstmem = (char *)dst->mem + dst_off;
         char *srcmem = (char *)src->mem + src_off;
@@ -259,7 +266,8 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
     return 1;
 }
 
-ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv)
+ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv,
+                      struct fuse_bufvec *srcv)
 {
     size_t copied = 0, i;
 
@@ -301,7 +309,7 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv)
         dst_len = dst->size - dstv->off;
         len = min_size(src_len, dst_len);
 
-        res = fuse_buf_copy_one(dst, dstv->off, src, srcv->off, len);
+        res = fuse_buf_copy_one(req, dst, dstv->off, src, srcv->off, len);
         if (res < 0) {
             if (!copied) {
                 return res;
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index ed9280de91..05d56883dd 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -495,6 +495,8 @@ struct fuse_conn_info {
 struct fuse_session;
 struct fuse_pollhandle;
 struct fuse_conn_info_opts;
+struct fuse_req;
+typedef struct fuse_req *fuse_req_t;
 
 /**
  * This function parses several command-line options that can be used
@@ -713,11 +715,13 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv);
 /**
  * Copy data from one buffer vector to another
  *
+ * @param req The request this copy is part of
  * @param dst destination buffer vector
  * @param src source buffer vector
  * @return actual number of bytes copied or -errno on error
  */
-ssize_t fuse_buf_copy(struct fuse_bufvec *dst, struct fuse_bufvec *src);
+ssize_t fuse_buf_copy(fuse_req_t req,
+                      struct fuse_bufvec *dst, struct fuse_bufvec *src);
 
 /**
  * Memory buffer iterator
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index af928b262f..b36140c565 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -42,9 +42,6 @@
 /** Inode number type */
 typedef uint64_t fuse_ino_t;
 
-/** Request pointer type */
-typedef struct fuse_req *fuse_req_t;
-
 /**
  * Session
  *
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 21ddb434ae..5baf4f1d50 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2135,7 +2135,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
         }
     }
 
-    res = fuse_buf_copy(&out_buf, in_buf);
+    res = fuse_buf_copy(req, &out_buf, in_buf);
     if (res < 0) {
         fuse_reply_err(req, -res);
     } else {
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 20/24] DAX/unmap virtiofsd: route unmappable write to slave command
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When a fuse_buf_copy is performed on an element with FUSE_BUF_PHYS_ADDR
route it to a fuse_virtio_write request that does a slave command to
perform the write.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/buffer.c         | 14 +++++++++++---
 tools/virtiofsd/fuse_common.h    |  6 +++++-
 tools/virtiofsd/fuse_lowlevel.h  |  3 ---
 tools/virtiofsd/passthrough_ll.c |  2 +-
 4 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 1a050aa441..8135d52d2a 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -200,13 +200,20 @@ static ssize_t fuse_buf_fd_to_fd(const struct fuse_buf *dst, size_t dst_off,
     return copied;
 }
 
-static ssize_t fuse_buf_copy_one(const struct fuse_buf *dst, size_t dst_off,
+static ssize_t fuse_buf_copy_one(fuse_req_t req,
+                                 const struct fuse_buf *dst, size_t dst_off,
                                  const struct fuse_buf *src, size_t src_off,
                                  size_t len)
 {
     int src_is_fd = src->flags & FUSE_BUF_IS_FD;
     int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
+    int src_is_phys = src->flags & FUSE_BUF_PHYS_ADDR;
+    int dst_is_phys = src->flags & FUSE_BUF_PHYS_ADDR;
 
+    if (src_is_phys && !src_is_fd && dst_is_fd) {
+        return fuse_virtio_write(req, dst, dst_off, src, src_off, len);
+    }
+    assert(!src_is_phys && !dst_is_phys);
     if (!src_is_fd && !dst_is_fd) {
         char *dstmem = (char *)dst->mem + dst_off;
         char *srcmem = (char *)src->mem + src_off;
@@ -259,7 +266,8 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
     return 1;
 }
 
-ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv)
+ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv,
+                      struct fuse_bufvec *srcv)
 {
     size_t copied = 0, i;
 
@@ -301,7 +309,7 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv)
         dst_len = dst->size - dstv->off;
         len = min_size(src_len, dst_len);
 
-        res = fuse_buf_copy_one(dst, dstv->off, src, srcv->off, len);
+        res = fuse_buf_copy_one(req, dst, dstv->off, src, srcv->off, len);
         if (res < 0) {
             if (!copied) {
                 return res;
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index ed9280de91..05d56883dd 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -495,6 +495,8 @@ struct fuse_conn_info {
 struct fuse_session;
 struct fuse_pollhandle;
 struct fuse_conn_info_opts;
+struct fuse_req;
+typedef struct fuse_req *fuse_req_t;
 
 /**
  * This function parses several command-line options that can be used
@@ -713,11 +715,13 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv);
 /**
  * Copy data from one buffer vector to another
  *
+ * @param req The request this copy is part of
  * @param dst destination buffer vector
  * @param src source buffer vector
  * @return actual number of bytes copied or -errno on error
  */
-ssize_t fuse_buf_copy(struct fuse_bufvec *dst, struct fuse_bufvec *src);
+ssize_t fuse_buf_copy(fuse_req_t req,
+                      struct fuse_bufvec *dst, struct fuse_bufvec *src);
 
 /**
  * Memory buffer iterator
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index af928b262f..b36140c565 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -42,9 +42,6 @@
 /** Inode number type */
 typedef uint64_t fuse_ino_t;
 
-/** Request pointer type */
-typedef struct fuse_req *fuse_req_t;
-
 /**
  * Session
  *
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 21ddb434ae..5baf4f1d50 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2135,7 +2135,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
         }
     }
 
-    res = fuse_buf_copy(&out_buf, in_buf);
+    res = fuse_buf_copy(req, &out_buf, in_buf);
     if (res < 0) {
         fuse_reply_err(req, -res);
     } else {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 21/24] DAX:virtiofsd: implement FUSE_INIT map_alignment field
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Stefan Hajnoczi <stefanha@redhat.com>

Communicate the host page size to the FUSE client so that
FUSE_SETUPMAPPING/FUSE_REMOVEMAPPING requests are aware of our alignment
constraints.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 99ba000c2e..df4527acc9 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/host-utils.h"
 #include "fuse_i.h"
 #include "standard-headers/linux/fuse.h"
 #include "fuse_misc.h"
@@ -2188,6 +2189,12 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     outarg.max_background = se->conn.max_background;
     outarg.congestion_threshold = se->conn.congestion_threshold;
     outarg.time_gran = se->conn.time_gran;
+    if (arg->flags & FUSE_MAP_ALIGNMENT) {
+        outarg.flags |= FUSE_MAP_ALIGNMENT;
+
+        /* This constraint comes from mmap(2) and munmap(2) */
+        outarg.map_alignment = ctz64(sysconf(_SC_PAGE_SIZE));
+    }
 
     fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major, outarg.minor);
     fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
@@ -2197,6 +2204,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
              outarg.congestion_threshold);
     fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n", outarg.time_gran);
+    fuse_log(FUSE_LOG_DEBUG, "   map_alignment=%u\n", outarg.map_alignment);
 
     send_reply_ok(req, &outarg, outargsize);
 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 21/24] DAX:virtiofsd: implement FUSE_INIT map_alignment field
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Stefan Hajnoczi <stefanha@redhat.com>

Communicate the host page size to the FUSE client so that
FUSE_SETUPMAPPING/FUSE_REMOVEMAPPING requests are aware of our alignment
constraints.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 99ba000c2e..df4527acc9 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/host-utils.h"
 #include "fuse_i.h"
 #include "standard-headers/linux/fuse.h"
 #include "fuse_misc.h"
@@ -2188,6 +2189,12 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     outarg.max_background = se->conn.max_background;
     outarg.congestion_threshold = se->conn.congestion_threshold;
     outarg.time_gran = se->conn.time_gran;
+    if (arg->flags & FUSE_MAP_ALIGNMENT) {
+        outarg.flags |= FUSE_MAP_ALIGNMENT;
+
+        /* This constraint comes from mmap(2) and munmap(2) */
+        outarg.map_alignment = ctz64(sysconf(_SC_PAGE_SIZE));
+    }
 
     fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major, outarg.minor);
     fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
@@ -2197,6 +2204,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
              outarg.congestion_threshold);
     fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n", outarg.time_gran);
+    fuse_log(FUSE_LOG_DEBUG, "   map_alignment=%u\n", outarg.map_alignment);
 
     send_reply_ok(req, &outarg, outargsize);
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 22/24] vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Vivek Goyal <vgoyal@redhat.com>

Extend VhostUserFSSlaveMsg so that slave can ask it to drop CAP_FSETID
before doing I/O on fd.

In some cases, virtiofsd takes the onus of clearing setuid bit on a file
when WRITE happens. Generally virtiofsd does the WRITE to fd (from guest
memory which is mapped in virtiofsd as well), but if this memory is
unmappable in virtiofsd (like cache window), then virtiofsd asks qemu
to do the I/O instead.

To retain the capability to drop suid bit on write, qemu needs to
drop the CAP_FSETID as well before write to fd. Extend VhostUserFSSlaveMsg
so that virtiofsd can specify in message if CAP_FSETID needs to be
dropped.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 hw/virtio/vhost-user-fs.c                 | 5 +++++
 include/hw/virtio/vhost-user-fs.h         | 6 ++++++
 subprojects/libvhost-user/libvhost-user.h | 6 ++++++
 3 files changed, 17 insertions(+)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 357bc1d04e..61e891c82d 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -176,6 +176,11 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
         return (uint64_t)-1;
     }
 
+    if (sm->gen_flags & VHOST_USER_FS_GENFLAG_DROP_FSETID) {
+        error_report("Dropping CAP_FSETID is not supported");
+        return (uint64_t)-ENOTSUP;
+    }
+
     for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
         if (sm->len[i] == 0) {
             continue;
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index ffd3165c29..e646eb004a 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -30,7 +30,13 @@ OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS)
 #define VHOST_USER_FS_FLAG_MAP_R (1ull << 0)
 #define VHOST_USER_FS_FLAG_MAP_W (1ull << 1)
 
+/* Generic flags for the overall message and not individual ranges */
+/* Drop capability CAP_FSETID during the operation */
+#define VHOST_USER_FS_GENFLAG_DROP_FSETID (1ull << 0)
+
 typedef struct {
+    /* Generic flags for the overall message */
+    uint64_t gen_flags;
     /* Offsets within the file being mapped */
     uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
     /* Offsets within the cache */
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index a398148ed9..f7de8f6387 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -132,7 +132,13 @@ typedef enum VhostUserSlaveRequest {
 #define VHOST_USER_FS_FLAG_MAP_R (1ull << 0)
 #define VHOST_USER_FS_FLAG_MAP_W (1ull << 1)
 
+/* Generic flags for the overall message and not individual ranges */
+/* Drop capability CAP_FSETID during the operation */
+#define VHOST_USER_FS_GENFLAG_DROP_FSETID (1ull << 0)
+
 typedef struct {
+    /* Generic flags for the overall message */
+    uint64_t gen_flags;
     /* Offsets within the file being mapped */
     uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
     /* Offsets within the cache */
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 22/24] vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Vivek Goyal <vgoyal@redhat.com>

Extend VhostUserFSSlaveMsg so that slave can ask it to drop CAP_FSETID
before doing I/O on fd.

In some cases, virtiofsd takes the onus of clearing setuid bit on a file
when WRITE happens. Generally virtiofsd does the WRITE to fd (from guest
memory which is mapped in virtiofsd as well), but if this memory is
unmappable in virtiofsd (like cache window), then virtiofsd asks qemu
to do the I/O instead.

To retain the capability to drop suid bit on write, qemu needs to
drop the CAP_FSETID as well before write to fd. Extend VhostUserFSSlaveMsg
so that virtiofsd can specify in message if CAP_FSETID needs to be
dropped.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 hw/virtio/vhost-user-fs.c                 | 5 +++++
 include/hw/virtio/vhost-user-fs.h         | 6 ++++++
 subprojects/libvhost-user/libvhost-user.h | 6 ++++++
 3 files changed, 17 insertions(+)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 357bc1d04e..61e891c82d 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -176,6 +176,11 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
         return (uint64_t)-1;
     }
 
+    if (sm->gen_flags & VHOST_USER_FS_GENFLAG_DROP_FSETID) {
+        error_report("Dropping CAP_FSETID is not supported");
+        return (uint64_t)-ENOTSUP;
+    }
+
     for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
         if (sm->len[i] == 0) {
             continue;
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index ffd3165c29..e646eb004a 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -30,7 +30,13 @@ OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS)
 #define VHOST_USER_FS_FLAG_MAP_R (1ull << 0)
 #define VHOST_USER_FS_FLAG_MAP_W (1ull << 1)
 
+/* Generic flags for the overall message and not individual ranges */
+/* Drop capability CAP_FSETID during the operation */
+#define VHOST_USER_FS_GENFLAG_DROP_FSETID (1ull << 0)
+
 typedef struct {
+    /* Generic flags for the overall message */
+    uint64_t gen_flags;
     /* Offsets within the file being mapped */
     uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
     /* Offsets within the cache */
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index a398148ed9..f7de8f6387 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -132,7 +132,13 @@ typedef enum VhostUserSlaveRequest {
 #define VHOST_USER_FS_FLAG_MAP_R (1ull << 0)
 #define VHOST_USER_FS_FLAG_MAP_W (1ull << 1)
 
+/* Generic flags for the overall message and not individual ranges */
+/* Drop capability CAP_FSETID during the operation */
+#define VHOST_USER_FS_GENFLAG_DROP_FSETID (1ull << 0)
+
 typedef struct {
+    /* Generic flags for the overall message */
+    uint64_t gen_flags;
     /* Offsets within the file being mapped */
     uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
     /* Offsets within the cache */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Vivek Goyal <vgoyal@redhat.com>

As part of slave_io message, slave can ask to do I/O on an fd. Additionally
slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
Implement functionality to drop CAP_FSETID and gain it back after the
operation.

This also creates a dependency on libcap-ng.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 hw/virtio/meson.build     |  1 +
 hw/virtio/vhost-user-fs.c | 92 ++++++++++++++++++++++++++++++++++++++-
 meson.build               |  6 +++
 3 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index fbff9bc9d4..bdcdc82e13 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -18,6 +18,7 @@ virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('virtio-crypto.c'))
 virtio_ss.add(when: ['CONFIG_VIRTIO_CRYPTO', 'CONFIG_VIRTIO_PCI'], if_true: files('virtio-crypto-pci.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER_FS', if_true: files('vhost-user-fs.c'))
+virtio_ss.add(when: 'CONFIG_VHOST_USER_FS', if_true: libcap_ng)
 virtio_ss.add(when: ['CONFIG_VHOST_USER_FS', 'CONFIG_VIRTIO_PCI'], if_true: files('vhost-user-fs-pci.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_PMEM', if_true: files('virtio-pmem.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_VSOCK', if_true: files('vhost-vsock.c', 'vhost-vsock-common.c'))
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 61e891c82d..0d6ec27edd 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -13,6 +13,8 @@
 
 #include "qemu/osdep.h"
 #include <sys/ioctl.h>
+#include <cap-ng.h>
+#include <sys/syscall.h>
 #include "standard-headers/linux/virtio_fs.h"
 #include "qapi/error.h"
 #include "hw/qdev-properties.h"
@@ -36,6 +38,84 @@
 #define DAX_WINDOW_PROT PROT_NONE
 #endif
 
+/*
+ * Helpers for dropping and regaining effective capabilities. Returns 0
+ * on success, error otherwise
+ */
+static int drop_effective_cap(const char *cap_name, bool *cap_dropped)
+{
+    int cap, ret;
+
+    cap = capng_name_to_capability(cap_name);
+    if (cap < 0) {
+        ret = -errno;
+        error_report("capng_name_to_capability(%s) failed:%s", cap_name,
+                     strerror(errno));
+        goto out;
+    }
+
+    if (capng_get_caps_process()) {
+        ret = -errno;
+        error_report("capng_get_caps_process() failed:%s", strerror(errno));
+        goto out;
+    }
+
+    /* We dont have this capability in effective set already. */
+    if (!capng_have_capability(CAPNG_EFFECTIVE, cap)) {
+        ret = 0;
+        goto out;
+    }
+
+    if (capng_update(CAPNG_DROP, CAPNG_EFFECTIVE, cap)) {
+        ret = -errno;
+        error_report("capng_update(DROP,) failed");
+        goto out;
+    }
+    if (capng_apply(CAPNG_SELECT_CAPS)) {
+        ret = -errno;
+        error_report("drop:capng_apply() failed");
+        goto out;
+    }
+
+    ret = 0;
+    if (cap_dropped) {
+        *cap_dropped = true;
+    }
+
+out:
+    return ret;
+}
+
+static int gain_effective_cap(const char *cap_name)
+{
+    int cap;
+    int ret = 0;
+
+    cap = capng_name_to_capability(cap_name);
+    if (cap < 0) {
+        ret = -errno;
+        error_report("capng_name_to_capability(%s) failed:%s", cap_name,
+                     strerror(errno));
+        goto out;
+    }
+
+    if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE, cap)) {
+        ret = -errno;
+        error_report("capng_update(ADD,) failed");
+        goto out;
+    }
+
+    if (capng_apply(CAPNG_SELECT_CAPS)) {
+        ret = -errno;
+        error_report("gain:capng_apply() failed");
+        goto out;
+    }
+    ret = 0;
+
+out:
+    return ret;
+}
+
 uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
                                  int fd)
 {
@@ -170,6 +250,7 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
     unsigned int i;
     int res = 0;
     size_t done = 0;
+    bool cap_fsetid_dropped = false;
 
     if (fd < 0) {
         error_report("Bad fd for map");
@@ -177,8 +258,10 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
     }
 
     if (sm->gen_flags & VHOST_USER_FS_GENFLAG_DROP_FSETID) {
-        error_report("Dropping CAP_FSETID is not supported");
-        return (uint64_t)-ENOTSUP;
+        res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
+        if (res != 0) {
+            return (uint64_t)res;
+        }
     }
 
     for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
@@ -237,6 +320,11 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
     }
     close(fd);
 
+    if (cap_fsetid_dropped) {
+        if (gain_effective_cap("FSETID")) {
+            error_report("Failed to gain CAP_FSETID");
+        }
+    }
     trace_vhost_user_fs_slave_io_exit(res, done);
     if (res < 0) {
         return (uint64_t)res;
diff --git a/meson.build b/meson.build
index 2d8b433ff0..99a7fbacc1 100644
--- a/meson.build
+++ b/meson.build
@@ -1060,6 +1060,12 @@ elif get_option('virtfs').disabled()
   have_virtfs = false
 endif
 
+if config_host.has_key('CONFIG_VHOST_USER_FS')
+  if not libcap_ng.found()
+    error('vhost-user-fs requires libcap-ng-devel')
+  endif
+endif
+
 config_host_data.set_quoted('CONFIG_BINDIR', get_option('prefix') / get_option('bindir'))
 config_host_data.set_quoted('CONFIG_PREFIX', get_option('prefix'))
 config_host_data.set_quoted('CONFIG_QEMU_CONFDIR', get_option('prefix') / qemu_confdir)
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Vivek Goyal <vgoyal@redhat.com>

As part of slave_io message, slave can ask to do I/O on an fd. Additionally
slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
Implement functionality to drop CAP_FSETID and gain it back after the
operation.

This also creates a dependency on libcap-ng.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 hw/virtio/meson.build     |  1 +
 hw/virtio/vhost-user-fs.c | 92 ++++++++++++++++++++++++++++++++++++++-
 meson.build               |  6 +++
 3 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index fbff9bc9d4..bdcdc82e13 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -18,6 +18,7 @@ virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('virtio-crypto.c'))
 virtio_ss.add(when: ['CONFIG_VIRTIO_CRYPTO', 'CONFIG_VIRTIO_PCI'], if_true: files('virtio-crypto-pci.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER_FS', if_true: files('vhost-user-fs.c'))
+virtio_ss.add(when: 'CONFIG_VHOST_USER_FS', if_true: libcap_ng)
 virtio_ss.add(when: ['CONFIG_VHOST_USER_FS', 'CONFIG_VIRTIO_PCI'], if_true: files('vhost-user-fs-pci.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_PMEM', if_true: files('virtio-pmem.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_VSOCK', if_true: files('vhost-vsock.c', 'vhost-vsock-common.c'))
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 61e891c82d..0d6ec27edd 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -13,6 +13,8 @@
 
 #include "qemu/osdep.h"
 #include <sys/ioctl.h>
+#include <cap-ng.h>
+#include <sys/syscall.h>
 #include "standard-headers/linux/virtio_fs.h"
 #include "qapi/error.h"
 #include "hw/qdev-properties.h"
@@ -36,6 +38,84 @@
 #define DAX_WINDOW_PROT PROT_NONE
 #endif
 
+/*
+ * Helpers for dropping and regaining effective capabilities. Returns 0
+ * on success, error otherwise
+ */
+static int drop_effective_cap(const char *cap_name, bool *cap_dropped)
+{
+    int cap, ret;
+
+    cap = capng_name_to_capability(cap_name);
+    if (cap < 0) {
+        ret = -errno;
+        error_report("capng_name_to_capability(%s) failed:%s", cap_name,
+                     strerror(errno));
+        goto out;
+    }
+
+    if (capng_get_caps_process()) {
+        ret = -errno;
+        error_report("capng_get_caps_process() failed:%s", strerror(errno));
+        goto out;
+    }
+
+    /* We dont have this capability in effective set already. */
+    if (!capng_have_capability(CAPNG_EFFECTIVE, cap)) {
+        ret = 0;
+        goto out;
+    }
+
+    if (capng_update(CAPNG_DROP, CAPNG_EFFECTIVE, cap)) {
+        ret = -errno;
+        error_report("capng_update(DROP,) failed");
+        goto out;
+    }
+    if (capng_apply(CAPNG_SELECT_CAPS)) {
+        ret = -errno;
+        error_report("drop:capng_apply() failed");
+        goto out;
+    }
+
+    ret = 0;
+    if (cap_dropped) {
+        *cap_dropped = true;
+    }
+
+out:
+    return ret;
+}
+
+static int gain_effective_cap(const char *cap_name)
+{
+    int cap;
+    int ret = 0;
+
+    cap = capng_name_to_capability(cap_name);
+    if (cap < 0) {
+        ret = -errno;
+        error_report("capng_name_to_capability(%s) failed:%s", cap_name,
+                     strerror(errno));
+        goto out;
+    }
+
+    if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE, cap)) {
+        ret = -errno;
+        error_report("capng_update(ADD,) failed");
+        goto out;
+    }
+
+    if (capng_apply(CAPNG_SELECT_CAPS)) {
+        ret = -errno;
+        error_report("gain:capng_apply() failed");
+        goto out;
+    }
+    ret = 0;
+
+out:
+    return ret;
+}
+
 uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
                                  int fd)
 {
@@ -170,6 +250,7 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
     unsigned int i;
     int res = 0;
     size_t done = 0;
+    bool cap_fsetid_dropped = false;
 
     if (fd < 0) {
         error_report("Bad fd for map");
@@ -177,8 +258,10 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
     }
 
     if (sm->gen_flags & VHOST_USER_FS_GENFLAG_DROP_FSETID) {
-        error_report("Dropping CAP_FSETID is not supported");
-        return (uint64_t)-ENOTSUP;
+        res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
+        if (res != 0) {
+            return (uint64_t)res;
+        }
     }
 
     for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
@@ -237,6 +320,11 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
     }
     close(fd);
 
+    if (cap_fsetid_dropped) {
+        if (gain_effective_cap("FSETID")) {
+            error_report("Failed to gain CAP_FSETID");
+        }
+    }
     trace_vhost_user_fs_slave_io_exit(res, done);
     if (res < 0) {
         return (uint64_t)res;
diff --git a/meson.build b/meson.build
index 2d8b433ff0..99a7fbacc1 100644
--- a/meson.build
+++ b/meson.build
@@ -1060,6 +1060,12 @@ elif get_option('virtfs').disabled()
   have_virtfs = false
 endif
 
+if config_host.has_key('CONFIG_VHOST_USER_FS')
+  if not libcap_ng.found()
+    error('vhost-user-fs requires libcap-ng-devel')
+  endif
+endif
+
 config_host_data.set_quoted('CONFIG_BINDIR', get_option('prefix') / get_option('bindir'))
 config_host_data.set_quoted('CONFIG_PREFIX', get_option('prefix'))
 config_host_data.set_quoted('CONFIG_QEMU_CONFDIR', get_option('prefix') / qemu_confdir)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 24/24] virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it
  2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Vivek Goyal <vgoyal@redhat.com>

If qemu guest asked to drop CAP_FSETID upon write, send that info
to qemu in SLAVE_FS_IO message so that qemu can drop capability
before WRITE. This is to make sure that any setuid bit is killed
on fd (if there is one set).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/buffer.c         | 10 ++++++----
 tools/virtiofsd/fuse_common.h    |  6 +++++-
 tools/virtiofsd/fuse_lowlevel.h  |  6 +++++-
 tools/virtiofsd/fuse_virtio.c    |  5 ++++-
 tools/virtiofsd/passthrough_ll.c |  2 +-
 5 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 8135d52d2a..b4cda7db9a 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -203,7 +203,7 @@ static ssize_t fuse_buf_fd_to_fd(const struct fuse_buf *dst, size_t dst_off,
 static ssize_t fuse_buf_copy_one(fuse_req_t req,
                                  const struct fuse_buf *dst, size_t dst_off,
                                  const struct fuse_buf *src, size_t src_off,
-                                 size_t len)
+                                 size_t len, bool dropped_cap_fsetid)
 {
     int src_is_fd = src->flags & FUSE_BUF_IS_FD;
     int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
@@ -211,7 +211,8 @@ static ssize_t fuse_buf_copy_one(fuse_req_t req,
     int dst_is_phys = src->flags & FUSE_BUF_PHYS_ADDR;
 
     if (src_is_phys && !src_is_fd && dst_is_fd) {
-        return fuse_virtio_write(req, dst, dst_off, src, src_off, len);
+        return fuse_virtio_write(req, dst, dst_off, src, src_off, len,
+                                 dropped_cap_fsetid);
     }
     assert(!src_is_phys && !dst_is_phys);
     if (!src_is_fd && !dst_is_fd) {
@@ -267,7 +268,7 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
 }
 
 ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv,
-                      struct fuse_bufvec *srcv)
+                      struct fuse_bufvec *srcv, bool dropped_cap_fsetid)
 {
     size_t copied = 0, i;
 
@@ -309,7 +310,8 @@ ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv,
         dst_len = dst->size - dstv->off;
         len = min_size(src_len, dst_len);
 
-        res = fuse_buf_copy_one(req, dst, dstv->off, src, srcv->off, len);
+        res = fuse_buf_copy_one(req, dst, dstv->off, src, srcv->off, len,
+                                dropped_cap_fsetid);
         if (res < 0) {
             if (!copied) {
                 return res;
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 05d56883dd..8cf9a5544e 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -718,10 +718,14 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv);
  * @param req The request this copy is part of
  * @param dst destination buffer vector
  * @param src source buffer vector
+ * @param dropped_cap_fsetid Caller has dropped CAP_FSETID. If work is handed
+ *        over to a different thread/process, CAP_FSETID needs to be dropped
+ *        there as well.
  * @return actual number of bytes copied or -errno on error
  */
 ssize_t fuse_buf_copy(fuse_req_t req,
-                      struct fuse_bufvec *dst, struct fuse_bufvec *src);
+                      struct fuse_bufvec *dst, struct fuse_bufvec *src,
+                      bool dropped_cap_fsetid);
 
 /**
  * Memory buffer iterator
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index b36140c565..21e1ee24d0 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -2029,9 +2029,13 @@ int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
  * @param src The source (memory) buffer
  * @param src_off The GPA
  * @param len Length in bytes
+ * @param dropped_cap_fsetid Caller dropped CAP_FSETID. If it is being handed
+ *        over to different thread/process, CAP_FSETID needs to be dropped
+ *        before write.
  */
 ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
                           size_t dst_off, const struct fuse_buf *src,
-                          size_t src_off, size_t len);
+                          size_t src_off, size_t len,
+                          bool dropped_cap_fsetid);
 
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 316d1f2463..6cdf131bc7 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -1189,7 +1189,7 @@ int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
  */
 ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
                           size_t dst_off, const struct fuse_buf *src,
-                          size_t src_off, size_t len)
+                          size_t src_off, size_t len, bool dropped_cap_fsetid)
 {
     VhostUserFSSlaveMsg msg = { 0 };
 
@@ -1205,6 +1205,9 @@ ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
     msg.c_offset[0] = (uintptr_t)src->mem + src_off;
     msg.len[0] = len;
     msg.flags[0] = VHOST_USER_FS_FLAG_MAP_W;
+    if (dropped_cap_fsetid) {
+        msg.gen_flags |= VHOST_USER_FS_GENFLAG_DROP_FSETID;
+    }
 
     int64_t result = fuse_virtio_io(req->se, &msg, dst->fd);
     fuse_log(FUSE_LOG_DEBUG, "%s: result=%ld\n", __func__, result);
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 5baf4f1d50..8dba129785 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2135,7 +2135,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
         }
     }
 
-    res = fuse_buf_copy(req, &out_buf, in_buf);
+    res = fuse_buf_copy(req, &out_buf, in_buf, fi->kill_priv);
     if (res < 0) {
         fuse_reply_err(req, -res);
     } else {
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [Virtio-fs] [PATCH 24/24] virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it
@ 2021-02-09 19:02   ` Dr. David Alan Gilbert (git)
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-02-09 19:02 UTC (permalink / raw)
  To: qemu-devel, vgoyal, stefanha, virtio-fs, marcandre.lureau, mst

From: Vivek Goyal <vgoyal@redhat.com>

If qemu guest asked to drop CAP_FSETID upon write, send that info
to qemu in SLAVE_FS_IO message so that qemu can drop capability
before WRITE. This is to make sure that any setuid bit is killed
on fd (if there is one set).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/buffer.c         | 10 ++++++----
 tools/virtiofsd/fuse_common.h    |  6 +++++-
 tools/virtiofsd/fuse_lowlevel.h  |  6 +++++-
 tools/virtiofsd/fuse_virtio.c    |  5 ++++-
 tools/virtiofsd/passthrough_ll.c |  2 +-
 5 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 8135d52d2a..b4cda7db9a 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -203,7 +203,7 @@ static ssize_t fuse_buf_fd_to_fd(const struct fuse_buf *dst, size_t dst_off,
 static ssize_t fuse_buf_copy_one(fuse_req_t req,
                                  const struct fuse_buf *dst, size_t dst_off,
                                  const struct fuse_buf *src, size_t src_off,
-                                 size_t len)
+                                 size_t len, bool dropped_cap_fsetid)
 {
     int src_is_fd = src->flags & FUSE_BUF_IS_FD;
     int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
@@ -211,7 +211,8 @@ static ssize_t fuse_buf_copy_one(fuse_req_t req,
     int dst_is_phys = src->flags & FUSE_BUF_PHYS_ADDR;
 
     if (src_is_phys && !src_is_fd && dst_is_fd) {
-        return fuse_virtio_write(req, dst, dst_off, src, src_off, len);
+        return fuse_virtio_write(req, dst, dst_off, src, src_off, len,
+                                 dropped_cap_fsetid);
     }
     assert(!src_is_phys && !dst_is_phys);
     if (!src_is_fd && !dst_is_fd) {
@@ -267,7 +268,7 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
 }
 
 ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv,
-                      struct fuse_bufvec *srcv)
+                      struct fuse_bufvec *srcv, bool dropped_cap_fsetid)
 {
     size_t copied = 0, i;
 
@@ -309,7 +310,8 @@ ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv,
         dst_len = dst->size - dstv->off;
         len = min_size(src_len, dst_len);
 
-        res = fuse_buf_copy_one(req, dst, dstv->off, src, srcv->off, len);
+        res = fuse_buf_copy_one(req, dst, dstv->off, src, srcv->off, len,
+                                dropped_cap_fsetid);
         if (res < 0) {
             if (!copied) {
                 return res;
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 05d56883dd..8cf9a5544e 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -718,10 +718,14 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv);
  * @param req The request this copy is part of
  * @param dst destination buffer vector
  * @param src source buffer vector
+ * @param dropped_cap_fsetid Caller has dropped CAP_FSETID. If work is handed
+ *        over to a different thread/process, CAP_FSETID needs to be dropped
+ *        there as well.
  * @return actual number of bytes copied or -errno on error
  */
 ssize_t fuse_buf_copy(fuse_req_t req,
-                      struct fuse_bufvec *dst, struct fuse_bufvec *src);
+                      struct fuse_bufvec *dst, struct fuse_bufvec *src,
+                      bool dropped_cap_fsetid);
 
 /**
  * Memory buffer iterator
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index b36140c565..21e1ee24d0 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -2029,9 +2029,13 @@ int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
  * @param src The source (memory) buffer
  * @param src_off The GPA
  * @param len Length in bytes
+ * @param dropped_cap_fsetid Caller dropped CAP_FSETID. If it is being handed
+ *        over to different thread/process, CAP_FSETID needs to be dropped
+ *        before write.
  */
 ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
                           size_t dst_off, const struct fuse_buf *src,
-                          size_t src_off, size_t len);
+                          size_t src_off, size_t len,
+                          bool dropped_cap_fsetid);
 
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 316d1f2463..6cdf131bc7 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -1189,7 +1189,7 @@ int64_t fuse_virtio_io(struct fuse_session *se, VhostUserFSSlaveMsg *msg,
  */
 ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
                           size_t dst_off, const struct fuse_buf *src,
-                          size_t src_off, size_t len)
+                          size_t src_off, size_t len, bool dropped_cap_fsetid)
 {
     VhostUserFSSlaveMsg msg = { 0 };
 
@@ -1205,6 +1205,9 @@ ssize_t fuse_virtio_write(fuse_req_t req, const struct fuse_buf *dst,
     msg.c_offset[0] = (uintptr_t)src->mem + src_off;
     msg.len[0] = len;
     msg.flags[0] = VHOST_USER_FS_FLAG_MAP_W;
+    if (dropped_cap_fsetid) {
+        msg.gen_flags |= VHOST_USER_FS_GENFLAG_DROP_FSETID;
+    }
 
     int64_t result = fuse_virtio_io(req->se, &msg, dst->fd);
     fuse_log(FUSE_LOG_DEBUG, "%s: result=%ld\n", __func__, result);
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 5baf4f1d50..8dba129785 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2135,7 +2135,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
         }
     }
 
-    res = fuse_buf_copy(req, &out_buf, in_buf);
+    res = fuse_buf_copy(req, &out_buf, in_buf, fi->kill_priv);
     if (res < 0) {
         fuse_reply_err(req, -res);
     } else {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH 01/24] DAX: vhost-user: Rework slave return values
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11  9:59     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11  9:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1129 bytes --]

On Tue, Feb 09, 2021 at 07:02:01PM +0000, Dr. David Alan Gilbert (git) wrote:
> +static uint64_t vhost_user_slave_handle_vring_host_notifier(
> +                struct vhost_dev *dev,
> +               VhostUserVringArea *area,
> +               int fd)

Indentation looks off. Only worth changing if you respin.

> @@ -1398,7 +1399,8 @@ static void slave_read(void *opaque)
>      struct vhost_user *u = dev->opaque;
>      VhostUserHeader hdr = { 0, };
>      VhostUserPayload payload = { 0, };
> -    int size, ret = 0;
> +    int size;
> +    uint64_t ret = 0;
>      struct iovec iov;
>      struct msghdr msgh;
>      int fd[VHOST_USER_SLAVE_MAX_FDS];
> @@ -1472,7 +1474,7 @@ static void slave_read(void *opaque)
>          break;
>      default:
>          error_report("Received unexpected msg type: %d.", hdr.request);
> -        ret = -EINVAL;
> +        ret = (uint64_t)-EINVAL;

The !!ret was removed below so it would have previously been true (1).
Now it has changed value.

If there is no specific reason to change the value, please keep it true
(1) just in case a vhost-user device backend depends on that value.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 01/24] DAX: vhost-user: Rework slave return values
@ 2021-02-11  9:59     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11  9:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1129 bytes --]

On Tue, Feb 09, 2021 at 07:02:01PM +0000, Dr. David Alan Gilbert (git) wrote:
> +static uint64_t vhost_user_slave_handle_vring_host_notifier(
> +                struct vhost_dev *dev,
> +               VhostUserVringArea *area,
> +               int fd)

Indentation looks off. Only worth changing if you respin.

> @@ -1398,7 +1399,8 @@ static void slave_read(void *opaque)
>      struct vhost_user *u = dev->opaque;
>      VhostUserHeader hdr = { 0, };
>      VhostUserPayload payload = { 0, };
> -    int size, ret = 0;
> +    int size;
> +    uint64_t ret = 0;
>      struct iovec iov;
>      struct msghdr msgh;
>      int fd[VHOST_USER_SLAVE_MAX_FDS];
> @@ -1472,7 +1474,7 @@ static void slave_read(void *opaque)
>          break;
>      default:
>          error_report("Received unexpected msg type: %d.", hdr.request);
> -        ret = -EINVAL;
> +        ret = (uint64_t)-EINVAL;

The !!ret was removed below so it would have previously been true (1).
Now it has changed value.

If there is no specific reason to change the value, please keep it true
(1) just in case a vhost-user device backend depends on that value.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/24] DAX: libvhost-user: Route slave message payload
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 10:05     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On Tue, Feb 09, 2021 at 07:02:02PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Route the uint64 payload from message replies on the slave back up
> through vu_process_message_reply and to the callers.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  subprojects/libvhost-user/libvhost-user.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 02/24] DAX: libvhost-user: Route slave message payload
@ 2021-02-11 10:05     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On Tue, Feb 09, 2021 at 07:02:02PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Route the uint64 payload from message replies on the slave back up
> through vu_process_message_reply and to the callers.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  subprojects/libvhost-user/libvhost-user.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 03/24] DAX: libvhost-user: Allow popping a queue element with bad pointers
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 10:12     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

On Tue, Feb 09, 2021 at 07:02:03PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Allow a daemon implemented with libvhost-user to accept an
> element with pointers to memory that aren't in the mapping table.
> The daemon might have some special way to deal with some special
> cases of this.
> 
> The default behaviour doesn't change.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  block/export/vhost-user-blk-server.c      |  2 +-
>  contrib/vhost-user-blk/vhost-user-blk.c   |  3 +-
>  contrib/vhost-user-gpu/vhost-user-gpu.c   |  5 ++-
>  contrib/vhost-user-input/main.c           |  4 +-
>  contrib/vhost-user-scsi/vhost-user-scsi.c |  2 +-
>  subprojects/libvhost-user/libvhost-user.c | 51 ++++++++++++++++++-----
>  subprojects/libvhost-user/libvhost-user.h |  8 +++-
>  tests/vhost-user-bridge.c                 |  4 +-
>  tools/virtiofsd/fuse_virtio.c             |  3 +-
>  9 files changed, 60 insertions(+), 22 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 03/24] DAX: libvhost-user: Allow popping a queue element with bad pointers
@ 2021-02-11 10:12     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

On Tue, Feb 09, 2021 at 07:02:03PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Allow a daemon implemented with libvhost-user to accept an
> element with pointers to memory that aren't in the mapping table.
> The daemon might have some special way to deal with some special
> cases of this.
> 
> The default behaviour doesn't change.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  block/export/vhost-user-blk-server.c      |  2 +-
>  contrib/vhost-user-blk/vhost-user-blk.c   |  3 +-
>  contrib/vhost-user-gpu/vhost-user-gpu.c   |  5 ++-
>  contrib/vhost-user-input/main.c           |  4 +-
>  contrib/vhost-user-scsi/vhost-user-scsi.c |  2 +-
>  subprojects/libvhost-user/libvhost-user.c | 51 ++++++++++++++++++-----
>  subprojects/libvhost-user/libvhost-user.h |  8 +++-
>  tests/vhost-user-bridge.c                 |  4 +-
>  tools/virtiofsd/fuse_virtio.c             |  3 +-
>  9 files changed, 60 insertions(+), 22 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 04/24] DAX subprojects/libvhost-user: Add virtio-fs slave types
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 10:16     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

On Tue, Feb 09, 2021 at 07:02:04PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add virtio-fs definitions to libvhost-user
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  subprojects/libvhost-user/libvhost-user.c | 41 +++++++++++++++++++++++
>  subprojects/libvhost-user/libvhost-user.h | 31 +++++++++++++++++
>  2 files changed, 72 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 04/24] DAX subprojects/libvhost-user: Add virtio-fs slave types
@ 2021-02-11 10:16     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

On Tue, Feb 09, 2021 at 07:02:04PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add virtio-fs definitions to libvhost-user
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  subprojects/libvhost-user/libvhost-user.c | 41 +++++++++++++++++++++++
>  subprojects/libvhost-user/libvhost-user.h | 31 +++++++++++++++++
>  2 files changed, 72 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 05/24] DAX: virtio: Add shared memory capability
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 10:17     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]

On Tue, Feb 09, 2021 at 07:02:05PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'
> and the data structure 'virtio_pci_cap64' to go with it.
> They allow defining shared memory regions with sizes and offsets
> of 2^32 and more.
> Multiple instances of the capability are allowed and distinguished
> by the 'id' field in the base capability.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hw/virtio/virtio-pci.c | 20 ++++++++++++++++++++
>  hw/virtio/virtio-pci.h |  4 ++++
>  2 files changed, 24 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 05/24] DAX: virtio: Add shared memory capability
@ 2021-02-11 10:17     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]

On Tue, Feb 09, 2021 at 07:02:05PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'
> and the data structure 'virtio_pci_cap64' to go with it.
> They allow defining shared memory regions with sizes and offsets
> of 2^32 and more.
> Multiple instances of the capability are allowed and distinguished
> by the 'id' field in the base capability.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hw/virtio/virtio-pci.c | 20 ++++++++++++++++++++
>  hw/virtio/virtio-pci.h |  4 ++++
>  2 files changed, 24 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 06/24] DAX: virtio-fs: Add cache BAR
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 10:25     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1919 bytes --]

On Tue, Feb 09, 2021 at 07:02:06PM +0000, Dr. David Alan Gilbert (git) wrote:
> @@ -46,6 +51,26 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>      }
>  
>      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> +    cachesize = dev->vdev.conf.cache_size;
> +
> +    /*
> +     * The bar starts with the data/DAX cache
> +     * Others will be added later.
> +     */
> +    memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
> +                       "vhost-fs-pci-cachebar", cachesize);

s/vhost-fs/vhost-user-fs/ for consistency. Only worth changing if you
respin.

> +    if (cachesize) {
> +        memory_region_add_subregion(&dev->cachebar, 0, &dev->vdev.cache);
> +        virtio_pci_add_shm_cap(vpci_dev, VIRTIO_FS_PCI_CACHE_BAR, 0, cachesize,
> +                               VIRTIO_FS_SHMCAP_ID_CACHE);
> +    }
> +
> +    /* After 'realized' so the memory region exists */
> +    pci_register_bar(&vpci_dev->pci_dev, VIRTIO_FS_PCI_CACHE_BAR,
> +                     PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                     PCI_BASE_ADDRESS_MEM_PREFETCH |
> +                     PCI_BASE_ADDRESS_MEM_TYPE_64,
> +                     &dev->cachebar);

Please include a comment explainig why it's okay to use BAR 2, which is
already used for the virtio-pci modern io bar (off by default):

    /*
     * virtio pci bar layout used by default.
     * subclasses can re-arrange things if needed.
     *
     *   region 0   --  virtio legacy io bar
     *   region 1   --  msi-x bar
     *   region 2   --  virtio modern io bar (off by default)
     *   region 4+5 --  virtio modern memory (64bit) bar
     *
     */

I guess the idea is that the io bar is available since it's off by
default. What happens if the io bar is enabled?

Should this bar registration should be conditional (only when cache size
is greater than 0)?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 06/24] DAX: virtio-fs: Add cache BAR
@ 2021-02-11 10:25     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1919 bytes --]

On Tue, Feb 09, 2021 at 07:02:06PM +0000, Dr. David Alan Gilbert (git) wrote:
> @@ -46,6 +51,26 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>      }
>  
>      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> +    cachesize = dev->vdev.conf.cache_size;
> +
> +    /*
> +     * The bar starts with the data/DAX cache
> +     * Others will be added later.
> +     */
> +    memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
> +                       "vhost-fs-pci-cachebar", cachesize);

s/vhost-fs/vhost-user-fs/ for consistency. Only worth changing if you
respin.

> +    if (cachesize) {
> +        memory_region_add_subregion(&dev->cachebar, 0, &dev->vdev.cache);
> +        virtio_pci_add_shm_cap(vpci_dev, VIRTIO_FS_PCI_CACHE_BAR, 0, cachesize,
> +                               VIRTIO_FS_SHMCAP_ID_CACHE);
> +    }
> +
> +    /* After 'realized' so the memory region exists */
> +    pci_register_bar(&vpci_dev->pci_dev, VIRTIO_FS_PCI_CACHE_BAR,
> +                     PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                     PCI_BASE_ADDRESS_MEM_PREFETCH |
> +                     PCI_BASE_ADDRESS_MEM_TYPE_64,
> +                     &dev->cachebar);

Please include a comment explainig why it's okay to use BAR 2, which is
already used for the virtio-pci modern io bar (off by default):

    /*
     * virtio pci bar layout used by default.
     * subclasses can re-arrange things if needed.
     *
     *   region 0   --  virtio legacy io bar
     *   region 1   --  msi-x bar
     *   region 2   --  virtio modern io bar (off by default)
     *   region 4+5 --  virtio modern memory (64bit) bar
     *
     */

I guess the idea is that the io bar is available since it's off by
default. What happens if the io bar is enabled?

Should this bar registration should be conditional (only when cache size
is greater than 0)?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 10:32     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1600 bytes --]

On Tue, Feb 09, 2021 at 07:02:07PM +0000, Dr. David Alan Gilbert (git) wrote:
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index d6085f7045..1deedd3407 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1432,6 +1432,26 @@ Slave message types
>  
>    The state.num field is currently reserved and must be set to 0.
>  
> +``VHOST_USER_SLAVE_FS_MAP``
> +  :id: 6
> +  :equivalent ioctl: N/A
> +  :slave payload: fd + n * (offset + address + len)

I'm not sure I understand this notation. '+' means field concatenation?
Is 'fd' a field or does it indicate file descriptor passing?

I suggest using a struct name instead of informal notation so that the
payload size and representation is clear.

The same applies for VHOST_USER_SLAVE_FS_UNMAP.

> +  :master payload: N/A
> +
> +  Requests that the QEMU mmap the given fd into the virtio-fs cache;

s/QEMU mmap the given fd/given fd be mmapped/

Please avoid mentioning QEMU specifically. Any VMM should be able to
implement this spec.

The same applies for VHOST_USER_SLAVE_FS_UNMAP.

> +  multiple chunks can be mapped in one command.
> +  A reply is generated indicating whether mapping succeeded.
> +
> +``VHOST_USER_SLAVE_FS_UNMAP``
> +  :id: 7
> +  :equivalent ioctl: N/A
> +  :slave payload: n * (address + len)
> +  :master payload: N/A
> +
> +  Requests that the QEMU un-mmap the given range in the virtio-fs cache;
> +  multiple chunks can be unmapped in one command.
> +  A reply is generated indicating whether unmapping succeeded.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
@ 2021-02-11 10:32     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1600 bytes --]

On Tue, Feb 09, 2021 at 07:02:07PM +0000, Dr. David Alan Gilbert (git) wrote:
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index d6085f7045..1deedd3407 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1432,6 +1432,26 @@ Slave message types
>  
>    The state.num field is currently reserved and must be set to 0.
>  
> +``VHOST_USER_SLAVE_FS_MAP``
> +  :id: 6
> +  :equivalent ioctl: N/A
> +  :slave payload: fd + n * (offset + address + len)

I'm not sure I understand this notation. '+' means field concatenation?
Is 'fd' a field or does it indicate file descriptor passing?

I suggest using a struct name instead of informal notation so that the
payload size and representation is clear.

The same applies for VHOST_USER_SLAVE_FS_UNMAP.

> +  :master payload: N/A
> +
> +  Requests that the QEMU mmap the given fd into the virtio-fs cache;

s/QEMU mmap the given fd/given fd be mmapped/

Please avoid mentioning QEMU specifically. Any VMM should be able to
implement this spec.

The same applies for VHOST_USER_SLAVE_FS_UNMAP.

> +  multiple chunks can be mapped in one command.
> +  A reply is generated indicating whether mapping succeeded.
> +
> +``VHOST_USER_SLAVE_FS_UNMAP``
> +  :id: 7
> +  :equivalent ioctl: N/A
> +  :slave payload: n * (address + len)
> +  :master payload: N/A
> +
> +  Requests that the QEMU un-mmap the given range in the virtio-fs cache;
> +  multiple chunks can be unmapped in one command.
> +  A reply is generated indicating whether unmapping succeeded.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 08/24] DAX: virtio-fs: Fill in slave commands for mapping
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 10:57     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 2023 bytes --]

On Tue, Feb 09, 2021 at 07:02:08PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Fill in definitions for map, unmap and sync commands.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> with fix by misono.tomohiro@fujitsu.com
> ---
>  hw/virtio/vhost-user-fs.c | 115 ++++++++++++++++++++++++++++++++++++--
>  1 file changed, 111 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 78401d2ff1..5f2fca4d82 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -37,15 +37,122 @@
>  uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
>                                   int fd)
>  {
> -    /* TODO */
> -    return (uint64_t)-1;
> +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> +    if (!fs) {
> +        /* Shouldn't happen - but seen on error path */
> +        error_report("Bad fs ptr");
> +        return (uint64_t)-1;
> +    }

If a non-vhost-user-fs vhost-user device backend sends this message
VHOST_USER_FS() -> object_dynamic_cast_assert() there will either be an
assertion failure (CONFIG_QOM_CAST_DEBUG) or the pointer will be
silently cast to the wrong type (!CONFIG_QOM_CAST_DEBUG).

Both of these outcomes are not suitable for input validation. We need to
fail cleanly here:

  VhostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
                                                       TYPE_VHOST_USER_FS);
  if (!fs) {
      ...handle failure...
  }

>  uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
>                                     VhostUserFSSlaveMsg *sm)
>  {
> -    /* TODO */
> -    return (uint64_t)-1;
> +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> +    if (!fs) {
> +        /* Shouldn't happen - but seen on error path */
> +        error_report("Bad fs ptr");
> +        return (uint64_t)-1;
> +    }

Same here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 08/24] DAX: virtio-fs: Fill in slave commands for mapping
@ 2021-02-11 10:57     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 10:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 2023 bytes --]

On Tue, Feb 09, 2021 at 07:02:08PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Fill in definitions for map, unmap and sync commands.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> with fix by misono.tomohiro@fujitsu.com
> ---
>  hw/virtio/vhost-user-fs.c | 115 ++++++++++++++++++++++++++++++++++++--
>  1 file changed, 111 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 78401d2ff1..5f2fca4d82 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -37,15 +37,122 @@
>  uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
>                                   int fd)
>  {
> -    /* TODO */
> -    return (uint64_t)-1;
> +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> +    if (!fs) {
> +        /* Shouldn't happen - but seen on error path */
> +        error_report("Bad fs ptr");
> +        return (uint64_t)-1;
> +    }

If a non-vhost-user-fs vhost-user device backend sends this message
VHOST_USER_FS() -> object_dynamic_cast_assert() there will either be an
assertion failure (CONFIG_QOM_CAST_DEBUG) or the pointer will be
silently cast to the wrong type (!CONFIG_QOM_CAST_DEBUG).

Both of these outcomes are not suitable for input validation. We need to
fail cleanly here:

  VhostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
                                                       TYPE_VHOST_USER_FS);
  if (!fs) {
      ...handle failure...
  }

>  uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
>                                     VhostUserFSSlaveMsg *sm)
>  {
> -    /* TODO */
> -    return (uint64_t)-1;
> +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> +    if (!fs) {
> +        /* Shouldn't happen - but seen on error path */
> +        error_report("Bad fs ptr");
> +        return (uint64_t)-1;
> +    }

Same here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 09/24] DAX: virtiofsd Add cache accessor functions
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 12:31     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 509 bytes --]

On Tue, Feb 09, 2021 at 07:02:09PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add low level functions that the clients can use to map/unmap cache
> areas.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.h | 21 +++++++++++++++++++++
>  tools/virtiofsd/fuse_virtio.c   | 18 ++++++++++++++++++
>  2 files changed, 39 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 09/24] DAX: virtiofsd Add cache accessor functions
@ 2021-02-11 12:31     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 509 bytes --]

On Tue, Feb 09, 2021 at 07:02:09PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add low level functions that the clients can use to map/unmap cache
> areas.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.h | 21 +++++++++++++++++++++
>  tools/virtiofsd/fuse_virtio.c   | 18 ++++++++++++++++++
>  2 files changed, 39 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 12:37     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> +                             struct fuse_mbuf_iter *iter)
> +{
> +    struct fuse_removemapping_in *arg;
> +    struct fuse_removemapping_one *one;
> +
> +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> +    if (!arg || arg->count <= 0) {

arg->count is unsigned so < is tautologous.

> +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> +        fuse_reply_err(req, EINVAL);
> +        return;
> +    }
> +
> +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));

arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
think we should be more defensive here since this input comes from the
guest.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
@ 2021-02-11 12:37     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> +                             struct fuse_mbuf_iter *iter)
> +{
> +    struct fuse_removemapping_in *arg;
> +    struct fuse_removemapping_one *one;
> +
> +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> +    if (!arg || arg->count <= 0) {

arg->count is unsigned so < is tautologous.

> +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> +        fuse_reply_err(req, EINVAL);
> +        return;
> +    }
> +
> +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));

arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
think we should be more defensive here since this input comes from the
guest.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 11/24] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 12:37     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 361 bytes --]

On Tue, Feb 09, 2021 at 07:02:11PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 11/24] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll
@ 2021-02-11 12:37     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 361 bytes --]

On Tue, Feb 09, 2021 at 07:02:11PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 12/24] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 12:41     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]

On Tue, Feb 09, 2021 at 07:02:12PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Wire up passthrough_ll's setupmapping to allocate, send to virtio
> and then reply OK.
> 
> Guest might not pass file pointer. In that case using inode info, open
> the file again, mmap() and close fd.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> With fix from:
> Signed-off-by: Fotis Xenakis <foxen@windowslive.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 13 ++++++--
>  tools/virtiofsd/passthrough_ll.c | 52 ++++++++++++++++++++++++++++++--
>  2 files changed, 61 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 12/24] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
@ 2021-02-11 12:41     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]

On Tue, Feb 09, 2021 at 07:02:12PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Wire up passthrough_ll's setupmapping to allocate, send to virtio
> and then reply OK.
> 
> Guest might not pass file pointer. In that case using inode info, open
> the file again, mmap() and close fd.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> With fix from:
> Signed-off-by: Fotis Xenakis <foxen@windowslive.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 13 ++++++--
>  tools/virtiofsd/passthrough_ll.c | 52 ++++++++++++++++++++++++++++++--
>  2 files changed, 61 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 13/24] DAX: virtiofsd: Make lo_removemapping() work
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 12:41     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

On Tue, Feb 09, 2021 at 07:02:13PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> 
> Let guest pass in the offset in dax window a mapping is currently
> mapped at and needs to be removed.
> 
> Vivek added the initial support to remove single mapping and later Peng
> added patch to support removing multiple mappings in single command.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 13/24] DAX: virtiofsd: Make lo_removemapping() work
@ 2021-02-11 12:41     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

On Tue, Feb 09, 2021 at 07:02:13PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> 
> Let guest pass in the offset in dax window a mapping is currently
> mapped at and needs to be removed.
> 
> Vivek added the initial support to remove single mapping and later Peng
> added patch to support removing multiple mappings in single command.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 14/24] DAX: virtiofsd: route se down to destroy method
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 12:42     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 572 bytes --]

On Tue, Feb 09, 2021 at 07:02:14PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> We're going to need to pass the session down to destroy so that it can
> pass it back to do the remove mapping.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 6 +++---
>  tools/virtiofsd/fuse_lowlevel.h  | 2 +-
>  tools/virtiofsd/passthrough_ll.c | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 14/24] DAX: virtiofsd: route se down to destroy method
@ 2021-02-11 12:42     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 572 bytes --]

On Tue, Feb 09, 2021 at 07:02:14PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> We're going to need to pass the session down to destroy so that it can
> pass it back to do the remove mapping.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 6 +++---
>  tools/virtiofsd/fuse_lowlevel.h  | 2 +-
>  tools/virtiofsd/passthrough_ll.c | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 15/24] DAX: virtiofsd: Perform an unmap on destroy
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 12:42     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

On Tue, Feb 09, 2021 at 07:02:15PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Force unmap all remaining dax cache entries on a destroy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 15/24] DAX: virtiofsd: Perform an unmap on destroy
@ 2021-02-11 12:42     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 12:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

On Tue, Feb 09, 2021 at 07:02:15PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Force unmap all remaining dax cache entries on a destroy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 16/24] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 14:17     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 14:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 7116 bytes --]

On Tue, Feb 09, 2021 at 07:02:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
> client to ask qemu to perform a read/write from an fd directly
> to GPA.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  docs/interop/vhost-user.rst               | 11 +++
>  hw/virtio/trace-events                    |  6 ++
>  hw/virtio/vhost-user-fs.c                 | 84 +++++++++++++++++++++++
>  hw/virtio/vhost-user.c                    |  4 ++
>  include/hw/virtio/vhost-user-fs.h         |  2 +
>  subprojects/libvhost-user/libvhost-user.h |  1 +
>  6 files changed, 108 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 1deedd3407..821712f4a2 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1452,6 +1452,17 @@ Slave message types
>    multiple chunks can be unmapped in one command.
>    A reply is generated indicating whether unmapping succeeded.
>  
> +``VHOST_USER_SLAVE_FS_IO``
> +  :id: 9
> +  :equivalent ioctl: N/A
> +  :slave payload: fd + n * (offset + address + len)

Please clarify the payload representation. This is not enough for
someone to implement the spec.

> +  :master payload: N/A
> +
> +  Requests that the QEMU performs IO directly from an fd to guest memory

To avoid naming a particular VMM:

s/the QEMU performs IO/IO be performed/

> +  on behalf of the daemon; this is normally for a case where a memory region
> +  isn't visible to the daemon. slave payload has flags which determine
> +  the direction of IO operation.

Please document the payload flags in the spec.

> +
>  .. _reply_ack:
>  
>  VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index c62727f879..20557a078e 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
>  vhost_vdpa_set_owner(void *dev) "dev: %p"
>  vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
>  
> +# vhost-user-fs.c
> +
> +vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
> +vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
> +vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
> +
>  # virtio.c
>  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
>  virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 5f2fca4d82..357bc1d04e 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -23,6 +23,8 @@
>  #include "hw/virtio/vhost-user-fs.h"
>  #include "monitor/monitor.h"
>  #include "sysemu/sysemu.h"
> +#include "exec/address-spaces.h"
> +#include "trace.h"
>  
>  /*
>   * The powerpc kernel code expects the memory to be accessible during
> @@ -155,6 +157,88 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
>      return (uint64_t)res;
>  }
>  
> +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
> +                                int fd)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> +    if (!fs) {
> +        /* Shouldn't happen - but seen it in error paths */
> +        error_report("Bad fs ptr");
> +        return (uint64_t)-1;
> +    }

Same pointer casting issue as with map/unmap.

> +
> +    unsigned int i;
> +    int res = 0;
> +    size_t done = 0;
> +
> +    if (fd < 0) {
> +        error_report("Bad fd for map");
> +        return (uint64_t)-1;
> +    }
> +
> +    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
> +        if (sm->len[i] == 0) {
> +            continue;
> +        }
> +
> +        size_t len = sm->len[i];
> +        hwaddr gpa = sm->c_offset[i];
> +
> +        while (len && !res) {
> +            MemoryRegionSection mrs = memory_region_find(get_system_memory(),
> +                                                         gpa, len);
> +            size_t mrs_size = (size_t)int128_get64(mrs.size);

If there is a vIOMMU then the vhost-user device backend should be
restricted to just areas of guest RAM that are mapped. I think this can
be achieved by using the vhost-user-fs device's address space instead of
get_system_memory(). For example, virtio_pci_get_dma_as().

> +
> +            if (!mrs_size) {
> +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> +                res = -EFAULT;
> +                break;
> +            }
> +
> +            trace_vhost_user_fs_slave_io_loop(mrs.mr->name,
> +                                          (uint64_t)mrs.offset_within_region,
> +                                          memory_region_is_ram(mrs.mr),
> +                                          memory_region_is_romd(mrs.mr),
> +                                          (size_t)mrs_size);
> +
> +            void *hostptr = qemu_map_ram_ptr(mrs.mr->ram_block,
> +                                             mrs.offset_within_region);
> +            ssize_t transferred;
> +            if (sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) {

The flag name is specific to map requests but it's shared with the IO
request. Perhaps rename the flags?

> +                /* Read from file into RAM */
> +                if (mrs.mr->readonly) {
> +                    res = -EFAULT;
> +                    break;
> +                }
> +                transferred = pread(fd, hostptr, mrs_size, sm->fd_offset[i]);
> +            } else {
> +                /* Write into file from RAM */
> +                assert((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_W));

The vhost-user device backend must not be able to crash the VMM. Please
use an if statement and fail the request if the flags are invalid
instead of assert().

> +                transferred = pwrite(fd, hostptr, mrs_size, sm->fd_offset[i]);
> +            }
> +            trace_vhost_user_fs_slave_io_loop_res(transferred);
> +            if (transferred < 0) {
> +                res = -errno;
> +                break;
> +            }
> +            if (!transferred) {
> +                /* EOF */
> +                break;
> +            }
> +
> +            done += transferred;
> +            len -= transferred;

Is gpa += transferred missing so that this loop can handle crossing
MemoryRegion boundaries?

sm->fd_offset[i] also needs to be put into a local variable and
incremented by transferred each time around the loop.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 16/24] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
@ 2021-02-11 14:17     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 14:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 7116 bytes --]

On Tue, Feb 09, 2021 at 07:02:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
> client to ask qemu to perform a read/write from an fd directly
> to GPA.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  docs/interop/vhost-user.rst               | 11 +++
>  hw/virtio/trace-events                    |  6 ++
>  hw/virtio/vhost-user-fs.c                 | 84 +++++++++++++++++++++++
>  hw/virtio/vhost-user.c                    |  4 ++
>  include/hw/virtio/vhost-user-fs.h         |  2 +
>  subprojects/libvhost-user/libvhost-user.h |  1 +
>  6 files changed, 108 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 1deedd3407..821712f4a2 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1452,6 +1452,17 @@ Slave message types
>    multiple chunks can be unmapped in one command.
>    A reply is generated indicating whether unmapping succeeded.
>  
> +``VHOST_USER_SLAVE_FS_IO``
> +  :id: 9
> +  :equivalent ioctl: N/A
> +  :slave payload: fd + n * (offset + address + len)

Please clarify the payload representation. This is not enough for
someone to implement the spec.

> +  :master payload: N/A
> +
> +  Requests that the QEMU performs IO directly from an fd to guest memory

To avoid naming a particular VMM:

s/the QEMU performs IO/IO be performed/

> +  on behalf of the daemon; this is normally for a case where a memory region
> +  isn't visible to the daemon. slave payload has flags which determine
> +  the direction of IO operation.

Please document the payload flags in the spec.

> +
>  .. _reply_ack:
>  
>  VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index c62727f879..20557a078e 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
>  vhost_vdpa_set_owner(void *dev) "dev: %p"
>  vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
>  
> +# vhost-user-fs.c
> +
> +vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
> +vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
> +vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
> +
>  # virtio.c
>  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
>  virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 5f2fca4d82..357bc1d04e 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -23,6 +23,8 @@
>  #include "hw/virtio/vhost-user-fs.h"
>  #include "monitor/monitor.h"
>  #include "sysemu/sysemu.h"
> +#include "exec/address-spaces.h"
> +#include "trace.h"
>  
>  /*
>   * The powerpc kernel code expects the memory to be accessible during
> @@ -155,6 +157,88 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
>      return (uint64_t)res;
>  }
>  
> +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
> +                                int fd)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> +    if (!fs) {
> +        /* Shouldn't happen - but seen it in error paths */
> +        error_report("Bad fs ptr");
> +        return (uint64_t)-1;
> +    }

Same pointer casting issue as with map/unmap.

> +
> +    unsigned int i;
> +    int res = 0;
> +    size_t done = 0;
> +
> +    if (fd < 0) {
> +        error_report("Bad fd for map");
> +        return (uint64_t)-1;
> +    }
> +
> +    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
> +        if (sm->len[i] == 0) {
> +            continue;
> +        }
> +
> +        size_t len = sm->len[i];
> +        hwaddr gpa = sm->c_offset[i];
> +
> +        while (len && !res) {
> +            MemoryRegionSection mrs = memory_region_find(get_system_memory(),
> +                                                         gpa, len);
> +            size_t mrs_size = (size_t)int128_get64(mrs.size);

If there is a vIOMMU then the vhost-user device backend should be
restricted to just areas of guest RAM that are mapped. I think this can
be achieved by using the vhost-user-fs device's address space instead of
get_system_memory(). For example, virtio_pci_get_dma_as().

> +
> +            if (!mrs_size) {
> +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> +                res = -EFAULT;
> +                break;
> +            }
> +
> +            trace_vhost_user_fs_slave_io_loop(mrs.mr->name,
> +                                          (uint64_t)mrs.offset_within_region,
> +                                          memory_region_is_ram(mrs.mr),
> +                                          memory_region_is_romd(mrs.mr),
> +                                          (size_t)mrs_size);
> +
> +            void *hostptr = qemu_map_ram_ptr(mrs.mr->ram_block,
> +                                             mrs.offset_within_region);
> +            ssize_t transferred;
> +            if (sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) {

The flag name is specific to map requests but it's shared with the IO
request. Perhaps rename the flags?

> +                /* Read from file into RAM */
> +                if (mrs.mr->readonly) {
> +                    res = -EFAULT;
> +                    break;
> +                }
> +                transferred = pread(fd, hostptr, mrs_size, sm->fd_offset[i]);
> +            } else {
> +                /* Write into file from RAM */
> +                assert((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_W));

The vhost-user device backend must not be able to crash the VMM. Please
use an if statement and fail the request if the flags are invalid
instead of assert().

> +                transferred = pwrite(fd, hostptr, mrs_size, sm->fd_offset[i]);
> +            }
> +            trace_vhost_user_fs_slave_io_loop_res(transferred);
> +            if (transferred < 0) {
> +                res = -errno;
> +                break;
> +            }
> +            if (!transferred) {
> +                /* EOF */
> +                break;
> +            }
> +
> +            done += transferred;
> +            len -= transferred;

Is gpa += transferred missing so that this loop can handle crossing
MemoryRegion boundaries?

sm->fd_offset[i] also needs to be put into a local variable and
incremented by transferred each time around the loop.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 17/24] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 14:18     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 14:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 589 bytes --]

On Tue, Feb 09, 2021 at 07:02:17PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add a wrapper to send VHOST_USER_SLAVE_FS_IO commands and a
> further wrapper for sending a fuse_buf write using the FS_IO
> slave command.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.h | 25 ++++++++++++++++++++++
>  tools/virtiofsd/fuse_virtio.c   | 38 +++++++++++++++++++++++++++++++++
>  2 files changed, 63 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 17/24] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO
@ 2021-02-11 14:18     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 14:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 589 bytes --]

On Tue, Feb 09, 2021 at 07:02:17PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add a wrapper to send VHOST_USER_SLAVE_FS_IO commands and a
> further wrapper for sending a fuse_buf write using the FS_IO
> slave command.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.h | 25 ++++++++++++++++++++++
>  tools/virtiofsd/fuse_virtio.c   | 38 +++++++++++++++++++++++++++++++++
>  2 files changed, 63 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 14:29     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 14:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 3255 bytes --]

On Tue, Feb 09, 2021 at 07:02:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> For some read/writes the virtio queue elements are unmappable by
> the daemon; these are cases where the data is to be read/written
> from non-RAM.  In viritofs's case this is typically a direct read/write
> into an mmap'd DAX file also on virtiofs (possibly on another instance).
> 
> When we receive a virtio queue element, check that we have enough
> mappable data to handle the headers.  Make a note of the number of
> unmappable 'in' entries (ie. for read data back to the VMM),
> and flag the fuse_bufvec for 'out' entries with a new flag
> FUSE_BUF_PHYS_ADDR.

Looking back at this I think vhost-user will need generic
READ_MEMORY/WRITE_MEMORY commands. It's okay for virtio-fs to have its
own IO command (although not strictly necessary).

With generic READ_MEMORY/WRITE_MEMORY libvhost-user and other vhost-user
device backend implementations can handle vring descriptors that point
into the DAX window. This can be done transparently so individual device
implementations (net, blk, etc) don't even know when memory is copied vs
zero-copy shared memory access.

So this approach is okay for virtio-fs but it's not a long-term solution
for all of vhost-user. Eventually the long-term solution may be needed
so that other VIRTIO devices that have shared memory resources work.

Another bonus of READ_MEMORY/WRITE_MEMORY is that users that prefer an
enforcing vIOMMU can disable shared memory (maybe just keep the vring
itself mmapped).

I just wanted to share this idea but don't expect it to be addressed in
this patch series.

> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> index a090040bb2..ed9280de91 100644
> --- a/tools/virtiofsd/fuse_common.h
> +++ b/tools/virtiofsd/fuse_common.h
> @@ -611,6 +611,13 @@ enum fuse_buf_flags {
>       * detected.
>       */
>      FUSE_BUF_FD_RETRY = (1 << 3),
> +
> +    /**
> +     * The addresses in the iovec represent guest physical addresses
> +     * that can't be mapped by the daemon process.
> +     * IO must be bounced back to the VMM to do it.
> +     */
> +    FUSE_BUF_PHYS_ADDR = (1 << 4),

With a vIOMMU it's an IOVA. Without a vIOMMU it's a GPA. This constant
may need to be renamed in the future, but it is okay for now.

> +    if (req->bad_in_num || req->bad_out_num) {
> +        bool handled_unmappable = false;
> +
> +        if (out_num > 2 && out_num_readable >= 2 && !req->bad_in_num &&
> +            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
> +            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
> +            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {

This violates the VIRTIO specification:

  2.6.4.1 Device Requirements: Message Framing

  The device MUST NOT make assumptions about the particular arrangement of descriptors.

  https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-280004

The driver is not obligated to submit separate iovecs. out_num == 1 is
valid and the device needs to process it byte-wise instead of making
assumptions about iovec layout.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
@ 2021-02-11 14:29     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 14:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 3255 bytes --]

On Tue, Feb 09, 2021 at 07:02:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> For some read/writes the virtio queue elements are unmappable by
> the daemon; these are cases where the data is to be read/written
> from non-RAM.  In viritofs's case this is typically a direct read/write
> into an mmap'd DAX file also on virtiofs (possibly on another instance).
> 
> When we receive a virtio queue element, check that we have enough
> mappable data to handle the headers.  Make a note of the number of
> unmappable 'in' entries (ie. for read data back to the VMM),
> and flag the fuse_bufvec for 'out' entries with a new flag
> FUSE_BUF_PHYS_ADDR.

Looking back at this I think vhost-user will need generic
READ_MEMORY/WRITE_MEMORY commands. It's okay for virtio-fs to have its
own IO command (although not strictly necessary).

With generic READ_MEMORY/WRITE_MEMORY libvhost-user and other vhost-user
device backend implementations can handle vring descriptors that point
into the DAX window. This can be done transparently so individual device
implementations (net, blk, etc) don't even know when memory is copied vs
zero-copy shared memory access.

So this approach is okay for virtio-fs but it's not a long-term solution
for all of vhost-user. Eventually the long-term solution may be needed
so that other VIRTIO devices that have shared memory resources work.

Another bonus of READ_MEMORY/WRITE_MEMORY is that users that prefer an
enforcing vIOMMU can disable shared memory (maybe just keep the vring
itself mmapped).

I just wanted to share this idea but don't expect it to be addressed in
this patch series.

> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> index a090040bb2..ed9280de91 100644
> --- a/tools/virtiofsd/fuse_common.h
> +++ b/tools/virtiofsd/fuse_common.h
> @@ -611,6 +611,13 @@ enum fuse_buf_flags {
>       * detected.
>       */
>      FUSE_BUF_FD_RETRY = (1 << 3),
> +
> +    /**
> +     * The addresses in the iovec represent guest physical addresses
> +     * that can't be mapped by the daemon process.
> +     * IO must be bounced back to the VMM to do it.
> +     */
> +    FUSE_BUF_PHYS_ADDR = (1 << 4),

With a vIOMMU it's an IOVA. Without a vIOMMU it's a GPA. This constant
may need to be renamed in the future, but it is okay for now.

> +    if (req->bad_in_num || req->bad_out_num) {
> +        bool handled_unmappable = false;
> +
> +        if (out_num > 2 && out_num_readable >= 2 && !req->bad_in_num &&
> +            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
> +            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
> +            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {

This violates the VIRTIO specification:

  2.6.4.1 Device Requirements: Message Framing

  The device MUST NOT make assumptions about the particular arrangement of descriptors.

  https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-280004

The driver is not obligated to submit separate iovecs. out_num == 1 is
valid and the device needs to process it byte-wise instead of making
assumptions about iovec layout.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 14:35     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 14:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> 
> As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> Implement functionality to drop CAP_FSETID and gain it back after the
> operation.
> 
> This also creates a dependency on libcap-ng.

Is this patch only for the case where QEMU is running as root?

I'm not sure it will have any effect on a regular QEMU (e.g. launched by
libvirt).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
@ 2021-02-11 14:35     ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-11 14:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> 
> As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> Implement functionality to drop CAP_FSETID and gain it back after the
> operation.
> 
> This also creates a dependency on libcap-ng.

Is this patch only for the case where QEMU is running as root?

I'm not sure it will have any effect on a regular QEMU (e.g. launched by
libvirt).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
  2021-02-11 14:35     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-02-11 14:40       ` Vivek Goyal
  -1 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 14:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-fs, marcandre.lureau, mst, Dr. David Alan Gilbert (git),
	qemu-devel

On Thu, Feb 11, 2021 at 02:35:42PM +0000, Stefan Hajnoczi wrote:
> On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Vivek Goyal <vgoyal@redhat.com>
> > 
> > As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> > slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> > Implement functionality to drop CAP_FSETID and gain it back after the
> > operation.
> > 
> > This also creates a dependency on libcap-ng.
> 
> Is this patch only for the case where QEMU is running as root?
> 

Yes, it primarily is for the case where qemu is running as root, or
somebody managed to launch it non-root but with still having capability
CAP_FSETID.

Vivek

> I'm not sure it will have any effect on a regular QEMU (e.g. launched by
> libvirt).





^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
@ 2021-02-11 14:40       ` Vivek Goyal
  0 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 14:40 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, mst, qemu-devel

On Thu, Feb 11, 2021 at 02:35:42PM +0000, Stefan Hajnoczi wrote:
> On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Vivek Goyal <vgoyal@redhat.com>
> > 
> > As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> > slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> > Implement functionality to drop CAP_FSETID and gain it back after the
> > operation.
> > 
> > This also creates a dependency on libcap-ng.
> 
> Is this patch only for the case where QEMU is running as root?
> 

Yes, it primarily is for the case where qemu is running as root, or
somebody managed to launch it non-root but with still having capability
CAP_FSETID.

Vivek

> I'm not sure it will have any effect on a regular QEMU (e.g. launched by
> libvirt).




^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 01/24] DAX: vhost-user: Rework slave return values
  2021-02-11  9:59     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-02-11 15:27       ` Vivek Goyal
  -1 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 15:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-fs, marcandre.lureau, mst, Dr. David Alan Gilbert (git),
	qemu-devel

On Thu, Feb 11, 2021 at 09:59:36AM +0000, Stefan Hajnoczi wrote:
> On Tue, Feb 09, 2021 at 07:02:01PM +0000, Dr. David Alan Gilbert (git) wrote:
> > +static uint64_t vhost_user_slave_handle_vring_host_notifier(
> > +                struct vhost_dev *dev,
> > +               VhostUserVringArea *area,
> > +               int fd)
> 
> Indentation looks off. Only worth changing if you respin.
> 
> > @@ -1398,7 +1399,8 @@ static void slave_read(void *opaque)
> >      struct vhost_user *u = dev->opaque;
> >      VhostUserHeader hdr = { 0, };
> >      VhostUserPayload payload = { 0, };
> > -    int size, ret = 0;
> > +    int size;
> > +    uint64_t ret = 0;
> >      struct iovec iov;
> >      struct msghdr msgh;
> >      int fd[VHOST_USER_SLAVE_MAX_FDS];
> > @@ -1472,7 +1474,7 @@ static void slave_read(void *opaque)
> >          break;
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> > -        ret = -EINVAL;
> > +        ret = (uint64_t)-EINVAL;
> 
> The !!ret was removed below so it would have previously been true (1).
> Now it has changed value.
> 
> If there is no specific reason to change the value, please keep it true
> (1) just in case a vhost-user device backend depends on that value.

Good catch. I guess it will be nice to send -EINVAL back but we probably
can't change it now due to backward compatibility issue. Just in case,
someone is relying on reading back true (instead of -EINVAL).

Vivek



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 01/24] DAX: vhost-user: Rework slave return values
@ 2021-02-11 15:27       ` Vivek Goyal
  0 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 15:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, mst, qemu-devel

On Thu, Feb 11, 2021 at 09:59:36AM +0000, Stefan Hajnoczi wrote:
> On Tue, Feb 09, 2021 at 07:02:01PM +0000, Dr. David Alan Gilbert (git) wrote:
> > +static uint64_t vhost_user_slave_handle_vring_host_notifier(
> > +                struct vhost_dev *dev,
> > +               VhostUserVringArea *area,
> > +               int fd)
> 
> Indentation looks off. Only worth changing if you respin.
> 
> > @@ -1398,7 +1399,8 @@ static void slave_read(void *opaque)
> >      struct vhost_user *u = dev->opaque;
> >      VhostUserHeader hdr = { 0, };
> >      VhostUserPayload payload = { 0, };
> > -    int size, ret = 0;
> > +    int size;
> > +    uint64_t ret = 0;
> >      struct iovec iov;
> >      struct msghdr msgh;
> >      int fd[VHOST_USER_SLAVE_MAX_FDS];
> > @@ -1472,7 +1474,7 @@ static void slave_read(void *opaque)
> >          break;
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> > -        ret = -EINVAL;
> > +        ret = (uint64_t)-EINVAL;
> 
> The !!ret was removed below so it would have previously been true (1).
> Now it has changed value.
> 
> If there is no specific reason to change the value, please keep it true
> (1) just in case a vhost-user device backend depends on that value.

Good catch. I guess it will be nice to send -EINVAL back but we probably
can't change it now due to backward compatibility issue. Just in case,
someone is relying on reading back true (instead of -EINVAL).

Vivek


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 12/24] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-11 16:05     ` Vivek Goyal
  -1 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 16:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: virtio-fs, marcandre.lureau, qemu-devel, stefanha, mst

On Tue, Feb 09, 2021 at 07:02:12PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Wire up passthrough_ll's setupmapping to allocate, send to virtio
> and then reply OK.
> 
> Guest might not pass file pointer. In that case using inode info, open
> the file again, mmap() and close fd.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> With fix from:
> Signed-off-by: Fotis Xenakis <foxen@windowslive.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 13 ++++++--
>  tools/virtiofsd/passthrough_ll.c | 52 ++++++++++++++++++++++++++++++--
>  2 files changed, 61 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 0d3768b7d0..f74583e095 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -1897,8 +1897,17 @@ static void do_setupmapping(fuse_req_t req, fuse_ino_t nodeid,
>      }
>  
>      if (req->se->op.setupmapping) {
> -        req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
> -                                 arg->moffset, genflags, &fi);
> +        /*
> +         * TODO: Add a flag to request which tells if arg->fh is
> +         * valid or not.
> +         */
> +        if (fi.fh == (uint64_t)-1) {
> +            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
> +                                     arg->moffset, genflags, NULL);
> +        } else {
> +            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
> +                                     arg->moffset, genflags, &fi);
> +        }
>      } else {
>          fuse_reply_err(req, ENOSYS);
>      }
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 31c43d67a0..0493f00756 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2967,8 +2967,56 @@ static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
>                              uint64_t len, uint64_t moffset, uint64_t flags,
>                              struct fuse_file_info *fi)
>  {
> -    /* TODO */
> -    fuse_reply_err(req, ENOSYS);
> +    struct lo_data *lo = lo_data(req);
> +    int ret = 0, fd;
> +    VhostUserFSSlaveMsg msg = { 0 };
> +    uint64_t vhu_flags;
> +    char *buf;
> +    bool writable = flags & O_RDWR;
> +
> +    fuse_log(FUSE_LOG_DEBUG,
> +             "lo_setupmapping(ino=%" PRIu64 ", fi=0x%p,"
> +             " foffset=%" PRIu64 ", len=%" PRIu64 ", moffset=%" PRIu64
> +             ", flags=%" PRIu64 ")\n",
> +             ino, (void *)fi, foffset, len, moffset, flags);
> +
> +    vhu_flags = VHOST_USER_FS_FLAG_MAP_R;
> +    if (writable) {
> +        vhu_flags |= VHOST_USER_FS_FLAG_MAP_W;
> +    }
> +
> +    msg.fd_offset[0] = foffset;
> +    msg.len[0] = len;
> +    msg.c_offset[0] = moffset;
> +    msg.flags[0] = vhu_flags;
> +
> +    if (fi) {
> +        fd = lo_fi_fd(req, fi);
> +    } else {
> +        ret = asprintf(&buf, "%i", lo_fd(req, ino));
> +        if (ret == -1) {
> +            return (void)fuse_reply_err(req, errno);
> +        }
> +
> +        fd = openat(lo->proc_self_fd, buf, flags);
> +        free(buf);

We can probably now use lo_inode_open() instead here now?

Vivek



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 12/24] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping
@ 2021-02-11 16:05     ` Vivek Goyal
  0 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 16:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: virtio-fs, marcandre.lureau, qemu-devel, mst

On Tue, Feb 09, 2021 at 07:02:12PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Wire up passthrough_ll's setupmapping to allocate, send to virtio
> and then reply OK.
> 
> Guest might not pass file pointer. In that case using inode info, open
> the file again, mmap() and close fd.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> With fix from:
> Signed-off-by: Fotis Xenakis <foxen@windowslive.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 13 ++++++--
>  tools/virtiofsd/passthrough_ll.c | 52 ++++++++++++++++++++++++++++++--
>  2 files changed, 61 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 0d3768b7d0..f74583e095 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -1897,8 +1897,17 @@ static void do_setupmapping(fuse_req_t req, fuse_ino_t nodeid,
>      }
>  
>      if (req->se->op.setupmapping) {
> -        req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
> -                                 arg->moffset, genflags, &fi);
> +        /*
> +         * TODO: Add a flag to request which tells if arg->fh is
> +         * valid or not.
> +         */
> +        if (fi.fh == (uint64_t)-1) {
> +            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
> +                                     arg->moffset, genflags, NULL);
> +        } else {
> +            req->se->op.setupmapping(req, nodeid, arg->foffset, arg->len,
> +                                     arg->moffset, genflags, &fi);
> +        }
>      } else {
>          fuse_reply_err(req, ENOSYS);
>      }
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 31c43d67a0..0493f00756 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2967,8 +2967,56 @@ static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset,
>                              uint64_t len, uint64_t moffset, uint64_t flags,
>                              struct fuse_file_info *fi)
>  {
> -    /* TODO */
> -    fuse_reply_err(req, ENOSYS);
> +    struct lo_data *lo = lo_data(req);
> +    int ret = 0, fd;
> +    VhostUserFSSlaveMsg msg = { 0 };
> +    uint64_t vhu_flags;
> +    char *buf;
> +    bool writable = flags & O_RDWR;
> +
> +    fuse_log(FUSE_LOG_DEBUG,
> +             "lo_setupmapping(ino=%" PRIu64 ", fi=0x%p,"
> +             " foffset=%" PRIu64 ", len=%" PRIu64 ", moffset=%" PRIu64
> +             ", flags=%" PRIu64 ")\n",
> +             ino, (void *)fi, foffset, len, moffset, flags);
> +
> +    vhu_flags = VHOST_USER_FS_FLAG_MAP_R;
> +    if (writable) {
> +        vhu_flags |= VHOST_USER_FS_FLAG_MAP_W;
> +    }
> +
> +    msg.fd_offset[0] = foffset;
> +    msg.len[0] = len;
> +    msg.c_offset[0] = moffset;
> +    msg.flags[0] = vhu_flags;
> +
> +    if (fi) {
> +        fd = lo_fi_fd(req, fi);
> +    } else {
> +        ret = asprintf(&buf, "%i", lo_fd(req, ino));
> +        if (ret == -1) {
> +            return (void)fuse_reply_err(req, errno);
> +        }
> +
> +        fd = openat(lo->proc_self_fd, buf, flags);
> +        free(buf);

We can probably now use lo_inode_open() instead here now?

Vivek


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
  2021-02-11 12:37     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-02-11 16:39       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-11 16:39 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> > +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> > +                             struct fuse_mbuf_iter *iter)
> > +{
> > +    struct fuse_removemapping_in *arg;
> > +    struct fuse_removemapping_one *one;
> > +
> > +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> > +    if (!arg || arg->count <= 0) {
> 
> arg->count is unsigned so < is tautologous.
> 
> > +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > +        fuse_reply_err(req, EINVAL);
> > +        return;
> > +    }
> > +
> > +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
> 
> arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
> think we should be more defensive here since this input comes from the
> guest.

OK, so I've gone with:

    if (!arg || !arg->count || 
        (uint64_t)arg->count * sizeof(*one) >= SIZE_MAX) {
        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
        fuse_reply_err(req, EINVAL);
        return;
    }

to fix both of those (the compiler likes to moan on 64bit about
that comparison being always false in the simpler ways I tried it).

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
@ 2021-02-11 16:39       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-11 16:39 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> > +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> > +                             struct fuse_mbuf_iter *iter)
> > +{
> > +    struct fuse_removemapping_in *arg;
> > +    struct fuse_removemapping_one *one;
> > +
> > +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> > +    if (!arg || arg->count <= 0) {
> 
> arg->count is unsigned so < is tautologous.
> 
> > +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > +        fuse_reply_err(req, EINVAL);
> > +        return;
> > +    }
> > +
> > +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
> 
> arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
> think we should be more defensive here since this input comes from the
> guest.

OK, so I've gone with:

    if (!arg || !arg->count || 
        (uint64_t)arg->count * sizeof(*one) >= SIZE_MAX) {
        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
        fuse_reply_err(req, EINVAL);
        return;
    }

to fix both of those (the compiler likes to moan on 64bit about
that comparison being always false in the simpler ways I tried it).

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
  2021-02-11 16:39       ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2021-02-11 18:30         ` Vivek Goyal
  -1 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 18:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: virtio-fs, marcandre.lureau, qemu-devel, Stefan Hajnoczi, mst

On Thu, Feb 11, 2021 at 04:39:22PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> > > +                             struct fuse_mbuf_iter *iter)
> > > +{
> > > +    struct fuse_removemapping_in *arg;
> > > +    struct fuse_removemapping_one *one;
> > > +
> > > +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> > > +    if (!arg || arg->count <= 0) {
> > 
> > arg->count is unsigned so < is tautologous.
> > 
> > > +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > > +        fuse_reply_err(req, EINVAL);
> > > +        return;
> > > +    }
> > > +
> > > +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
> > 
> > arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
> > think we should be more defensive here since this input comes from the
> > guest.
> 
> OK, so I've gone with:
> 
>     if (!arg || !arg->count || 
>         (uint64_t)arg->count * sizeof(*one) >= SIZE_MAX) {
>         fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
>         fuse_reply_err(req, EINVAL);
>         return;

If we did not want to get into unit64_t business, can we alternatively do.
     if (!arg || !arg->count || arg->count > SIZE_MAX/sizeof(*one)) {
     }

Vivek



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
@ 2021-02-11 18:30         ` Vivek Goyal
  0 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 18:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-fs, marcandre.lureau, qemu-devel, mst

On Thu, Feb 11, 2021 at 04:39:22PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> > > +                             struct fuse_mbuf_iter *iter)
> > > +{
> > > +    struct fuse_removemapping_in *arg;
> > > +    struct fuse_removemapping_one *one;
> > > +
> > > +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> > > +    if (!arg || arg->count <= 0) {
> > 
> > arg->count is unsigned so < is tautologous.
> > 
> > > +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > > +        fuse_reply_err(req, EINVAL);
> > > +        return;
> > > +    }
> > > +
> > > +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
> > 
> > arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
> > think we should be more defensive here since this input comes from the
> > guest.
> 
> OK, so I've gone with:
> 
>     if (!arg || !arg->count || 
>         (uint64_t)arg->count * sizeof(*one) >= SIZE_MAX) {
>         fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
>         fuse_reply_err(req, EINVAL);
>         return;

If we did not want to get into unit64_t business, can we alternatively do.
     if (!arg || !arg->count || arg->count > SIZE_MAX/sizeof(*one)) {
     }

Vivek


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
  2021-02-11 18:30         ` [Virtio-fs] " Vivek Goyal
@ 2021-02-11 19:50           ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-11 19:50 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, marcandre.lureau, qemu-devel, Stefan Hajnoczi, mst

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Thu, Feb 11, 2021 at 04:39:22PM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> > > > +                             struct fuse_mbuf_iter *iter)
> > > > +{
> > > > +    struct fuse_removemapping_in *arg;
> > > > +    struct fuse_removemapping_one *one;
> > > > +
> > > > +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> > > > +    if (!arg || arg->count <= 0) {
> > > 
> > > arg->count is unsigned so < is tautologous.
> > > 
> > > > +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > > > +        fuse_reply_err(req, EINVAL);
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
> > > 
> > > arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
> > > think we should be more defensive here since this input comes from the
> > > guest.
> > 
> > OK, so I've gone with:
> > 
> >     if (!arg || !arg->count || 
> >         (uint64_t)arg->count * sizeof(*one) >= SIZE_MAX) {
> >         fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> >         fuse_reply_err(req, EINVAL);
> >         return;
> 
> If we did not want to get into unit64_t business, can we alternatively do.
>      if (!arg || !arg->count || arg->count > SIZE_MAX/sizeof(*one)) {

I tried that and the compiler moaned that it was always false; which on
a 64bit host it is since arg->count is uint32_t.

Dave

>      }
> 
> Vivek
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
@ 2021-02-11 19:50           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-11 19:50 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, marcandre.lureau, qemu-devel, mst

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Thu, Feb 11, 2021 at 04:39:22PM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> > > > +                             struct fuse_mbuf_iter *iter)
> > > > +{
> > > > +    struct fuse_removemapping_in *arg;
> > > > +    struct fuse_removemapping_one *one;
> > > > +
> > > > +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> > > > +    if (!arg || arg->count <= 0) {
> > > 
> > > arg->count is unsigned so < is tautologous.
> > > 
> > > > +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > > > +        fuse_reply_err(req, EINVAL);
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
> > > 
> > > arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
> > > think we should be more defensive here since this input comes from the
> > > guest.
> > 
> > OK, so I've gone with:
> > 
> >     if (!arg || !arg->count || 
> >         (uint64_t)arg->count * sizeof(*one) >= SIZE_MAX) {
> >         fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> >         fuse_reply_err(req, EINVAL);
> >         return;
> 
> If we did not want to get into unit64_t business, can we alternatively do.
>      if (!arg || !arg->count || arg->count > SIZE_MAX/sizeof(*one)) {

I tried that and the compiler moaned that it was always false; which on
a 64bit host it is since arg->count is uint32_t.

Dave

>      }
> 
> Vivek
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
  2021-02-11 19:50           ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2021-02-11 20:15             ` Vivek Goyal
  -1 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 20:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: virtio-fs, marcandre.lureau, qemu-devel, Stefan Hajnoczi, mst

On Thu, Feb 11, 2021 at 07:50:37PM +0000, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Thu, Feb 11, 2021 at 04:39:22PM +0000, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> > > > > +                             struct fuse_mbuf_iter *iter)
> > > > > +{
> > > > > +    struct fuse_removemapping_in *arg;
> > > > > +    struct fuse_removemapping_one *one;
> > > > > +
> > > > > +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> > > > > +    if (!arg || arg->count <= 0) {
> > > > 
> > > > arg->count is unsigned so < is tautologous.
> > > > 
> > > > > +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > > > > +        fuse_reply_err(req, EINVAL);
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > > +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
> > > > 
> > > > arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
> > > > think we should be more defensive here since this input comes from the
> > > > guest.
> > > 
> > > OK, so I've gone with:
> > > 
> > >     if (!arg || !arg->count || 
> > >         (uint64_t)arg->count * sizeof(*one) >= SIZE_MAX) {
> > >         fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > >         fuse_reply_err(req, EINVAL);
> > >         return;
> > 
> > If we did not want to get into unit64_t business, can we alternatively do.
> >      if (!arg || !arg->count || arg->count > SIZE_MAX/sizeof(*one)) {
> 
> I tried that and the compiler moaned that it was always false; which on
> a 64bit host it is since arg->count is uint32_t.

Hmm.... May be something like.

bool is_arg_count_valid()
{
  if (!arg->count)
      return false;

#if __WORDSIZE == 64
   return true;
#elif
  if (argc->count > SIZE_MAX/sizeof(*one))
      return false;
#fi
  return true;
}

if (!argc || !is_arg_count_valie()) {
}

Vivek



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands
@ 2021-02-11 20:15             ` Vivek Goyal
  0 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-11 20:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-fs, marcandre.lureau, qemu-devel, mst

On Thu, Feb 11, 2021 at 07:50:37PM +0000, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Thu, Feb 11, 2021 at 04:39:22PM +0000, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > On Tue, Feb 09, 2021 at 07:02:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > +static void do_removemapping(fuse_req_t req, fuse_ino_t nodeid,
> > > > > +                             struct fuse_mbuf_iter *iter)
> > > > > +{
> > > > > +    struct fuse_removemapping_in *arg;
> > > > > +    struct fuse_removemapping_one *one;
> > > > > +
> > > > > +    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
> > > > > +    if (!arg || arg->count <= 0) {
> > > > 
> > > > arg->count is unsigned so < is tautologous.
> > > > 
> > > > > +        fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > > > > +        fuse_reply_err(req, EINVAL);
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > > +    one = fuse_mbuf_iter_advance(iter, arg->count * sizeof(*one));
> > > > 
> > > > arg->count * sizeof(*one) is an integer overflow on 32-bit hosts. I
> > > > think we should be more defensive here since this input comes from the
> > > > guest.
> > > 
> > > OK, so I've gone with:
> > > 
> > >     if (!arg || !arg->count || 
> > >         (uint64_t)arg->count * sizeof(*one) >= SIZE_MAX) {
> > >         fuse_log(FUSE_LOG_ERR, "do_removemapping: invalid arg %p\n", arg);
> > >         fuse_reply_err(req, EINVAL);
> > >         return;
> > 
> > If we did not want to get into unit64_t business, can we alternatively do.
> >      if (!arg || !arg->count || arg->count > SIZE_MAX/sizeof(*one)) {
> 
> I tried that and the compiler moaned that it was always false; which on
> a 64bit host it is since arg->count is uint32_t.

Hmm.... May be something like.

bool is_arg_count_valid()
{
  if (!arg->count)
      return false;

#if __WORDSIZE == 64
   return true;
#elif
  if (argc->count > SIZE_MAX/sizeof(*one))
      return false;
#fi
  return true;
}

if (!argc || !is_arg_count_valie()) {
}

Vivek


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
@ 2021-02-15 10:35     ` Chirantan Ekbote
  -1 siblings, 0 replies; 138+ messages in thread
From: Chirantan Ekbote @ 2021-02-15 10:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: mst, qemu-devel, virtio-fs-list, Stefan Hajnoczi,
	marcandre.lureau, Vivek Goyal

On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
<dgilbert@redhat.com> wrote:
>
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> +
> +typedef struct {
> +    /* Offsets within the file being mapped */
> +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> +    /* Offsets within the cache */
> +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> +    /* Lengths of sections */
> +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> +    /* Flags, from VHOST_USER_FS_FLAG_* */
> +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> +} VhostUserFSSlaveMsg;
> +

Is it too late to change this?  This struct allocates space for up to
8 entries but most of the time the server will only try to set up one
mapping at a time so only 32 out of the 256 bytes in the message are
actually being used.  We're just wasting time memcpy'ing bytes that
will never be used.  Is there some reason this can't be dynamically
sized?  Something like:

typedef struct {
    /* Number of mapping requests */
    uint16_t num_requests;
    /* `num_requests` mapping requests */
   MappingRequest requests[];
} VhostUserFSSlaveMsg;

typedef struct {
    /* Offset within the file being mapped */
    uint64_t fd_offset;
    /* Offset within the cache */
    uint64_t c_offset;
    /* Length of section */
    uint64_t len;
    /* Flags, from VHOST_USER_FS_FLAG_* */
    uint64_t flags;
} MappingRequest;

The current pre-allocated structure both wastes space when there are
fewer than 8 requests and requires extra messages to be sent when
there are more than 8 requests.  I realize that in the grand scheme of
things copying 224 extra bytes is basically not noticeable but it just
irks me that we could fix this really easily before it gets propagated
to too many other places.

Chirantan

> --
> 2.29.2
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs
>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
@ 2021-02-15 10:35     ` Chirantan Ekbote
  0 siblings, 0 replies; 138+ messages in thread
From: Chirantan Ekbote @ 2021-02-15 10:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: mst, qemu-devel, virtio-fs-list, marcandre.lureau, Vivek Goyal

On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
<dgilbert@redhat.com> wrote:
>
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> +
> +typedef struct {
> +    /* Offsets within the file being mapped */
> +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> +    /* Offsets within the cache */
> +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> +    /* Lengths of sections */
> +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> +    /* Flags, from VHOST_USER_FS_FLAG_* */
> +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> +} VhostUserFSSlaveMsg;
> +

Is it too late to change this?  This struct allocates space for up to
8 entries but most of the time the server will only try to set up one
mapping at a time so only 32 out of the 256 bytes in the message are
actually being used.  We're just wasting time memcpy'ing bytes that
will never be used.  Is there some reason this can't be dynamically
sized?  Something like:

typedef struct {
    /* Number of mapping requests */
    uint16_t num_requests;
    /* `num_requests` mapping requests */
   MappingRequest requests[];
} VhostUserFSSlaveMsg;

typedef struct {
    /* Offset within the file being mapped */
    uint64_t fd_offset;
    /* Offset within the cache */
    uint64_t c_offset;
    /* Length of section */
    uint64_t len;
    /* Flags, from VHOST_USER_FS_FLAG_* */
    uint64_t flags;
} MappingRequest;

The current pre-allocated structure both wastes space when there are
fewer than 8 requests and requires extra messages to be sent when
there are more than 8 requests.  I realize that in the grand scheme of
things copying 224 extra bytes is basically not noticeable but it just
irks me that we could fix this really easily before it gets propagated
to too many other places.

Chirantan

> --
> 2.29.2
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs
>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-02-15 10:35     ` Chirantan Ekbote
@ 2021-02-15 13:25       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-15 13:25 UTC (permalink / raw)
  To: Chirantan Ekbote
  Cc: mst, qemu-devel, virtio-fs-list, Stefan Hajnoczi,
	marcandre.lureau, Vivek Goyal

* Chirantan Ekbote (chirantan@chromium.org) wrote:
> On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
> >
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > +
> > +typedef struct {
> > +    /* Offsets within the file being mapped */
> > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Offsets within the cache */
> > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Lengths of sections */
> > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > +} VhostUserFSSlaveMsg;
> > +
> 
> Is it too late to change this?

No; this is a message defined as part of this series so still up for
review.  It's not guest visible; just on the vhist-user pipe.

> This struct allocates space for up to
> 8 entries but most of the time the server will only try to set up one
> mapping at a time so only 32 out of the 256 bytes in the message are
> actually being used.  We're just wasting time memcpy'ing bytes that
> will never be used.  Is there some reason this can't be dynamically
> sized?  Something like:
> 
> typedef struct {
>     /* Number of mapping requests */
>     uint16_t num_requests;
>     /* `num_requests` mapping requests */
>    MappingRequest requests[];
> } VhostUserFSSlaveMsg;
> 
> typedef struct {
>     /* Offset within the file being mapped */
>     uint64_t fd_offset;
>     /* Offset within the cache */
>     uint64_t c_offset;
>     /* Length of section */
>     uint64_t len;
>     /* Flags, from VHOST_USER_FS_FLAG_* */
>     uint64_t flags;
> } MappingRequest;
> 
> The current pre-allocated structure both wastes space when there are
> fewer than 8 requests and requires extra messages to be sent when
> there are more than 8 requests.  I realize that in the grand scheme of
> things copying 224 extra bytes is basically not noticeable but it just
> irks me that we could fix this really easily before it gets propagated
> to too many other places.

Sure; I'll have a look.  I think at the moment the only
more-than-one-entry case is the remove mapping case.

Dave

> Chirantan
> 
> > --
> > 2.29.2
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://www.redhat.com/mailman/listinfo/virtio-fs
> >
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
@ 2021-02-15 13:25       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-15 13:25 UTC (permalink / raw)
  To: Chirantan Ekbote
  Cc: mst, qemu-devel, virtio-fs-list, marcandre.lureau, Vivek Goyal

* Chirantan Ekbote (chirantan@chromium.org) wrote:
> On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
> >
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > +
> > +typedef struct {
> > +    /* Offsets within the file being mapped */
> > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Offsets within the cache */
> > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Lengths of sections */
> > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > +} VhostUserFSSlaveMsg;
> > +
> 
> Is it too late to change this?

No; this is a message defined as part of this series so still up for
review.  It's not guest visible; just on the vhist-user pipe.

> This struct allocates space for up to
> 8 entries but most of the time the server will only try to set up one
> mapping at a time so only 32 out of the 256 bytes in the message are
> actually being used.  We're just wasting time memcpy'ing bytes that
> will never be used.  Is there some reason this can't be dynamically
> sized?  Something like:
> 
> typedef struct {
>     /* Number of mapping requests */
>     uint16_t num_requests;
>     /* `num_requests` mapping requests */
>    MappingRequest requests[];
> } VhostUserFSSlaveMsg;
> 
> typedef struct {
>     /* Offset within the file being mapped */
>     uint64_t fd_offset;
>     /* Offset within the cache */
>     uint64_t c_offset;
>     /* Length of section */
>     uint64_t len;
>     /* Flags, from VHOST_USER_FS_FLAG_* */
>     uint64_t flags;
> } MappingRequest;
> 
> The current pre-allocated structure both wastes space when there are
> fewer than 8 requests and requires extra messages to be sent when
> there are more than 8 requests.  I realize that in the grand scheme of
> things copying 224 extra bytes is basically not noticeable but it just
> irks me that we could fix this really easily before it gets propagated
> to too many other places.

Sure; I'll have a look.  I think at the moment the only
more-than-one-entry case is the remove mapping case.

Dave

> Chirantan
> 
> > --
> > 2.29.2
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://www.redhat.com/mailman/listinfo/virtio-fs
> >
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-02-15 10:35     ` Chirantan Ekbote
@ 2021-02-15 14:24       ` Vivek Goyal
  -1 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-15 14:24 UTC (permalink / raw)
  To: Chirantan Ekbote
  Cc: mst, Dr. David Alan Gilbert (git),
	qemu-devel, virtio-fs-list, Stefan Hajnoczi, marcandre.lureau

On Mon, Feb 15, 2021 at 07:35:53PM +0900, Chirantan Ekbote wrote:
> On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
> >
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > +
> > +typedef struct {
> > +    /* Offsets within the file being mapped */
> > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Offsets within the cache */
> > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Lengths of sections */
> > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > +} VhostUserFSSlaveMsg;
> > +
> 
> Is it too late to change this?  This struct allocates space for up to
> 8 entries but most of the time the server will only try to set up one
> mapping at a time so only 32 out of the 256 bytes in the message are
> actually being used.  We're just wasting time memcpy'ing bytes that
> will never be used.  Is there some reason this can't be dynamically
> sized?  Something like:
> 
> typedef struct {
>     /* Number of mapping requests */
>     uint16_t num_requests;
>     /* `num_requests` mapping requests */
>    MappingRequest requests[];
> } VhostUserFSSlaveMsg;
> 
> typedef struct {
>     /* Offset within the file being mapped */
>     uint64_t fd_offset;
>     /* Offset within the cache */
>     uint64_t c_offset;
>     /* Length of section */
>     uint64_t len;
>     /* Flags, from VHOST_USER_FS_FLAG_* */
>     uint64_t flags;
> } MappingRequest;
> 
> The current pre-allocated structure both wastes space when there are
> fewer than 8 requests and requires extra messages to be sent when
> there are more than 8 requests.  I realize that in the grand scheme of
> things copying 224 extra bytes is basically not noticeable but it just
> irks me that we could fix this really easily before it gets propagated
> to too many other places.

Sounds like a reasonable idea. We probably will have to dynamically
allocate memory for removemapping, hopefully that does not have a
performance impact.

Vivek



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
@ 2021-02-15 14:24       ` Vivek Goyal
  0 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-15 14:24 UTC (permalink / raw)
  To: Chirantan Ekbote; +Cc: mst, qemu-devel, virtio-fs-list, marcandre.lureau

On Mon, Feb 15, 2021 at 07:35:53PM +0900, Chirantan Ekbote wrote:
> On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
> >
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > +
> > +typedef struct {
> > +    /* Offsets within the file being mapped */
> > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Offsets within the cache */
> > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Lengths of sections */
> > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > +} VhostUserFSSlaveMsg;
> > +
> 
> Is it too late to change this?  This struct allocates space for up to
> 8 entries but most of the time the server will only try to set up one
> mapping at a time so only 32 out of the 256 bytes in the message are
> actually being used.  We're just wasting time memcpy'ing bytes that
> will never be used.  Is there some reason this can't be dynamically
> sized?  Something like:
> 
> typedef struct {
>     /* Number of mapping requests */
>     uint16_t num_requests;
>     /* `num_requests` mapping requests */
>    MappingRequest requests[];
> } VhostUserFSSlaveMsg;
> 
> typedef struct {
>     /* Offset within the file being mapped */
>     uint64_t fd_offset;
>     /* Offset within the cache */
>     uint64_t c_offset;
>     /* Length of section */
>     uint64_t len;
>     /* Flags, from VHOST_USER_FS_FLAG_* */
>     uint64_t flags;
> } MappingRequest;
> 
> The current pre-allocated structure both wastes space when there are
> fewer than 8 requests and requires extra messages to be sent when
> there are more than 8 requests.  I realize that in the grand scheme of
> things copying 224 extra bytes is basically not noticeable but it just
> irks me that we could fix this really easily before it gets propagated
> to too many other places.

Sounds like a reasonable idea. We probably will have to dynamically
allocate memory for removemapping, hopefully that does not have a
performance impact.

Vivek


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
  2021-02-11 14:40       ` [Virtio-fs] " Vivek Goyal
@ 2021-02-15 15:57         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-15 15:57 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, marcandre.lureau, mst, Dr. David Alan Gilbert (git),
	qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1179 bytes --]

On Thu, Feb 11, 2021 at 09:40:31AM -0500, Vivek Goyal wrote:
> On Thu, Feb 11, 2021 at 02:35:42PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: Vivek Goyal <vgoyal@redhat.com>
> > > 
> > > As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> > > slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> > > Implement functionality to drop CAP_FSETID and gain it back after the
> > > operation.
> > > 
> > > This also creates a dependency on libcap-ng.
> > 
> > Is this patch only for the case where QEMU is running as root?
> > 
> 
> Yes, it primarily is for the case where qemu is running as root, or
> somebody managed to launch it non-root but with still having capability
> CAP_FSETID.

Running QEMU as root is not encouraged because the security model is
designed around the principle of least privilege (only give QEMU access
to resources that belong to the guest).

What happens in the case where QEMU is not root? Does that mean QEMU
will drop suid/guid bits even if the FUSE client wanted them to be
preserved?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
@ 2021-02-15 15:57         ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-15 15:57 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, marcandre.lureau, mst, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1179 bytes --]

On Thu, Feb 11, 2021 at 09:40:31AM -0500, Vivek Goyal wrote:
> On Thu, Feb 11, 2021 at 02:35:42PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: Vivek Goyal <vgoyal@redhat.com>
> > > 
> > > As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> > > slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> > > Implement functionality to drop CAP_FSETID and gain it back after the
> > > operation.
> > > 
> > > This also creates a dependency on libcap-ng.
> > 
> > Is this patch only for the case where QEMU is running as root?
> > 
> 
> Yes, it primarily is for the case where qemu is running as root, or
> somebody managed to launch it non-root but with still having capability
> CAP_FSETID.

Running QEMU as root is not encouraged because the security model is
designed around the principle of least privilege (only give QEMU access
to resources that belong to the guest).

What happens in the case where QEMU is not root? Does that mean QEMU
will drop suid/guid bits even if the FUSE client wanted them to be
preserved?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
  2021-02-15 15:57         ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-02-16 15:57           ` Vivek Goyal
  -1 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-16 15:57 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-fs, marcandre.lureau, mst, Dr. David Alan Gilbert (git),
	qemu-devel

On Mon, Feb 15, 2021 at 03:57:11PM +0000, Stefan Hajnoczi wrote:
> On Thu, Feb 11, 2021 at 09:40:31AM -0500, Vivek Goyal wrote:
> > On Thu, Feb 11, 2021 at 02:35:42PM +0000, Stefan Hajnoczi wrote:
> > > On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: Vivek Goyal <vgoyal@redhat.com>
> > > > 
> > > > As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> > > > slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> > > > Implement functionality to drop CAP_FSETID and gain it back after the
> > > > operation.
> > > > 
> > > > This also creates a dependency on libcap-ng.
> > > 
> > > Is this patch only for the case where QEMU is running as root?
> > > 
> > 
> > Yes, it primarily is for the case where qemu is running as root, or
> > somebody managed to launch it non-root but with still having capability
> > CAP_FSETID.
> 
> Running QEMU as root is not encouraged because the security model is
> designed around the principle of least privilege (only give QEMU access
> to resources that belong to the guest).
> 
> What happens in the case where QEMU is not root? Does that mean QEMU
> will drop suid/guid bits even if the FUSE client wanted them to be
> preserved?

QEMU will drop CAP_FSETID only if vhost-user slave asked for it. There
is no notion of gaining CAP_FSETID.

IOW, yes, if qemu is running unpriviliged and does not have CAP_FSETID,
then we will end up clearing setuid bit on host. Not sure how that
problem can be fixed.

Vivek



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
@ 2021-02-16 15:57           ` Vivek Goyal
  0 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-02-16 15:57 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, mst, qemu-devel

On Mon, Feb 15, 2021 at 03:57:11PM +0000, Stefan Hajnoczi wrote:
> On Thu, Feb 11, 2021 at 09:40:31AM -0500, Vivek Goyal wrote:
> > On Thu, Feb 11, 2021 at 02:35:42PM +0000, Stefan Hajnoczi wrote:
> > > On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: Vivek Goyal <vgoyal@redhat.com>
> > > > 
> > > > As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> > > > slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> > > > Implement functionality to drop CAP_FSETID and gain it back after the
> > > > operation.
> > > > 
> > > > This also creates a dependency on libcap-ng.
> > > 
> > > Is this patch only for the case where QEMU is running as root?
> > > 
> > 
> > Yes, it primarily is for the case where qemu is running as root, or
> > somebody managed to launch it non-root but with still having capability
> > CAP_FSETID.
> 
> Running QEMU as root is not encouraged because the security model is
> designed around the principle of least privilege (only give QEMU access
> to resources that belong to the guest).
> 
> What happens in the case where QEMU is not root? Does that mean QEMU
> will drop suid/guid bits even if the FUSE client wanted them to be
> preserved?

QEMU will drop CAP_FSETID only if vhost-user slave asked for it. There
is no notion of gaining CAP_FSETID.

IOW, yes, if qemu is running unpriviliged and does not have CAP_FSETID,
then we will end up clearing setuid bit on host. Not sure how that
problem can be fixed.

Vivek


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 08/24] DAX: virtio-fs: Fill in slave commands for mapping
  2021-02-11 10:57     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-02-18 10:59       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-18 10:59 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:08PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Fill in definitions for map, unmap and sync commands.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > with fix by misono.tomohiro@fujitsu.com
> > ---
> >  hw/virtio/vhost-user-fs.c | 115 ++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 111 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 78401d2ff1..5f2fca4d82 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -37,15 +37,122 @@
> >  uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
> >                                   int fd)
> >  {
> > -    /* TODO */
> > -    return (uint64_t)-1;
> > +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> > +    if (!fs) {
> > +        /* Shouldn't happen - but seen on error path */
> > +        error_report("Bad fs ptr");
> > +        return (uint64_t)-1;
> > +    }
> 
> If a non-vhost-user-fs vhost-user device backend sends this message
> VHOST_USER_FS() -> object_dynamic_cast_assert() there will either be an
> assertion failure (CONFIG_QOM_CAST_DEBUG) or the pointer will be
> silently cast to the wrong type (!CONFIG_QOM_CAST_DEBUG).
> 
> Both of these outcomes are not suitable for input validation. We need to
> fail cleanly here:
> 
>   VhostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
>                                                        TYPE_VHOST_USER_FS);
>   if (!fs) {
>       ...handle failure...
>   }
> 
> >  uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
> >                                     VhostUserFSSlaveMsg *sm)
> >  {
> > -    /* TODO */
> > -    return (uint64_t)-1;
> > +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> > +    if (!fs) {
> > +        /* Shouldn't happen - but seen on error path */
> > +        error_report("Bad fs ptr");
> > +        return (uint64_t)-1;
> > +    }
> 
> Same here.

Thanks, fixed.

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 08/24] DAX: virtio-fs: Fill in slave commands for mapping
@ 2021-02-18 10:59       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-18 10:59 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:08PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Fill in definitions for map, unmap and sync commands.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > with fix by misono.tomohiro@fujitsu.com
> > ---
> >  hw/virtio/vhost-user-fs.c | 115 ++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 111 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 78401d2ff1..5f2fca4d82 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -37,15 +37,122 @@
> >  uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
> >                                   int fd)
> >  {
> > -    /* TODO */
> > -    return (uint64_t)-1;
> > +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> > +    if (!fs) {
> > +        /* Shouldn't happen - but seen on error path */
> > +        error_report("Bad fs ptr");
> > +        return (uint64_t)-1;
> > +    }
> 
> If a non-vhost-user-fs vhost-user device backend sends this message
> VHOST_USER_FS() -> object_dynamic_cast_assert() there will either be an
> assertion failure (CONFIG_QOM_CAST_DEBUG) or the pointer will be
> silently cast to the wrong type (!CONFIG_QOM_CAST_DEBUG).
> 
> Both of these outcomes are not suitable for input validation. We need to
> fail cleanly here:
> 
>   VhostUserFS *fs = (VHostUserFS *)object_dynamic_cast(OBJECT(dev->vdev),
>                                                        TYPE_VHOST_USER_FS);
>   if (!fs) {
>       ...handle failure...
>   }
> 
> >  uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
> >                                     VhostUserFSSlaveMsg *sm)
> >  {
> > -    /* TODO */
> > -    return (uint64_t)-1;
> > +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> > +    if (!fs) {
> > +        /* Shouldn't happen - but seen on error path */
> > +        error_report("Bad fs ptr");
> > +        return (uint64_t)-1;
> > +    }
> 
> Same here.

Thanks, fixed.

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 01/24] DAX: vhost-user: Rework slave return values
  2021-02-11  9:59     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-02-18 12:18       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-18 12:18 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:01PM +0000, Dr. David Alan Gilbert (git) wrote:
> > +static uint64_t vhost_user_slave_handle_vring_host_notifier(
> > +                struct vhost_dev *dev,
> > +               VhostUserVringArea *area,
> > +               int fd)
> 
> Indentation looks off. Only worth changing if you respin.

Done.

> > @@ -1398,7 +1399,8 @@ static void slave_read(void *opaque)
> >      struct vhost_user *u = dev->opaque;
> >      VhostUserHeader hdr = { 0, };
> >      VhostUserPayload payload = { 0, };
> > -    int size, ret = 0;
> > +    int size;
> > +    uint64_t ret = 0;
> >      struct iovec iov;
> >      struct msghdr msgh;
> >      int fd[VHOST_USER_SLAVE_MAX_FDS];
> > @@ -1472,7 +1474,7 @@ static void slave_read(void *opaque)
> >          break;
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> > -        ret = -EINVAL;
> > +        ret = (uint64_t)-EINVAL;
> 
> The !!ret was removed below so it would have previously been true (1).
> Now it has changed value.
> 
> If there is no specific reason to change the value, please keep it true
> (1) just in case a vhost-user device backend depends on that value.

Done; although moving to errnos there feels a bit better to me; but yes
the callers aren't audited.

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 01/24] DAX: vhost-user: Rework slave return values
@ 2021-02-18 12:18       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-18 12:18 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:01PM +0000, Dr. David Alan Gilbert (git) wrote:
> > +static uint64_t vhost_user_slave_handle_vring_host_notifier(
> > +                struct vhost_dev *dev,
> > +               VhostUserVringArea *area,
> > +               int fd)
> 
> Indentation looks off. Only worth changing if you respin.

Done.

> > @@ -1398,7 +1399,8 @@ static void slave_read(void *opaque)
> >      struct vhost_user *u = dev->opaque;
> >      VhostUserHeader hdr = { 0, };
> >      VhostUserPayload payload = { 0, };
> > -    int size, ret = 0;
> > +    int size;
> > +    uint64_t ret = 0;
> >      struct iovec iov;
> >      struct msghdr msgh;
> >      int fd[VHOST_USER_SLAVE_MAX_FDS];
> > @@ -1472,7 +1474,7 @@ static void slave_read(void *opaque)
> >          break;
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> > -        ret = -EINVAL;
> > +        ret = (uint64_t)-EINVAL;
> 
> The !!ret was removed below so it would have previously been true (1).
> Now it has changed value.
> 
> If there is no specific reason to change the value, please keep it true
> (1) just in case a vhost-user device backend depends on that value.

Done; although moving to errnos there feels a bit better to me; but yes
the callers aren't audited.

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 06/24] DAX: virtio-fs: Add cache BAR
  2021-02-11 10:25     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-02-18 17:33       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-18 17:33 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:06PM +0000, Dr. David Alan Gilbert (git) wrote:
> > @@ -46,6 +51,26 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> >      }
> >  
> >      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> > +    cachesize = dev->vdev.conf.cache_size;
> > +
> > +    /*
> > +     * The bar starts with the data/DAX cache
> > +     * Others will be added later.
> > +     */
> > +    memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
> > +                       "vhost-fs-pci-cachebar", cachesize);
> 
> s/vhost-fs/vhost-user-fs/ for consistency. Only worth changing if you
> respin.

Done.

> > +    if (cachesize) {
> > +        memory_region_add_subregion(&dev->cachebar, 0, &dev->vdev.cache);
> > +        virtio_pci_add_shm_cap(vpci_dev, VIRTIO_FS_PCI_CACHE_BAR, 0, cachesize,
> > +                               VIRTIO_FS_SHMCAP_ID_CACHE);
> > +    }
> > +
> > +    /* After 'realized' so the memory region exists */
> > +    pci_register_bar(&vpci_dev->pci_dev, VIRTIO_FS_PCI_CACHE_BAR,
> > +                     PCI_BASE_ADDRESS_SPACE_MEMORY |
> > +                     PCI_BASE_ADDRESS_MEM_PREFETCH |
> > +                     PCI_BASE_ADDRESS_MEM_TYPE_64,
> > +                     &dev->cachebar);
> 
> Please include a comment explainig why it's okay to use BAR 2, which is
> already used for the virtio-pci modern io bar (off by default):
> 
>     /*
>      * virtio pci bar layout used by default.
>      * subclasses can re-arrange things if needed.
>      *
>      *   region 0   --  virtio legacy io bar
>      *   region 1   --  msi-x bar
>      *   region 2   --  virtio modern io bar (off by default)
>      *   region 4+5 --  virtio modern memory (64bit) bar
>      *
>      */
> 
> I guess the idea is that the io bar is available since it's off by
> default. What happens if the io bar is enabled?

We don't have many choices; the only other option would be to extend
the modern memory bar at 4/5.

For now, I've added a check:

qemu-system-x86_64: -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,cache-size=4G,modern-pio-notify=true: Cache can not be used together with modern_pio

> Should this bar registration should be conditional (only when cache size
> is greater than 0)?

Yes, added.

Dave


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 06/24] DAX: virtio-fs: Add cache BAR
@ 2021-02-18 17:33       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-18 17:33 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:06PM +0000, Dr. David Alan Gilbert (git) wrote:
> > @@ -46,6 +51,26 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> >      }
> >  
> >      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> > +    cachesize = dev->vdev.conf.cache_size;
> > +
> > +    /*
> > +     * The bar starts with the data/DAX cache
> > +     * Others will be added later.
> > +     */
> > +    memory_region_init(&dev->cachebar, OBJECT(vpci_dev),
> > +                       "vhost-fs-pci-cachebar", cachesize);
> 
> s/vhost-fs/vhost-user-fs/ for consistency. Only worth changing if you
> respin.

Done.

> > +    if (cachesize) {
> > +        memory_region_add_subregion(&dev->cachebar, 0, &dev->vdev.cache);
> > +        virtio_pci_add_shm_cap(vpci_dev, VIRTIO_FS_PCI_CACHE_BAR, 0, cachesize,
> > +                               VIRTIO_FS_SHMCAP_ID_CACHE);
> > +    }
> > +
> > +    /* After 'realized' so the memory region exists */
> > +    pci_register_bar(&vpci_dev->pci_dev, VIRTIO_FS_PCI_CACHE_BAR,
> > +                     PCI_BASE_ADDRESS_SPACE_MEMORY |
> > +                     PCI_BASE_ADDRESS_MEM_PREFETCH |
> > +                     PCI_BASE_ADDRESS_MEM_TYPE_64,
> > +                     &dev->cachebar);
> 
> Please include a comment explainig why it's okay to use BAR 2, which is
> already used for the virtio-pci modern io bar (off by default):
> 
>     /*
>      * virtio pci bar layout used by default.
>      * subclasses can re-arrange things if needed.
>      *
>      *   region 0   --  virtio legacy io bar
>      *   region 1   --  msi-x bar
>      *   region 2   --  virtio modern io bar (off by default)
>      *   region 4+5 --  virtio modern memory (64bit) bar
>      *
>      */
> 
> I guess the idea is that the io bar is available since it's off by
> default. What happens if the io bar is enabled?

We don't have many choices; the only other option would be to extend
the modern memory bar at 4/5.

For now, I've added a check:

qemu-system-x86_64: -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,cache-size=4G,modern-pio-notify=true: Cache can not be used together with modern_pio

> Should this bar registration should be conditional (only when cache size
> is greater than 0)?

Yes, added.

Dave


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
  2021-02-16 15:57           ` [Virtio-fs] " Vivek Goyal
@ 2021-02-22 16:53             ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-22 16:53 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, marcandre.lureau, mst, Dr. David Alan Gilbert (git),
	qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]

On Tue, Feb 16, 2021 at 10:57:10AM -0500, Vivek Goyal wrote:
> On Mon, Feb 15, 2021 at 03:57:11PM +0000, Stefan Hajnoczi wrote:
> > On Thu, Feb 11, 2021 at 09:40:31AM -0500, Vivek Goyal wrote:
> > > On Thu, Feb 11, 2021 at 02:35:42PM +0000, Stefan Hajnoczi wrote:
> > > > On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > From: Vivek Goyal <vgoyal@redhat.com>
> > > > > 
> > > > > As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> > > > > slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> > > > > Implement functionality to drop CAP_FSETID and gain it back after the
> > > > > operation.
> > > > > 
> > > > > This also creates a dependency on libcap-ng.
> > > > 
> > > > Is this patch only for the case where QEMU is running as root?
> > > > 
> > > 
> > > Yes, it primarily is for the case where qemu is running as root, or
> > > somebody managed to launch it non-root but with still having capability
> > > CAP_FSETID.
> > 
> > Running QEMU as root is not encouraged because the security model is
> > designed around the principle of least privilege (only give QEMU access
> > to resources that belong to the guest).
> > 
> > What happens in the case where QEMU is not root? Does that mean QEMU
> > will drop suid/guid bits even if the FUSE client wanted them to be
> > preserved?
> 
> QEMU will drop CAP_FSETID only if vhost-user slave asked for it. There
> is no notion of gaining CAP_FSETID.
> 
> IOW, yes, if qemu is running unpriviliged and does not have CAP_FSETID,
> then we will end up clearing setuid bit on host. Not sure how that
> problem can be fixed.

Yeah, that seems problematic since the suid bit should stay set in that
case. The host cannot set the bit again (even if it has privileges)
because that would create a race condition where the guest expects the
bit set but it's cleared.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality
@ 2021-02-22 16:53             ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-02-22 16:53 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, marcandre.lureau, mst, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]

On Tue, Feb 16, 2021 at 10:57:10AM -0500, Vivek Goyal wrote:
> On Mon, Feb 15, 2021 at 03:57:11PM +0000, Stefan Hajnoczi wrote:
> > On Thu, Feb 11, 2021 at 09:40:31AM -0500, Vivek Goyal wrote:
> > > On Thu, Feb 11, 2021 at 02:35:42PM +0000, Stefan Hajnoczi wrote:
> > > > On Tue, Feb 09, 2021 at 07:02:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > From: Vivek Goyal <vgoyal@redhat.com>
> > > > > 
> > > > > As part of slave_io message, slave can ask to do I/O on an fd. Additionally
> > > > > slave can ask for dropping CAP_FSETID (if master has it) before doing I/O.
> > > > > Implement functionality to drop CAP_FSETID and gain it back after the
> > > > > operation.
> > > > > 
> > > > > This also creates a dependency on libcap-ng.
> > > > 
> > > > Is this patch only for the case where QEMU is running as root?
> > > > 
> > > 
> > > Yes, it primarily is for the case where qemu is running as root, or
> > > somebody managed to launch it non-root but with still having capability
> > > CAP_FSETID.
> > 
> > Running QEMU as root is not encouraged because the security model is
> > designed around the principle of least privilege (only give QEMU access
> > to resources that belong to the guest).
> > 
> > What happens in the case where QEMU is not root? Does that mean QEMU
> > will drop suid/guid bits even if the FUSE client wanted them to be
> > preserved?
> 
> QEMU will drop CAP_FSETID only if vhost-user slave asked for it. There
> is no notion of gaining CAP_FSETID.
> 
> IOW, yes, if qemu is running unpriviliged and does not have CAP_FSETID,
> then we will end up clearing setuid bit on host. Not sure how that
> problem can be fixed.

Yeah, that seems problematic since the suid bit should stay set in that
case. The host cannot set the bit again (even if it has privileges)
because that would create a race condition where the guest expects the
bit set but it's cleared.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
  2021-02-11 14:29     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-02-25 10:19       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-25 10:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > For some read/writes the virtio queue elements are unmappable by
> > the daemon; these are cases where the data is to be read/written
> > from non-RAM.  In viritofs's case this is typically a direct read/write
> > into an mmap'd DAX file also on virtiofs (possibly on another instance).
> > 
> > When we receive a virtio queue element, check that we have enough
> > mappable data to handle the headers.  Make a note of the number of
> > unmappable 'in' entries (ie. for read data back to the VMM),
> > and flag the fuse_bufvec for 'out' entries with a new flag
> > FUSE_BUF_PHYS_ADDR.
> 
> Looking back at this I think vhost-user will need generic
> READ_MEMORY/WRITE_MEMORY commands. It's okay for virtio-fs to have its
> own IO command (although not strictly necessary).
> 
> With generic READ_MEMORY/WRITE_MEMORY libvhost-user and other vhost-user
> device backend implementations can handle vring descriptors that point
> into the DAX window. This can be done transparently so individual device
> implementations (net, blk, etc) don't even know when memory is copied vs
> zero-copy shared memory access.
> 
> So this approach is okay for virtio-fs but it's not a long-term solution
> for all of vhost-user. Eventually the long-term solution may be needed
> so that other VIRTIO devices that have shared memory resources work.
> 
> Another bonus of READ_MEMORY/WRITE_MEMORY is that users that prefer an
> enforcing vIOMMU can disable shared memory (maybe just keep the vring
> itself mmapped).
> 
> I just wanted to share this idea but don't expect it to be addressed in
> this patch series.

Yes, that would be nice; although in this case it would imply an extra
memory copy; you'd have to do the IO in the daemon, and then perform a
read/write back across the socket.

> > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > index a090040bb2..ed9280de91 100644
> > --- a/tools/virtiofsd/fuse_common.h
> > +++ b/tools/virtiofsd/fuse_common.h
> > @@ -611,6 +611,13 @@ enum fuse_buf_flags {
> >       * detected.
> >       */
> >      FUSE_BUF_FD_RETRY = (1 << 3),
> > +
> > +    /**
> > +     * The addresses in the iovec represent guest physical addresses
> > +     * that can't be mapped by the daemon process.
> > +     * IO must be bounced back to the VMM to do it.
> > +     */
> > +    FUSE_BUF_PHYS_ADDR = (1 << 4),
> 
> With a vIOMMU it's an IOVA. Without a vIOMMU it's a GPA. This constant
> may need to be renamed in the future, but it is okay for now.

Do we have any naming for something that's either a GPA or a IOVA?

> > +    if (req->bad_in_num || req->bad_out_num) {
> > +        bool handled_unmappable = false;
> > +
> > +        if (out_num > 2 && out_num_readable >= 2 && !req->bad_in_num &&
> > +            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
> > +            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
> > +            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
> 
> This violates the VIRTIO specification:
> 
>   2.6.4.1 Device Requirements: Message Framing
> 
>   The device MUST NOT make assumptions about the particular arrangement of descriptors.
> 
>   https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-280004
> 
> The driver is not obligated to submit separate iovecs. out_num == 1 is
> valid and the device needs to process it byte-wise instead of making
> assumptions about iovec layout.

Yes, it's actually not new in this patch, but I'll clean it up.
I took the shortcut all the way back in:
  e17f7a580e2c599330ad virtiofsd: Pass write iov's all the way through

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
@ 2021-02-25 10:19       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-02-25 10:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > For some read/writes the virtio queue elements are unmappable by
> > the daemon; these are cases where the data is to be read/written
> > from non-RAM.  In viritofs's case this is typically a direct read/write
> > into an mmap'd DAX file also on virtiofs (possibly on another instance).
> > 
> > When we receive a virtio queue element, check that we have enough
> > mappable data to handle the headers.  Make a note of the number of
> > unmappable 'in' entries (ie. for read data back to the VMM),
> > and flag the fuse_bufvec for 'out' entries with a new flag
> > FUSE_BUF_PHYS_ADDR.
> 
> Looking back at this I think vhost-user will need generic
> READ_MEMORY/WRITE_MEMORY commands. It's okay for virtio-fs to have its
> own IO command (although not strictly necessary).
> 
> With generic READ_MEMORY/WRITE_MEMORY libvhost-user and other vhost-user
> device backend implementations can handle vring descriptors that point
> into the DAX window. This can be done transparently so individual device
> implementations (net, blk, etc) don't even know when memory is copied vs
> zero-copy shared memory access.
> 
> So this approach is okay for virtio-fs but it's not a long-term solution
> for all of vhost-user. Eventually the long-term solution may be needed
> so that other VIRTIO devices that have shared memory resources work.
> 
> Another bonus of READ_MEMORY/WRITE_MEMORY is that users that prefer an
> enforcing vIOMMU can disable shared memory (maybe just keep the vring
> itself mmapped).
> 
> I just wanted to share this idea but don't expect it to be addressed in
> this patch series.

Yes, that would be nice; although in this case it would imply an extra
memory copy; you'd have to do the IO in the daemon, and then perform a
read/write back across the socket.

> > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > index a090040bb2..ed9280de91 100644
> > --- a/tools/virtiofsd/fuse_common.h
> > +++ b/tools/virtiofsd/fuse_common.h
> > @@ -611,6 +611,13 @@ enum fuse_buf_flags {
> >       * detected.
> >       */
> >      FUSE_BUF_FD_RETRY = (1 << 3),
> > +
> > +    /**
> > +     * The addresses in the iovec represent guest physical addresses
> > +     * that can't be mapped by the daemon process.
> > +     * IO must be bounced back to the VMM to do it.
> > +     */
> > +    FUSE_BUF_PHYS_ADDR = (1 << 4),
> 
> With a vIOMMU it's an IOVA. Without a vIOMMU it's a GPA. This constant
> may need to be renamed in the future, but it is okay for now.

Do we have any naming for something that's either a GPA or a IOVA?

> > +    if (req->bad_in_num || req->bad_out_num) {
> > +        bool handled_unmappable = false;
> > +
> > +        if (out_num > 2 && out_num_readable >= 2 && !req->bad_in_num &&
> > +            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
> > +            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
> > +            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
> 
> This violates the VIRTIO specification:
> 
>   2.6.4.1 Device Requirements: Message Framing
> 
>   The device MUST NOT make assumptions about the particular arrangement of descriptors.
> 
>   https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-280004
> 
> The driver is not obligated to submit separate iovecs. out_num == 1 is
> valid and the device needs to process it byte-wise instead of making
> assumptions about iovec layout.

Yes, it's actually not new in this patch, but I'll clean it up.
I took the shortcut all the way back in:
  e17f7a580e2c599330ad virtiofsd: Pass write iov's all the way through

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-02-11 10:32     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-03-08 17:04       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-08 17:04 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:07PM +0000, Dr. David Alan Gilbert (git) wrote:
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d6085f7045..1deedd3407 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1432,6 +1432,26 @@ Slave message types
> >  
> >    The state.num field is currently reserved and must be set to 0.
> >  
> > +``VHOST_USER_SLAVE_FS_MAP``
> > +  :id: 6
> > +  :equivalent ioctl: N/A
> > +  :slave payload: fd + n * (offset + address + len)
> 
> I'm not sure I understand this notation. '+' means field concatenation?
> Is 'fd' a field or does it indicate file descriptor passing?
> 
> I suggest using a struct name instead of informal notation so that the
> payload size and representation is clear.
> 
> The same applies for VHOST_USER_SLAVE_FS_UNMAP.
> 
> > +  :master payload: N/A
> > +
> > +  Requests that the QEMU mmap the given fd into the virtio-fs cache;
> 
> s/QEMU mmap the given fd/given fd be mmapped/
> 
> Please avoid mentioning QEMU specifically. Any VMM should be able to
> implement this spec.
> 
> The same applies for VHOST_USER_SLAVE_FS_UNMAP.

OK, I've changed this to:

+``VHOST_USER_SLAVE_FS_MAP``
+  :id: 6
+  :equivalent ioctl: N/A
+  :slave payload: ``struct VhostUserFSSlaveMsg``
+  :master payload: N/A
+
+  Requests that an fd, provided in the ancillary data, be mmapped
+  into the virtio-fs cache; multiple chunks can be mapped in one
+  command.
+  A reply is generated indicating whether mapping succeeded.
+
+``VHOST_USER_SLAVE_FS_UNMAP``
+  :id: 7
+  :equivalent ioctl: N/A
+  :slave payload: ``struct VhostUserFSSlaveMsg``
+  :master payload: N/A
+
+  Requests that the range in the virtio-fs cache is unmapped;
+  multiple chunks can be unmapped in one command.
+  A reply is generated indicating whether unmapping succeeded.
+

(Although it'll get a little more complicated as I rework for
Chirantan's comment)

Dave

> > +  multiple chunks can be mapped in one command.
> > +  A reply is generated indicating whether mapping succeeded.
> > +
> > +``VHOST_USER_SLAVE_FS_UNMAP``
> > +  :id: 7
> > +  :equivalent ioctl: N/A
> > +  :slave payload: n * (address + len)
> > +  :master payload: N/A
> > +
> > +  Requests that the QEMU un-mmap the given range in the virtio-fs cache;
> > +  multiple chunks can be unmapped in one command.
> > +  A reply is generated indicating whether unmapping succeeded.


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
@ 2021-03-08 17:04       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-08 17:04 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:07PM +0000, Dr. David Alan Gilbert (git) wrote:
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d6085f7045..1deedd3407 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1432,6 +1432,26 @@ Slave message types
> >  
> >    The state.num field is currently reserved and must be set to 0.
> >  
> > +``VHOST_USER_SLAVE_FS_MAP``
> > +  :id: 6
> > +  :equivalent ioctl: N/A
> > +  :slave payload: fd + n * (offset + address + len)
> 
> I'm not sure I understand this notation. '+' means field concatenation?
> Is 'fd' a field or does it indicate file descriptor passing?
> 
> I suggest using a struct name instead of informal notation so that the
> payload size and representation is clear.
> 
> The same applies for VHOST_USER_SLAVE_FS_UNMAP.
> 
> > +  :master payload: N/A
> > +
> > +  Requests that the QEMU mmap the given fd into the virtio-fs cache;
> 
> s/QEMU mmap the given fd/given fd be mmapped/
> 
> Please avoid mentioning QEMU specifically. Any VMM should be able to
> implement this spec.
> 
> The same applies for VHOST_USER_SLAVE_FS_UNMAP.

OK, I've changed this to:

+``VHOST_USER_SLAVE_FS_MAP``
+  :id: 6
+  :equivalent ioctl: N/A
+  :slave payload: ``struct VhostUserFSSlaveMsg``
+  :master payload: N/A
+
+  Requests that an fd, provided in the ancillary data, be mmapped
+  into the virtio-fs cache; multiple chunks can be mapped in one
+  command.
+  A reply is generated indicating whether mapping succeeded.
+
+``VHOST_USER_SLAVE_FS_UNMAP``
+  :id: 7
+  :equivalent ioctl: N/A
+  :slave payload: ``struct VhostUserFSSlaveMsg``
+  :master payload: N/A
+
+  Requests that the range in the virtio-fs cache is unmapped;
+  multiple chunks can be unmapped in one command.
+  A reply is generated indicating whether unmapping succeeded.
+

(Although it'll get a little more complicated as I rework for
Chirantan's comment)

Dave

> > +  multiple chunks can be mapped in one command.
> > +  A reply is generated indicating whether mapping succeeded.
> > +
> > +``VHOST_USER_SLAVE_FS_UNMAP``
> > +  :id: 7
> > +  :equivalent ioctl: N/A
> > +  :slave payload: n * (address + len)
> > +  :master payload: N/A
> > +
> > +  Requests that the QEMU un-mmap the given range in the virtio-fs cache;
> > +  multiple chunks can be unmapped in one command.
> > +  A reply is generated indicating whether unmapping succeeded.


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-02-15 10:35     ` Chirantan Ekbote
@ 2021-03-11 12:15       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-11 12:15 UTC (permalink / raw)
  To: Chirantan Ekbote
  Cc: mst, qemu-devel, virtio-fs-list, Stefan Hajnoczi,
	marcandre.lureau, Vivek Goyal

* Chirantan Ekbote (chirantan@chromium.org) wrote:
> On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
> >
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > +
> > +typedef struct {
> > +    /* Offsets within the file being mapped */
> > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Offsets within the cache */
> > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Lengths of sections */
> > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > +} VhostUserFSSlaveMsg;
> > +
> 
> Is it too late to change this?  This struct allocates space for up to
> 8 entries but most of the time the server will only try to set up one
> mapping at a time so only 32 out of the 256 bytes in the message are
> actually being used.  We're just wasting time memcpy'ing bytes that
> will never be used.  Is there some reason this can't be dynamically
> sized?  Something like:
> 
> typedef struct {
>     /* Number of mapping requests */
>     uint16_t num_requests;
>     /* `num_requests` mapping requests */
>    MappingRequest requests[];
> } VhostUserFSSlaveMsg;
> 
> typedef struct {
>     /* Offset within the file being mapped */
>     uint64_t fd_offset;
>     /* Offset within the cache */
>     uint64_t c_offset;
>     /* Length of section */
>     uint64_t len;
>     /* Flags, from VHOST_USER_FS_FLAG_* */
>     uint64_t flags;
> } MappingRequest;
> 
> The current pre-allocated structure both wastes space when there are
> fewer than 8 requests and requires extra messages to be sent when
> there are more than 8 requests.  I realize that in the grand scheme of
> things copying 224 extra bytes is basically not noticeable but it just
> irks me that we could fix this really easily before it gets propagated
> to too many other places.

So this has come out as:

typedef struct {
    /* Offsets within the file being mapped */
    uint64_t fd_offset;
    /* Offsets within the cache */
    uint64_t c_offset;
    /* Lengths of sections */
    uint64_t len;
    /* Flags, from VHOST_USER_FS_FLAG_* */
    uint64_t flags;
} VhostUserFSSlaveMsgEntry;
 
typedef struct {
    /* Generic flags for the overall message */
    uint32_t flags;
    /* Number of entries */
    uint16_t count;
    /* Spare */
    uint16_t align;
 
    VhostUserFSSlaveMsgEntry entries[];
} VhostUserFSSlaveMsg;

which seems to work OK.
I've still got a:
#define VHOST_USER_FS_SLAVE_MAX_ENTRIES 8

to limit the size VhostUserFSSlaveMsg can get to.
The variable length array makes the union in the reader a bit more
hairy, but it's OK.

Dave

> Chirantan
> 
> > --
> > 2.29.2
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://www.redhat.com/mailman/listinfo/virtio-fs
> >
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
@ 2021-03-11 12:15       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-11 12:15 UTC (permalink / raw)
  To: Chirantan Ekbote
  Cc: mst, qemu-devel, virtio-fs-list, marcandre.lureau, Vivek Goyal

* Chirantan Ekbote (chirantan@chromium.org) wrote:
> On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
> >
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > +
> > +typedef struct {
> > +    /* Offsets within the file being mapped */
> > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Offsets within the cache */
> > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Lengths of sections */
> > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > +} VhostUserFSSlaveMsg;
> > +
> 
> Is it too late to change this?  This struct allocates space for up to
> 8 entries but most of the time the server will only try to set up one
> mapping at a time so only 32 out of the 256 bytes in the message are
> actually being used.  We're just wasting time memcpy'ing bytes that
> will never be used.  Is there some reason this can't be dynamically
> sized?  Something like:
> 
> typedef struct {
>     /* Number of mapping requests */
>     uint16_t num_requests;
>     /* `num_requests` mapping requests */
>    MappingRequest requests[];
> } VhostUserFSSlaveMsg;
> 
> typedef struct {
>     /* Offset within the file being mapped */
>     uint64_t fd_offset;
>     /* Offset within the cache */
>     uint64_t c_offset;
>     /* Length of section */
>     uint64_t len;
>     /* Flags, from VHOST_USER_FS_FLAG_* */
>     uint64_t flags;
> } MappingRequest;
> 
> The current pre-allocated structure both wastes space when there are
> fewer than 8 requests and requires extra messages to be sent when
> there are more than 8 requests.  I realize that in the grand scheme of
> things copying 224 extra bytes is basically not noticeable but it just
> irks me that we could fix this really easily before it gets propagated
> to too many other places.

So this has come out as:

typedef struct {
    /* Offsets within the file being mapped */
    uint64_t fd_offset;
    /* Offsets within the cache */
    uint64_t c_offset;
    /* Lengths of sections */
    uint64_t len;
    /* Flags, from VHOST_USER_FS_FLAG_* */
    uint64_t flags;
} VhostUserFSSlaveMsgEntry;
 
typedef struct {
    /* Generic flags for the overall message */
    uint32_t flags;
    /* Number of entries */
    uint16_t count;
    /* Spare */
    uint16_t align;
 
    VhostUserFSSlaveMsgEntry entries[];
} VhostUserFSSlaveMsg;

which seems to work OK.
I've still got a:
#define VHOST_USER_FS_SLAVE_MAX_ENTRIES 8

to limit the size VhostUserFSSlaveMsg can get to.
The variable length array makes the union in the reader a bit more
hairy, but it's OK.

Dave

> Chirantan
> 
> > --
> > 2.29.2
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://www.redhat.com/mailman/listinfo/virtio-fs
> >
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-03-11 12:15       ` Dr. David Alan Gilbert
@ 2021-03-11 13:50         ` Vivek Goyal
  -1 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-03-11 13:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: mst, Chirantan Ekbote, qemu-devel, virtio-fs-list,
	Stefan Hajnoczi, marcandre.lureau

On Thu, Mar 11, 2021 at 12:15:09PM +0000, Dr. David Alan Gilbert wrote:
> * Chirantan Ekbote (chirantan@chromium.org) wrote:
> > On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> > <dgilbert@redhat.com> wrote:
> > >
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > +
> > > +typedef struct {
> > > +    /* Offsets within the file being mapped */
> > > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > > +    /* Offsets within the cache */
> > > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > > +    /* Lengths of sections */
> > > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > > +} VhostUserFSSlaveMsg;
> > > +
> > 
> > Is it too late to change this?  This struct allocates space for up to
> > 8 entries but most of the time the server will only try to set up one
> > mapping at a time so only 32 out of the 256 bytes in the message are
> > actually being used.  We're just wasting time memcpy'ing bytes that
> > will never be used.  Is there some reason this can't be dynamically
> > sized?  Something like:
> > 
> > typedef struct {
> >     /* Number of mapping requests */
> >     uint16_t num_requests;
> >     /* `num_requests` mapping requests */
> >    MappingRequest requests[];
> > } VhostUserFSSlaveMsg;
> > 
> > typedef struct {
> >     /* Offset within the file being mapped */
> >     uint64_t fd_offset;
> >     /* Offset within the cache */
> >     uint64_t c_offset;
> >     /* Length of section */
> >     uint64_t len;
> >     /* Flags, from VHOST_USER_FS_FLAG_* */
> >     uint64_t flags;
> > } MappingRequest;
> > 
> > The current pre-allocated structure both wastes space when there are
> > fewer than 8 requests and requires extra messages to be sent when
> > there are more than 8 requests.  I realize that in the grand scheme of
> > things copying 224 extra bytes is basically not noticeable but it just
> > irks me that we could fix this really easily before it gets propagated
> > to too many other places.
> 
> So this has come out as:
> 
> typedef struct {
>     /* Offsets within the file being mapped */
>     uint64_t fd_offset;
>     /* Offsets within the cache */
>     uint64_t c_offset;
>     /* Lengths of sections */
>     uint64_t len;
>     /* Flags, from VHOST_USER_FS_FLAG_* */
>     uint64_t flags;
> } VhostUserFSSlaveMsgEntry;
>  
> typedef struct {
>     /* Generic flags for the overall message */
>     uint32_t flags;
>     /* Number of entries */
>     uint16_t count;
>     /* Spare */
>     uint16_t align;
>  
>     VhostUserFSSlaveMsgEntry entries[];
> } VhostUserFSSlaveMsg;
> 
> which seems to work OK.
> I've still got a:
> #define VHOST_USER_FS_SLAVE_MAX_ENTRIES 8

Hi Dave,

So if we were to raise this limit down the line, will it be just a matter
of changing this numebr and recompile qemu + virtiofsd? Or this is just
a limit on sender and qemu does not care.

If qemu cares about number of entries, then it will be good to raise this
limit to say 32 or 64.

Otherwise new definitions look good.

Thanks
Vivek

> 
> to limit the size VhostUserFSSlaveMsg can get to.
> The variable length array makes the union in the reader a bit more
> hairy, but it's OK.
> 
> Dave
> 
> > Chirantan
> > 
> > > --
> > > 2.29.2
> > >
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://www.redhat.com/mailman/listinfo/virtio-fs
> > >
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
@ 2021-03-11 13:50         ` Vivek Goyal
  0 siblings, 0 replies; 138+ messages in thread
From: Vivek Goyal @ 2021-03-11 13:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: mst, qemu-devel, virtio-fs-list, marcandre.lureau

On Thu, Mar 11, 2021 at 12:15:09PM +0000, Dr. David Alan Gilbert wrote:
> * Chirantan Ekbote (chirantan@chromium.org) wrote:
> > On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> > <dgilbert@redhat.com> wrote:
> > >
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > +
> > > +typedef struct {
> > > +    /* Offsets within the file being mapped */
> > > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > > +    /* Offsets within the cache */
> > > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > > +    /* Lengths of sections */
> > > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > > +} VhostUserFSSlaveMsg;
> > > +
> > 
> > Is it too late to change this?  This struct allocates space for up to
> > 8 entries but most of the time the server will only try to set up one
> > mapping at a time so only 32 out of the 256 bytes in the message are
> > actually being used.  We're just wasting time memcpy'ing bytes that
> > will never be used.  Is there some reason this can't be dynamically
> > sized?  Something like:
> > 
> > typedef struct {
> >     /* Number of mapping requests */
> >     uint16_t num_requests;
> >     /* `num_requests` mapping requests */
> >    MappingRequest requests[];
> > } VhostUserFSSlaveMsg;
> > 
> > typedef struct {
> >     /* Offset within the file being mapped */
> >     uint64_t fd_offset;
> >     /* Offset within the cache */
> >     uint64_t c_offset;
> >     /* Length of section */
> >     uint64_t len;
> >     /* Flags, from VHOST_USER_FS_FLAG_* */
> >     uint64_t flags;
> > } MappingRequest;
> > 
> > The current pre-allocated structure both wastes space when there are
> > fewer than 8 requests and requires extra messages to be sent when
> > there are more than 8 requests.  I realize that in the grand scheme of
> > things copying 224 extra bytes is basically not noticeable but it just
> > irks me that we could fix this really easily before it gets propagated
> > to too many other places.
> 
> So this has come out as:
> 
> typedef struct {
>     /* Offsets within the file being mapped */
>     uint64_t fd_offset;
>     /* Offsets within the cache */
>     uint64_t c_offset;
>     /* Lengths of sections */
>     uint64_t len;
>     /* Flags, from VHOST_USER_FS_FLAG_* */
>     uint64_t flags;
> } VhostUserFSSlaveMsgEntry;
>  
> typedef struct {
>     /* Generic flags for the overall message */
>     uint32_t flags;
>     /* Number of entries */
>     uint16_t count;
>     /* Spare */
>     uint16_t align;
>  
>     VhostUserFSSlaveMsgEntry entries[];
> } VhostUserFSSlaveMsg;
> 
> which seems to work OK.
> I've still got a:
> #define VHOST_USER_FS_SLAVE_MAX_ENTRIES 8

Hi Dave,

So if we were to raise this limit down the line, will it be just a matter
of changing this numebr and recompile qemu + virtiofsd? Or this is just
a limit on sender and qemu does not care.

If qemu cares about number of entries, then it will be good to raise this
limit to say 32 or 64.

Otherwise new definitions look good.

Thanks
Vivek

> 
> to limit the size VhostUserFSSlaveMsg can get to.
> The variable length array makes the union in the reader a bit more
> hairy, but it's OK.
> 
> Dave
> 
> > Chirantan
> > 
> > > --
> > > 2.29.2
> > >
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://www.redhat.com/mailman/listinfo/virtio-fs
> > >
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
  2021-03-11 13:50         ` Vivek Goyal
@ 2021-03-11 18:52           ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-11 18:52 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: mst, Chirantan Ekbote, qemu-devel, virtio-fs-list,
	Stefan Hajnoczi, marcandre.lureau

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Thu, Mar 11, 2021 at 12:15:09PM +0000, Dr. David Alan Gilbert wrote:
> > * Chirantan Ekbote (chirantan@chromium.org) wrote:
> > > On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> > > <dgilbert@redhat.com> wrote:
> > > >
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > +
> > > > +typedef struct {
> > > > +    /* Offsets within the file being mapped */
> > > > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > > > +    /* Offsets within the cache */
> > > > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > > > +    /* Lengths of sections */
> > > > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > > > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > > > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > > > +} VhostUserFSSlaveMsg;
> > > > +
> > > 
> > > Is it too late to change this?  This struct allocates space for up to
> > > 8 entries but most of the time the server will only try to set up one
> > > mapping at a time so only 32 out of the 256 bytes in the message are
> > > actually being used.  We're just wasting time memcpy'ing bytes that
> > > will never be used.  Is there some reason this can't be dynamically
> > > sized?  Something like:
> > > 
> > > typedef struct {
> > >     /* Number of mapping requests */
> > >     uint16_t num_requests;
> > >     /* `num_requests` mapping requests */
> > >    MappingRequest requests[];
> > > } VhostUserFSSlaveMsg;
> > > 
> > > typedef struct {
> > >     /* Offset within the file being mapped */
> > >     uint64_t fd_offset;
> > >     /* Offset within the cache */
> > >     uint64_t c_offset;
> > >     /* Length of section */
> > >     uint64_t len;
> > >     /* Flags, from VHOST_USER_FS_FLAG_* */
> > >     uint64_t flags;
> > > } MappingRequest;
> > > 
> > > The current pre-allocated structure both wastes space when there are
> > > fewer than 8 requests and requires extra messages to be sent when
> > > there are more than 8 requests.  I realize that in the grand scheme of
> > > things copying 224 extra bytes is basically not noticeable but it just
> > > irks me that we could fix this really easily before it gets propagated
> > > to too many other places.
> > 
> > So this has come out as:
> > 
> > typedef struct {
> >     /* Offsets within the file being mapped */
> >     uint64_t fd_offset;
> >     /* Offsets within the cache */
> >     uint64_t c_offset;
> >     /* Lengths of sections */
> >     uint64_t len;
> >     /* Flags, from VHOST_USER_FS_FLAG_* */
> >     uint64_t flags;
> > } VhostUserFSSlaveMsgEntry;
> >  
> > typedef struct {
> >     /* Generic flags for the overall message */
> >     uint32_t flags;
> >     /* Number of entries */
> >     uint16_t count;
> >     /* Spare */
> >     uint16_t align;
> >  
> >     VhostUserFSSlaveMsgEntry entries[];
> > } VhostUserFSSlaveMsg;
> > 
> > which seems to work OK.
> > I've still got a:
> > #define VHOST_USER_FS_SLAVE_MAX_ENTRIES 8
> 
> Hi Dave,
> 
> So if we were to raise this limit down the line, will it be just a matter
> of changing this numebr and recompile qemu + virtiofsd? Or this is just
> a limit on sender and qemu does not care.

They have to agree; 
> If qemu cares about number of entries, then it will be good to raise this
> limit to say 32 or 64.

I've bumped it to 32.

Dave

> Otherwise new definitions look good.
> 
> Thanks
> Vivek
> 
> > 
> > to limit the size VhostUserFSSlaveMsg can get to.
> > The variable length array makes the union in the reader a bit more
> > hairy, but it's OK.
> > 
> > Dave
> > 
> > > Chirantan
> > > 
> > > > --
> > > > 2.29.2
> > > >
> > > > _______________________________________________
> > > > Virtio-fs mailing list
> > > > Virtio-fs@redhat.com
> > > > https://www.redhat.com/mailman/listinfo/virtio-fs
> > > >
> > > 
> > -- 
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping
@ 2021-03-11 18:52           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-11 18:52 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: mst, qemu-devel, virtio-fs-list, marcandre.lureau

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Thu, Mar 11, 2021 at 12:15:09PM +0000, Dr. David Alan Gilbert wrote:
> > * Chirantan Ekbote (chirantan@chromium.org) wrote:
> > > On Wed, Feb 10, 2021 at 4:04 AM Dr. David Alan Gilbert (git)
> > > <dgilbert@redhat.com> wrote:
> > > >
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > +
> > > > +typedef struct {
> > > > +    /* Offsets within the file being mapped */
> > > > +    uint64_t fd_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > > > +    /* Offsets within the cache */
> > > > +    uint64_t c_offset[VHOST_USER_FS_SLAVE_ENTRIES];
> > > > +    /* Lengths of sections */
> > > > +    uint64_t len[VHOST_USER_FS_SLAVE_ENTRIES];
> > > > +    /* Flags, from VHOST_USER_FS_FLAG_* */
> > > > +    uint64_t flags[VHOST_USER_FS_SLAVE_ENTRIES];
> > > > +} VhostUserFSSlaveMsg;
> > > > +
> > > 
> > > Is it too late to change this?  This struct allocates space for up to
> > > 8 entries but most of the time the server will only try to set up one
> > > mapping at a time so only 32 out of the 256 bytes in the message are
> > > actually being used.  We're just wasting time memcpy'ing bytes that
> > > will never be used.  Is there some reason this can't be dynamically
> > > sized?  Something like:
> > > 
> > > typedef struct {
> > >     /* Number of mapping requests */
> > >     uint16_t num_requests;
> > >     /* `num_requests` mapping requests */
> > >    MappingRequest requests[];
> > > } VhostUserFSSlaveMsg;
> > > 
> > > typedef struct {
> > >     /* Offset within the file being mapped */
> > >     uint64_t fd_offset;
> > >     /* Offset within the cache */
> > >     uint64_t c_offset;
> > >     /* Length of section */
> > >     uint64_t len;
> > >     /* Flags, from VHOST_USER_FS_FLAG_* */
> > >     uint64_t flags;
> > > } MappingRequest;
> > > 
> > > The current pre-allocated structure both wastes space when there are
> > > fewer than 8 requests and requires extra messages to be sent when
> > > there are more than 8 requests.  I realize that in the grand scheme of
> > > things copying 224 extra bytes is basically not noticeable but it just
> > > irks me that we could fix this really easily before it gets propagated
> > > to too many other places.
> > 
> > So this has come out as:
> > 
> > typedef struct {
> >     /* Offsets within the file being mapped */
> >     uint64_t fd_offset;
> >     /* Offsets within the cache */
> >     uint64_t c_offset;
> >     /* Lengths of sections */
> >     uint64_t len;
> >     /* Flags, from VHOST_USER_FS_FLAG_* */
> >     uint64_t flags;
> > } VhostUserFSSlaveMsgEntry;
> >  
> > typedef struct {
> >     /* Generic flags for the overall message */
> >     uint32_t flags;
> >     /* Number of entries */
> >     uint16_t count;
> >     /* Spare */
> >     uint16_t align;
> >  
> >     VhostUserFSSlaveMsgEntry entries[];
> > } VhostUserFSSlaveMsg;
> > 
> > which seems to work OK.
> > I've still got a:
> > #define VHOST_USER_FS_SLAVE_MAX_ENTRIES 8
> 
> Hi Dave,
> 
> So if we were to raise this limit down the line, will it be just a matter
> of changing this numebr and recompile qemu + virtiofsd? Or this is just
> a limit on sender and qemu does not care.

They have to agree; 
> If qemu cares about number of entries, then it will be good to raise this
> limit to say 32 or 64.

I've bumped it to 32.

Dave

> Otherwise new definitions look good.
> 
> Thanks
> Vivek
> 
> > 
> > to limit the size VhostUserFSSlaveMsg can get to.
> > The variable length array makes the union in the reader a bit more
> > hairy, but it's OK.
> > 
> > Dave
> > 
> > > Chirantan
> > > 
> > > > --
> > > > 2.29.2
> > > >
> > > > _______________________________________________
> > > > Virtio-fs mailing list
> > > > Virtio-fs@redhat.com
> > > > https://www.redhat.com/mailman/listinfo/virtio-fs
> > > >
> > > 
> > -- 
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 16/24] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  2021-02-11 14:17     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-03-16 19:59       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-16 19:59 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
> > client to ask qemu to perform a read/write from an fd directly
> > to GPA.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  docs/interop/vhost-user.rst               | 11 +++
> >  hw/virtio/trace-events                    |  6 ++
> >  hw/virtio/vhost-user-fs.c                 | 84 +++++++++++++++++++++++
> >  hw/virtio/vhost-user.c                    |  4 ++
> >  include/hw/virtio/vhost-user-fs.h         |  2 +
> >  subprojects/libvhost-user/libvhost-user.h |  1 +
> >  6 files changed, 108 insertions(+)
> > 
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index 1deedd3407..821712f4a2 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1452,6 +1452,17 @@ Slave message types
> >    multiple chunks can be unmapped in one command.
> >    A reply is generated indicating whether unmapping succeeded.
> >  
> > +``VHOST_USER_SLAVE_FS_IO``
> > +  :id: 9
> > +  :equivalent ioctl: N/A
> > +  :slave payload: fd + n * (offset + address + len)
> 
> Please clarify the payload representation. This is not enough for
> someone to implement the spec.

Done:
)
+  :slave payload: ``struct VhostUserFSSlaveMsg``
   :master payload: N/A
 
+  Requests that IO be performed directly from an fd, passed in ancillary
+  data, to guest memory on behalf of the daemon; this is normally for a
+  case where a memory region isn't visible to the daemon. slave payload
+  has flags which determine the direction of IO operation.
+
 
 .. 
> > +  :master payload: N/A
> > +
> > +  Requests that the QEMU performs IO directly from an fd to guest memory
> 
> To avoid naming a particular VMM:
> 
> s/the QEMU performs IO/IO be performed/
> 
> > +  on behalf of the daemon; this is normally for a case where a memory region
> > +  isn't visible to the daemon. slave payload has flags which determine
> > +  the direction of IO operation.
> 
> Please document the payload flags in the spec.


+  The ``VHOST_USER_FS_FLAG_MAP_R`` flag must be set in the ``flags`` field to
+  read from the file into RAM.
+  The ``VHOST_USER_FS_FLAG_MAP_W`` flag must be set in the ``flags`` field to
+  write to the file from RAM.

> > +
> >  .. _reply_ack:
> >  
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index c62727f879..20557a078e 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
> >  vhost_vdpa_set_owner(void *dev) "dev: %p"
> >  vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> >  
> > +# vhost-user-fs.c
> > +
> > +vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
> > +vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
> > +vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
> > +
> >  # virtio.c
> >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> >  virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 5f2fca4d82..357bc1d04e 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -23,6 +23,8 @@
> >  #include "hw/virtio/vhost-user-fs.h"
> >  #include "monitor/monitor.h"
> >  #include "sysemu/sysemu.h"
> > +#include "exec/address-spaces.h"
> > +#include "trace.h"
> >  
> >  /*
> >   * The powerpc kernel code expects the memory to be accessible during
> > @@ -155,6 +157,88 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
> >      return (uint64_t)res;
> >  }
> >  
> > +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
> > +                                int fd)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> > +    if (!fs) {
> > +        /* Shouldn't happen - but seen it in error paths */
> > +        error_report("Bad fs ptr");
> > +        return (uint64_t)-1;
> > +    }
> 
> Same pointer casting issue as with map/unmap.

Done

> > +
> > +    unsigned int i;
> > +    int res = 0;
> > +    size_t done = 0;
> > +
> > +    if (fd < 0) {
> > +        error_report("Bad fd for map");
> > +        return (uint64_t)-1;
> > +    }
> > +
> > +    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
> > +        if (sm->len[i] == 0) {
> > +            continue;
> > +        }
> > +
> > +        size_t len = sm->len[i];
> > +        hwaddr gpa = sm->c_offset[i];
> > +
> > +        while (len && !res) {
> > +            MemoryRegionSection mrs = memory_region_find(get_system_memory(),
> > +                                                         gpa, len);
> > +            size_t mrs_size = (size_t)int128_get64(mrs.size);
> 
> If there is a vIOMMU then the vhost-user device backend should be
> restricted to just areas of guest RAM that are mapped. I think this can
> be achieved by using the vhost-user-fs device's address space instead of
> get_system_memory(). For example, virtio_pci_get_dma_as().

Written but not yet tested, as :

            bool is_write = e->flags & VHOST_USER_FS_FLAG_MAP_W;
            MemoryRegion *mr = address_space_translate(dev->vdev->dma_as, gpa,
                                                       &xlat, &xlat_len,
                                                       is_write,
                                                       MEMTXATTRS_UNSPECIFIED);

> > +
> > +            if (!mrs_size) {
> > +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> > +                res = -EFAULT;
> > +                break;
> > +            }
> > +
> > +            trace_vhost_user_fs_slave_io_loop(mrs.mr->name,
> > +                                          (uint64_t)mrs.offset_within_region,
> > +                                          memory_region_is_ram(mrs.mr),
> > +                                          memory_region_is_romd(mrs.mr),
> > +                                          (size_t)mrs_size);
> > +
> > +            void *hostptr = qemu_map_ram_ptr(mrs.mr->ram_block,
> > +                                             mrs.offset_within_region);
> > +            ssize_t transferred;
> > +            if (sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) {
> 
> The flag name is specific to map requests but it's shared with the IO
> request. Perhaps rename the flags?

They're both read/write's; do you have a preferred alternative?

> > +                /* Read from file into RAM */
> > +                if (mrs.mr->readonly) {
> > +                    res = -EFAULT;
> > +                    break;
> > +                }
> > +                transferred = pread(fd, hostptr, mrs_size, sm->fd_offset[i]);
> > +            } else {
> > +                /* Write into file from RAM */
> > +                assert((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_W));
> 
> The vhost-user device backend must not be able to crash the VMM. Please
> use an if statement and fail the request if the flags are invalid
> instead of assert().

Done

> > +                transferred = pwrite(fd, hostptr, mrs_size, sm->fd_offset[i]);
> > +            }
> > +            trace_vhost_user_fs_slave_io_loop_res(transferred);
> > +            if (transferred < 0) {
> > +                res = -errno;
> > +                break;
> > +            }
> > +            if (!transferred) {
> > +                /* EOF */
> > +                break;
> > +            }
> > +
> > +            done += transferred;
> > +            len -= transferred;
> 
> Is gpa += transferred missing so that this loop can handle crossing
> MemoryRegion boundaries?
> 
> sm->fd_offset[i] also needs to be put into a local variable and
> incremented by transferred each time around the loop.

Hmm yes, both of those are right; this obviously needs more testing,
especially across boundaries.

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 16/24] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
@ 2021-03-16 19:59       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-16 19:59 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Define a new slave command 'VHOST_USER_SLAVE_FS_IO' for a
> > client to ask qemu to perform a read/write from an fd directly
> > to GPA.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  docs/interop/vhost-user.rst               | 11 +++
> >  hw/virtio/trace-events                    |  6 ++
> >  hw/virtio/vhost-user-fs.c                 | 84 +++++++++++++++++++++++
> >  hw/virtio/vhost-user.c                    |  4 ++
> >  include/hw/virtio/vhost-user-fs.h         |  2 +
> >  subprojects/libvhost-user/libvhost-user.h |  1 +
> >  6 files changed, 108 insertions(+)
> > 
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index 1deedd3407..821712f4a2 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1452,6 +1452,17 @@ Slave message types
> >    multiple chunks can be unmapped in one command.
> >    A reply is generated indicating whether unmapping succeeded.
> >  
> > +``VHOST_USER_SLAVE_FS_IO``
> > +  :id: 9
> > +  :equivalent ioctl: N/A
> > +  :slave payload: fd + n * (offset + address + len)
> 
> Please clarify the payload representation. This is not enough for
> someone to implement the spec.

Done:
)
+  :slave payload: ``struct VhostUserFSSlaveMsg``
   :master payload: N/A
 
+  Requests that IO be performed directly from an fd, passed in ancillary
+  data, to guest memory on behalf of the daemon; this is normally for a
+  case where a memory region isn't visible to the daemon. slave payload
+  has flags which determine the direction of IO operation.
+
 
 .. 
> > +  :master payload: N/A
> > +
> > +  Requests that the QEMU performs IO directly from an fd to guest memory
> 
> To avoid naming a particular VMM:
> 
> s/the QEMU performs IO/IO be performed/
> 
> > +  on behalf of the daemon; this is normally for a case where a memory region
> > +  isn't visible to the daemon. slave payload has flags which determine
> > +  the direction of IO operation.
> 
> Please document the payload flags in the spec.


+  The ``VHOST_USER_FS_FLAG_MAP_R`` flag must be set in the ``flags`` field to
+  read from the file into RAM.
+  The ``VHOST_USER_FS_FLAG_MAP_W`` flag must be set in the ``flags`` field to
+  write to the file from RAM.

> > +
> >  .. _reply_ack:
> >  
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index c62727f879..20557a078e 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -53,6 +53,12 @@ vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRI
> >  vhost_vdpa_set_owner(void *dev) "dev: %p"
> >  vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> >  
> > +# vhost-user-fs.c
> > +
> > +vhost_user_fs_slave_io_loop(const char *name, uint64_t owr, int is_ram, int is_romd, size_t size) "region %s with internal offset 0x%"PRIx64 " ram=%d romd=%d mrs.size=%zd"
> > +vhost_user_fs_slave_io_loop_res(ssize_t transferred) "%zd"
> > +vhost_user_fs_slave_io_exit(int res, size_t done) "res: %d done: %zd"
> > +
> >  # virtio.c
> >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> >  virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 5f2fca4d82..357bc1d04e 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -23,6 +23,8 @@
> >  #include "hw/virtio/vhost-user-fs.h"
> >  #include "monitor/monitor.h"
> >  #include "sysemu/sysemu.h"
> > +#include "exec/address-spaces.h"
> > +#include "trace.h"
> >  
> >  /*
> >   * The powerpc kernel code expects the memory to be accessible during
> > @@ -155,6 +157,88 @@ uint64_t vhost_user_fs_slave_unmap(struct vhost_dev *dev,
> >      return (uint64_t)res;
> >  }
> >  
> > +uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
> > +                                int fd)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(dev->vdev);
> > +    if (!fs) {
> > +        /* Shouldn't happen - but seen it in error paths */
> > +        error_report("Bad fs ptr");
> > +        return (uint64_t)-1;
> > +    }
> 
> Same pointer casting issue as with map/unmap.

Done

> > +
> > +    unsigned int i;
> > +    int res = 0;
> > +    size_t done = 0;
> > +
> > +    if (fd < 0) {
> > +        error_report("Bad fd for map");
> > +        return (uint64_t)-1;
> > +    }
> > +
> > +    for (i = 0; i < VHOST_USER_FS_SLAVE_ENTRIES && !res; i++) {
> > +        if (sm->len[i] == 0) {
> > +            continue;
> > +        }
> > +
> > +        size_t len = sm->len[i];
> > +        hwaddr gpa = sm->c_offset[i];
> > +
> > +        while (len && !res) {
> > +            MemoryRegionSection mrs = memory_region_find(get_system_memory(),
> > +                                                         gpa, len);
> > +            size_t mrs_size = (size_t)int128_get64(mrs.size);
> 
> If there is a vIOMMU then the vhost-user device backend should be
> restricted to just areas of guest RAM that are mapped. I think this can
> be achieved by using the vhost-user-fs device's address space instead of
> get_system_memory(). For example, virtio_pci_get_dma_as().

Written but not yet tested, as :

            bool is_write = e->flags & VHOST_USER_FS_FLAG_MAP_W;
            MemoryRegion *mr = address_space_translate(dev->vdev->dma_as, gpa,
                                                       &xlat, &xlat_len,
                                                       is_write,
                                                       MEMTXATTRS_UNSPECIFIED);

> > +
> > +            if (!mrs_size) {
> > +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> > +                res = -EFAULT;
> > +                break;
> > +            }
> > +
> > +            trace_vhost_user_fs_slave_io_loop(mrs.mr->name,
> > +                                          (uint64_t)mrs.offset_within_region,
> > +                                          memory_region_is_ram(mrs.mr),
> > +                                          memory_region_is_romd(mrs.mr),
> > +                                          (size_t)mrs_size);
> > +
> > +            void *hostptr = qemu_map_ram_ptr(mrs.mr->ram_block,
> > +                                             mrs.offset_within_region);
> > +            ssize_t transferred;
> > +            if (sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) {
> 
> The flag name is specific to map requests but it's shared with the IO
> request. Perhaps rename the flags?

They're both read/write's; do you have a preferred alternative?

> > +                /* Read from file into RAM */
> > +                if (mrs.mr->readonly) {
> > +                    res = -EFAULT;
> > +                    break;
> > +                }
> > +                transferred = pread(fd, hostptr, mrs_size, sm->fd_offset[i]);
> > +            } else {
> > +                /* Write into file from RAM */
> > +                assert((sm->flags[i] & VHOST_USER_FS_FLAG_MAP_W));
> 
> The vhost-user device backend must not be able to crash the VMM. Please
> use an if statement and fail the request if the flags are invalid
> instead of assert().

Done

> > +                transferred = pwrite(fd, hostptr, mrs_size, sm->fd_offset[i]);
> > +            }
> > +            trace_vhost_user_fs_slave_io_loop_res(transferred);
> > +            if (transferred < 0) {
> > +                res = -errno;
> > +                break;
> > +            }
> > +            if (!transferred) {
> > +                /* EOF */
> > +                break;
> > +            }
> > +
> > +            done += transferred;
> > +            len -= transferred;
> 
> Is gpa += transferred missing so that this loop can handle crossing
> MemoryRegion boundaries?
> 
> sm->fd_offset[i] also needs to be put into a local variable and
> incremented by transferred each time around the loop.

Hmm yes, both of those are right; this obviously needs more testing,
especially across boundaries.

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
  2021-02-11 14:29     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-03-17 10:33       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-17 10:33 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > For some read/writes the virtio queue elements are unmappable by
> > the daemon; these are cases where the data is to be read/written
> > from non-RAM.  In viritofs's case this is typically a direct read/write
> > into an mmap'd DAX file also on virtiofs (possibly on another instance).
> > 
> > When we receive a virtio queue element, check that we have enough
> > mappable data to handle the headers.  Make a note of the number of
> > unmappable 'in' entries (ie. for read data back to the VMM),
> > and flag the fuse_bufvec for 'out' entries with a new flag
> > FUSE_BUF_PHYS_ADDR.
> 
> Looking back at this I think vhost-user will need generic
> READ_MEMORY/WRITE_MEMORY commands. It's okay for virtio-fs to have its
> own IO command (although not strictly necessary).
> 
> With generic READ_MEMORY/WRITE_MEMORY libvhost-user and other vhost-user
> device backend implementations can handle vring descriptors that point
> into the DAX window. This can be done transparently so individual device
> implementations (net, blk, etc) don't even know when memory is copied vs
> zero-copy shared memory access.
> 
> So this approach is okay for virtio-fs but it's not a long-term solution
> for all of vhost-user. Eventually the long-term solution may be needed
> so that other VIRTIO devices that have shared memory resources work.
> 
> Another bonus of READ_MEMORY/WRITE_MEMORY is that users that prefer an
> enforcing vIOMMU can disable shared memory (maybe just keep the vring
> itself mmapped).

Yes, although in this case we're doing read/write to an fd rather than
arbitrary data to be read/written.

> I just wanted to share this idea but don't expect it to be addressed in
> this patch series.
> 
> > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > index a090040bb2..ed9280de91 100644
> > --- a/tools/virtiofsd/fuse_common.h
> > +++ b/tools/virtiofsd/fuse_common.h
> > @@ -611,6 +611,13 @@ enum fuse_buf_flags {
> >       * detected.
> >       */
> >      FUSE_BUF_FD_RETRY = (1 << 3),
> > +
> > +    /**
> > +     * The addresses in the iovec represent guest physical addresses
> > +     * that can't be mapped by the daemon process.
> > +     * IO must be bounced back to the VMM to do it.
> > +     */
> > +    FUSE_BUF_PHYS_ADDR = (1 << 4),
> 
> With a vIOMMU it's an IOVA. Without a vIOMMU it's a GPA. This constant
> may need to be renamed in the future, but it is okay for now.

Do we have a name for something that's either an IOVA or a GPA?

> > +    if (req->bad_in_num || req->bad_out_num) {
> > +        bool handled_unmappable = false;
> > +
> > +        if (out_num > 2 && out_num_readable >= 2 && !req->bad_in_num &&
> > +            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
> > +            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
> > +            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
> 
> This violates the VIRTIO specification:
> 
>   2.6.4.1 Device Requirements: Message Framing
> 
>   The device MUST NOT make assumptions about the particular arrangement of descriptors.
> 
>   https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-280004
> 
> The driver is not obligated to submit separate iovecs. out_num == 1 is
> valid and the device needs to process it byte-wise instead of making
> assumptions about iovec layout.

Yep, already fixed.

Dave


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
@ 2021-03-17 10:33       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 138+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-17 10:33 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Feb 09, 2021 at 07:02:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > For some read/writes the virtio queue elements are unmappable by
> > the daemon; these are cases where the data is to be read/written
> > from non-RAM.  In viritofs's case this is typically a direct read/write
> > into an mmap'd DAX file also on virtiofs (possibly on another instance).
> > 
> > When we receive a virtio queue element, check that we have enough
> > mappable data to handle the headers.  Make a note of the number of
> > unmappable 'in' entries (ie. for read data back to the VMM),
> > and flag the fuse_bufvec for 'out' entries with a new flag
> > FUSE_BUF_PHYS_ADDR.
> 
> Looking back at this I think vhost-user will need generic
> READ_MEMORY/WRITE_MEMORY commands. It's okay for virtio-fs to have its
> own IO command (although not strictly necessary).
> 
> With generic READ_MEMORY/WRITE_MEMORY libvhost-user and other vhost-user
> device backend implementations can handle vring descriptors that point
> into the DAX window. This can be done transparently so individual device
> implementations (net, blk, etc) don't even know when memory is copied vs
> zero-copy shared memory access.
> 
> So this approach is okay for virtio-fs but it's not a long-term solution
> for all of vhost-user. Eventually the long-term solution may be needed
> so that other VIRTIO devices that have shared memory resources work.
> 
> Another bonus of READ_MEMORY/WRITE_MEMORY is that users that prefer an
> enforcing vIOMMU can disable shared memory (maybe just keep the vring
> itself mmapped).

Yes, although in this case we're doing read/write to an fd rather than
arbitrary data to be read/written.

> I just wanted to share this idea but don't expect it to be addressed in
> this patch series.
> 
> > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > index a090040bb2..ed9280de91 100644
> > --- a/tools/virtiofsd/fuse_common.h
> > +++ b/tools/virtiofsd/fuse_common.h
> > @@ -611,6 +611,13 @@ enum fuse_buf_flags {
> >       * detected.
> >       */
> >      FUSE_BUF_FD_RETRY = (1 << 3),
> > +
> > +    /**
> > +     * The addresses in the iovec represent guest physical addresses
> > +     * that can't be mapped by the daemon process.
> > +     * IO must be bounced back to the VMM to do it.
> > +     */
> > +    FUSE_BUF_PHYS_ADDR = (1 << 4),
> 
> With a vIOMMU it's an IOVA. Without a vIOMMU it's a GPA. This constant
> may need to be renamed in the future, but it is okay for now.

Do we have a name for something that's either an IOVA or a GPA?

> > +    if (req->bad_in_num || req->bad_out_num) {
> > +        bool handled_unmappable = false;
> > +
> > +        if (out_num > 2 && out_num_readable >= 2 && !req->bad_in_num &&
> > +            out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
> > +            ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
> > +            out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
> 
> This violates the VIRTIO specification:
> 
>   2.6.4.1 Device Requirements: Message Framing
> 
>   The device MUST NOT make assumptions about the particular arrangement of descriptors.
> 
>   https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-280004
> 
> The driver is not obligated to submit separate iovecs. out_num == 1 is
> valid and the device needs to process it byte-wise instead of making
> assumptions about iovec layout.

Yep, already fixed.

Dave


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 16/24] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
  2021-03-16 19:59       ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2021-03-31 10:12         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-03-31 10:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]

On Tue, Mar 16, 2021 at 07:59:59PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Tue, Feb 09, 2021 at 07:02:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > +            if (!mrs_size) {
> > > +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> > > +                res = -EFAULT;
> > > +                break;
> > > +            }
> > > +
> > > +            trace_vhost_user_fs_slave_io_loop(mrs.mr->name,
> > > +                                          (uint64_t)mrs.offset_within_region,
> > > +                                          memory_region_is_ram(mrs.mr),
> > > +                                          memory_region_is_romd(mrs.mr),
> > > +                                          (size_t)mrs_size);
> > > +
> > > +            void *hostptr = qemu_map_ram_ptr(mrs.mr->ram_block,
> > > +                                             mrs.offset_within_region);
> > > +            ssize_t transferred;
> > > +            if (sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) {
> > 
> > The flag name is specific to map requests but it's shared with the IO
> > request. Perhaps rename the flags?
> 
> They're both read/write's; do you have a preferred alternative?

VHOST_USER_FS_FLAG_<what it does> (read? readwrite? etc)

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 16/24] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO
@ 2021-03-31 10:12         ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-03-31 10:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]

On Tue, Mar 16, 2021 at 07:59:59PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Tue, Feb 09, 2021 at 07:02:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > +            if (!mrs_size) {
> > > +                error_report("No guest region found for 0x%" HWADDR_PRIx, gpa);
> > > +                res = -EFAULT;
> > > +                break;
> > > +            }
> > > +
> > > +            trace_vhost_user_fs_slave_io_loop(mrs.mr->name,
> > > +                                          (uint64_t)mrs.offset_within_region,
> > > +                                          memory_region_is_ram(mrs.mr),
> > > +                                          memory_region_is_romd(mrs.mr),
> > > +                                          (size_t)mrs_size);
> > > +
> > > +            void *hostptr = qemu_map_ram_ptr(mrs.mr->ram_block,
> > > +                                             mrs.offset_within_region);
> > > +            ssize_t transferred;
> > > +            if (sm->flags[i] & VHOST_USER_FS_FLAG_MAP_R) {
> > 
> > The flag name is specific to map requests but it's shared with the IO
> > request. Perhaps rename the flags?
> 
> They're both read/write's; do you have a preferred alternative?

VHOST_USER_FS_FLAG_<what it does> (read? readwrite? etc)

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
  2021-02-25 10:19       ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2021-03-31 10:14         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-03-31 10:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]

On Thu, Feb 25, 2021 at 10:19:31AM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Tue, Feb 09, 2021 at 07:02:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > > index a090040bb2..ed9280de91 100644
> > > --- a/tools/virtiofsd/fuse_common.h
> > > +++ b/tools/virtiofsd/fuse_common.h
> > > @@ -611,6 +611,13 @@ enum fuse_buf_flags {
> > >       * detected.
> > >       */
> > >      FUSE_BUF_FD_RETRY = (1 << 3),
> > > +
> > > +    /**
> > > +     * The addresses in the iovec represent guest physical addresses
> > > +     * that can't be mapped by the daemon process.
> > > +     * IO must be bounced back to the VMM to do it.
> > > +     */
> > > +    FUSE_BUF_PHYS_ADDR = (1 << 4),
> > 
> > With a vIOMMU it's an IOVA. Without a vIOMMU it's a GPA. This constant
> > may need to be renamed in the future, but it is okay for now.
> 
> Do we have any naming for something that's either a GPA or a IOVA?

I don't remember but I think the naming is confusing in core vhost code
too :). I just don't remember if it's called a "physical address" there.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [Virtio-fs] [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements
@ 2021-03-31 10:14         ` Stefan Hajnoczi
  0 siblings, 0 replies; 138+ messages in thread
From: Stefan Hajnoczi @ 2021-03-31 10:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: virtio-fs, marcandre.lureau, qemu-devel, vgoyal, mst

[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]

On Thu, Feb 25, 2021 at 10:19:31AM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Tue, Feb 09, 2021 at 07:02:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > > index a090040bb2..ed9280de91 100644
> > > --- a/tools/virtiofsd/fuse_common.h
> > > +++ b/tools/virtiofsd/fuse_common.h
> > > @@ -611,6 +611,13 @@ enum fuse_buf_flags {
> > >       * detected.
> > >       */
> > >      FUSE_BUF_FD_RETRY = (1 << 3),
> > > +
> > > +    /**
> > > +     * The addresses in the iovec represent guest physical addresses
> > > +     * that can't be mapped by the daemon process.
> > > +     * IO must be bounced back to the VMM to do it.
> > > +     */
> > > +    FUSE_BUF_PHYS_ADDR = (1 << 4),
> > 
> > With a vIOMMU it's an IOVA. Without a vIOMMU it's a GPA. This constant
> > may need to be renamed in the future, but it is okay for now.
> 
> Do we have any naming for something that's either a GPA or a IOVA?

I don't remember but I think the naming is confusing in core vhost code
too :). I just don't remember if it's called a "physical address" there.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

end of thread, other threads:[~2021-03-31 10:19 UTC | newest]

Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 19:02 [PATCH 00/24] virtiofs dax patches Dr. David Alan Gilbert (git)
2021-02-09 19:02 ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-09 19:02 ` [PATCH 01/24] DAX: vhost-user: Rework slave return values Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11  9:59   ` Stefan Hajnoczi
2021-02-11  9:59     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-11 15:27     ` Vivek Goyal
2021-02-11 15:27       ` [Virtio-fs] " Vivek Goyal
2021-02-18 12:18     ` Dr. David Alan Gilbert
2021-02-18 12:18       ` [Virtio-fs] " Dr. David Alan Gilbert
2021-02-09 19:02 ` [PATCH 02/24] DAX: libvhost-user: Route slave message payload Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 10:05   ` Stefan Hajnoczi
2021-02-11 10:05     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 03/24] DAX: libvhost-user: Allow popping a queue element with bad pointers Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 10:12   ` Stefan Hajnoczi
2021-02-11 10:12     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 04/24] DAX subprojects/libvhost-user: Add virtio-fs slave types Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 10:16   ` Stefan Hajnoczi
2021-02-11 10:16     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 05/24] DAX: virtio: Add shared memory capability Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 10:17   ` Stefan Hajnoczi
2021-02-11 10:17     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 06/24] DAX: virtio-fs: Add cache BAR Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 10:25   ` Stefan Hajnoczi
2021-02-11 10:25     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-18 17:33     ` Dr. David Alan Gilbert
2021-02-18 17:33       ` [Virtio-fs] " Dr. David Alan Gilbert
2021-02-09 19:02 ` [PATCH 07/24] DAX: virtio-fs: Add vhost-user slave commands for mapping Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 10:32   ` Stefan Hajnoczi
2021-02-11 10:32     ` [Virtio-fs] " Stefan Hajnoczi
2021-03-08 17:04     ` Dr. David Alan Gilbert
2021-03-08 17:04       ` [Virtio-fs] " Dr. David Alan Gilbert
2021-02-15 10:35   ` Chirantan Ekbote
2021-02-15 10:35     ` Chirantan Ekbote
2021-02-15 13:25     ` Dr. David Alan Gilbert
2021-02-15 13:25       ` Dr. David Alan Gilbert
2021-02-15 14:24     ` Vivek Goyal
2021-02-15 14:24       ` Vivek Goyal
2021-03-11 12:15     ` Dr. David Alan Gilbert
2021-03-11 12:15       ` Dr. David Alan Gilbert
2021-03-11 13:50       ` Vivek Goyal
2021-03-11 13:50         ` Vivek Goyal
2021-03-11 18:52         ` Dr. David Alan Gilbert
2021-03-11 18:52           ` Dr. David Alan Gilbert
2021-02-09 19:02 ` [PATCH 08/24] DAX: virtio-fs: Fill in " Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 10:57   ` Stefan Hajnoczi
2021-02-11 10:57     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-18 10:59     ` Dr. David Alan Gilbert
2021-02-18 10:59       ` [Virtio-fs] " Dr. David Alan Gilbert
2021-02-09 19:02 ` [PATCH 09/24] DAX: virtiofsd Add cache accessor functions Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 12:31   ` Stefan Hajnoczi
2021-02-11 12:31     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 10/24] DAX: virtiofsd: Add setup/remove mappings fuse commands Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 12:37   ` Stefan Hajnoczi
2021-02-11 12:37     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-11 16:39     ` Dr. David Alan Gilbert
2021-02-11 16:39       ` [Virtio-fs] " Dr. David Alan Gilbert
2021-02-11 18:30       ` Vivek Goyal
2021-02-11 18:30         ` [Virtio-fs] " Vivek Goyal
2021-02-11 19:50         ` Dr. David Alan Gilbert
2021-02-11 19:50           ` [Virtio-fs] " Dr. David Alan Gilbert
2021-02-11 20:15           ` Vivek Goyal
2021-02-11 20:15             ` [Virtio-fs] " Vivek Goyal
2021-02-09 19:02 ` [PATCH 11/24] DAX: virtiofsd: Add setup/remove mapping handlers to passthrough_ll Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 12:37   ` Stefan Hajnoczi
2021-02-11 12:37     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 12/24] DAX: virtiofsd: Wire up passthrough_ll's lo_setupmapping Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 12:41   ` Stefan Hajnoczi
2021-02-11 12:41     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-11 16:05   ` Vivek Goyal
2021-02-11 16:05     ` [Virtio-fs] " Vivek Goyal
2021-02-09 19:02 ` [PATCH 13/24] DAX: virtiofsd: Make lo_removemapping() work Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 12:41   ` Stefan Hajnoczi
2021-02-11 12:41     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 14/24] DAX: virtiofsd: route se down to destroy method Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 12:42   ` Stefan Hajnoczi
2021-02-11 12:42     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 15/24] DAX: virtiofsd: Perform an unmap on destroy Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 12:42   ` Stefan Hajnoczi
2021-02-11 12:42     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 16/24] DAX/unmap: virtiofsd: Add VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 14:17   ` Stefan Hajnoczi
2021-02-11 14:17     ` [Virtio-fs] " Stefan Hajnoczi
2021-03-16 19:59     ` Dr. David Alan Gilbert
2021-03-16 19:59       ` [Virtio-fs] " Dr. David Alan Gilbert
2021-03-31 10:12       ` Stefan Hajnoczi
2021-03-31 10:12         ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 17/24] DAX/unmap virtiofsd: Add wrappers for VHOST_USER_SLAVE_FS_IO Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 14:18   ` Stefan Hajnoczi
2021-02-11 14:18     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 14:29   ` Stefan Hajnoczi
2021-02-11 14:29     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-25 10:19     ` Dr. David Alan Gilbert
2021-02-25 10:19       ` [Virtio-fs] " Dr. David Alan Gilbert
2021-03-31 10:14       ` Stefan Hajnoczi
2021-03-31 10:14         ` [Virtio-fs] " Stefan Hajnoczi
2021-03-17 10:33     ` Dr. David Alan Gilbert
2021-03-17 10:33       ` [Virtio-fs] " Dr. David Alan Gilbert
2021-02-09 19:02 ` [PATCH 19/24] DAX/unmap virtiofsd: Route unmappable reads Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-09 19:02 ` [PATCH 20/24] DAX/unmap virtiofsd: route unmappable write to slave command Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-09 19:02 ` [PATCH 21/24] DAX:virtiofsd: implement FUSE_INIT map_alignment field Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-09 19:02 ` [PATCH 22/24] vhost-user-fs: Extend VhostUserFSSlaveMsg to pass additional info Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-09 19:02 ` [PATCH 23/24] vhost-user-fs: Implement drop CAP_FSETID functionality Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)
2021-02-11 14:35   ` Stefan Hajnoczi
2021-02-11 14:35     ` [Virtio-fs] " Stefan Hajnoczi
2021-02-11 14:40     ` Vivek Goyal
2021-02-11 14:40       ` [Virtio-fs] " Vivek Goyal
2021-02-15 15:57       ` Stefan Hajnoczi
2021-02-15 15:57         ` [Virtio-fs] " Stefan Hajnoczi
2021-02-16 15:57         ` Vivek Goyal
2021-02-16 15:57           ` [Virtio-fs] " Vivek Goyal
2021-02-22 16:53           ` Stefan Hajnoczi
2021-02-22 16:53             ` [Virtio-fs] " Stefan Hajnoczi
2021-02-09 19:02 ` [PATCH 24/24] virtiofsd: Ask qemu to drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
2021-02-09 19:02   ` [Virtio-fs] " Dr. David Alan Gilbert (git)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.