All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] virtiofsd: Support notification queue and
@ 2021-09-30 15:30 ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

Hi,

Here are the patches to support notification queue and blocking
posix locks. One of the biggest change since las time has been
creation of custom thread pool for handling locking requests. 
Thanks to Ioannis for doing most of the work on custom thread
pool.

I have posted corresponding kernel changes here.

https://lore.kernel.org/linux-fsdevel/20210930143850.1188628-1-vgoyal@redhat.com/T/#mb2d0fbfdb580ef33b6e812d0acbd16333b11f2cf

Any feedback is welcome.

Thanks
Vivek

Vivek Goyal (13):
  virtio_fs.h: Add notification queue feature bit
  virtiofsd: fuse.h header file changes for lock notification
  virtiofsd: Remove unused virtio_fs_config definition
  virtiofsd: Add a helper to send element on virtqueue
  virtiofsd: Add a helper to stop all queues
  vhost-user-fs: Use helpers to create/cleanup virtqueue
  virtiofsd: Release file locks using F_UNLCK
  virtiofsd: Create a notification queue
  virtiofsd: Specify size of notification buffer using config space
  virtiofsd: Custom threadpool for remote blocking posix locks requests
  virtiofsd: Shutdown notification queue in the end
  virtiofsd: Implement blocking posix locks
  virtiofsd, seccomp: Add clock_nanosleep() to allow list

 hw/virtio/vhost-user-fs-pci.c              |   4 +-
 hw/virtio/vhost-user-fs.c                  | 158 ++++++++--
 include/hw/virtio/vhost-user-fs.h          |   4 +
 include/standard-headers/linux/fuse.h      |  11 +-
 include/standard-headers/linux/virtio_fs.h |   5 +
 tools/virtiofsd/fuse_i.h                   |   1 +
 tools/virtiofsd/fuse_lowlevel.c            |  37 ++-
 tools/virtiofsd/fuse_lowlevel.h            |  26 ++
 tools/virtiofsd/fuse_virtio.c              | 339 +++++++++++++++++----
 tools/virtiofsd/meson.build                |   1 +
 tools/virtiofsd/passthrough_ll.c           |  91 +++++-
 tools/virtiofsd/passthrough_seccomp.c      |   2 +
 tools/virtiofsd/tpool.c                    | 331 ++++++++++++++++++++
 tools/virtiofsd/tpool.h                    |  18 ++
 14 files changed, 915 insertions(+), 113 deletions(-)
 create mode 100644 tools/virtiofsd/tpool.c
 create mode 100644 tools/virtiofsd/tpool.h

-- 
2.31.1



^ permalink raw reply	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 00/13] virtiofsd: Support notification queue and
@ 2021-09-30 15:30 ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

Hi,

Here are the patches to support notification queue and blocking
posix locks. One of the biggest change since las time has been
creation of custom thread pool for handling locking requests. 
Thanks to Ioannis for doing most of the work on custom thread
pool.

I have posted corresponding kernel changes here.

https://lore.kernel.org/linux-fsdevel/20210930143850.1188628-1-vgoyal@redhat.com/T/#mb2d0fbfdb580ef33b6e812d0acbd16333b11f2cf

Any feedback is welcome.

Thanks
Vivek

Vivek Goyal (13):
  virtio_fs.h: Add notification queue feature bit
  virtiofsd: fuse.h header file changes for lock notification
  virtiofsd: Remove unused virtio_fs_config definition
  virtiofsd: Add a helper to send element on virtqueue
  virtiofsd: Add a helper to stop all queues
  vhost-user-fs: Use helpers to create/cleanup virtqueue
  virtiofsd: Release file locks using F_UNLCK
  virtiofsd: Create a notification queue
  virtiofsd: Specify size of notification buffer using config space
  virtiofsd: Custom threadpool for remote blocking posix locks requests
  virtiofsd: Shutdown notification queue in the end
  virtiofsd: Implement blocking posix locks
  virtiofsd, seccomp: Add clock_nanosleep() to allow list

 hw/virtio/vhost-user-fs-pci.c              |   4 +-
 hw/virtio/vhost-user-fs.c                  | 158 ++++++++--
 include/hw/virtio/vhost-user-fs.h          |   4 +
 include/standard-headers/linux/fuse.h      |  11 +-
 include/standard-headers/linux/virtio_fs.h |   5 +
 tools/virtiofsd/fuse_i.h                   |   1 +
 tools/virtiofsd/fuse_lowlevel.c            |  37 ++-
 tools/virtiofsd/fuse_lowlevel.h            |  26 ++
 tools/virtiofsd/fuse_virtio.c              | 339 +++++++++++++++++----
 tools/virtiofsd/meson.build                |   1 +
 tools/virtiofsd/passthrough_ll.c           |  91 +++++-
 tools/virtiofsd/passthrough_seccomp.c      |   2 +
 tools/virtiofsd/tpool.c                    | 331 ++++++++++++++++++++
 tools/virtiofsd/tpool.h                    |  18 ++
 14 files changed, 915 insertions(+), 113 deletions(-)
 create mode 100644 tools/virtiofsd/tpool.c
 create mode 100644 tools/virtiofsd/tpool.h

-- 
2.31.1


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 01/13] virtio_fs.h: Add notification queue feature bit
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

This change will ultimately come from kernel as kernel header file update
when kernel patches get merged.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 include/standard-headers/linux/virtio_fs.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
index a32fe8a64c..b7f015186e 100644
--- a/include/standard-headers/linux/virtio_fs.h
+++ b/include/standard-headers/linux/virtio_fs.h
@@ -8,6 +8,9 @@
 #include "standard-headers/linux/virtio_config.h"
 #include "standard-headers/linux/virtio_types.h"
 
+/* Feature bits. Notification queue supported */
+#define VIRTIO_FS_F_NOTIFICATION	0
+
 struct virtio_fs_config {
 	/* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */
 	uint8_t tag[36];
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 01/13] virtio_fs.h: Add notification queue feature bit
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

This change will ultimately come from kernel as kernel header file update
when kernel patches get merged.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 include/standard-headers/linux/virtio_fs.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
index a32fe8a64c..b7f015186e 100644
--- a/include/standard-headers/linux/virtio_fs.h
+++ b/include/standard-headers/linux/virtio_fs.h
@@ -8,6 +8,9 @@
 #include "standard-headers/linux/virtio_config.h"
 #include "standard-headers/linux/virtio_types.h"
 
+/* Feature bits. Notification queue supported */
+#define VIRTIO_FS_F_NOTIFICATION	0
+
 struct virtio_fs_config {
 	/* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */
 	uint8_t tag[36];
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 02/13] virtiofsd: fuse.h header file changes for lock notification
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

This change comes from fuse.h kernel header file udpate. Hence keeping
it in a separate patch.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 include/standard-headers/linux/fuse.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index cce105bfba..0b6218d569 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -181,6 +181,8 @@
  *  - add FUSE_OPEN_KILL_SUIDGID
  *  - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT
  *  - add FUSE_SETXATTR_ACL_KILL_SGID
+ *  7.35
+ *  - add FUSE_NOTIFY_LOCK
  */
 
 #ifndef _LINUX_FUSE_H
@@ -212,7 +214,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 33
+#define FUSE_KERNEL_MINOR_VERSION 35
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -521,6 +523,7 @@ enum fuse_notify_code {
 	FUSE_NOTIFY_STORE = 4,
 	FUSE_NOTIFY_RETRIEVE = 5,
 	FUSE_NOTIFY_DELETE = 6,
+	FUSE_NOTIFY_LOCK = 7,
 	FUSE_NOTIFY_CODE_MAX,
 };
 
@@ -912,6 +915,12 @@ struct fuse_notify_retrieve_in {
 	uint64_t	dummy4;
 };
 
+struct fuse_notify_lock_out {
+	uint64_t	unique;
+	int32_t		error;
+	int32_t		padding;
+};
+
 /* Device ioctls: */
 #define FUSE_DEV_IOC_MAGIC		229
 #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 02/13] virtiofsd: fuse.h header file changes for lock notification
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

This change comes from fuse.h kernel header file udpate. Hence keeping
it in a separate patch.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 include/standard-headers/linux/fuse.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index cce105bfba..0b6218d569 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -181,6 +181,8 @@
  *  - add FUSE_OPEN_KILL_SUIDGID
  *  - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT
  *  - add FUSE_SETXATTR_ACL_KILL_SGID
+ *  7.35
+ *  - add FUSE_NOTIFY_LOCK
  */
 
 #ifndef _LINUX_FUSE_H
@@ -212,7 +214,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 33
+#define FUSE_KERNEL_MINOR_VERSION 35
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -521,6 +523,7 @@ enum fuse_notify_code {
 	FUSE_NOTIFY_STORE = 4,
 	FUSE_NOTIFY_RETRIEVE = 5,
 	FUSE_NOTIFY_DELETE = 6,
+	FUSE_NOTIFY_LOCK = 7,
 	FUSE_NOTIFY_CODE_MAX,
 };
 
@@ -912,6 +915,12 @@ struct fuse_notify_retrieve_in {
 	uint64_t	dummy4;
 };
 
+struct fuse_notify_lock_out {
+	uint64_t	unique;
+	int32_t		error;
+	int32_t		padding;
+};
+
 /* Device ioctls: */
 #define FUSE_DEV_IOC_MAGIC		229
 #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 03/13] virtiofsd: Remove unused virtio_fs_config definition
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

"struct virtio_fs_config" definition seems to be unused in fuse_virtio.c.
Remove it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 8f4fd165b9..da7b6a76bf 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -82,12 +82,6 @@ struct fv_VuDev {
     struct fv_QueueInfo **qi;
 };
 
-/* From spec */
-struct virtio_fs_config {
-    char tag[36];
-    uint32_t num_queues;
-};
-
 /* Callback from libvhost-user */
 static uint64_t fv_get_features(VuDev *dev)
 {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 03/13] virtiofsd: Remove unused virtio_fs_config definition
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

"struct virtio_fs_config" definition seems to be unused in fuse_virtio.c.
Remove it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 8f4fd165b9..da7b6a76bf 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -82,12 +82,6 @@ struct fv_VuDev {
     struct fv_QueueInfo **qi;
 };
 
-/* From spec */
-struct virtio_fs_config {
-    char tag[36];
-    uint32_t num_queues;
-};
-
 /* Callback from libvhost-user */
 static uint64_t fv_get_features(VuDev *dev)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 04/13] virtiofsd: Add a helper to send element on virtqueue
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

We have open coded logic to take locks and push element on virtqueue at
three places. Add a helper and use it everywhere. Code is easier to read and
less number of lines of code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 45 ++++++++++++++---------------------
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index da7b6a76bf..fcf12db9cd 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -243,6 +243,21 @@ static void vu_dispatch_unlock(struct fv_VuDev *vud)
     assert(ret == 0);
 }
 
+static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
+                            ssize_t len)
+{
+    struct fuse_session *se = qi->virtio_dev->se;
+    VuDev *dev = &se->virtio_dev->dev;
+    VuVirtq *q = vu_get_queue(dev, qi->qidx);
+
+    vu_dispatch_rdlock(qi->virtio_dev);
+    pthread_mutex_lock(&qi->vq_lock);
+    vu_queue_push(dev, q, elem, len);
+    vu_queue_notify(dev, q);
+    pthread_mutex_unlock(&qi->vq_lock);
+    vu_dispatch_unlock(qi->virtio_dev);
+}
+
 /*
  * Called back by ll whenever it wants to send a reply/message back
  * The 1st element of the iov starts with the fuse_out_header
@@ -253,8 +268,6 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 {
     FVRequest *req = container_of(ch, FVRequest, ch);
     struct fv_QueueInfo *qi = ch->qi;
-    VuDev *dev = &se->virtio_dev->dev;
-    VuVirtq *q = vu_get_queue(dev, qi->qidx);
     VuVirtqElement *elem = &req->elem;
     int ret = 0;
 
@@ -296,13 +309,7 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 
     copy_iov(iov, count, in_sg, in_num, tosend_len);
 
-    vu_dispatch_rdlock(qi->virtio_dev);
-    pthread_mutex_lock(&qi->vq_lock);
-    vu_queue_push(dev, q, elem, tosend_len);
-    vu_queue_notify(dev, q);
-    pthread_mutex_unlock(&qi->vq_lock);
-    vu_dispatch_unlock(qi->virtio_dev);
-
+    vq_send_element(qi, elem, tosend_len);
     req->reply_sent = true;
 
 err:
@@ -321,8 +328,6 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
 {
     FVRequest *req = container_of(ch, FVRequest, ch);
     struct fv_QueueInfo *qi = ch->qi;
-    VuDev *dev = &se->virtio_dev->dev;
-    VuVirtq *q = vu_get_queue(dev, qi->qidx);
     VuVirtqElement *elem = &req->elem;
     int ret = 0;
     g_autofree struct iovec *in_sg_cpy = NULL;
@@ -430,12 +435,7 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
         out_sg->len = tosend_len;
     }
 
-    vu_dispatch_rdlock(qi->virtio_dev);
-    pthread_mutex_lock(&qi->vq_lock);
-    vu_queue_push(dev, q, elem, tosend_len);
-    vu_queue_notify(dev, q);
-    pthread_mutex_unlock(&qi->vq_lock);
-    vu_dispatch_unlock(qi->virtio_dev);
+    vq_send_element(qi, elem, tosend_len);
     req->reply_sent = true;
     return 0;
 }
@@ -447,7 +447,6 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
 {
     struct fv_QueueInfo *qi = user_data;
     struct fuse_session *se = qi->virtio_dev->se;
-    struct VuDev *dev = &qi->virtio_dev->dev;
     FVRequest *req = data;
     VuVirtqElement *elem = &req->elem;
     struct fuse_buf fbuf = {};
@@ -589,17 +588,9 @@ out:
 
     /* If the request has no reply, still recycle the virtqueue element */
     if (!req->reply_sent) {
-        struct VuVirtq *q = vu_get_queue(dev, qi->qidx);
-
         fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n", __func__,
                  elem->index);
-
-        vu_dispatch_rdlock(qi->virtio_dev);
-        pthread_mutex_lock(&qi->vq_lock);
-        vu_queue_push(dev, q, elem, 0);
-        vu_queue_notify(dev, q);
-        pthread_mutex_unlock(&qi->vq_lock);
-        vu_dispatch_unlock(qi->virtio_dev);
+        vq_send_element(qi, elem, 0);
     }
 
     pthread_mutex_destroy(&req->ch.lock);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 04/13] virtiofsd: Add a helper to send element on virtqueue
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

We have open coded logic to take locks and push element on virtqueue at
three places. Add a helper and use it everywhere. Code is easier to read and
less number of lines of code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 45 ++++++++++++++---------------------
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index da7b6a76bf..fcf12db9cd 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -243,6 +243,21 @@ static void vu_dispatch_unlock(struct fv_VuDev *vud)
     assert(ret == 0);
 }
 
+static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
+                            ssize_t len)
+{
+    struct fuse_session *se = qi->virtio_dev->se;
+    VuDev *dev = &se->virtio_dev->dev;
+    VuVirtq *q = vu_get_queue(dev, qi->qidx);
+
+    vu_dispatch_rdlock(qi->virtio_dev);
+    pthread_mutex_lock(&qi->vq_lock);
+    vu_queue_push(dev, q, elem, len);
+    vu_queue_notify(dev, q);
+    pthread_mutex_unlock(&qi->vq_lock);
+    vu_dispatch_unlock(qi->virtio_dev);
+}
+
 /*
  * Called back by ll whenever it wants to send a reply/message back
  * The 1st element of the iov starts with the fuse_out_header
@@ -253,8 +268,6 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 {
     FVRequest *req = container_of(ch, FVRequest, ch);
     struct fv_QueueInfo *qi = ch->qi;
-    VuDev *dev = &se->virtio_dev->dev;
-    VuVirtq *q = vu_get_queue(dev, qi->qidx);
     VuVirtqElement *elem = &req->elem;
     int ret = 0;
 
@@ -296,13 +309,7 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 
     copy_iov(iov, count, in_sg, in_num, tosend_len);
 
-    vu_dispatch_rdlock(qi->virtio_dev);
-    pthread_mutex_lock(&qi->vq_lock);
-    vu_queue_push(dev, q, elem, tosend_len);
-    vu_queue_notify(dev, q);
-    pthread_mutex_unlock(&qi->vq_lock);
-    vu_dispatch_unlock(qi->virtio_dev);
-
+    vq_send_element(qi, elem, tosend_len);
     req->reply_sent = true;
 
 err:
@@ -321,8 +328,6 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
 {
     FVRequest *req = container_of(ch, FVRequest, ch);
     struct fv_QueueInfo *qi = ch->qi;
-    VuDev *dev = &se->virtio_dev->dev;
-    VuVirtq *q = vu_get_queue(dev, qi->qidx);
     VuVirtqElement *elem = &req->elem;
     int ret = 0;
     g_autofree struct iovec *in_sg_cpy = NULL;
@@ -430,12 +435,7 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
         out_sg->len = tosend_len;
     }
 
-    vu_dispatch_rdlock(qi->virtio_dev);
-    pthread_mutex_lock(&qi->vq_lock);
-    vu_queue_push(dev, q, elem, tosend_len);
-    vu_queue_notify(dev, q);
-    pthread_mutex_unlock(&qi->vq_lock);
-    vu_dispatch_unlock(qi->virtio_dev);
+    vq_send_element(qi, elem, tosend_len);
     req->reply_sent = true;
     return 0;
 }
@@ -447,7 +447,6 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
 {
     struct fv_QueueInfo *qi = user_data;
     struct fuse_session *se = qi->virtio_dev->se;
-    struct VuDev *dev = &qi->virtio_dev->dev;
     FVRequest *req = data;
     VuVirtqElement *elem = &req->elem;
     struct fuse_buf fbuf = {};
@@ -589,17 +588,9 @@ out:
 
     /* If the request has no reply, still recycle the virtqueue element */
     if (!req->reply_sent) {
-        struct VuVirtq *q = vu_get_queue(dev, qi->qidx);
-
         fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n", __func__,
                  elem->index);
-
-        vu_dispatch_rdlock(qi->virtio_dev);
-        pthread_mutex_lock(&qi->vq_lock);
-        vu_queue_push(dev, q, elem, 0);
-        vu_queue_notify(dev, q);
-        pthread_mutex_unlock(&qi->vq_lock);
-        vu_dispatch_unlock(qi->virtio_dev);
+        vq_send_element(qi, elem, 0);
     }
 
     pthread_mutex_destroy(&req->ch.lock);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 05/13] virtiofsd: Add a helper to stop all queues
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

Use a helper to stop all the queues. Later in the patch series I am
planning to use this helper at one more place later in the patch series.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index fcf12db9cd..baead08b28 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -740,6 +740,18 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
     vud->qi[qidx] = NULL;
 }
 
+static void stop_all_queues(struct fv_VuDev *vud)
+{
+    for (int i = 0; i < vud->nqueues; i++) {
+        if (!vud->qi[i]) {
+            continue;
+        }
+
+        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
+        fv_queue_cleanup_thread(vud, i);
+    }
+}
+
 /* Callback from libvhost-user on start or stop of a queue */
 static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
 {
@@ -870,15 +882,7 @@ int virtio_loop(struct fuse_session *se)
      * Make sure all fv_queue_thread()s quit on exit, as we're about to
      * free virtio dev and fuse session, no one should access them anymore.
      */
-    for (int i = 0; i < se->virtio_dev->nqueues; i++) {
-        if (!se->virtio_dev->qi[i]) {
-            continue;
-        }
-
-        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
-        fv_queue_cleanup_thread(se->virtio_dev, i);
-    }
-
+    stop_all_queues(se->virtio_dev);
     fuse_log(FUSE_LOG_INFO, "%s: Exit\n", __func__);
 
     return 0;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 05/13] virtiofsd: Add a helper to stop all queues
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

Use a helper to stop all the queues. Later in the patch series I am
planning to use this helper at one more place later in the patch series.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index fcf12db9cd..baead08b28 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -740,6 +740,18 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
     vud->qi[qidx] = NULL;
 }
 
+static void stop_all_queues(struct fv_VuDev *vud)
+{
+    for (int i = 0; i < vud->nqueues; i++) {
+        if (!vud->qi[i]) {
+            continue;
+        }
+
+        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
+        fv_queue_cleanup_thread(vud, i);
+    }
+}
+
 /* Callback from libvhost-user on start or stop of a queue */
 static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
 {
@@ -870,15 +882,7 @@ int virtio_loop(struct fuse_session *se)
      * Make sure all fv_queue_thread()s quit on exit, as we're about to
      * free virtio dev and fuse session, no one should access them anymore.
      */
-    for (int i = 0; i < se->virtio_dev->nqueues; i++) {
-        if (!se->virtio_dev->qi[i]) {
-            continue;
-        }
-
-        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
-        fv_queue_cleanup_thread(se->virtio_dev, i);
-    }
-
+    stop_all_queues(se->virtio_dev);
     fuse_log(FUSE_LOG_INFO, "%s: Exit\n", __func__);
 
     return 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

Add helpers to create/cleanup virtuqueues and use those helpers. I will
need to reconfigure queues in later patches and using helpers will allow
reusing the code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 35 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index c595957983..d1efbc5b18 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
     }
 }
 
+static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
+{
+    /*
+     * Not normally called; it's the daemon that handles the queue;
+     * however virtio's cleanup path can call this.
+     */
+}
+
+static void vuf_create_vqs(VirtIODevice *vdev)
+{
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    unsigned int i;
+
+    /* Hiprio queue */
+    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
+                                     vuf_handle_output);
+
+    /* Request queues */
+    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
+    for (i = 0; i < fs->conf.num_request_queues; i++) {
+        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
+                                          vuf_handle_output);
+    }
+
+    /* 1 high prio queue, plus the number configured */
+    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
+    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
+}
+
+static void vuf_cleanup_vqs(VirtIODevice *vdev)
+{
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    unsigned int i;
+
+    virtio_delete_queue(fs->hiprio_vq);
+    fs->hiprio_vq = NULL;
+
+    for (i = 0; i < fs->conf.num_request_queues; i++) {
+        virtio_delete_queue(fs->req_vqs[i]);
+    }
+
+    g_free(fs->req_vqs);
+    fs->req_vqs = NULL;
+
+    fs->vhost_dev.nvqs = 0;
+    g_free(fs->vhost_dev.vqs);
+    fs->vhost_dev.vqs = NULL;
+}
+
 static uint64_t vuf_get_features(VirtIODevice *vdev,
                                  uint64_t features,
                                  Error **errp)
@@ -148,14 +197,6 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
     return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
 }
 
-static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
-{
-    /*
-     * Not normally called; it's the daemon that handles the queue;
-     * however virtio's cleanup path can call this.
-     */
-}
-
 static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
                                             bool mask)
 {
@@ -175,7 +216,6 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserFS *fs = VHOST_USER_FS(dev);
-    unsigned int i;
     size_t len;
     int ret;
 
@@ -222,18 +262,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
     virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
                 sizeof(struct virtio_fs_config));
 
-    /* Hiprio queue */
-    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
-
-    /* Request queues */
-    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
-    for (i = 0; i < fs->conf.num_request_queues; i++) {
-        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
-    }
-
-    /* 1 high prio queue, plus the number configured */
-    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
-    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
+    vuf_create_vqs(vdev);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0, errp);
     if (ret < 0) {
@@ -244,13 +273,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
 
 err_virtio:
     vhost_user_cleanup(&fs->vhost_user);
-    virtio_delete_queue(fs->hiprio_vq);
-    for (i = 0; i < fs->conf.num_request_queues; i++) {
-        virtio_delete_queue(fs->req_vqs[i]);
-    }
-    g_free(fs->req_vqs);
+    vuf_cleanup_vqs(vdev);
     virtio_cleanup(vdev);
-    g_free(fs->vhost_dev.vqs);
     return;
 }
 
@@ -258,7 +282,6 @@ static void vuf_device_unrealize(DeviceState *dev)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserFS *fs = VHOST_USER_FS(dev);
-    int i;
 
     /* This will stop vhost backend if appropriate. */
     vuf_set_status(vdev, 0);
@@ -267,14 +290,8 @@ static void vuf_device_unrealize(DeviceState *dev)
 
     vhost_user_cleanup(&fs->vhost_user);
 
-    virtio_delete_queue(fs->hiprio_vq);
-    for (i = 0; i < fs->conf.num_request_queues; i++) {
-        virtio_delete_queue(fs->req_vqs[i]);
-    }
-    g_free(fs->req_vqs);
+    vuf_cleanup_vqs(vdev);
     virtio_cleanup(vdev);
-    g_free(fs->vhost_dev.vqs);
-    fs->vhost_dev.vqs = NULL;
 }
 
 static const VMStateDescription vuf_vmstate = {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

Add helpers to create/cleanup virtuqueues and use those helpers. I will
need to reconfigure queues in later patches and using helpers will allow
reusing the code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 35 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index c595957983..d1efbc5b18 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
     }
 }
 
+static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
+{
+    /*
+     * Not normally called; it's the daemon that handles the queue;
+     * however virtio's cleanup path can call this.
+     */
+}
+
+static void vuf_create_vqs(VirtIODevice *vdev)
+{
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    unsigned int i;
+
+    /* Hiprio queue */
+    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
+                                     vuf_handle_output);
+
+    /* Request queues */
+    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
+    for (i = 0; i < fs->conf.num_request_queues; i++) {
+        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
+                                          vuf_handle_output);
+    }
+
+    /* 1 high prio queue, plus the number configured */
+    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
+    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
+}
+
+static void vuf_cleanup_vqs(VirtIODevice *vdev)
+{
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    unsigned int i;
+
+    virtio_delete_queue(fs->hiprio_vq);
+    fs->hiprio_vq = NULL;
+
+    for (i = 0; i < fs->conf.num_request_queues; i++) {
+        virtio_delete_queue(fs->req_vqs[i]);
+    }
+
+    g_free(fs->req_vqs);
+    fs->req_vqs = NULL;
+
+    fs->vhost_dev.nvqs = 0;
+    g_free(fs->vhost_dev.vqs);
+    fs->vhost_dev.vqs = NULL;
+}
+
 static uint64_t vuf_get_features(VirtIODevice *vdev,
                                  uint64_t features,
                                  Error **errp)
@@ -148,14 +197,6 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
     return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
 }
 
-static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
-{
-    /*
-     * Not normally called; it's the daemon that handles the queue;
-     * however virtio's cleanup path can call this.
-     */
-}
-
 static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
                                             bool mask)
 {
@@ -175,7 +216,6 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserFS *fs = VHOST_USER_FS(dev);
-    unsigned int i;
     size_t len;
     int ret;
 
@@ -222,18 +262,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
     virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
                 sizeof(struct virtio_fs_config));
 
-    /* Hiprio queue */
-    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
-
-    /* Request queues */
-    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
-    for (i = 0; i < fs->conf.num_request_queues; i++) {
-        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
-    }
-
-    /* 1 high prio queue, plus the number configured */
-    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
-    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
+    vuf_create_vqs(vdev);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0, errp);
     if (ret < 0) {
@@ -244,13 +273,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
 
 err_virtio:
     vhost_user_cleanup(&fs->vhost_user);
-    virtio_delete_queue(fs->hiprio_vq);
-    for (i = 0; i < fs->conf.num_request_queues; i++) {
-        virtio_delete_queue(fs->req_vqs[i]);
-    }
-    g_free(fs->req_vqs);
+    vuf_cleanup_vqs(vdev);
     virtio_cleanup(vdev);
-    g_free(fs->vhost_dev.vqs);
     return;
 }
 
@@ -258,7 +282,6 @@ static void vuf_device_unrealize(DeviceState *dev)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserFS *fs = VHOST_USER_FS(dev);
-    int i;
 
     /* This will stop vhost backend if appropriate. */
     vuf_set_status(vdev, 0);
@@ -267,14 +290,8 @@ static void vuf_device_unrealize(DeviceState *dev)
 
     vhost_user_cleanup(&fs->vhost_user);
 
-    virtio_delete_queue(fs->hiprio_vq);
-    for (i = 0; i < fs->conf.num_request_queues; i++) {
-        virtio_delete_queue(fs->req_vqs[i]);
-    }
-    g_free(fs->req_vqs);
+    vuf_cleanup_vqs(vdev);
     virtio_cleanup(vdev);
-    g_free(fs->vhost_dev.vqs);
-    fs->vhost_dev.vqs = NULL;
 }
 
 static const VMStateDescription vuf_vmstate = {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 07/13] virtiofsd: Release file locks using F_UNLCK
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

We are emulating posix locks for guest using open file description locks
in virtiofsd. When any of the fd is closed in guest, we find associated
OFD lock fd (if there is one) and close it to release all the locks.

Assumption here is that there is no other thread using lo_inode_plock
structure or plock->fd, hence it is safe to do so.

But now we are about to introduce blocking variant of locks (SETLKW),
and that means we might be waiting to a lock to be available and
using plock->fd. And that means there are still users of plock
structure.

So release locks using fcntl(SETLK, F_UNLCK) instead of closing fd
and plock will be freed later when lo_inode is being freed.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 38b2af8599..6928662e22 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1557,9 +1557,6 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
         g_hash_table_remove(lo->inodes, &inode->key);
         if (lo->posix_lock) {
-            if (g_hash_table_size(inode->posix_locks)) {
-                fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
-            }
             g_hash_table_destroy(inode->posix_locks);
             pthread_mutex_destroy(&inode->plock_mutex);
         }
@@ -2266,6 +2263,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     (void)ino;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
+    struct lo_inode_plock *plock;
+    struct flock flock;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -2282,8 +2281,22 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     /* An fd is going away. Cleanup associated posix locks */
     if (lo->posix_lock) {
         pthread_mutex_lock(&inode->plock_mutex);
-        g_hash_table_remove(inode->posix_locks,
+        plock = g_hash_table_lookup(inode->posix_locks,
             GUINT_TO_POINTER(fi->lock_owner));
+
+        if (plock) {
+            /*
+             * An fd is being closed. For posix locks, this means
+             * drop all the associated locks.
+             */
+            memset(&flock, 0, sizeof(struct flock));
+            flock.l_type = F_UNLCK;
+            flock.l_whence = SEEK_SET;
+            /* Unlock whole file */
+            flock.l_start = flock.l_len = 0;
+            fcntl(plock->fd, F_OFD_SETLK, &flock);
+        }
+
         pthread_mutex_unlock(&inode->plock_mutex);
     }
     res = close(dup(lo_fi_fd(req, fi)));
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 07/13] virtiofsd: Release file locks using F_UNLCK
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

We are emulating posix locks for guest using open file description locks
in virtiofsd. When any of the fd is closed in guest, we find associated
OFD lock fd (if there is one) and close it to release all the locks.

Assumption here is that there is no other thread using lo_inode_plock
structure or plock->fd, hence it is safe to do so.

But now we are about to introduce blocking variant of locks (SETLKW),
and that means we might be waiting to a lock to be available and
using plock->fd. And that means there are still users of plock
structure.

So release locks using fcntl(SETLK, F_UNLCK) instead of closing fd
and plock will be freed later when lo_inode is being freed.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 38b2af8599..6928662e22 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1557,9 +1557,6 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
         g_hash_table_remove(lo->inodes, &inode->key);
         if (lo->posix_lock) {
-            if (g_hash_table_size(inode->posix_locks)) {
-                fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
-            }
             g_hash_table_destroy(inode->posix_locks);
             pthread_mutex_destroy(&inode->plock_mutex);
         }
@@ -2266,6 +2263,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     (void)ino;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
+    struct lo_inode_plock *plock;
+    struct flock flock;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -2282,8 +2281,22 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     /* An fd is going away. Cleanup associated posix locks */
     if (lo->posix_lock) {
         pthread_mutex_lock(&inode->plock_mutex);
-        g_hash_table_remove(inode->posix_locks,
+        plock = g_hash_table_lookup(inode->posix_locks,
             GUINT_TO_POINTER(fi->lock_owner));
+
+        if (plock) {
+            /*
+             * An fd is being closed. For posix locks, this means
+             * drop all the associated locks.
+             */
+            memset(&flock, 0, sizeof(struct flock));
+            flock.l_type = F_UNLCK;
+            flock.l_whence = SEEK_SET;
+            /* Unlock whole file */
+            flock.l_start = flock.l_len = 0;
+            fcntl(plock->fd, F_OFD_SETLK, &flock);
+        }
+
         pthread_mutex_unlock(&inode->plock_mutex);
     }
     res = close(dup(lo_fi_fd(req, fi)));
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 08/13] virtiofsd: Create a notification queue
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

Add a notification queue which will be used to send async notifications
for file lock availability.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
---
 hw/virtio/vhost-user-fs-pci.c     |  4 +-
 hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
 include/hw/virtio/vhost-user-fs.h |  2 +
 tools/virtiofsd/fuse_i.h          |  1 +
 tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
 5 files changed, 116 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
index 2ed8492b3f..cdb9471088 100644
--- a/hw/virtio/vhost-user-fs-pci.c
+++ b/hw/virtio/vhost-user-fs-pci.c
@@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     DeviceState *vdev = DEVICE(&dev->vdev);
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        /* Also reserve config change and hiprio queue vectors */
-        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
+        /* Also reserve config change, hiprio and notification queue vectors */
+        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
     }
 
     qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index d1efbc5b18..6bafcf0243 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_NOTIFY_ON_EMPTY,
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_IOMMU_PLATFORM,
+    VIRTIO_FS_F_NOTIFICATION,
 
     VHOST_INVALID_FEATURE_BIT
 };
@@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
      */
 }
 
-static void vuf_create_vqs(VirtIODevice *vdev)
+static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
     unsigned int i;
@@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
     /* Hiprio queue */
     fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
                                      vuf_handle_output);
+    /*
+     * Notification queue. Feature negotiation happens later. So at this
+     * point of time we don't know if driver will use notification queue
+     * or not.
+     */
+    if (notification_vq) {
+        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
+                                               vuf_handle_output);
+    }
 
     /* Request queues */
     fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
@@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
                                           vuf_handle_output);
     }
 
-    /* 1 high prio queue, plus the number configured */
-    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
+    /* 1 high prio queue, 1 notification queue plus the number configured */
+    if (notification_vq) {
+        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
+    } else {
+        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
+    }
     fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
 }
 
@@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
     virtio_delete_queue(fs->hiprio_vq);
     fs->hiprio_vq = NULL;
 
+    if (fs->notification_vq) {
+        virtio_delete_queue(fs->notification_vq);
+    }
+    fs->notification_vq = NULL;
+
     for (i = 0; i < fs->conf.num_request_queues; i++) {
         virtio_delete_queue(fs->req_vqs[i]);
     }
@@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
 
+    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
+
     return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
 }
 
+static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
+{
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
+        fs->notify_enabled = true;
+        /*
+         * If guest first booted with no notification queue support and
+         * later rebooted with kernel which supports notification, we
+         * can end up here
+         */
+        if (!fs->notification_vq) {
+            vuf_cleanup_vqs(vdev);
+            vuf_create_vqs(vdev, true);
+        }
+        return;
+    }
+
+    fs->notify_enabled = false;
+    if (!fs->notification_vq) {
+        return;
+    }
+    /*
+     * Driver does not support notification queue. Reconfigure queues
+     * and do not create notification queue.
+     */
+    vuf_cleanup_vqs(vdev);
+
+    /* Create queues again */
+    vuf_create_vqs(vdev, false);
+}
+
 static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
                                             bool mask)
 {
@@ -262,7 +315,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
     virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
                 sizeof(struct virtio_fs_config));
 
-    vuf_create_vqs(vdev);
+    vuf_create_vqs(vdev, true);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0, errp);
     if (ret < 0) {
@@ -327,6 +380,7 @@ static void vuf_class_init(ObjectClass *klass, void *data)
     vdc->realize = vuf_device_realize;
     vdc->unrealize = vuf_device_unrealize;
     vdc->get_features = vuf_get_features;
+    vdc->set_features = vuf_set_features;
     vdc->get_config = vuf_get_config;
     vdc->set_status = vuf_set_status;
     vdc->guest_notifier_mask = vuf_guest_notifier_mask;
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 0d62834c25..95dc0dd402 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -39,7 +39,9 @@ struct VHostUserFS {
     VhostUserState vhost_user;
     VirtQueue **req_vqs;
     VirtQueue *hiprio_vq;
+    VirtQueue *notification_vq;
     int32_t bootindex;
+    bool notify_enabled;
 
     /*< public >*/
 };
diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index 492e002181..4942d080da 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -73,6 +73,7 @@ struct fuse_session {
     int   vu_socketfd;
     struct fv_VuDev *virtio_dev;
     int thread_pool_size;
+    bool notify_enabled;
 };
 
 struct fuse_chan {
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index baead08b28..f5b87a508a 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu/iov.h"
 #include "qapi/error.h"
+#include "standard-headers/linux/virtio_fs.h"
 #include "fuse_i.h"
 #include "standard-headers/linux/fuse.h"
 #include "fuse_misc.h"
@@ -85,12 +86,25 @@ struct fv_VuDev {
 /* Callback from libvhost-user */
 static uint64_t fv_get_features(VuDev *dev)
 {
-    return 1ULL << VIRTIO_F_VERSION_1;
+    uint64_t features;
+
+    features = 1ull << VIRTIO_F_VERSION_1 |
+               1ull << VIRTIO_FS_F_NOTIFICATION;
+
+    return features;
 }
 
 /* Callback from libvhost-user */
 static void fv_set_features(VuDev *dev, uint64_t features)
 {
+    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
+    struct fuse_session *se = vud->se;
+
+    if ((1ull << VIRTIO_FS_F_NOTIFICATION) & features) {
+        se->notify_enabled = true;
+    } else {
+        se->notify_enabled = false;
+    }
 }
 
 /*
@@ -719,22 +733,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
 {
     int ret;
     struct fv_QueueInfo *ourqi;
+    struct fuse_session *se = vud->se;
 
     assert(qidx < vud->nqueues);
     ourqi = vud->qi[qidx];
 
-    /* Kill the thread */
-    if (eventfd_write(ourqi->kill_fd, 1)) {
-        fuse_log(FUSE_LOG_ERR, "Eventfd_write for queue %d: %s\n",
-                 qidx, strerror(errno));
-    }
-    ret = pthread_join(ourqi->thread, NULL);
-    if (ret) {
-        fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err %d\n",
-                 __func__, qidx, ret);
+    /* qidx == 1 is the notification queue if notifications are enabled */
+    if (!se->notify_enabled || qidx != 1) {
+        /* Kill the thread */
+        if (eventfd_write(ourqi->kill_fd, 1)) {
+            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
+        }
+        ret = pthread_join(ourqi->thread, NULL);
+        if (ret) {
+            fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err"
+                     " %d\n", __func__, qidx, ret);
+        }
+        close(ourqi->kill_fd);
     }
     pthread_mutex_destroy(&ourqi->vq_lock);
-    close(ourqi->kill_fd);
     ourqi->kick_fd = -1;
     g_free(vud->qi[qidx]);
     vud->qi[qidx] = NULL;
@@ -757,6 +774,9 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
 {
     struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
     struct fv_QueueInfo *ourqi;
+    int valid_queues = 2; /* One hiprio queue and one request queue */
+    bool notification_q = false;
+    struct fuse_session *se = vud->se;
 
     fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
              started);
@@ -768,10 +788,19 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
      * well-behaved client in mind and may not protect against all types of
      * races yet.
      */
-    if (qidx > 1) {
-        fuse_log(FUSE_LOG_ERR,
-                 "%s: multiple request queues not yet implemented, please only "
-                 "configure 1 request queue\n",
+    if (se->notify_enabled) {
+        valid_queues++;
+        /*
+         * If notification queue is enabled, then qidx 1 is notificaiton queue.
+         */
+        if (qidx == 1) {
+            notification_q = true;
+        }
+    }
+
+    if (qidx >= valid_queues) {
+        fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
+                 "implemented, please only configure 1 request queue\n",
                  __func__);
         exit(EXIT_FAILURE);
     }
@@ -793,11 +822,18 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
             assert(vud->qi[qidx]->kick_fd == -1);
         }
         ourqi = vud->qi[qidx];
+        pthread_mutex_init(&ourqi->vq_lock, NULL);
+        /*
+         * For notification queue, we don't have to start a thread yet.
+         */
+        if (notification_q) {
+            return;
+        }
+
         ourqi->kick_fd = dev->vq[qidx].kick_fd;
 
         ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
         assert(ourqi->kill_fd != -1);
-        pthread_mutex_init(&ourqi->vq_lock, NULL);
 
         if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
             fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
@@ -1048,7 +1084,7 @@ int virtio_session_mount(struct fuse_session *se)
     se->vu_socketfd = data_sock;
     se->virtio_dev->se = se;
     pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
-    if (!vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
+    if (!vu_init(&se->virtio_dev->dev, 3, se->vu_socketfd, fv_panic, NULL,
                  fv_set_watch, fv_remove_watch, &fv_iface)) {
         fuse_log(FUSE_LOG_ERR, "%s: vu_init failed\n", __func__);
         return -1;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 08/13] virtiofsd: Create a notification queue
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

Add a notification queue which will be used to send async notifications
for file lock availability.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
---
 hw/virtio/vhost-user-fs-pci.c     |  4 +-
 hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
 include/hw/virtio/vhost-user-fs.h |  2 +
 tools/virtiofsd/fuse_i.h          |  1 +
 tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
 5 files changed, 116 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
index 2ed8492b3f..cdb9471088 100644
--- a/hw/virtio/vhost-user-fs-pci.c
+++ b/hw/virtio/vhost-user-fs-pci.c
@@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     DeviceState *vdev = DEVICE(&dev->vdev);
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        /* Also reserve config change and hiprio queue vectors */
-        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
+        /* Also reserve config change, hiprio and notification queue vectors */
+        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
     }
 
     qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index d1efbc5b18..6bafcf0243 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_NOTIFY_ON_EMPTY,
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_IOMMU_PLATFORM,
+    VIRTIO_FS_F_NOTIFICATION,
 
     VHOST_INVALID_FEATURE_BIT
 };
@@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
      */
 }
 
-static void vuf_create_vqs(VirtIODevice *vdev)
+static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
     unsigned int i;
@@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
     /* Hiprio queue */
     fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
                                      vuf_handle_output);
+    /*
+     * Notification queue. Feature negotiation happens later. So at this
+     * point of time we don't know if driver will use notification queue
+     * or not.
+     */
+    if (notification_vq) {
+        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
+                                               vuf_handle_output);
+    }
 
     /* Request queues */
     fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
@@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
                                           vuf_handle_output);
     }
 
-    /* 1 high prio queue, plus the number configured */
-    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
+    /* 1 high prio queue, 1 notification queue plus the number configured */
+    if (notification_vq) {
+        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
+    } else {
+        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
+    }
     fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
 }
 
@@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
     virtio_delete_queue(fs->hiprio_vq);
     fs->hiprio_vq = NULL;
 
+    if (fs->notification_vq) {
+        virtio_delete_queue(fs->notification_vq);
+    }
+    fs->notification_vq = NULL;
+
     for (i = 0; i < fs->conf.num_request_queues; i++) {
         virtio_delete_queue(fs->req_vqs[i]);
     }
@@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
 
+    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
+
     return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
 }
 
+static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
+{
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
+        fs->notify_enabled = true;
+        /*
+         * If guest first booted with no notification queue support and
+         * later rebooted with kernel which supports notification, we
+         * can end up here
+         */
+        if (!fs->notification_vq) {
+            vuf_cleanup_vqs(vdev);
+            vuf_create_vqs(vdev, true);
+        }
+        return;
+    }
+
+    fs->notify_enabled = false;
+    if (!fs->notification_vq) {
+        return;
+    }
+    /*
+     * Driver does not support notification queue. Reconfigure queues
+     * and do not create notification queue.
+     */
+    vuf_cleanup_vqs(vdev);
+
+    /* Create queues again */
+    vuf_create_vqs(vdev, false);
+}
+
 static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
                                             bool mask)
 {
@@ -262,7 +315,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
     virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
                 sizeof(struct virtio_fs_config));
 
-    vuf_create_vqs(vdev);
+    vuf_create_vqs(vdev, true);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0, errp);
     if (ret < 0) {
@@ -327,6 +380,7 @@ static void vuf_class_init(ObjectClass *klass, void *data)
     vdc->realize = vuf_device_realize;
     vdc->unrealize = vuf_device_unrealize;
     vdc->get_features = vuf_get_features;
+    vdc->set_features = vuf_set_features;
     vdc->get_config = vuf_get_config;
     vdc->set_status = vuf_set_status;
     vdc->guest_notifier_mask = vuf_guest_notifier_mask;
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 0d62834c25..95dc0dd402 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -39,7 +39,9 @@ struct VHostUserFS {
     VhostUserState vhost_user;
     VirtQueue **req_vqs;
     VirtQueue *hiprio_vq;
+    VirtQueue *notification_vq;
     int32_t bootindex;
+    bool notify_enabled;
 
     /*< public >*/
 };
diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index 492e002181..4942d080da 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -73,6 +73,7 @@ struct fuse_session {
     int   vu_socketfd;
     struct fv_VuDev *virtio_dev;
     int thread_pool_size;
+    bool notify_enabled;
 };
 
 struct fuse_chan {
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index baead08b28..f5b87a508a 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu/iov.h"
 #include "qapi/error.h"
+#include "standard-headers/linux/virtio_fs.h"
 #include "fuse_i.h"
 #include "standard-headers/linux/fuse.h"
 #include "fuse_misc.h"
@@ -85,12 +86,25 @@ struct fv_VuDev {
 /* Callback from libvhost-user */
 static uint64_t fv_get_features(VuDev *dev)
 {
-    return 1ULL << VIRTIO_F_VERSION_1;
+    uint64_t features;
+
+    features = 1ull << VIRTIO_F_VERSION_1 |
+               1ull << VIRTIO_FS_F_NOTIFICATION;
+
+    return features;
 }
 
 /* Callback from libvhost-user */
 static void fv_set_features(VuDev *dev, uint64_t features)
 {
+    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
+    struct fuse_session *se = vud->se;
+
+    if ((1ull << VIRTIO_FS_F_NOTIFICATION) & features) {
+        se->notify_enabled = true;
+    } else {
+        se->notify_enabled = false;
+    }
 }
 
 /*
@@ -719,22 +733,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
 {
     int ret;
     struct fv_QueueInfo *ourqi;
+    struct fuse_session *se = vud->se;
 
     assert(qidx < vud->nqueues);
     ourqi = vud->qi[qidx];
 
-    /* Kill the thread */
-    if (eventfd_write(ourqi->kill_fd, 1)) {
-        fuse_log(FUSE_LOG_ERR, "Eventfd_write for queue %d: %s\n",
-                 qidx, strerror(errno));
-    }
-    ret = pthread_join(ourqi->thread, NULL);
-    if (ret) {
-        fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err %d\n",
-                 __func__, qidx, ret);
+    /* qidx == 1 is the notification queue if notifications are enabled */
+    if (!se->notify_enabled || qidx != 1) {
+        /* Kill the thread */
+        if (eventfd_write(ourqi->kill_fd, 1)) {
+            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
+        }
+        ret = pthread_join(ourqi->thread, NULL);
+        if (ret) {
+            fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err"
+                     " %d\n", __func__, qidx, ret);
+        }
+        close(ourqi->kill_fd);
     }
     pthread_mutex_destroy(&ourqi->vq_lock);
-    close(ourqi->kill_fd);
     ourqi->kick_fd = -1;
     g_free(vud->qi[qidx]);
     vud->qi[qidx] = NULL;
@@ -757,6 +774,9 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
 {
     struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
     struct fv_QueueInfo *ourqi;
+    int valid_queues = 2; /* One hiprio queue and one request queue */
+    bool notification_q = false;
+    struct fuse_session *se = vud->se;
 
     fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
              started);
@@ -768,10 +788,19 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
      * well-behaved client in mind and may not protect against all types of
      * races yet.
      */
-    if (qidx > 1) {
-        fuse_log(FUSE_LOG_ERR,
-                 "%s: multiple request queues not yet implemented, please only "
-                 "configure 1 request queue\n",
+    if (se->notify_enabled) {
+        valid_queues++;
+        /*
+         * If notification queue is enabled, then qidx 1 is notificaiton queue.
+         */
+        if (qidx == 1) {
+            notification_q = true;
+        }
+    }
+
+    if (qidx >= valid_queues) {
+        fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
+                 "implemented, please only configure 1 request queue\n",
                  __func__);
         exit(EXIT_FAILURE);
     }
@@ -793,11 +822,18 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
             assert(vud->qi[qidx]->kick_fd == -1);
         }
         ourqi = vud->qi[qidx];
+        pthread_mutex_init(&ourqi->vq_lock, NULL);
+        /*
+         * For notification queue, we don't have to start a thread yet.
+         */
+        if (notification_q) {
+            return;
+        }
+
         ourqi->kick_fd = dev->vq[qidx].kick_fd;
 
         ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
         assert(ourqi->kill_fd != -1);
-        pthread_mutex_init(&ourqi->vq_lock, NULL);
 
         if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
             fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
@@ -1048,7 +1084,7 @@ int virtio_session_mount(struct fuse_session *se)
     se->vu_socketfd = data_sock;
     se->virtio_dev->se = se;
     pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
-    if (!vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
+    if (!vu_init(&se->virtio_dev->dev, 3, se->vu_socketfd, fv_panic, NULL,
                  fv_set_watch, fv_remove_watch, &fv_iface)) {
         fuse_log(FUSE_LOG_ERR, "%s: vu_init failed\n", __func__);
         return -1;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 09/13] virtiofsd: Specify size of notification buffer using config space
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

Daemon specifies size of notification buffer needed and that should be
done using config space.

Only ->notify_buf_size value of config space comes from daemon. Rest of
it is filled by qemu device emulation code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
---
 hw/virtio/vhost-user-fs.c                  | 27 +++++++++++++++++++
 include/hw/virtio/vhost-user-fs.h          |  2 ++
 include/standard-headers/linux/virtio_fs.h |  2 ++
 tools/virtiofsd/fuse_virtio.c              | 31 ++++++++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 6bafcf0243..68a94708b4 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -36,15 +36,41 @@ static const int user_feature_bits[] = {
     VHOST_INVALID_FEATURE_BIT
 };
 
+static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
+{
+    return 0;
+}
+
+const VhostDevConfigOps fs_ops = {
+    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
+};
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
     struct virtio_fs_config fscfg = {};
+    Error *local_err = NULL;
+    int ret;
+
+    /*
+     * As of now we only get notification buffer size from device. And that's
+     * needed only if notification queue is enabled.
+     */
+    if (fs->notify_enabled) {
+        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
+                                   sizeof(struct virtio_fs_config),
+                                   &local_err);
+        if (ret) {
+            error_report_err(local_err);
+            return;
+        }
+    }
 
     memcpy((char *)fscfg.tag, fs->conf.tag,
            MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
 
     virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
+    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
 
     memcpy(config, &fscfg, sizeof(fscfg));
 }
@@ -316,6 +342,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
                 sizeof(struct virtio_fs_config));
 
     vuf_create_vqs(vdev, true);
+    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0, errp);
     if (ret < 0) {
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 95dc0dd402..3b114ee260 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -14,6 +14,7 @@
 #ifndef _QEMU_VHOST_USER_FS_H
 #define _QEMU_VHOST_USER_FS_H
 
+#include "standard-headers/linux/virtio_fs.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
@@ -37,6 +38,7 @@ struct VHostUserFS {
     struct vhost_virtqueue *vhost_vqs;
     struct vhost_dev vhost_dev;
     VhostUserState vhost_user;
+    struct virtio_fs_config fscfg;
     VirtQueue **req_vqs;
     VirtQueue *hiprio_vq;
     VirtQueue *notification_vq;
diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
index b7f015186e..867d18acf6 100644
--- a/include/standard-headers/linux/virtio_fs.h
+++ b/include/standard-headers/linux/virtio_fs.h
@@ -17,6 +17,8 @@ struct virtio_fs_config {
 
 	/* Number of request queues */
 	uint32_t num_request_queues;
+	/* Size of notification buffer */
+	uint32_t notify_buf_size;
 } QEMU_PACKED;
 
 /* For the id field in virtio_pci_shm_cap */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index f5b87a508a..3b720c5d4a 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -856,6 +856,35 @@ static bool fv_queue_order(VuDev *dev, int qidx)
     return false;
 }
 
+static uint64_t fv_get_protocol_features(VuDev *dev)
+{
+    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+}
+
+static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
+{
+    struct virtio_fs_config fscfg = {};
+    unsigned notify_size, roundto = 64;
+    union fuse_notify_union {
+        struct fuse_notify_poll_wakeup_out  wakeup_out;
+        struct fuse_notify_inval_inode_out  inode_out;
+        struct fuse_notify_inval_entry_out  entry_out;
+        struct fuse_notify_delete_out       delete_out;
+        struct fuse_notify_store_out        store_out;
+        struct fuse_notify_retrieve_out     retrieve_out;
+    };
+
+    notify_size = sizeof(struct fuse_out_header) +
+              sizeof(union fuse_notify_union);
+    notify_size = ((notify_size + roundto) / roundto) * roundto;
+
+    fscfg.notify_buf_size = notify_size;
+    memcpy(config, &fscfg, len);
+    fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
+             fscfg.notify_buf_size);
+    return 0;
+}
+
 static const VuDevIface fv_iface = {
     .get_features = fv_get_features,
     .set_features = fv_set_features,
@@ -864,6 +893,8 @@ static const VuDevIface fv_iface = {
     .queue_set_started = fv_queue_set_started,
 
     .queue_is_processed_in_order = fv_queue_order,
+    .get_protocol_features = fv_get_protocol_features,
+    .get_config = fv_get_config,
 };
 
 /*
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 09/13] virtiofsd: Specify size of notification buffer using config space
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

Daemon specifies size of notification buffer needed and that should be
done using config space.

Only ->notify_buf_size value of config space comes from daemon. Rest of
it is filled by qemu device emulation code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
---
 hw/virtio/vhost-user-fs.c                  | 27 +++++++++++++++++++
 include/hw/virtio/vhost-user-fs.h          |  2 ++
 include/standard-headers/linux/virtio_fs.h |  2 ++
 tools/virtiofsd/fuse_virtio.c              | 31 ++++++++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 6bafcf0243..68a94708b4 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -36,15 +36,41 @@ static const int user_feature_bits[] = {
     VHOST_INVALID_FEATURE_BIT
 };
 
+static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
+{
+    return 0;
+}
+
+const VhostDevConfigOps fs_ops = {
+    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
+};
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
     struct virtio_fs_config fscfg = {};
+    Error *local_err = NULL;
+    int ret;
+
+    /*
+     * As of now we only get notification buffer size from device. And that's
+     * needed only if notification queue is enabled.
+     */
+    if (fs->notify_enabled) {
+        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
+                                   sizeof(struct virtio_fs_config),
+                                   &local_err);
+        if (ret) {
+            error_report_err(local_err);
+            return;
+        }
+    }
 
     memcpy((char *)fscfg.tag, fs->conf.tag,
            MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
 
     virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
+    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
 
     memcpy(config, &fscfg, sizeof(fscfg));
 }
@@ -316,6 +342,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
                 sizeof(struct virtio_fs_config));
 
     vuf_create_vqs(vdev, true);
+    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0, errp);
     if (ret < 0) {
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 95dc0dd402..3b114ee260 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -14,6 +14,7 @@
 #ifndef _QEMU_VHOST_USER_FS_H
 #define _QEMU_VHOST_USER_FS_H
 
+#include "standard-headers/linux/virtio_fs.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
@@ -37,6 +38,7 @@ struct VHostUserFS {
     struct vhost_virtqueue *vhost_vqs;
     struct vhost_dev vhost_dev;
     VhostUserState vhost_user;
+    struct virtio_fs_config fscfg;
     VirtQueue **req_vqs;
     VirtQueue *hiprio_vq;
     VirtQueue *notification_vq;
diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
index b7f015186e..867d18acf6 100644
--- a/include/standard-headers/linux/virtio_fs.h
+++ b/include/standard-headers/linux/virtio_fs.h
@@ -17,6 +17,8 @@ struct virtio_fs_config {
 
 	/* Number of request queues */
 	uint32_t num_request_queues;
+	/* Size of notification buffer */
+	uint32_t notify_buf_size;
 } QEMU_PACKED;
 
 /* For the id field in virtio_pci_shm_cap */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index f5b87a508a..3b720c5d4a 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -856,6 +856,35 @@ static bool fv_queue_order(VuDev *dev, int qidx)
     return false;
 }
 
+static uint64_t fv_get_protocol_features(VuDev *dev)
+{
+    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+}
+
+static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
+{
+    struct virtio_fs_config fscfg = {};
+    unsigned notify_size, roundto = 64;
+    union fuse_notify_union {
+        struct fuse_notify_poll_wakeup_out  wakeup_out;
+        struct fuse_notify_inval_inode_out  inode_out;
+        struct fuse_notify_inval_entry_out  entry_out;
+        struct fuse_notify_delete_out       delete_out;
+        struct fuse_notify_store_out        store_out;
+        struct fuse_notify_retrieve_out     retrieve_out;
+    };
+
+    notify_size = sizeof(struct fuse_out_header) +
+              sizeof(union fuse_notify_union);
+    notify_size = ((notify_size + roundto) / roundto) * roundto;
+
+    fscfg.notify_buf_size = notify_size;
+    memcpy(config, &fscfg, len);
+    fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
+             fscfg.notify_buf_size);
+    return 0;
+}
+
 static const VuDevIface fv_iface = {
     .get_features = fv_get_features,
     .set_features = fv_set_features,
@@ -864,6 +893,8 @@ static const VuDevIface fv_iface = {
     .queue_set_started = fv_queue_set_started,
 
     .queue_is_processed_in_order = fv_queue_order,
+    .get_protocol_features = fv_get_protocol_features,
+    .get_config = fv_get_config,
 };
 
 /*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

Add a new custom threadpool using posix threads that specifically
service locking requests.

In the case of a fcntl(SETLKW) request, if the guest is waiting
for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
unblocks the blocked threads by sending a signal to them and waking
them up.

The current threadpool (GThreadPool) is not adequate to service the
locking requests that result in a thread blocking. That is because
GLib does not provide an API to cancel the request while it is
serviced by a thread. In addition, a user might be running virtiofsd
without a threadpool (--thread-pool-size=0), thus a locking request
that blocks, will block the main virtqueue thread that services requests
from servicing any other requests.

The only exception occurs when the lock is of type F_UNLCK. In this case
the request is serviced by the main virtqueue thread or a GThreadPool
thread to avoid a deadlock, when all the threads in the custom threadpool
are blocked.

Then virtiofsd proceeds to cleanup the state of the threads, release
them back to the system and re-initialize.

Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c         |  90 ++++++-
 tools/virtiofsd/meson.build           |   1 +
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 tools/virtiofsd/tpool.c               | 331 ++++++++++++++++++++++++++
 tools/virtiofsd/tpool.h               |  18 ++
 5 files changed, 440 insertions(+), 1 deletion(-)
 create mode 100644 tools/virtiofsd/tpool.c
 create mode 100644 tools/virtiofsd/tpool.h

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 3b720c5d4a..c67c2e0e7a 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -20,6 +20,7 @@
 #include "fuse_misc.h"
 #include "fuse_opt.h"
 #include "fuse_virtio.h"
+#include "tpool.h"
 
 #include <sys/eventfd.h>
 #include <sys/socket.h>
@@ -612,6 +613,60 @@ out:
     free(req);
 }
 
+/*
+ * If the request is a locking request, use a custom locking thread pool.
+ */
+static bool use_lock_tpool(gpointer data, gpointer user_data)
+{
+    struct fv_QueueInfo *qi = user_data;
+    struct fuse_session *se = qi->virtio_dev->se;
+    FVRequest *req = data;
+    VuVirtqElement *elem = &req->elem;
+    struct fuse_buf fbuf = {};
+    struct fuse_in_header *inhp;
+    struct fuse_lk_in *lkinp;
+    size_t lk_req_len;
+    /* The 'out' part of the elem is from qemu */
+    unsigned int out_num = elem->out_num;
+    struct iovec *out_sg = elem->out_sg;
+    size_t out_len = iov_size(out_sg, out_num);
+    bool use_custom_tpool = false;
+
+    /*
+     * If notifications are not enabled, no point in using cusotm lock
+     * thread pool.
+     */
+    if (!se->notify_enabled) {
+        return false;
+    }
+
+    assert(se->bufsize > sizeof(struct fuse_in_header));
+    lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in);
+
+    if (out_len < lk_req_len) {
+        return false;
+    }
+
+    fbuf.mem = g_malloc(se->bufsize);
+    copy_from_iov(&fbuf, out_num, out_sg, lk_req_len);
+
+    inhp = fbuf.mem;
+    if (inhp->opcode != FUSE_SETLKW) {
+        goto out;
+    }
+
+    lkinp = fbuf.mem + sizeof(struct fuse_in_header);
+    if (lkinp->lk.type == F_UNLCK) {
+        goto out;
+    }
+
+    /* Its a blocking lock request. Use custom thread pool */
+    use_custom_tpool = true;
+out:
+    g_free(fbuf.mem);
+    return use_custom_tpool;
+}
+
 /* Thread function for individual queues, created when a queue is 'started' */
 static void *fv_queue_thread(void *opaque)
 {
@@ -619,6 +674,7 @@ static void *fv_queue_thread(void *opaque)
     struct VuDev *dev = &qi->virtio_dev->dev;
     struct VuVirtq *q = vu_get_queue(dev, qi->qidx);
     struct fuse_session *se = qi->virtio_dev->se;
+    struct fv_ThreadPool *lk_tpool = NULL;
     GThreadPool *pool = NULL;
     GList *req_list = NULL;
 
@@ -631,6 +687,24 @@ static void *fv_queue_thread(void *opaque)
             fuse_log(FUSE_LOG_ERR, "%s: g_thread_pool_new failed\n", __func__);
             return NULL;
         }
+
+    }
+
+    /*
+     * Create the custom thread pool to handle blocking locking requests.
+     * Do not create for hiprio queue (qidx=0).
+     */
+    if (qi->qidx) {
+        fuse_log(FUSE_LOG_DEBUG, "%s: Creating a locking thread pool for"
+                 " Queue %d with size %d\n", __func__, qi->qidx, 4);
+        lk_tpool = fv_thread_pool_init(4);
+        if (!lk_tpool) {
+            fuse_log(FUSE_LOG_ERR, "%s: fv_thread_pool failed\n", __func__);
+            if (pool) {
+                g_thread_pool_free(pool, FALSE, TRUE);
+            }
+            return NULL;
+        }
     }
 
     fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
@@ -703,7 +777,17 @@ static void *fv_queue_thread(void *opaque)
 
             req->reply_sent = false;
 
-            if (!se->thread_pool_size) {
+            /*
+             * In every case we get the opcode of the request and check if it
+             * is a locking request. If yes, we assign the request to the
+             * custom thread pool, with the exception when the lock is of type
+             * F_UNCLK. In this case to avoid a deadlock when all the custom
+             * threads are blocked, the request is serviced by the main
+             * virtqueue thread or a thread in GThreadPool
+             */
+            if (use_lock_tpool(req, qi)) {
+                fv_thread_pool_push(lk_tpool, fv_queue_worker, req, qi);
+            } else if (!se->thread_pool_size) {
                 req_list = g_list_prepend(req_list, req);
             } else {
                 g_thread_pool_push(pool, req, NULL);
@@ -726,6 +810,10 @@ static void *fv_queue_thread(void *opaque)
         g_thread_pool_free(pool, FALSE, TRUE);
     }
 
+    if (lk_tpool) {
+        fv_thread_pool_destroy(lk_tpool);
+    }
+
     return NULL;
 }
 
diff --git a/tools/virtiofsd/meson.build b/tools/virtiofsd/meson.build
index c134ba633f..203cd5613a 100644
--- a/tools/virtiofsd/meson.build
+++ b/tools/virtiofsd/meson.build
@@ -6,6 +6,7 @@ executable('virtiofsd', files(
   'fuse_signals.c',
   'fuse_virtio.c',
   'helper.c',
+  'tpool.c',
   'passthrough_ll.c',
   'passthrough_seccomp.c'),
   dependencies: [seccomp, qemuutil, libcap_ng, vhost_user],
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index a3ce9f898d..cd24b40b78 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -116,6 +116,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(write),
     SCMP_SYS(writev),
     SCMP_SYS(umask),
+    SCMP_SYS(nanosleep),
 };
 
 /* Syscalls used when --syslog is enabled */
diff --git a/tools/virtiofsd/tpool.c b/tools/virtiofsd/tpool.c
new file mode 100644
index 0000000000..f9aa41b0c5
--- /dev/null
+++ b/tools/virtiofsd/tpool.c
@@ -0,0 +1,331 @@
+/*
+ * custom threadpool for virtiofsd
+ *
+ * Copyright (C) 2021 Red Hat, Inc.
+ *
+ * Authors:
+ *     Ioannis Angelakopoulos <iangelak@redhat.com>
+ *     Vivek Goyal <vgoyal@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <pthread.h>
+#include <glib.h>
+#include <stdbool.h>
+#include <errno.h>
+#include "tpool.h"
+#include "fuse_log.h"
+
+struct fv_PoolReq {
+    struct fv_PoolReq *next;                        /* pointer to next task */
+    void (*worker_func)(void *arg1, void *arg2);    /* worker function */
+    void *arg1;                                     /* 1st arg: Request */
+    void *arg2;                                     /* 2nd arg: Virtqueue */
+};
+
+struct fv_PoolReqQueue {
+    pthread_mutex_t lock;
+    GQueue queue;
+    pthread_cond_t notify;                         /* Conditional variable */
+};
+
+struct fv_PoolThread {
+    pthread_t pthread;
+    int alive;
+    int id;
+    struct fv_ThreadPool *tpool;
+};
+
+struct fv_ThreadPool {
+    struct fv_PoolThread **threads;
+    struct fv_PoolReqQueue *req_queue;
+    pthread_mutex_t tp_lock;
+
+    /* Total number of threads created */
+    int num_threads;
+
+    /* Number of threads running now */
+    int nr_running;
+    int destroy_pool;
+};
+
+/* Initialize the Locking Request Queue */
+static struct fv_PoolReqQueue *fv_pool_request_queue_init(void)
+{
+    struct fv_PoolReqQueue *rq;
+
+    rq = g_new0(struct fv_PoolReqQueue, 1);
+    pthread_mutex_init(&(rq->lock), NULL);
+    pthread_cond_init(&(rq->notify), NULL);
+    g_queue_init(&rq->queue);
+    return rq;
+}
+
+/* Push a new locking request to the queue*/
+void fv_thread_pool_push(struct fv_ThreadPool *tpool,
+                         void (*worker_func)(void *, void *),
+                         void *arg1, void *arg2)
+{
+    struct fv_PoolReq *newreq;
+    struct fv_PoolReqQueue *rq = tpool->req_queue;
+
+    newreq = g_new(struct fv_PoolReq, 1);
+    newreq->worker_func = worker_func;
+    newreq->arg1 = arg1;
+    newreq->arg2 = arg2;
+    newreq->next = NULL;
+
+    /* Now add the request to the queue */
+    pthread_mutex_lock(&rq->lock);
+    g_queue_push_tail(&rq->queue, newreq);
+
+    /* Notify the threads that a request is available */
+    pthread_cond_signal(&rq->notify);
+    pthread_mutex_unlock(&rq->lock);
+
+}
+
+/* Pop a locking request from the queue*/
+static struct fv_PoolReq *fv_tpool_pop(struct fv_ThreadPool *tpool)
+{
+    struct fv_PoolReq *pool_req = NULL;
+    struct fv_PoolReqQueue *rq = tpool->req_queue;
+
+    pthread_mutex_lock(&rq->lock);
+
+    pool_req = g_queue_pop_head(&rq->queue);
+
+    if (!g_queue_is_empty(&rq->queue)) {
+        pthread_cond_signal(&rq->notify);
+    }
+    pthread_mutex_unlock(&rq->lock);
+
+    return pool_req;
+}
+
+static void fv_pool_request_queue_destroy(struct fv_ThreadPool *tpool)
+{
+    struct fv_PoolReq *pool_req;
+
+    while ((pool_req = fv_tpool_pop(tpool))) {
+        g_free(pool_req);
+    }
+
+    /* Now free the actual queue itself */
+    g_free(tpool->req_queue);
+}
+
+/*
+ * Signal handler for blcking threads that wait on a remote lock to be released
+ * Called when virtiofsd does cleanup and wants to wake up these threads
+ */
+static void fv_thread_signal_handler(int signal)
+{
+    fuse_log(FUSE_LOG_DEBUG, "Thread received a signal.\n");
+    return;
+}
+
+static bool is_pool_stopping(struct fv_ThreadPool *tpool)
+{
+    bool destroy = false;
+
+    pthread_mutex_lock(&tpool->tp_lock);
+    destroy = tpool->destroy_pool;
+    pthread_mutex_unlock(&tpool->tp_lock);
+
+    return destroy;
+}
+
+static void *fv_thread_do_work(void *thread)
+{
+    struct fv_PoolThread *worker = (struct fv_PoolThread *)thread;
+    struct fv_ThreadPool *tpool = worker->tpool;
+    struct fv_PoolReq *pool_request;
+    /* Actual worker function and arguments. Same as non locking requests */
+    void (*worker_func)(void*, void*);
+    void *arg1;
+    void *arg2;
+
+    while (1) {
+        if (is_pool_stopping(tpool)) {
+            break;
+        }
+
+        /*
+         * Get the queue lock first so that we can wait on the conditional
+         * variable afterwards
+         */
+        pthread_mutex_lock(&tpool->req_queue->lock);
+
+        /* Wait on the condition variable until it is available */
+        while (g_queue_is_empty(&tpool->req_queue->queue) &&
+               !is_pool_stopping(tpool)) {
+            pthread_cond_wait(&tpool->req_queue->notify,
+                              &tpool->req_queue->lock);
+        }
+
+        /* Unlock the queue for other threads */
+        pthread_mutex_unlock(&tpool->req_queue->lock);
+
+        if (is_pool_stopping(tpool)) {
+            break;
+        }
+
+        /* Now the request must be serviced */
+        pool_request = fv_tpool_pop(tpool);
+        if (pool_request) {
+            fuse_log(FUSE_LOG_DEBUG, "%s: Locking Thread:%d handling"
+                    " a request\n", __func__, worker->id);
+            worker_func = pool_request->worker_func;
+            arg1 = pool_request->arg1;
+            arg2 = pool_request->arg2;
+            worker_func(arg1, arg2);
+            g_free(pool_request);
+        }
+    }
+
+    /* Mark the thread as inactive */
+    pthread_mutex_lock(&tpool->tp_lock);
+    tpool->threads[worker->id]->alive = 0;
+    tpool->nr_running--;
+    pthread_mutex_unlock(&tpool->tp_lock);
+
+    return NULL;
+}
+
+/* Create a single thread that handles locking requests */
+static int fv_worker_thread_init(struct fv_ThreadPool *tpool,
+                                 struct fv_PoolThread **thread, int id)
+{
+    struct fv_PoolThread *worker;
+    int ret;
+
+    worker = g_new(struct fv_PoolThread, 1);
+    worker->tpool = tpool;
+    worker->id = id;
+    worker->alive = 1;
+
+    ret = pthread_create(&worker->pthread, NULL, fv_thread_do_work,
+                         worker);
+    if (ret) {
+        fuse_log(FUSE_LOG_ERR, "pthread_create() failed with err=%d\n", ret);
+        g_free(worker);
+        return ret;
+    }
+    pthread_detach(worker->pthread);
+    *thread = worker;
+    return 0;
+}
+
+static void send_signal_all(struct fv_ThreadPool *tpool)
+{
+    int i;
+
+    pthread_mutex_lock(&tpool->tp_lock);
+    for (i = 0; i < tpool->num_threads; i++) {
+        if (tpool->threads[i]->alive) {
+            pthread_kill(tpool->threads[i]->pthread, SIGUSR1);
+        }
+    }
+    pthread_mutex_unlock(&tpool->tp_lock);
+}
+
+static void do_pool_destroy(struct fv_ThreadPool *tpool, bool send_signal)
+{
+    int i, nr_running;
+
+    /* We want to destroy the pool */
+    pthread_mutex_lock(&tpool->tp_lock);
+    tpool->destroy_pool = 1;
+    pthread_mutex_unlock(&tpool->tp_lock);
+
+    /* Wake up threads waiting for requests */
+    pthread_mutex_lock(&tpool->req_queue->lock);
+    pthread_cond_broadcast(&tpool->req_queue->notify);
+    pthread_mutex_unlock(&tpool->req_queue->lock);
+
+    /* Send Signal and wait for all threads to exit. */
+    while (1) {
+        if (send_signal) {
+            send_signal_all(tpool);
+        }
+        pthread_mutex_lock(&tpool->tp_lock);
+        nr_running = tpool->nr_running;
+        pthread_mutex_unlock(&tpool->tp_lock);
+        if (!nr_running) {
+            break;
+        }
+        g_usleep(10000);
+    }
+
+    /* Destroy the locking request queue */
+    fv_pool_request_queue_destroy(tpool);
+    for (i = 0; i < tpool->num_threads; i++) {
+        g_free(tpool->threads[i]);
+    }
+
+    /* Now free the threadpool */
+    g_free(tpool->threads);
+    g_free(tpool);
+}
+
+void fv_thread_pool_destroy(struct fv_ThreadPool *tpool)
+{
+    if (!tpool) {
+        return;
+    }
+    do_pool_destroy(tpool, true);
+}
+
+static int register_sig_handler(void)
+{
+    struct sigaction sa;
+    sigemptyset(&sa.sa_mask);
+    sa.sa_flags = 0;
+    sa.sa_handler = fv_thread_signal_handler;
+    if (sigaction(SIGUSR1, &sa, NULL) == -1) {
+        fuse_log(FUSE_LOG_ERR, "Cannot register the signal handler:%s\n",
+                 strerror(errno));
+        return 1;
+    }
+    return 0;
+}
+
+/* Initialize the thread pool for the locking posix threads */
+struct fv_ThreadPool *fv_thread_pool_init(unsigned int thread_num)
+{
+    struct fv_ThreadPool *tpool = NULL;
+    int i, ret;
+
+    if (!thread_num) {
+        thread_num = 1;
+    }
+
+    if (register_sig_handler()) {
+        return NULL;
+    }
+    tpool = g_new0(struct fv_ThreadPool, 1);
+    pthread_mutex_init(&(tpool->tp_lock), NULL);
+
+    /* Initialize the Lock Request Queue */
+    tpool->req_queue = fv_pool_request_queue_init();
+
+    /* Create the threads in the pool */
+    tpool->threads = g_new(struct fv_PoolThread *, thread_num);
+
+    for (i = 0; i < thread_num; i++) {
+        ret = fv_worker_thread_init(tpool, &tpool->threads[i], i);
+        if (ret) {
+            goto out_err;
+        }
+        tpool->num_threads++;
+        tpool->nr_running++;
+    }
+
+    return tpool;
+out_err:
+    /* An error occurred. Cleanup and return NULL */
+    do_pool_destroy(tpool, false);
+    return NULL;
+}
diff --git a/tools/virtiofsd/tpool.h b/tools/virtiofsd/tpool.h
new file mode 100644
index 0000000000..48d67e9a50
--- /dev/null
+++ b/tools/virtiofsd/tpool.h
@@ -0,0 +1,18 @@
+/*
+ * custom threadpool for virtiofsd
+ *
+ * Copyright (C) 2021 Red Hat, Inc.
+ *
+ * Authors:
+ *     Ioannis Angelakopoulos <iangelak@redhat.com>
+ *     Vivek Goyal <vgoyal@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+struct fv_ThreadPool;
+
+struct fv_ThreadPool *fv_thread_pool_init(unsigned int thread_num);
+void fv_thread_pool_destroy(struct fv_ThreadPool *tpool);
+void fv_thread_pool_push(struct fv_ThreadPool *tpool,
+                   void (*worker_func)(void *, void *), void *arg1, void *arg2);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

Add a new custom threadpool using posix threads that specifically
service locking requests.

In the case of a fcntl(SETLKW) request, if the guest is waiting
for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
unblocks the blocked threads by sending a signal to them and waking
them up.

The current threadpool (GThreadPool) is not adequate to service the
locking requests that result in a thread blocking. That is because
GLib does not provide an API to cancel the request while it is
serviced by a thread. In addition, a user might be running virtiofsd
without a threadpool (--thread-pool-size=0), thus a locking request
that blocks, will block the main virtqueue thread that services requests
from servicing any other requests.

The only exception occurs when the lock is of type F_UNLCK. In this case
the request is serviced by the main virtqueue thread or a GThreadPool
thread to avoid a deadlock, when all the threads in the custom threadpool
are blocked.

Then virtiofsd proceeds to cleanup the state of the threads, release
them back to the system and re-initialize.

Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c         |  90 ++++++-
 tools/virtiofsd/meson.build           |   1 +
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 tools/virtiofsd/tpool.c               | 331 ++++++++++++++++++++++++++
 tools/virtiofsd/tpool.h               |  18 ++
 5 files changed, 440 insertions(+), 1 deletion(-)
 create mode 100644 tools/virtiofsd/tpool.c
 create mode 100644 tools/virtiofsd/tpool.h

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 3b720c5d4a..c67c2e0e7a 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -20,6 +20,7 @@
 #include "fuse_misc.h"
 #include "fuse_opt.h"
 #include "fuse_virtio.h"
+#include "tpool.h"
 
 #include <sys/eventfd.h>
 #include <sys/socket.h>
@@ -612,6 +613,60 @@ out:
     free(req);
 }
 
+/*
+ * If the request is a locking request, use a custom locking thread pool.
+ */
+static bool use_lock_tpool(gpointer data, gpointer user_data)
+{
+    struct fv_QueueInfo *qi = user_data;
+    struct fuse_session *se = qi->virtio_dev->se;
+    FVRequest *req = data;
+    VuVirtqElement *elem = &req->elem;
+    struct fuse_buf fbuf = {};
+    struct fuse_in_header *inhp;
+    struct fuse_lk_in *lkinp;
+    size_t lk_req_len;
+    /* The 'out' part of the elem is from qemu */
+    unsigned int out_num = elem->out_num;
+    struct iovec *out_sg = elem->out_sg;
+    size_t out_len = iov_size(out_sg, out_num);
+    bool use_custom_tpool = false;
+
+    /*
+     * If notifications are not enabled, no point in using cusotm lock
+     * thread pool.
+     */
+    if (!se->notify_enabled) {
+        return false;
+    }
+
+    assert(se->bufsize > sizeof(struct fuse_in_header));
+    lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in);
+
+    if (out_len < lk_req_len) {
+        return false;
+    }
+
+    fbuf.mem = g_malloc(se->bufsize);
+    copy_from_iov(&fbuf, out_num, out_sg, lk_req_len);
+
+    inhp = fbuf.mem;
+    if (inhp->opcode != FUSE_SETLKW) {
+        goto out;
+    }
+
+    lkinp = fbuf.mem + sizeof(struct fuse_in_header);
+    if (lkinp->lk.type == F_UNLCK) {
+        goto out;
+    }
+
+    /* Its a blocking lock request. Use custom thread pool */
+    use_custom_tpool = true;
+out:
+    g_free(fbuf.mem);
+    return use_custom_tpool;
+}
+
 /* Thread function for individual queues, created when a queue is 'started' */
 static void *fv_queue_thread(void *opaque)
 {
@@ -619,6 +674,7 @@ static void *fv_queue_thread(void *opaque)
     struct VuDev *dev = &qi->virtio_dev->dev;
     struct VuVirtq *q = vu_get_queue(dev, qi->qidx);
     struct fuse_session *se = qi->virtio_dev->se;
+    struct fv_ThreadPool *lk_tpool = NULL;
     GThreadPool *pool = NULL;
     GList *req_list = NULL;
 
@@ -631,6 +687,24 @@ static void *fv_queue_thread(void *opaque)
             fuse_log(FUSE_LOG_ERR, "%s: g_thread_pool_new failed\n", __func__);
             return NULL;
         }
+
+    }
+
+    /*
+     * Create the custom thread pool to handle blocking locking requests.
+     * Do not create for hiprio queue (qidx=0).
+     */
+    if (qi->qidx) {
+        fuse_log(FUSE_LOG_DEBUG, "%s: Creating a locking thread pool for"
+                 " Queue %d with size %d\n", __func__, qi->qidx, 4);
+        lk_tpool = fv_thread_pool_init(4);
+        if (!lk_tpool) {
+            fuse_log(FUSE_LOG_ERR, "%s: fv_thread_pool failed\n", __func__);
+            if (pool) {
+                g_thread_pool_free(pool, FALSE, TRUE);
+            }
+            return NULL;
+        }
     }
 
     fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
@@ -703,7 +777,17 @@ static void *fv_queue_thread(void *opaque)
 
             req->reply_sent = false;
 
-            if (!se->thread_pool_size) {
+            /*
+             * In every case we get the opcode of the request and check if it
+             * is a locking request. If yes, we assign the request to the
+             * custom thread pool, with the exception when the lock is of type
+             * F_UNCLK. In this case to avoid a deadlock when all the custom
+             * threads are blocked, the request is serviced by the main
+             * virtqueue thread or a thread in GThreadPool
+             */
+            if (use_lock_tpool(req, qi)) {
+                fv_thread_pool_push(lk_tpool, fv_queue_worker, req, qi);
+            } else if (!se->thread_pool_size) {
                 req_list = g_list_prepend(req_list, req);
             } else {
                 g_thread_pool_push(pool, req, NULL);
@@ -726,6 +810,10 @@ static void *fv_queue_thread(void *opaque)
         g_thread_pool_free(pool, FALSE, TRUE);
     }
 
+    if (lk_tpool) {
+        fv_thread_pool_destroy(lk_tpool);
+    }
+
     return NULL;
 }
 
diff --git a/tools/virtiofsd/meson.build b/tools/virtiofsd/meson.build
index c134ba633f..203cd5613a 100644
--- a/tools/virtiofsd/meson.build
+++ b/tools/virtiofsd/meson.build
@@ -6,6 +6,7 @@ executable('virtiofsd', files(
   'fuse_signals.c',
   'fuse_virtio.c',
   'helper.c',
+  'tpool.c',
   'passthrough_ll.c',
   'passthrough_seccomp.c'),
   dependencies: [seccomp, qemuutil, libcap_ng, vhost_user],
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index a3ce9f898d..cd24b40b78 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -116,6 +116,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(write),
     SCMP_SYS(writev),
     SCMP_SYS(umask),
+    SCMP_SYS(nanosleep),
 };
 
 /* Syscalls used when --syslog is enabled */
diff --git a/tools/virtiofsd/tpool.c b/tools/virtiofsd/tpool.c
new file mode 100644
index 0000000000..f9aa41b0c5
--- /dev/null
+++ b/tools/virtiofsd/tpool.c
@@ -0,0 +1,331 @@
+/*
+ * custom threadpool for virtiofsd
+ *
+ * Copyright (C) 2021 Red Hat, Inc.
+ *
+ * Authors:
+ *     Ioannis Angelakopoulos <iangelak@redhat.com>
+ *     Vivek Goyal <vgoyal@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <pthread.h>
+#include <glib.h>
+#include <stdbool.h>
+#include <errno.h>
+#include "tpool.h"
+#include "fuse_log.h"
+
+struct fv_PoolReq {
+    struct fv_PoolReq *next;                        /* pointer to next task */
+    void (*worker_func)(void *arg1, void *arg2);    /* worker function */
+    void *arg1;                                     /* 1st arg: Request */
+    void *arg2;                                     /* 2nd arg: Virtqueue */
+};
+
+struct fv_PoolReqQueue {
+    pthread_mutex_t lock;
+    GQueue queue;
+    pthread_cond_t notify;                         /* Conditional variable */
+};
+
+struct fv_PoolThread {
+    pthread_t pthread;
+    int alive;
+    int id;
+    struct fv_ThreadPool *tpool;
+};
+
+struct fv_ThreadPool {
+    struct fv_PoolThread **threads;
+    struct fv_PoolReqQueue *req_queue;
+    pthread_mutex_t tp_lock;
+
+    /* Total number of threads created */
+    int num_threads;
+
+    /* Number of threads running now */
+    int nr_running;
+    int destroy_pool;
+};
+
+/* Initialize the Locking Request Queue */
+static struct fv_PoolReqQueue *fv_pool_request_queue_init(void)
+{
+    struct fv_PoolReqQueue *rq;
+
+    rq = g_new0(struct fv_PoolReqQueue, 1);
+    pthread_mutex_init(&(rq->lock), NULL);
+    pthread_cond_init(&(rq->notify), NULL);
+    g_queue_init(&rq->queue);
+    return rq;
+}
+
+/* Push a new locking request to the queue*/
+void fv_thread_pool_push(struct fv_ThreadPool *tpool,
+                         void (*worker_func)(void *, void *),
+                         void *arg1, void *arg2)
+{
+    struct fv_PoolReq *newreq;
+    struct fv_PoolReqQueue *rq = tpool->req_queue;
+
+    newreq = g_new(struct fv_PoolReq, 1);
+    newreq->worker_func = worker_func;
+    newreq->arg1 = arg1;
+    newreq->arg2 = arg2;
+    newreq->next = NULL;
+
+    /* Now add the request to the queue */
+    pthread_mutex_lock(&rq->lock);
+    g_queue_push_tail(&rq->queue, newreq);
+
+    /* Notify the threads that a request is available */
+    pthread_cond_signal(&rq->notify);
+    pthread_mutex_unlock(&rq->lock);
+
+}
+
+/* Pop a locking request from the queue*/
+static struct fv_PoolReq *fv_tpool_pop(struct fv_ThreadPool *tpool)
+{
+    struct fv_PoolReq *pool_req = NULL;
+    struct fv_PoolReqQueue *rq = tpool->req_queue;
+
+    pthread_mutex_lock(&rq->lock);
+
+    pool_req = g_queue_pop_head(&rq->queue);
+
+    if (!g_queue_is_empty(&rq->queue)) {
+        pthread_cond_signal(&rq->notify);
+    }
+    pthread_mutex_unlock(&rq->lock);
+
+    return pool_req;
+}
+
+static void fv_pool_request_queue_destroy(struct fv_ThreadPool *tpool)
+{
+    struct fv_PoolReq *pool_req;
+
+    while ((pool_req = fv_tpool_pop(tpool))) {
+        g_free(pool_req);
+    }
+
+    /* Now free the actual queue itself */
+    g_free(tpool->req_queue);
+}
+
+/*
+ * Signal handler for blcking threads that wait on a remote lock to be released
+ * Called when virtiofsd does cleanup and wants to wake up these threads
+ */
+static void fv_thread_signal_handler(int signal)
+{
+    fuse_log(FUSE_LOG_DEBUG, "Thread received a signal.\n");
+    return;
+}
+
+static bool is_pool_stopping(struct fv_ThreadPool *tpool)
+{
+    bool destroy = false;
+
+    pthread_mutex_lock(&tpool->tp_lock);
+    destroy = tpool->destroy_pool;
+    pthread_mutex_unlock(&tpool->tp_lock);
+
+    return destroy;
+}
+
+static void *fv_thread_do_work(void *thread)
+{
+    struct fv_PoolThread *worker = (struct fv_PoolThread *)thread;
+    struct fv_ThreadPool *tpool = worker->tpool;
+    struct fv_PoolReq *pool_request;
+    /* Actual worker function and arguments. Same as non locking requests */
+    void (*worker_func)(void*, void*);
+    void *arg1;
+    void *arg2;
+
+    while (1) {
+        if (is_pool_stopping(tpool)) {
+            break;
+        }
+
+        /*
+         * Get the queue lock first so that we can wait on the conditional
+         * variable afterwards
+         */
+        pthread_mutex_lock(&tpool->req_queue->lock);
+
+        /* Wait on the condition variable until it is available */
+        while (g_queue_is_empty(&tpool->req_queue->queue) &&
+               !is_pool_stopping(tpool)) {
+            pthread_cond_wait(&tpool->req_queue->notify,
+                              &tpool->req_queue->lock);
+        }
+
+        /* Unlock the queue for other threads */
+        pthread_mutex_unlock(&tpool->req_queue->lock);
+
+        if (is_pool_stopping(tpool)) {
+            break;
+        }
+
+        /* Now the request must be serviced */
+        pool_request = fv_tpool_pop(tpool);
+        if (pool_request) {
+            fuse_log(FUSE_LOG_DEBUG, "%s: Locking Thread:%d handling"
+                    " a request\n", __func__, worker->id);
+            worker_func = pool_request->worker_func;
+            arg1 = pool_request->arg1;
+            arg2 = pool_request->arg2;
+            worker_func(arg1, arg2);
+            g_free(pool_request);
+        }
+    }
+
+    /* Mark the thread as inactive */
+    pthread_mutex_lock(&tpool->tp_lock);
+    tpool->threads[worker->id]->alive = 0;
+    tpool->nr_running--;
+    pthread_mutex_unlock(&tpool->tp_lock);
+
+    return NULL;
+}
+
+/* Create a single thread that handles locking requests */
+static int fv_worker_thread_init(struct fv_ThreadPool *tpool,
+                                 struct fv_PoolThread **thread, int id)
+{
+    struct fv_PoolThread *worker;
+    int ret;
+
+    worker = g_new(struct fv_PoolThread, 1);
+    worker->tpool = tpool;
+    worker->id = id;
+    worker->alive = 1;
+
+    ret = pthread_create(&worker->pthread, NULL, fv_thread_do_work,
+                         worker);
+    if (ret) {
+        fuse_log(FUSE_LOG_ERR, "pthread_create() failed with err=%d\n", ret);
+        g_free(worker);
+        return ret;
+    }
+    pthread_detach(worker->pthread);
+    *thread = worker;
+    return 0;
+}
+
+static void send_signal_all(struct fv_ThreadPool *tpool)
+{
+    int i;
+
+    pthread_mutex_lock(&tpool->tp_lock);
+    for (i = 0; i < tpool->num_threads; i++) {
+        if (tpool->threads[i]->alive) {
+            pthread_kill(tpool->threads[i]->pthread, SIGUSR1);
+        }
+    }
+    pthread_mutex_unlock(&tpool->tp_lock);
+}
+
+static void do_pool_destroy(struct fv_ThreadPool *tpool, bool send_signal)
+{
+    int i, nr_running;
+
+    /* We want to destroy the pool */
+    pthread_mutex_lock(&tpool->tp_lock);
+    tpool->destroy_pool = 1;
+    pthread_mutex_unlock(&tpool->tp_lock);
+
+    /* Wake up threads waiting for requests */
+    pthread_mutex_lock(&tpool->req_queue->lock);
+    pthread_cond_broadcast(&tpool->req_queue->notify);
+    pthread_mutex_unlock(&tpool->req_queue->lock);
+
+    /* Send Signal and wait for all threads to exit. */
+    while (1) {
+        if (send_signal) {
+            send_signal_all(tpool);
+        }
+        pthread_mutex_lock(&tpool->tp_lock);
+        nr_running = tpool->nr_running;
+        pthread_mutex_unlock(&tpool->tp_lock);
+        if (!nr_running) {
+            break;
+        }
+        g_usleep(10000);
+    }
+
+    /* Destroy the locking request queue */
+    fv_pool_request_queue_destroy(tpool);
+    for (i = 0; i < tpool->num_threads; i++) {
+        g_free(tpool->threads[i]);
+    }
+
+    /* Now free the threadpool */
+    g_free(tpool->threads);
+    g_free(tpool);
+}
+
+void fv_thread_pool_destroy(struct fv_ThreadPool *tpool)
+{
+    if (!tpool) {
+        return;
+    }
+    do_pool_destroy(tpool, true);
+}
+
+static int register_sig_handler(void)
+{
+    struct sigaction sa;
+    sigemptyset(&sa.sa_mask);
+    sa.sa_flags = 0;
+    sa.sa_handler = fv_thread_signal_handler;
+    if (sigaction(SIGUSR1, &sa, NULL) == -1) {
+        fuse_log(FUSE_LOG_ERR, "Cannot register the signal handler:%s\n",
+                 strerror(errno));
+        return 1;
+    }
+    return 0;
+}
+
+/* Initialize the thread pool for the locking posix threads */
+struct fv_ThreadPool *fv_thread_pool_init(unsigned int thread_num)
+{
+    struct fv_ThreadPool *tpool = NULL;
+    int i, ret;
+
+    if (!thread_num) {
+        thread_num = 1;
+    }
+
+    if (register_sig_handler()) {
+        return NULL;
+    }
+    tpool = g_new0(struct fv_ThreadPool, 1);
+    pthread_mutex_init(&(tpool->tp_lock), NULL);
+
+    /* Initialize the Lock Request Queue */
+    tpool->req_queue = fv_pool_request_queue_init();
+
+    /* Create the threads in the pool */
+    tpool->threads = g_new(struct fv_PoolThread *, thread_num);
+
+    for (i = 0; i < thread_num; i++) {
+        ret = fv_worker_thread_init(tpool, &tpool->threads[i], i);
+        if (ret) {
+            goto out_err;
+        }
+        tpool->num_threads++;
+        tpool->nr_running++;
+    }
+
+    return tpool;
+out_err:
+    /* An error occurred. Cleanup and return NULL */
+    do_pool_destroy(tpool, false);
+    return NULL;
+}
diff --git a/tools/virtiofsd/tpool.h b/tools/virtiofsd/tpool.h
new file mode 100644
index 0000000000..48d67e9a50
--- /dev/null
+++ b/tools/virtiofsd/tpool.h
@@ -0,0 +1,18 @@
+/*
+ * custom threadpool for virtiofsd
+ *
+ * Copyright (C) 2021 Red Hat, Inc.
+ *
+ * Authors:
+ *     Ioannis Angelakopoulos <iangelak@redhat.com>
+ *     Vivek Goyal <vgoyal@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+struct fv_ThreadPool;
+
+struct fv_ThreadPool *fv_thread_pool_init(unsigned int thread_num);
+void fv_thread_pool_destroy(struct fv_ThreadPool *tpool);
+void fv_thread_pool_push(struct fv_ThreadPool *tpool,
+                   void (*worker_func)(void *, void *), void *arg1, void *arg2);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

So far we did not have the notion of cross queue traffic. That is, we
get request on a queue and send back response on same queue. So if a
request be being processed and at the same time a stop queue request
comes in, we wait for all pending requests to finish and then queue
is stopped and associated data structure cleaned.

But with notification queue, now it is possible that we get a locking
request on request queue and send the notification back on a different
queue (notificaiton queue). This means, we need to make sure that
notifiation queue has not already been shutdown or is not being
shutdown in parallel while we are trying to send a notification back.
Otherwise bad things are bound to happen.

One way to solve this problem is that stop notification queue in the
end. First stop hiprio and all request queues. That means by the
time we are trying to stop notification queue, we know no other
request can be in progress which can try to send something on
notification queue.

But problem is that currently we don't have any control on in what
order queues should be stopped. If there was a notion of whole device
being stopped, then we could decide in what order queues should be
stopped.

Stefan mentioned that there is a command to stop whole device
VHOST_USER_SET_STATUS but it is not implemented in libvhost-user
yet. Also we probably could not move away from per queue stop
logic we have as of now.

As an alternative, he said if we stop all queue when qidx 0 is
being stopped, it should be fine and we can solve the issue of
notification queue shutdown order.

So in this patch I am shutting down all queues when queue 0
is being shutdown. And also changed shutdown order in such a
way that notification queue is shutdown last.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index c67c2e0e7a..a87e88e286 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -826,6 +826,11 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
     assert(qidx < vud->nqueues);
     ourqi = vud->qi[qidx];
 
+    /* Queue is already stopped */
+    if (!ourqi) {
+        return;
+    }
+
     /* qidx == 1 is the notification queue if notifications are enabled */
     if (!se->notify_enabled || qidx != 1) {
         /* Kill the thread */
@@ -847,14 +852,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
 
 static void stop_all_queues(struct fv_VuDev *vud)
 {
+    struct fuse_session *se = vud->se;
+
     for (int i = 0; i < vud->nqueues; i++) {
         if (!vud->qi[i]) {
             continue;
         }
 
+        /* Shutdown notification queue in the end */
+        if (se->notify_enabled && i == 1) {
+            continue;
+        }
         fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
         fv_queue_cleanup_thread(vud, i);
     }
+
+    if (se->notify_enabled) {
+        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, 1);
+        fv_queue_cleanup_thread(vud, 1);
+    }
 }
 
 /* Callback from libvhost-user on start or stop of a queue */
@@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
          * the queue thread doesn't block in virtio_send_msg().
          */
         vu_dispatch_unlock(vud);
-        fv_queue_cleanup_thread(vud, qidx);
+
+        /*
+         * If queue 0 is being shutdown, treat it as if device is being
+         * shutdown and stop all queues.
+         */
+        if (qidx == 0) {
+            stop_all_queues(vud);
+        } else {
+            fv_queue_cleanup_thread(vud, qidx);
+        }
         vu_dispatch_wrlock(vud);
     }
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

So far we did not have the notion of cross queue traffic. That is, we
get request on a queue and send back response on same queue. So if a
request be being processed and at the same time a stop queue request
comes in, we wait for all pending requests to finish and then queue
is stopped and associated data structure cleaned.

But with notification queue, now it is possible that we get a locking
request on request queue and send the notification back on a different
queue (notificaiton queue). This means, we need to make sure that
notifiation queue has not already been shutdown or is not being
shutdown in parallel while we are trying to send a notification back.
Otherwise bad things are bound to happen.

One way to solve this problem is that stop notification queue in the
end. First stop hiprio and all request queues. That means by the
time we are trying to stop notification queue, we know no other
request can be in progress which can try to send something on
notification queue.

But problem is that currently we don't have any control on in what
order queues should be stopped. If there was a notion of whole device
being stopped, then we could decide in what order queues should be
stopped.

Stefan mentioned that there is a command to stop whole device
VHOST_USER_SET_STATUS but it is not implemented in libvhost-user
yet. Also we probably could not move away from per queue stop
logic we have as of now.

As an alternative, he said if we stop all queue when qidx 0 is
being stopped, it should be fine and we can solve the issue of
notification queue shutdown order.

So in this patch I am shutting down all queues when queue 0
is being shutdown. And also changed shutdown order in such a
way that notification queue is shutdown last.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index c67c2e0e7a..a87e88e286 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -826,6 +826,11 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
     assert(qidx < vud->nqueues);
     ourqi = vud->qi[qidx];
 
+    /* Queue is already stopped */
+    if (!ourqi) {
+        return;
+    }
+
     /* qidx == 1 is the notification queue if notifications are enabled */
     if (!se->notify_enabled || qidx != 1) {
         /* Kill the thread */
@@ -847,14 +852,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
 
 static void stop_all_queues(struct fv_VuDev *vud)
 {
+    struct fuse_session *se = vud->se;
+
     for (int i = 0; i < vud->nqueues; i++) {
         if (!vud->qi[i]) {
             continue;
         }
 
+        /* Shutdown notification queue in the end */
+        if (se->notify_enabled && i == 1) {
+            continue;
+        }
         fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
         fv_queue_cleanup_thread(vud, i);
     }
+
+    if (se->notify_enabled) {
+        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, 1);
+        fv_queue_cleanup_thread(vud, 1);
+    }
 }
 
 /* Callback from libvhost-user on start or stop of a queue */
@@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
          * the queue thread doesn't block in virtio_send_msg().
          */
         vu_dispatch_unlock(vud);
-        fv_queue_cleanup_thread(vud, qidx);
+
+        /*
+         * If queue 0 is being shutdown, treat it as if device is being
+         * shutdown and stop all queues.
+         */
+        if (qidx == 0) {
+            stop_all_queues(vud);
+        } else {
+            fv_queue_cleanup_thread(vud, qidx);
+        }
         vu_dispatch_wrlock(vud);
     }
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 12/13] virtiofsd: Implement blocking posix locks
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

As of now we don't support fcntl(F_SETLKW) and if we see one, we return
-EOPNOTSUPP.

Change that by accepting these requests and returning a reply
immediately asking caller to wait. Once lock is available, send a
notification to the waiter indicating lock is available.

In response to lock request, we are returning error value as "1", which
signals to client to queue the lock request internally and later client
will get a notification which will signal lock is taken (or error). And
then fuse client should wake up the guest process.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
 tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
 tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
 tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
 4 files changed, 167 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index e4679c73ab..2e7f4b786d 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
         .unique = req->unique,
         .error = error,
     };
-
-    if (error <= -1000 || error > 0) {
+    /* error = 1 has been used to signal client to wait for notificaiton */
+    if (error <= -1000 || error > 1) {
         fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
         out.error = -ERANGE;
     }
@@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
     return send_reply(req, -err, NULL, 0);
 }
 
+int fuse_reply_wait(fuse_req_t req)
+{
+    return send_reply(req, 1, NULL, 0);
+}
+
 void fuse_reply_none(fuse_req_t req)
 {
     fuse_free_req(req);
@@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
     send_reply_ok(req, NULL, 0);
 }
 
+static int send_notify_iov(struct fuse_session *se, int notify_code,
+                           struct iovec *iov, int count)
+{
+    struct fuse_out_header out;
+    if (!se->got_init) {
+        return -ENOTCONN;
+    }
+    out.unique = 0;
+    out.error = notify_code;
+    iov[0].iov_base = &out;
+    iov[0].iov_len = sizeof(struct fuse_out_header);
+    return fuse_send_msg(se, NULL, iov, count);
+}
+
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
+                  int32_t error)
+{
+    struct fuse_notify_lock_out outarg = {0};
+    struct iovec iov[2];
+
+    outarg.unique = unique;
+    outarg.error = -error;
+
+    iov[1].iov_base = &outarg;
+    iov[1].iov_len = sizeof(outarg);
+    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
+}
+
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
                                off_t offset, struct fuse_bufvec *bufv)
 {
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index c55c0ca2fc..64624b48dc 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
  */
 int fuse_reply_err(fuse_req_t req, int err);
 
+/**
+ * Ask caller to wait for lock.
+ *
+ * Possible requests:
+ *   setlkw
+ *
+ * If caller sends a blocking lock request (setlkw), then reply to caller
+ * that wait for lock to be available. Once lock is available caller will
+ * receive a notification with request's unique id. Notification will
+ * carry info whether lock was successfully obtained or not.
+ *
+ * @param req request handle
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_wait(fuse_req_t req);
+
 /**
  * Don't send reply
  *
@@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
                                off_t offset, struct fuse_bufvec *bufv);
 
+/**
+ * Notify event related to previous lock request
+ *
+ * @param se the session object
+ * @param unique the unique id of the request which requested setlkw
+ * @param error zero for success, -errno for the failure
+ */
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
+                              int32_t error);
+
 /*
  * Utility functions
  */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index a87e88e286..bb2d4456fc 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
     vu_dispatch_unlock(qi->virtio_dev);
 }
 
+/* Returns NULL if queue is empty */
+static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
+{
+    struct fuse_session *se = qi->virtio_dev->se;
+    VuDev *dev = &se->virtio_dev->dev;
+    VuVirtq *q = vu_get_queue(dev, qi->qidx);
+    FVRequest *req;
+
+    vu_dispatch_rdlock(qi->virtio_dev);
+    pthread_mutex_lock(&qi->vq_lock);
+    /* Pop an element from queue */
+    req = vu_queue_pop(dev, q, sizeof(FVRequest));
+    pthread_mutex_unlock(&qi->vq_lock);
+    vu_dispatch_unlock(qi->virtio_dev);
+    return req;
+}
+
 /*
  * Called back by ll whenever it wants to send a reply/message back
  * The 1st element of the iov starts with the fuse_out_header
@@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
 int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
                     struct iovec *iov, int count)
 {
-    FVRequest *req = container_of(ch, FVRequest, ch);
-    struct fv_QueueInfo *qi = ch->qi;
-    VuVirtqElement *elem = &req->elem;
+    FVRequest *req;
+    struct fv_QueueInfo *qi;
+    VuVirtqElement *elem;
     int ret = 0;
 
     assert(count >= 1);
@@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 
     size_t tosend_len = iov_size(iov, count);
 
-    /* unique == 0 is notification, which we don't support */
-    assert(out->unique);
+    /* unique == 0 is notification */
+    if (!out->unique) {
+        if (!se->notify_enabled) {
+            return -EOPNOTSUPP;
+        }
+        /* If notifications are enabled, queue index 1 is notification queue */
+        qi = se->virtio_dev->qi[1];
+        req = vq_pop_notify_elem(qi);
+        if (!req) {
+            /*
+             * TODO: Implement some sort of ring buffer and queue notifications
+             * on that and send these later when notification queue has space
+             * available.
+             */
+            return -ENOSPC;
+        }
+        req->reply_sent = false;
+    } else {
+        assert(ch);
+        req = container_of(ch, FVRequest, ch);
+        qi = ch->qi;
+    }
+
+    elem = &req->elem;
     assert(!req->reply_sent);
 
     /* The 'in' part of the elem is to qemu */
@@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
         struct fuse_notify_delete_out       delete_out;
         struct fuse_notify_store_out        store_out;
         struct fuse_notify_retrieve_out     retrieve_out;
+        struct fuse_notify_lock_out         lock_out;
     };
 
     notify_size = sizeof(struct fuse_out_header) +
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 6928662e22..277f74762b 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2131,13 +2131,35 @@ out:
     }
 }
 
+static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
+                                    int saverr)
+{
+    int ret;
+
+    do {
+        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
+        /*
+         * Retry sending notification if notification queue does not have
+         * free descriptor yet, otherwise break out of loop. Either we
+         * successfully sent notifiation or some other error occurred.
+         */
+        if (ret != -ENOSPC) {
+            break;
+        }
+        usleep(10000);
+    } while (1);
+}
+
 static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
                      struct flock *lock, int sleep)
 {
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     struct lo_inode_plock *plock;
-    int ret, saverr = 0;
+    int ret, saverr = 0, ofd;
+    uint64_t unique;
+    struct fuse_session *se = req->se;
+    bool blocking_lock = false;
 
     fuse_log(FUSE_LOG_DEBUG,
              "lo_setlk(ino=%" PRIu64 ", flags=%d)"
@@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
         return;
     }
 
-    if (sleep) {
-        fuse_reply_err(req, EOPNOTSUPP);
-        return;
-    }
-
     inode = lo_inode(req, ino);
     if (!inode) {
         fuse_reply_err(req, EBADF);
@@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
 
     if (!plock) {
         saverr = ret;
+        pthread_mutex_unlock(&inode->plock_mutex);
         goto out;
     }
 
+    /*
+     * plock is now released when inode is going away. We already have
+     * a reference on inode, so it is guaranteed that plock->fd is
+     * still around even after dropping inode->plock_mutex lock
+     */
+    ofd = plock->fd;
+    pthread_mutex_unlock(&inode->plock_mutex);
+
+    /*
+     * If this lock request can block, request caller to wait for
+     * notification. Do not access req after this. Once lock is
+     * available, send a notification instead.
+     */
+    if (sleep && lock->l_type != F_UNLCK) {
+        /*
+         * If notification queue is not enabled, can't support async
+         * locks.
+         */
+        if (!se->notify_enabled) {
+            saverr = EOPNOTSUPP;
+            goto out;
+        }
+        blocking_lock = true;
+        unique = req->unique;
+        fuse_reply_wait(req);
+    }
+
     /* TODO: Is it alright to modify flock? */
     lock->l_pid = 0;
-    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
+    if (blocking_lock) {
+        ret = fcntl(ofd, F_OFD_SETLKW, lock);
+    } else {
+        ret = fcntl(ofd, F_OFD_SETLK, lock);
+    }
     if (ret == -1) {
         saverr = errno;
     }
 
 out:
-    pthread_mutex_unlock(&inode->plock_mutex);
     lo_inode_put(lo, &inode);
 
-    fuse_reply_err(req, saverr);
+    if (!blocking_lock) {
+        fuse_reply_err(req, saverr);
+    } else {
+        setlk_send_notification(se, unique, saverr);
+    }
 }
 
 static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

As of now we don't support fcntl(F_SETLKW) and if we see one, we return
-EOPNOTSUPP.

Change that by accepting these requests and returning a reply
immediately asking caller to wait. Once lock is available, send a
notification to the waiter indicating lock is available.

In response to lock request, we are returning error value as "1", which
signals to client to queue the lock request internally and later client
will get a notification which will signal lock is taken (or error). And
then fuse client should wake up the guest process.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
 tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
 tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
 tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
 4 files changed, 167 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index e4679c73ab..2e7f4b786d 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
         .unique = req->unique,
         .error = error,
     };
-
-    if (error <= -1000 || error > 0) {
+    /* error = 1 has been used to signal client to wait for notificaiton */
+    if (error <= -1000 || error > 1) {
         fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
         out.error = -ERANGE;
     }
@@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
     return send_reply(req, -err, NULL, 0);
 }
 
+int fuse_reply_wait(fuse_req_t req)
+{
+    return send_reply(req, 1, NULL, 0);
+}
+
 void fuse_reply_none(fuse_req_t req)
 {
     fuse_free_req(req);
@@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
     send_reply_ok(req, NULL, 0);
 }
 
+static int send_notify_iov(struct fuse_session *se, int notify_code,
+                           struct iovec *iov, int count)
+{
+    struct fuse_out_header out;
+    if (!se->got_init) {
+        return -ENOTCONN;
+    }
+    out.unique = 0;
+    out.error = notify_code;
+    iov[0].iov_base = &out;
+    iov[0].iov_len = sizeof(struct fuse_out_header);
+    return fuse_send_msg(se, NULL, iov, count);
+}
+
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
+                  int32_t error)
+{
+    struct fuse_notify_lock_out outarg = {0};
+    struct iovec iov[2];
+
+    outarg.unique = unique;
+    outarg.error = -error;
+
+    iov[1].iov_base = &outarg;
+    iov[1].iov_len = sizeof(outarg);
+    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
+}
+
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
                                off_t offset, struct fuse_bufvec *bufv)
 {
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index c55c0ca2fc..64624b48dc 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
  */
 int fuse_reply_err(fuse_req_t req, int err);
 
+/**
+ * Ask caller to wait for lock.
+ *
+ * Possible requests:
+ *   setlkw
+ *
+ * If caller sends a blocking lock request (setlkw), then reply to caller
+ * that wait for lock to be available. Once lock is available caller will
+ * receive a notification with request's unique id. Notification will
+ * carry info whether lock was successfully obtained or not.
+ *
+ * @param req request handle
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_wait(fuse_req_t req);
+
 /**
  * Don't send reply
  *
@@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
                                off_t offset, struct fuse_bufvec *bufv);
 
+/**
+ * Notify event related to previous lock request
+ *
+ * @param se the session object
+ * @param unique the unique id of the request which requested setlkw
+ * @param error zero for success, -errno for the failure
+ */
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
+                              int32_t error);
+
 /*
  * Utility functions
  */
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index a87e88e286..bb2d4456fc 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
     vu_dispatch_unlock(qi->virtio_dev);
 }
 
+/* Returns NULL if queue is empty */
+static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
+{
+    struct fuse_session *se = qi->virtio_dev->se;
+    VuDev *dev = &se->virtio_dev->dev;
+    VuVirtq *q = vu_get_queue(dev, qi->qidx);
+    FVRequest *req;
+
+    vu_dispatch_rdlock(qi->virtio_dev);
+    pthread_mutex_lock(&qi->vq_lock);
+    /* Pop an element from queue */
+    req = vu_queue_pop(dev, q, sizeof(FVRequest));
+    pthread_mutex_unlock(&qi->vq_lock);
+    vu_dispatch_unlock(qi->virtio_dev);
+    return req;
+}
+
 /*
  * Called back by ll whenever it wants to send a reply/message back
  * The 1st element of the iov starts with the fuse_out_header
@@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
 int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
                     struct iovec *iov, int count)
 {
-    FVRequest *req = container_of(ch, FVRequest, ch);
-    struct fv_QueueInfo *qi = ch->qi;
-    VuVirtqElement *elem = &req->elem;
+    FVRequest *req;
+    struct fv_QueueInfo *qi;
+    VuVirtqElement *elem;
     int ret = 0;
 
     assert(count >= 1);
@@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 
     size_t tosend_len = iov_size(iov, count);
 
-    /* unique == 0 is notification, which we don't support */
-    assert(out->unique);
+    /* unique == 0 is notification */
+    if (!out->unique) {
+        if (!se->notify_enabled) {
+            return -EOPNOTSUPP;
+        }
+        /* If notifications are enabled, queue index 1 is notification queue */
+        qi = se->virtio_dev->qi[1];
+        req = vq_pop_notify_elem(qi);
+        if (!req) {
+            /*
+             * TODO: Implement some sort of ring buffer and queue notifications
+             * on that and send these later when notification queue has space
+             * available.
+             */
+            return -ENOSPC;
+        }
+        req->reply_sent = false;
+    } else {
+        assert(ch);
+        req = container_of(ch, FVRequest, ch);
+        qi = ch->qi;
+    }
+
+    elem = &req->elem;
     assert(!req->reply_sent);
 
     /* The 'in' part of the elem is to qemu */
@@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
         struct fuse_notify_delete_out       delete_out;
         struct fuse_notify_store_out        store_out;
         struct fuse_notify_retrieve_out     retrieve_out;
+        struct fuse_notify_lock_out         lock_out;
     };
 
     notify_size = sizeof(struct fuse_out_header) +
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 6928662e22..277f74762b 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2131,13 +2131,35 @@ out:
     }
 }
 
+static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
+                                    int saverr)
+{
+    int ret;
+
+    do {
+        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
+        /*
+         * Retry sending notification if notification queue does not have
+         * free descriptor yet, otherwise break out of loop. Either we
+         * successfully sent notifiation or some other error occurred.
+         */
+        if (ret != -ENOSPC) {
+            break;
+        }
+        usleep(10000);
+    } while (1);
+}
+
 static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
                      struct flock *lock, int sleep)
 {
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     struct lo_inode_plock *plock;
-    int ret, saverr = 0;
+    int ret, saverr = 0, ofd;
+    uint64_t unique;
+    struct fuse_session *se = req->se;
+    bool blocking_lock = false;
 
     fuse_log(FUSE_LOG_DEBUG,
              "lo_setlk(ino=%" PRIu64 ", flags=%d)"
@@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
         return;
     }
 
-    if (sleep) {
-        fuse_reply_err(req, EOPNOTSUPP);
-        return;
-    }
-
     inode = lo_inode(req, ino);
     if (!inode) {
         fuse_reply_err(req, EBADF);
@@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
 
     if (!plock) {
         saverr = ret;
+        pthread_mutex_unlock(&inode->plock_mutex);
         goto out;
     }
 
+    /*
+     * plock is now released when inode is going away. We already have
+     * a reference on inode, so it is guaranteed that plock->fd is
+     * still around even after dropping inode->plock_mutex lock
+     */
+    ofd = plock->fd;
+    pthread_mutex_unlock(&inode->plock_mutex);
+
+    /*
+     * If this lock request can block, request caller to wait for
+     * notification. Do not access req after this. Once lock is
+     * available, send a notification instead.
+     */
+    if (sleep && lock->l_type != F_UNLCK) {
+        /*
+         * If notification queue is not enabled, can't support async
+         * locks.
+         */
+        if (!se->notify_enabled) {
+            saverr = EOPNOTSUPP;
+            goto out;
+        }
+        blocking_lock = true;
+        unique = req->unique;
+        fuse_reply_wait(req);
+    }
+
     /* TODO: Is it alright to modify flock? */
     lock->l_pid = 0;
-    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
+    if (blocking_lock) {
+        ret = fcntl(ofd, F_OFD_SETLKW, lock);
+    } else {
+        ret = fcntl(ofd, F_OFD_SETLK, lock);
+    }
     if (ret == -1) {
         saverr = errno;
     }
 
 out:
-    pthread_mutex_unlock(&inode->plock_mutex);
     lo_inode_put(lo, &inode);
 
-    fuse_reply_err(req, saverr);
+    if (!blocking_lock) {
+        fuse_reply_err(req, saverr);
+    } else {
+        setlk_send_notification(se, unique, saverr);
+    }
 }
 
 static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-09-30 15:30   ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha
  Cc: jaggel, iangelak, dgilbert, vgoyal, miklos

g_usleep() calls nanosleep() and that now seems to call clock_nanosleep()
syscall. Now these patches are making use of g_usleep(). So add
clock_nanosleep() to list of allowed syscalls.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/passthrough_seccomp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index cd24b40b78..03080806c0 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -117,6 +117,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(writev),
     SCMP_SYS(umask),
     SCMP_SYS(nanosleep),
+    SCMP_SYS(clock_nanosleep),
 };
 
 /* Syscalls used when --syslog is enabled */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [Virtio-fs] [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list
@ 2021-09-30 15:30   ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-09-30 15:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs, stefanha; +Cc: vgoyal, miklos

g_usleep() calls nanosleep() and that now seems to call clock_nanosleep()
syscall. Now these patches are making use of g_usleep(). So add
clock_nanosleep() to list of allowed syscalls.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/passthrough_seccomp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index cd24b40b78..03080806c0 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -117,6 +117,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(writev),
     SCMP_SYS(umask),
     SCMP_SYS(nanosleep),
+    SCMP_SYS(clock_nanosleep),
 };
 
 /* Syscalls used when --syslog is enabled */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH 01/13] virtio_fs.h: Add notification queue feature bit
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 13:12     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:12 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 380 bytes --]

On Thu, Sep 30, 2021 at 11:30:25AM -0400, Vivek Goyal wrote:
> This change will ultimately come from kernel as kernel header file update
> when kernel patches get merged.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  include/standard-headers/linux/virtio_fs.h | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 01/13] virtio_fs.h: Add notification queue feature bit
@ 2021-10-04 13:12     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:12 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 380 bytes --]

On Thu, Sep 30, 2021 at 11:30:25AM -0400, Vivek Goyal wrote:
> This change will ultimately come from kernel as kernel header file update
> when kernel patches get merged.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  include/standard-headers/linux/virtio_fs.h | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 02/13] virtiofsd: fuse.h header file changes for lock notification
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 13:16     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:16 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 2074 bytes --]

On Thu, Sep 30, 2021 at 11:30:26AM -0400, Vivek Goyal wrote:
> This change comes from fuse.h kernel header file udpate. Hence keeping
> it in a separate patch.

QEMU syncs include/standard-headers/linux/ from linux.git. Please
indicate the status of this fuse.h change:
- Is it already in a Linux release?
- Or is it already in linux.git?
- Or is it awaiting review from the kernel FUSE maintainer?

We need to wait for the kernel change to get into linux.git before
merging this patch in QEMU. This ensures that QEMU uses actual released
kernel interfaces that are guaranteed to be stable.

> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  include/standard-headers/linux/fuse.h | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
> index cce105bfba..0b6218d569 100644
> --- a/include/standard-headers/linux/fuse.h
> +++ b/include/standard-headers/linux/fuse.h
> @@ -181,6 +181,8 @@
>   *  - add FUSE_OPEN_KILL_SUIDGID
>   *  - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT
>   *  - add FUSE_SETXATTR_ACL_KILL_SGID
> + *  7.35
> + *  - add FUSE_NOTIFY_LOCK
>   */
>  
>  #ifndef _LINUX_FUSE_H
> @@ -212,7 +214,7 @@
>  #define FUSE_KERNEL_VERSION 7
>  
>  /** Minor version number of this interface */
> -#define FUSE_KERNEL_MINOR_VERSION 33
> +#define FUSE_KERNEL_MINOR_VERSION 35
>  
>  /** The node ID of the root inode */
>  #define FUSE_ROOT_ID 1
> @@ -521,6 +523,7 @@ enum fuse_notify_code {
>  	FUSE_NOTIFY_STORE = 4,
>  	FUSE_NOTIFY_RETRIEVE = 5,
>  	FUSE_NOTIFY_DELETE = 6,
> +	FUSE_NOTIFY_LOCK = 7,
>  	FUSE_NOTIFY_CODE_MAX,
>  };
>  
> @@ -912,6 +915,12 @@ struct fuse_notify_retrieve_in {
>  	uint64_t	dummy4;
>  };
>  
> +struct fuse_notify_lock_out {
> +	uint64_t	unique;
> +	int32_t		error;
> +	int32_t		padding;
> +};
> +
>  /* Device ioctls: */
>  #define FUSE_DEV_IOC_MAGIC		229
>  #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
> -- 
> 2.31.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 02/13] virtiofsd: fuse.h header file changes for lock notification
@ 2021-10-04 13:16     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:16 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 2074 bytes --]

On Thu, Sep 30, 2021 at 11:30:26AM -0400, Vivek Goyal wrote:
> This change comes from fuse.h kernel header file udpate. Hence keeping
> it in a separate patch.

QEMU syncs include/standard-headers/linux/ from linux.git. Please
indicate the status of this fuse.h change:
- Is it already in a Linux release?
- Or is it already in linux.git?
- Or is it awaiting review from the kernel FUSE maintainer?

We need to wait for the kernel change to get into linux.git before
merging this patch in QEMU. This ensures that QEMU uses actual released
kernel interfaces that are guaranteed to be stable.

> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  include/standard-headers/linux/fuse.h | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
> index cce105bfba..0b6218d569 100644
> --- a/include/standard-headers/linux/fuse.h
> +++ b/include/standard-headers/linux/fuse.h
> @@ -181,6 +181,8 @@
>   *  - add FUSE_OPEN_KILL_SUIDGID
>   *  - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT
>   *  - add FUSE_SETXATTR_ACL_KILL_SGID
> + *  7.35
> + *  - add FUSE_NOTIFY_LOCK
>   */
>  
>  #ifndef _LINUX_FUSE_H
> @@ -212,7 +214,7 @@
>  #define FUSE_KERNEL_VERSION 7
>  
>  /** Minor version number of this interface */
> -#define FUSE_KERNEL_MINOR_VERSION 33
> +#define FUSE_KERNEL_MINOR_VERSION 35
>  
>  /** The node ID of the root inode */
>  #define FUSE_ROOT_ID 1
> @@ -521,6 +523,7 @@ enum fuse_notify_code {
>  	FUSE_NOTIFY_STORE = 4,
>  	FUSE_NOTIFY_RETRIEVE = 5,
>  	FUSE_NOTIFY_DELETE = 6,
> +	FUSE_NOTIFY_LOCK = 7,
>  	FUSE_NOTIFY_CODE_MAX,
>  };
>  
> @@ -912,6 +915,12 @@ struct fuse_notify_retrieve_in {
>  	uint64_t	dummy4;
>  };
>  
> +struct fuse_notify_lock_out {
> +	uint64_t	unique;
> +	int32_t		error;
> +	int32_t		padding;
> +};
> +
>  /* Device ioctls: */
>  #define FUSE_DEV_IOC_MAGIC		229
>  #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
> -- 
> 2.31.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 03/13] virtiofsd: Remove unused virtio_fs_config definition
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 13:17     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:17 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 431 bytes --]

On Thu, Sep 30, 2021 at 11:30:27AM -0400, Vivek Goyal wrote:
> "struct virtio_fs_config" definition seems to be unused in fuse_virtio.c.
> Remove it.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 6 ------
>  1 file changed, 6 deletions(-)

In fact, this struct is defined in
include/standard-headers/linux/virtio_fs.h!

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 03/13] virtiofsd: Remove unused virtio_fs_config definition
@ 2021-10-04 13:17     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:17 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 431 bytes --]

On Thu, Sep 30, 2021 at 11:30:27AM -0400, Vivek Goyal wrote:
> "struct virtio_fs_config" definition seems to be unused in fuse_virtio.c.
> Remove it.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 6 ------
>  1 file changed, 6 deletions(-)

In fact, this struct is defined in
include/standard-headers/linux/virtio_fs.h!

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 04/13] virtiofsd: Add a helper to send element on virtqueue
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 13:19     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:19 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On Thu, Sep 30, 2021 at 11:30:28AM -0400, Vivek Goyal wrote:
> We have open coded logic to take locks and push element on virtqueue at
> three places. Add a helper and use it everywhere. Code is easier to read and
> less number of lines of code.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 45 ++++++++++++++---------------------
>  1 file changed, 18 insertions(+), 27 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 04/13] virtiofsd: Add a helper to send element on virtqueue
@ 2021-10-04 13:19     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:19 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On Thu, Sep 30, 2021 at 11:30:28AM -0400, Vivek Goyal wrote:
> We have open coded logic to take locks and push element on virtqueue at
> three places. Add a helper and use it everywhere. Code is easier to read and
> less number of lines of code.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 45 ++++++++++++++---------------------
>  1 file changed, 18 insertions(+), 27 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 05/13] virtiofsd: Add a helper to stop all queues
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 13:22     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 439 bytes --]

On Thu, Sep 30, 2021 at 11:30:29AM -0400, Vivek Goyal wrote:
> Use a helper to stop all the queues. Later in the patch series I am
> planning to use this helper at one more place later in the patch series.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 05/13] virtiofsd: Add a helper to stop all queues
@ 2021-10-04 13:22     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 439 bytes --]

On Thu, Sep 30, 2021 at 11:30:29AM -0400, Vivek Goyal wrote:
> Use a helper to stop all the queues. Later in the patch series I am
> planning to use this helper at one more place later in the patch series.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 13:54     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:54 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 2097 bytes --]

On Thu, Sep 30, 2021 at 11:30:30AM -0400, Vivek Goyal wrote:
> Add helpers to create/cleanup virtuqueues and use those helpers. I will

s/virtuqueues/virtqueues/

> need to reconfigure queues in later patches and using helpers will allow
> reusing the code.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
>  1 file changed, 52 insertions(+), 35 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index c595957983..d1efbc5b18 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
>      }
>  }
>  
> +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    /*
> +     * Not normally called; it's the daemon that handles the queue;
> +     * however virtio's cleanup path can call this.
> +     */
> +}
> +
> +static void vuf_create_vqs(VirtIODevice *vdev)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +    unsigned int i;
> +
> +    /* Hiprio queue */
> +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                     vuf_handle_output);
> +
> +    /* Request queues */
> +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                          vuf_handle_output);
> +    }
> +
> +    /* 1 high prio queue, plus the number configured */
> +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);

These two lines prepare for vhost_dev_init(), so moving them here is
debatable. If a caller is going to use this function again in the future
then they need to be sure to also call vhost_dev_init(). For now it
looks safe, so I guess it's okay.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
@ 2021-10-04 13:54     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 13:54 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 2097 bytes --]

On Thu, Sep 30, 2021 at 11:30:30AM -0400, Vivek Goyal wrote:
> Add helpers to create/cleanup virtuqueues and use those helpers. I will

s/virtuqueues/virtqueues/

> need to reconfigure queues in later patches and using helpers will allow
> reusing the code.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
>  1 file changed, 52 insertions(+), 35 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index c595957983..d1efbc5b18 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
>      }
>  }
>  
> +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    /*
> +     * Not normally called; it's the daemon that handles the queue;
> +     * however virtio's cleanup path can call this.
> +     */
> +}
> +
> +static void vuf_create_vqs(VirtIODevice *vdev)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +    unsigned int i;
> +
> +    /* Hiprio queue */
> +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                     vuf_handle_output);
> +
> +    /* Request queues */
> +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                          vuf_handle_output);
> +    }
> +
> +    /* 1 high prio queue, plus the number configured */
> +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);

These two lines prepare for vhost_dev_init(), so moving them here is
debatable. If a caller is going to use this function again in the future
then they need to be sure to also call vhost_dev_init(). For now it
looks safe, so I guess it's okay.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 02/13] virtiofsd: fuse.h header file changes for lock notification
  2021-10-04 13:16     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-04 14:01       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-04 14:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Mon, Oct 04, 2021 at 02:16:18PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:26AM -0400, Vivek Goyal wrote:
> > This change comes from fuse.h kernel header file udpate. Hence keeping
> > it in a separate patch.
> 
> QEMU syncs include/standard-headers/linux/ from linux.git. Please
> indicate the status of this fuse.h change:
> - Is it already in a Linux release?
> - Or is it already in linux.git?
> - Or is it awaiting review from the kernel FUSE maintainer?

This is awaiting review from kernel FUSE maintainer.

I have posted kernel patches here.

https://lore.kernel.org/linux-fsdevel/20210930143850.1188628-1-vgoyal@redhat.com/

Vivek

> 
> We need to wait for the kernel change to get into linux.git before
> merging this patch in QEMU. This ensures that QEMU uses actual released
> kernel interfaces that are guaranteed to be stable.
> 
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  include/standard-headers/linux/fuse.h | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
> > index cce105bfba..0b6218d569 100644
> > --- a/include/standard-headers/linux/fuse.h
> > +++ b/include/standard-headers/linux/fuse.h
> > @@ -181,6 +181,8 @@
> >   *  - add FUSE_OPEN_KILL_SUIDGID
> >   *  - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT
> >   *  - add FUSE_SETXATTR_ACL_KILL_SGID
> > + *  7.35
> > + *  - add FUSE_NOTIFY_LOCK
> >   */
> >  
> >  #ifndef _LINUX_FUSE_H
> > @@ -212,7 +214,7 @@
> >  #define FUSE_KERNEL_VERSION 7
> >  
> >  /** Minor version number of this interface */
> > -#define FUSE_KERNEL_MINOR_VERSION 33
> > +#define FUSE_KERNEL_MINOR_VERSION 35
> >  
> >  /** The node ID of the root inode */
> >  #define FUSE_ROOT_ID 1
> > @@ -521,6 +523,7 @@ enum fuse_notify_code {
> >  	FUSE_NOTIFY_STORE = 4,
> >  	FUSE_NOTIFY_RETRIEVE = 5,
> >  	FUSE_NOTIFY_DELETE = 6,
> > +	FUSE_NOTIFY_LOCK = 7,
> >  	FUSE_NOTIFY_CODE_MAX,
> >  };
> >  
> > @@ -912,6 +915,12 @@ struct fuse_notify_retrieve_in {
> >  	uint64_t	dummy4;
> >  };
> >  
> > +struct fuse_notify_lock_out {
> > +	uint64_t	unique;
> > +	int32_t		error;
> > +	int32_t		padding;
> > +};
> > +
> >  /* Device ioctls: */
> >  #define FUSE_DEV_IOC_MAGIC		229
> >  #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
> > -- 
> > 2.31.1
> > 




^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 02/13] virtiofsd: fuse.h header file changes for lock notification
@ 2021-10-04 14:01       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-04 14:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Mon, Oct 04, 2021 at 02:16:18PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:26AM -0400, Vivek Goyal wrote:
> > This change comes from fuse.h kernel header file udpate. Hence keeping
> > it in a separate patch.
> 
> QEMU syncs include/standard-headers/linux/ from linux.git. Please
> indicate the status of this fuse.h change:
> - Is it already in a Linux release?
> - Or is it already in linux.git?
> - Or is it awaiting review from the kernel FUSE maintainer?

This is awaiting review from kernel FUSE maintainer.

I have posted kernel patches here.

https://lore.kernel.org/linux-fsdevel/20210930143850.1188628-1-vgoyal@redhat.com/

Vivek

> 
> We need to wait for the kernel change to get into linux.git before
> merging this patch in QEMU. This ensures that QEMU uses actual released
> kernel interfaces that are guaranteed to be stable.
> 
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  include/standard-headers/linux/fuse.h | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
> > index cce105bfba..0b6218d569 100644
> > --- a/include/standard-headers/linux/fuse.h
> > +++ b/include/standard-headers/linux/fuse.h
> > @@ -181,6 +181,8 @@
> >   *  - add FUSE_OPEN_KILL_SUIDGID
> >   *  - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT
> >   *  - add FUSE_SETXATTR_ACL_KILL_SGID
> > + *  7.35
> > + *  - add FUSE_NOTIFY_LOCK
> >   */
> >  
> >  #ifndef _LINUX_FUSE_H
> > @@ -212,7 +214,7 @@
> >  #define FUSE_KERNEL_VERSION 7
> >  
> >  /** Minor version number of this interface */
> > -#define FUSE_KERNEL_MINOR_VERSION 33
> > +#define FUSE_KERNEL_MINOR_VERSION 35
> >  
> >  /** The node ID of the root inode */
> >  #define FUSE_ROOT_ID 1
> > @@ -521,6 +523,7 @@ enum fuse_notify_code {
> >  	FUSE_NOTIFY_STORE = 4,
> >  	FUSE_NOTIFY_RETRIEVE = 5,
> >  	FUSE_NOTIFY_DELETE = 6,
> > +	FUSE_NOTIFY_LOCK = 7,
> >  	FUSE_NOTIFY_CODE_MAX,
> >  };
> >  
> > @@ -912,6 +915,12 @@ struct fuse_notify_retrieve_in {
> >  	uint64_t	dummy4;
> >  };
> >  
> > +struct fuse_notify_lock_out {
> > +	uint64_t	unique;
> > +	int32_t		error;
> > +	int32_t		padding;
> > +};
> > +
> >  /* Device ioctls: */
> >  #define FUSE_DEV_IOC_MAGIC		229
> >  #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
> > -- 
> > 2.31.1
> > 



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 08/13] virtiofsd: Create a notification queue
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 14:30     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 14:30 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 13596 bytes --]

On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote:
> Add a notification queue which will be used to send async notifications
> for file lock availability.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  hw/virtio/vhost-user-fs-pci.c     |  4 +-
>  hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
>  include/hw/virtio/vhost-user-fs.h |  2 +
>  tools/virtiofsd/fuse_i.h          |  1 +
>  tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
>  5 files changed, 116 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
> index 2ed8492b3f..cdb9471088 100644
> --- a/hw/virtio/vhost-user-fs-pci.c
> +++ b/hw/virtio/vhost-user-fs-pci.c
> @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>      DeviceState *vdev = DEVICE(&dev->vdev);
>  
>      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> -        /* Also reserve config change and hiprio queue vectors */
> -        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
> +        /* Also reserve config change, hiprio and notification queue vectors */
> +        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
>      }
>  
>      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index d1efbc5b18..6bafcf0243 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
>      VIRTIO_F_NOTIFY_ON_EMPTY,
>      VIRTIO_F_RING_PACKED,
>      VIRTIO_F_IOMMU_PLATFORM,
> +    VIRTIO_FS_F_NOTIFICATION,
>  
>      VHOST_INVALID_FEATURE_BIT
>  };
> @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>       */
>  }
>  
> -static void vuf_create_vqs(VirtIODevice *vdev)
> +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>      unsigned int i;
> @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
>      /* Hiprio queue */
>      fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
>                                       vuf_handle_output);
> +    /*
> +     * Notification queue. Feature negotiation happens later. So at this
> +     * point of time we don't know if driver will use notification queue
> +     * or not.
> +     */
> +    if (notification_vq) {
> +        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                               vuf_handle_output);
> +    }
>  
>      /* Request queues */
>      fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
>                                            vuf_handle_output);
>      }
>  
> -    /* 1 high prio queue, plus the number configured */
> -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> +    /* 1 high prio queue, 1 notification queue plus the number configured */
> +    if (notification_vq) {
> +        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> +    } else {
> +        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> +    }
>      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
>  }
>  
> @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
>      virtio_delete_queue(fs->hiprio_vq);
>      fs->hiprio_vq = NULL;
>  
> +    if (fs->notification_vq) {
> +        virtio_delete_queue(fs->notification_vq);
> +    }
> +    fs->notification_vq = NULL;
> +
>      for (i = 0; i < fs->conf.num_request_queues; i++) {
>          virtio_delete_queue(fs->req_vqs[i]);
>      }
> @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>  
> +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> +
>      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
>  }
>  
> +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +
> +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> +        fs->notify_enabled = true;
> +        /*
> +         * If guest first booted with no notification queue support and
> +         * later rebooted with kernel which supports notification, we
> +         * can end up here
> +         */
> +        if (!fs->notification_vq) {
> +            vuf_cleanup_vqs(vdev);
> +            vuf_create_vqs(vdev, true);
> +        }

I would simplify things by unconditionally creating the notification vq
for the device and letting the vhost-user device backend decide whether
it wants to handle the vq or not. If the backend doesn't implement the
vq then it also won't advertise VIRTIO_FS_F_NOTIFICATION so the guest
driver won't submit virtqueue buffers.

I'm not 100% sure if that approach works. It should be tested with a
virtiofsd that doesn't implement the notification vq, for example. But I
think it's worth exploring that because the code will be simpler than
worrying about whether notifications are enabled or disabled.

> +        return;
> +    }
> +
> +    fs->notify_enabled = false;
> +    if (!fs->notification_vq) {
> +        return;
> +    }
> +    /*
> +     * Driver does not support notification queue. Reconfigure queues
> +     * and do not create notification queue.
> +     */
> +    vuf_cleanup_vqs(vdev);
> +
> +    /* Create queues again */
> +    vuf_create_vqs(vdev, false);
> +}
> +
>  static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
>                                              bool mask)
>  {
> @@ -262,7 +315,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>      virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
>                  sizeof(struct virtio_fs_config));
>  
> -    vuf_create_vqs(vdev);
> +    vuf_create_vqs(vdev, true);
>      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
>                           VHOST_BACKEND_TYPE_USER, 0, errp);
>      if (ret < 0) {
> @@ -327,6 +380,7 @@ static void vuf_class_init(ObjectClass *klass, void *data)
>      vdc->realize = vuf_device_realize;
>      vdc->unrealize = vuf_device_unrealize;
>      vdc->get_features = vuf_get_features;
> +    vdc->set_features = vuf_set_features;
>      vdc->get_config = vuf_get_config;
>      vdc->set_status = vuf_set_status;
>      vdc->guest_notifier_mask = vuf_guest_notifier_mask;
> diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> index 0d62834c25..95dc0dd402 100644
> --- a/include/hw/virtio/vhost-user-fs.h
> +++ b/include/hw/virtio/vhost-user-fs.h
> @@ -39,7 +39,9 @@ struct VHostUserFS {
>      VhostUserState vhost_user;
>      VirtQueue **req_vqs;
>      VirtQueue *hiprio_vq;
> +    VirtQueue *notification_vq;
>      int32_t bootindex;
> +    bool notify_enabled;
>  
>      /*< public >*/
>  };
> diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> index 492e002181..4942d080da 100644
> --- a/tools/virtiofsd/fuse_i.h
> +++ b/tools/virtiofsd/fuse_i.h
> @@ -73,6 +73,7 @@ struct fuse_session {
>      int   vu_socketfd;
>      struct fv_VuDev *virtio_dev;
>      int thread_pool_size;
> +    bool notify_enabled;
>  };
>  
>  struct fuse_chan {
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index baead08b28..f5b87a508a 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -14,6 +14,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/iov.h"
>  #include "qapi/error.h"
> +#include "standard-headers/linux/virtio_fs.h"
>  #include "fuse_i.h"
>  #include "standard-headers/linux/fuse.h"
>  #include "fuse_misc.h"
> @@ -85,12 +86,25 @@ struct fv_VuDev {
>  /* Callback from libvhost-user */
>  static uint64_t fv_get_features(VuDev *dev)
>  {
> -    return 1ULL << VIRTIO_F_VERSION_1;
> +    uint64_t features;
> +
> +    features = 1ull << VIRTIO_F_VERSION_1 |
> +               1ull << VIRTIO_FS_F_NOTIFICATION;
> +
> +    return features;
>  }
>  
>  /* Callback from libvhost-user */
>  static void fv_set_features(VuDev *dev, uint64_t features)
>  {
> +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> +    struct fuse_session *se = vud->se;
> +
> +    if ((1ull << VIRTIO_FS_F_NOTIFICATION) & features) {
> +        se->notify_enabled = true;
> +    } else {
> +        se->notify_enabled = false;
> +    }
>  }
>  
>  /*
> @@ -719,22 +733,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>  {
>      int ret;
>      struct fv_QueueInfo *ourqi;
> +    struct fuse_session *se = vud->se;
>  
>      assert(qidx < vud->nqueues);
>      ourqi = vud->qi[qidx];
>  
> -    /* Kill the thread */
> -    if (eventfd_write(ourqi->kill_fd, 1)) {
> -        fuse_log(FUSE_LOG_ERR, "Eventfd_write for queue %d: %s\n",
> -                 qidx, strerror(errno));
> -    }
> -    ret = pthread_join(ourqi->thread, NULL);
> -    if (ret) {
> -        fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err %d\n",
> -                 __func__, qidx, ret);
> +    /* qidx == 1 is the notification queue if notifications are enabled */
> +    if (!se->notify_enabled || qidx != 1) {
> +        /* Kill the thread */
> +        if (eventfd_write(ourqi->kill_fd, 1)) {
> +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> +        }
> +        ret = pthread_join(ourqi->thread, NULL);
> +        if (ret) {
> +            fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err"
> +                     " %d\n", __func__, qidx, ret);
> +        }
> +        close(ourqi->kill_fd);
>      }
>      pthread_mutex_destroy(&ourqi->vq_lock);
> -    close(ourqi->kill_fd);
>      ourqi->kick_fd = -1;
>      g_free(vud->qi[qidx]);
>      vud->qi[qidx] = NULL;
> @@ -757,6 +774,9 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>  {
>      struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
>      struct fv_QueueInfo *ourqi;
> +    int valid_queues = 2; /* One hiprio queue and one request queue */
> +    bool notification_q = false;
> +    struct fuse_session *se = vud->se;
>  
>      fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
>               started);
> @@ -768,10 +788,19 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>       * well-behaved client in mind and may not protect against all types of
>       * races yet.
>       */
> -    if (qidx > 1) {
> -        fuse_log(FUSE_LOG_ERR,
> -                 "%s: multiple request queues not yet implemented, please only "
> -                 "configure 1 request queue\n",
> +    if (se->notify_enabled) {
> +        valid_queues++;
> +        /*
> +         * If notification queue is enabled, then qidx 1 is notificaiton queue.

s/notificaiton/notification/

> +         */
> +        if (qidx == 1) {
> +            notification_q = true;
> +        }
> +    }
> +
> +    if (qidx >= valid_queues) {
> +        fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
> +                 "implemented, please only configure 1 request queue\n",
>                   __func__);
>          exit(EXIT_FAILURE);
>      }
> @@ -793,11 +822,18 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>              assert(vud->qi[qidx]->kick_fd == -1);
>          }
>          ourqi = vud->qi[qidx];
> +        pthread_mutex_init(&ourqi->vq_lock, NULL);
> +        /*
> +         * For notification queue, we don't have to start a thread yet.
> +         */
> +        if (notification_q) {
> +            return;
> +        }
> +
>          ourqi->kick_fd = dev->vq[qidx].kick_fd;
>  
>          ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
>          assert(ourqi->kill_fd != -1);
> -        pthread_mutex_init(&ourqi->vq_lock, NULL);
>  
>          if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
>              fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
> @@ -1048,7 +1084,7 @@ int virtio_session_mount(struct fuse_session *se)
>      se->vu_socketfd = data_sock;
>      se->virtio_dev->se = se;
>      pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
> -    if (!vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
> +    if (!vu_init(&se->virtio_dev->dev, 3, se->vu_socketfd, fv_panic, NULL,

The guest driver can invoke fv_queue_set_started() with qidx=2 even when
VIRTIO_FS_F_NOTIFICATION is off. Luckily the following check protects
fv_queue_set_started():

  if (qidx >= valid_queues) {
      fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
               "implemented, please only configure 1 request queue\n",
               __func__);
      exit(EXIT_FAILURE);
  }

However, the error message suggests this is related to multiqueue. In
fact, we'll need to keep this check even once multiqueue has been
implemented. Maybe the error message should be tweaked or at least a
comment needs to be added to the code so this check isn't accidentally
removed once multiqueue is implemented.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 08/13] virtiofsd: Create a notification queue
@ 2021-10-04 14:30     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 14:30 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 13596 bytes --]

On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote:
> Add a notification queue which will be used to send async notifications
> for file lock availability.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  hw/virtio/vhost-user-fs-pci.c     |  4 +-
>  hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
>  include/hw/virtio/vhost-user-fs.h |  2 +
>  tools/virtiofsd/fuse_i.h          |  1 +
>  tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
>  5 files changed, 116 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
> index 2ed8492b3f..cdb9471088 100644
> --- a/hw/virtio/vhost-user-fs-pci.c
> +++ b/hw/virtio/vhost-user-fs-pci.c
> @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>      DeviceState *vdev = DEVICE(&dev->vdev);
>  
>      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> -        /* Also reserve config change and hiprio queue vectors */
> -        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
> +        /* Also reserve config change, hiprio and notification queue vectors */
> +        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
>      }
>  
>      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index d1efbc5b18..6bafcf0243 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
>      VIRTIO_F_NOTIFY_ON_EMPTY,
>      VIRTIO_F_RING_PACKED,
>      VIRTIO_F_IOMMU_PLATFORM,
> +    VIRTIO_FS_F_NOTIFICATION,
>  
>      VHOST_INVALID_FEATURE_BIT
>  };
> @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>       */
>  }
>  
> -static void vuf_create_vqs(VirtIODevice *vdev)
> +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>      unsigned int i;
> @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
>      /* Hiprio queue */
>      fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
>                                       vuf_handle_output);
> +    /*
> +     * Notification queue. Feature negotiation happens later. So at this
> +     * point of time we don't know if driver will use notification queue
> +     * or not.
> +     */
> +    if (notification_vq) {
> +        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                               vuf_handle_output);
> +    }
>  
>      /* Request queues */
>      fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
>                                            vuf_handle_output);
>      }
>  
> -    /* 1 high prio queue, plus the number configured */
> -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> +    /* 1 high prio queue, 1 notification queue plus the number configured */
> +    if (notification_vq) {
> +        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> +    } else {
> +        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> +    }
>      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
>  }
>  
> @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
>      virtio_delete_queue(fs->hiprio_vq);
>      fs->hiprio_vq = NULL;
>  
> +    if (fs->notification_vq) {
> +        virtio_delete_queue(fs->notification_vq);
> +    }
> +    fs->notification_vq = NULL;
> +
>      for (i = 0; i < fs->conf.num_request_queues; i++) {
>          virtio_delete_queue(fs->req_vqs[i]);
>      }
> @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>  
> +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> +
>      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
>  }
>  
> +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +
> +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> +        fs->notify_enabled = true;
> +        /*
> +         * If guest first booted with no notification queue support and
> +         * later rebooted with kernel which supports notification, we
> +         * can end up here
> +         */
> +        if (!fs->notification_vq) {
> +            vuf_cleanup_vqs(vdev);
> +            vuf_create_vqs(vdev, true);
> +        }

I would simplify things by unconditionally creating the notification vq
for the device and letting the vhost-user device backend decide whether
it wants to handle the vq or not. If the backend doesn't implement the
vq then it also won't advertise VIRTIO_FS_F_NOTIFICATION so the guest
driver won't submit virtqueue buffers.

I'm not 100% sure if that approach works. It should be tested with a
virtiofsd that doesn't implement the notification vq, for example. But I
think it's worth exploring that because the code will be simpler than
worrying about whether notifications are enabled or disabled.

> +        return;
> +    }
> +
> +    fs->notify_enabled = false;
> +    if (!fs->notification_vq) {
> +        return;
> +    }
> +    /*
> +     * Driver does not support notification queue. Reconfigure queues
> +     * and do not create notification queue.
> +     */
> +    vuf_cleanup_vqs(vdev);
> +
> +    /* Create queues again */
> +    vuf_create_vqs(vdev, false);
> +}
> +
>  static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
>                                              bool mask)
>  {
> @@ -262,7 +315,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>      virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
>                  sizeof(struct virtio_fs_config));
>  
> -    vuf_create_vqs(vdev);
> +    vuf_create_vqs(vdev, true);
>      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
>                           VHOST_BACKEND_TYPE_USER, 0, errp);
>      if (ret < 0) {
> @@ -327,6 +380,7 @@ static void vuf_class_init(ObjectClass *klass, void *data)
>      vdc->realize = vuf_device_realize;
>      vdc->unrealize = vuf_device_unrealize;
>      vdc->get_features = vuf_get_features;
> +    vdc->set_features = vuf_set_features;
>      vdc->get_config = vuf_get_config;
>      vdc->set_status = vuf_set_status;
>      vdc->guest_notifier_mask = vuf_guest_notifier_mask;
> diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> index 0d62834c25..95dc0dd402 100644
> --- a/include/hw/virtio/vhost-user-fs.h
> +++ b/include/hw/virtio/vhost-user-fs.h
> @@ -39,7 +39,9 @@ struct VHostUserFS {
>      VhostUserState vhost_user;
>      VirtQueue **req_vqs;
>      VirtQueue *hiprio_vq;
> +    VirtQueue *notification_vq;
>      int32_t bootindex;
> +    bool notify_enabled;
>  
>      /*< public >*/
>  };
> diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> index 492e002181..4942d080da 100644
> --- a/tools/virtiofsd/fuse_i.h
> +++ b/tools/virtiofsd/fuse_i.h
> @@ -73,6 +73,7 @@ struct fuse_session {
>      int   vu_socketfd;
>      struct fv_VuDev *virtio_dev;
>      int thread_pool_size;
> +    bool notify_enabled;
>  };
>  
>  struct fuse_chan {
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index baead08b28..f5b87a508a 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -14,6 +14,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/iov.h"
>  #include "qapi/error.h"
> +#include "standard-headers/linux/virtio_fs.h"
>  #include "fuse_i.h"
>  #include "standard-headers/linux/fuse.h"
>  #include "fuse_misc.h"
> @@ -85,12 +86,25 @@ struct fv_VuDev {
>  /* Callback from libvhost-user */
>  static uint64_t fv_get_features(VuDev *dev)
>  {
> -    return 1ULL << VIRTIO_F_VERSION_1;
> +    uint64_t features;
> +
> +    features = 1ull << VIRTIO_F_VERSION_1 |
> +               1ull << VIRTIO_FS_F_NOTIFICATION;
> +
> +    return features;
>  }
>  
>  /* Callback from libvhost-user */
>  static void fv_set_features(VuDev *dev, uint64_t features)
>  {
> +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> +    struct fuse_session *se = vud->se;
> +
> +    if ((1ull << VIRTIO_FS_F_NOTIFICATION) & features) {
> +        se->notify_enabled = true;
> +    } else {
> +        se->notify_enabled = false;
> +    }
>  }
>  
>  /*
> @@ -719,22 +733,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>  {
>      int ret;
>      struct fv_QueueInfo *ourqi;
> +    struct fuse_session *se = vud->se;
>  
>      assert(qidx < vud->nqueues);
>      ourqi = vud->qi[qidx];
>  
> -    /* Kill the thread */
> -    if (eventfd_write(ourqi->kill_fd, 1)) {
> -        fuse_log(FUSE_LOG_ERR, "Eventfd_write for queue %d: %s\n",
> -                 qidx, strerror(errno));
> -    }
> -    ret = pthread_join(ourqi->thread, NULL);
> -    if (ret) {
> -        fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err %d\n",
> -                 __func__, qidx, ret);
> +    /* qidx == 1 is the notification queue if notifications are enabled */
> +    if (!se->notify_enabled || qidx != 1) {
> +        /* Kill the thread */
> +        if (eventfd_write(ourqi->kill_fd, 1)) {
> +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> +        }
> +        ret = pthread_join(ourqi->thread, NULL);
> +        if (ret) {
> +            fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err"
> +                     " %d\n", __func__, qidx, ret);
> +        }
> +        close(ourqi->kill_fd);
>      }
>      pthread_mutex_destroy(&ourqi->vq_lock);
> -    close(ourqi->kill_fd);
>      ourqi->kick_fd = -1;
>      g_free(vud->qi[qidx]);
>      vud->qi[qidx] = NULL;
> @@ -757,6 +774,9 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>  {
>      struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
>      struct fv_QueueInfo *ourqi;
> +    int valid_queues = 2; /* One hiprio queue and one request queue */
> +    bool notification_q = false;
> +    struct fuse_session *se = vud->se;
>  
>      fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
>               started);
> @@ -768,10 +788,19 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>       * well-behaved client in mind and may not protect against all types of
>       * races yet.
>       */
> -    if (qidx > 1) {
> -        fuse_log(FUSE_LOG_ERR,
> -                 "%s: multiple request queues not yet implemented, please only "
> -                 "configure 1 request queue\n",
> +    if (se->notify_enabled) {
> +        valid_queues++;
> +        /*
> +         * If notification queue is enabled, then qidx 1 is notificaiton queue.

s/notificaiton/notification/

> +         */
> +        if (qidx == 1) {
> +            notification_q = true;
> +        }
> +    }
> +
> +    if (qidx >= valid_queues) {
> +        fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
> +                 "implemented, please only configure 1 request queue\n",
>                   __func__);
>          exit(EXIT_FAILURE);
>      }
> @@ -793,11 +822,18 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>              assert(vud->qi[qidx]->kick_fd == -1);
>          }
>          ourqi = vud->qi[qidx];
> +        pthread_mutex_init(&ourqi->vq_lock, NULL);
> +        /*
> +         * For notification queue, we don't have to start a thread yet.
> +         */
> +        if (notification_q) {
> +            return;
> +        }
> +
>          ourqi->kick_fd = dev->vq[qidx].kick_fd;
>  
>          ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
>          assert(ourqi->kill_fd != -1);
> -        pthread_mutex_init(&ourqi->vq_lock, NULL);
>  
>          if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
>              fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
> @@ -1048,7 +1084,7 @@ int virtio_session_mount(struct fuse_session *se)
>      se->vu_socketfd = data_sock;
>      se->virtio_dev->se = se;
>      pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
> -    if (!vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
> +    if (!vu_init(&se->virtio_dev->dev, 3, se->vu_socketfd, fv_panic, NULL,

The guest driver can invoke fv_queue_set_started() with qidx=2 even when
VIRTIO_FS_F_NOTIFICATION is off. Luckily the following check protects
fv_queue_set_started():

  if (qidx >= valid_queues) {
      fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
               "implemented, please only configure 1 request queue\n",
               __func__);
      exit(EXIT_FAILURE);
  }

However, the error message suggests this is related to multiqueue. In
fact, we'll need to keep this check even once multiqueue has been
implemented. Maybe the error message should be tweaked or at least a
comment needs to be added to the code so this check isn't accidentally
removed once multiqueue is implemented.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 09/13] virtiofsd: Specify size of notification buffer using config space
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 14:33     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 14:33 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 5412 bytes --]

On Thu, Sep 30, 2021 at 11:30:33AM -0400, Vivek Goyal wrote:
> Daemon specifies size of notification buffer needed and that should be
> done using config space.
> 
> Only ->notify_buf_size value of config space comes from daemon. Rest of
> it is filled by qemu device emulation code.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c                  | 27 +++++++++++++++++++
>  include/hw/virtio/vhost-user-fs.h          |  2 ++
>  include/standard-headers/linux/virtio_fs.h |  2 ++
>  tools/virtiofsd/fuse_virtio.c              | 31 ++++++++++++++++++++++
>  4 files changed, 62 insertions(+)
> 
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 6bafcf0243..68a94708b4 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -36,15 +36,41 @@ static const int user_feature_bits[] = {
>      VHOST_INVALID_FEATURE_BIT
>  };
>  
> +static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
> +{
> +    return 0;
> +}
> +
> +const VhostDevConfigOps fs_ops = {
> +    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
> +};
> +
>  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>      struct virtio_fs_config fscfg = {};
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    /*
> +     * As of now we only get notification buffer size from device. And that's
> +     * needed only if notification queue is enabled.
> +     */
> +    if (fs->notify_enabled) {
> +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> +                                   sizeof(struct virtio_fs_config),
> +                                   &local_err);
> +        if (ret) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +    }
>  
>      memcpy((char *)fscfg.tag, fs->conf.tag,
>             MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
>  
>      virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
> +    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
>  
>      memcpy(config, &fscfg, sizeof(fscfg));
>  }
> @@ -316,6 +342,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>                  sizeof(struct virtio_fs_config));
>  
>      vuf_create_vqs(vdev, true);
> +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
>      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
>                           VHOST_BACKEND_TYPE_USER, 0, errp);
>      if (ret < 0) {
> diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> index 95dc0dd402..3b114ee260 100644
> --- a/include/hw/virtio/vhost-user-fs.h
> +++ b/include/hw/virtio/vhost-user-fs.h
> @@ -14,6 +14,7 @@
>  #ifndef _QEMU_VHOST_USER_FS_H
>  #define _QEMU_VHOST_USER_FS_H
>  
> +#include "standard-headers/linux/virtio_fs.h"
>  #include "hw/virtio/virtio.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
> @@ -37,6 +38,7 @@ struct VHostUserFS {
>      struct vhost_virtqueue *vhost_vqs;
>      struct vhost_dev vhost_dev;
>      VhostUserState vhost_user;
> +    struct virtio_fs_config fscfg;
>      VirtQueue **req_vqs;
>      VirtQueue *hiprio_vq;
>      VirtQueue *notification_vq;
> diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
> index b7f015186e..867d18acf6 100644
> --- a/include/standard-headers/linux/virtio_fs.h
> +++ b/include/standard-headers/linux/virtio_fs.h
> @@ -17,6 +17,8 @@ struct virtio_fs_config {
>  
>  	/* Number of request queues */
>  	uint32_t num_request_queues;
> +	/* Size of notification buffer */
> +	uint32_t notify_buf_size;
>  } QEMU_PACKED;
>  
>  /* For the id field in virtio_pci_shm_cap */

Please put all the include/standard-headers/linux/ changes into a single
commit that imports these changes from linux.git. Changes to this header
shouldn't be hand-written, use scripts/update-linux-headers.sh instead.

> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index f5b87a508a..3b720c5d4a 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -856,6 +856,35 @@ static bool fv_queue_order(VuDev *dev, int qidx)
>      return false;
>  }
>  
> +static uint64_t fv_get_protocol_features(VuDev *dev)
> +{
> +    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> +}
> +
> +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> +{
> +    struct virtio_fs_config fscfg = {};
> +    unsigned notify_size, roundto = 64;
> +    union fuse_notify_union {
> +        struct fuse_notify_poll_wakeup_out  wakeup_out;
> +        struct fuse_notify_inval_inode_out  inode_out;
> +        struct fuse_notify_inval_entry_out  entry_out;
> +        struct fuse_notify_delete_out       delete_out;
> +        struct fuse_notify_store_out        store_out;
> +        struct fuse_notify_retrieve_out     retrieve_out;
> +    };
> +
> +    notify_size = sizeof(struct fuse_out_header) +
> +              sizeof(union fuse_notify_union);
> +    notify_size = ((notify_size + roundto) / roundto) * roundto;

Why is the size rounded to 64 bytes?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 09/13] virtiofsd: Specify size of notification buffer using config space
@ 2021-10-04 14:33     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 14:33 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 5412 bytes --]

On Thu, Sep 30, 2021 at 11:30:33AM -0400, Vivek Goyal wrote:
> Daemon specifies size of notification buffer needed and that should be
> done using config space.
> 
> Only ->notify_buf_size value of config space comes from daemon. Rest of
> it is filled by qemu device emulation code.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c                  | 27 +++++++++++++++++++
>  include/hw/virtio/vhost-user-fs.h          |  2 ++
>  include/standard-headers/linux/virtio_fs.h |  2 ++
>  tools/virtiofsd/fuse_virtio.c              | 31 ++++++++++++++++++++++
>  4 files changed, 62 insertions(+)
> 
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 6bafcf0243..68a94708b4 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -36,15 +36,41 @@ static const int user_feature_bits[] = {
>      VHOST_INVALID_FEATURE_BIT
>  };
>  
> +static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
> +{
> +    return 0;
> +}
> +
> +const VhostDevConfigOps fs_ops = {
> +    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
> +};
> +
>  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>      struct virtio_fs_config fscfg = {};
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    /*
> +     * As of now we only get notification buffer size from device. And that's
> +     * needed only if notification queue is enabled.
> +     */
> +    if (fs->notify_enabled) {
> +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> +                                   sizeof(struct virtio_fs_config),
> +                                   &local_err);
> +        if (ret) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +    }
>  
>      memcpy((char *)fscfg.tag, fs->conf.tag,
>             MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
>  
>      virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
> +    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
>  
>      memcpy(config, &fscfg, sizeof(fscfg));
>  }
> @@ -316,6 +342,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>                  sizeof(struct virtio_fs_config));
>  
>      vuf_create_vqs(vdev, true);
> +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
>      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
>                           VHOST_BACKEND_TYPE_USER, 0, errp);
>      if (ret < 0) {
> diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> index 95dc0dd402..3b114ee260 100644
> --- a/include/hw/virtio/vhost-user-fs.h
> +++ b/include/hw/virtio/vhost-user-fs.h
> @@ -14,6 +14,7 @@
>  #ifndef _QEMU_VHOST_USER_FS_H
>  #define _QEMU_VHOST_USER_FS_H
>  
> +#include "standard-headers/linux/virtio_fs.h"
>  #include "hw/virtio/virtio.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
> @@ -37,6 +38,7 @@ struct VHostUserFS {
>      struct vhost_virtqueue *vhost_vqs;
>      struct vhost_dev vhost_dev;
>      VhostUserState vhost_user;
> +    struct virtio_fs_config fscfg;
>      VirtQueue **req_vqs;
>      VirtQueue *hiprio_vq;
>      VirtQueue *notification_vq;
> diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
> index b7f015186e..867d18acf6 100644
> --- a/include/standard-headers/linux/virtio_fs.h
> +++ b/include/standard-headers/linux/virtio_fs.h
> @@ -17,6 +17,8 @@ struct virtio_fs_config {
>  
>  	/* Number of request queues */
>  	uint32_t num_request_queues;
> +	/* Size of notification buffer */
> +	uint32_t notify_buf_size;
>  } QEMU_PACKED;
>  
>  /* For the id field in virtio_pci_shm_cap */

Please put all the include/standard-headers/linux/ changes into a single
commit that imports these changes from linux.git. Changes to this header
shouldn't be hand-written, use scripts/update-linux-headers.sh instead.

> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index f5b87a508a..3b720c5d4a 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -856,6 +856,35 @@ static bool fv_queue_order(VuDev *dev, int qidx)
>      return false;
>  }
>  
> +static uint64_t fv_get_protocol_features(VuDev *dev)
> +{
> +    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> +}
> +
> +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> +{
> +    struct virtio_fs_config fscfg = {};
> +    unsigned notify_size, roundto = 64;
> +    union fuse_notify_union {
> +        struct fuse_notify_poll_wakeup_out  wakeup_out;
> +        struct fuse_notify_inval_inode_out  inode_out;
> +        struct fuse_notify_inval_entry_out  entry_out;
> +        struct fuse_notify_delete_out       delete_out;
> +        struct fuse_notify_store_out        store_out;
> +        struct fuse_notify_retrieve_out     retrieve_out;
> +    };
> +
> +    notify_size = sizeof(struct fuse_out_header) +
> +              sizeof(union fuse_notify_union);
> +    notify_size = ((notify_size + roundto) / roundto) * roundto;

Why is the size rounded to 64 bytes?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 14:54     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 14:54 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 3902 bytes --]

On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote:
> Add a new custom threadpool using posix threads that specifically
> service locking requests.
> 
> In the case of a fcntl(SETLKW) request, if the guest is waiting
> for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
> unblocks the blocked threads by sending a signal to them and waking
> them up.
> 
> The current threadpool (GThreadPool) is not adequate to service the
> locking requests that result in a thread blocking. That is because
> GLib does not provide an API to cancel the request while it is
> serviced by a thread. In addition, a user might be running virtiofsd
> without a threadpool (--thread-pool-size=0), thus a locking request
> that blocks, will block the main virtqueue thread that services requests
> from servicing any other requests.
> 
> The only exception occurs when the lock is of type F_UNLCK. In this case
> the request is serviced by the main virtqueue thread or a GThreadPool
> thread to avoid a deadlock, when all the threads in the custom threadpool
> are blocked.
> 
> Then virtiofsd proceeds to cleanup the state of the threads, release
> them back to the system and re-initialize.

Is there another way to cancel SETLKW without resorting to a new thread
pool? Since this only matters when shutting down or restarting, can we
close all plock->fd file descriptors to kick the GThreadPool workers out
of fnctl()?

> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c         |  90 ++++++-
>  tools/virtiofsd/meson.build           |   1 +
>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>  tools/virtiofsd/tpool.c               | 331 ++++++++++++++++++++++++++
>  tools/virtiofsd/tpool.h               |  18 ++
>  5 files changed, 440 insertions(+), 1 deletion(-)
>  create mode 100644 tools/virtiofsd/tpool.c
>  create mode 100644 tools/virtiofsd/tpool.h
> 
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index 3b720c5d4a..c67c2e0e7a 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -20,6 +20,7 @@
>  #include "fuse_misc.h"
>  #include "fuse_opt.h"
>  #include "fuse_virtio.h"
> +#include "tpool.h"
>  
>  #include <sys/eventfd.h>
>  #include <sys/socket.h>
> @@ -612,6 +613,60 @@ out:
>      free(req);
>  }
>  
> +/*
> + * If the request is a locking request, use a custom locking thread pool.
> + */
> +static bool use_lock_tpool(gpointer data, gpointer user_data)
> +{
> +    struct fv_QueueInfo *qi = user_data;
> +    struct fuse_session *se = qi->virtio_dev->se;
> +    FVRequest *req = data;
> +    VuVirtqElement *elem = &req->elem;
> +    struct fuse_buf fbuf = {};
> +    struct fuse_in_header *inhp;
> +    struct fuse_lk_in *lkinp;
> +    size_t lk_req_len;
> +    /* The 'out' part of the elem is from qemu */
> +    unsigned int out_num = elem->out_num;
> +    struct iovec *out_sg = elem->out_sg;
> +    size_t out_len = iov_size(out_sg, out_num);
> +    bool use_custom_tpool = false;
> +
> +    /*
> +     * If notifications are not enabled, no point in using cusotm lock
> +     * thread pool.
> +     */
> +    if (!se->notify_enabled) {
> +        return false;
> +    }
> +
> +    assert(se->bufsize > sizeof(struct fuse_in_header));
> +    lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in);
> +
> +    if (out_len < lk_req_len) {
> +        return false;
> +    }
> +
> +    fbuf.mem = g_malloc(se->bufsize);
> +    copy_from_iov(&fbuf, out_num, out_sg, lk_req_len);

This looks inefficient: for every FUSE request we now malloc se->bufsize
and then copy lk_req_len bytes, only to free the memory again.

Is it possible to keep lk_req_len bytes on the stack instead?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
@ 2021-10-04 14:54     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 14:54 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 3902 bytes --]

On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote:
> Add a new custom threadpool using posix threads that specifically
> service locking requests.
> 
> In the case of a fcntl(SETLKW) request, if the guest is waiting
> for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
> unblocks the blocked threads by sending a signal to them and waking
> them up.
> 
> The current threadpool (GThreadPool) is not adequate to service the
> locking requests that result in a thread blocking. That is because
> GLib does not provide an API to cancel the request while it is
> serviced by a thread. In addition, a user might be running virtiofsd
> without a threadpool (--thread-pool-size=0), thus a locking request
> that blocks, will block the main virtqueue thread that services requests
> from servicing any other requests.
> 
> The only exception occurs when the lock is of type F_UNLCK. In this case
> the request is serviced by the main virtqueue thread or a GThreadPool
> thread to avoid a deadlock, when all the threads in the custom threadpool
> are blocked.
> 
> Then virtiofsd proceeds to cleanup the state of the threads, release
> them back to the system and re-initialize.

Is there another way to cancel SETLKW without resorting to a new thread
pool? Since this only matters when shutting down or restarting, can we
close all plock->fd file descriptors to kick the GThreadPool workers out
of fnctl()?

> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c         |  90 ++++++-
>  tools/virtiofsd/meson.build           |   1 +
>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>  tools/virtiofsd/tpool.c               | 331 ++++++++++++++++++++++++++
>  tools/virtiofsd/tpool.h               |  18 ++
>  5 files changed, 440 insertions(+), 1 deletion(-)
>  create mode 100644 tools/virtiofsd/tpool.c
>  create mode 100644 tools/virtiofsd/tpool.h
> 
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index 3b720c5d4a..c67c2e0e7a 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -20,6 +20,7 @@
>  #include "fuse_misc.h"
>  #include "fuse_opt.h"
>  #include "fuse_virtio.h"
> +#include "tpool.h"
>  
>  #include <sys/eventfd.h>
>  #include <sys/socket.h>
> @@ -612,6 +613,60 @@ out:
>      free(req);
>  }
>  
> +/*
> + * If the request is a locking request, use a custom locking thread pool.
> + */
> +static bool use_lock_tpool(gpointer data, gpointer user_data)
> +{
> +    struct fv_QueueInfo *qi = user_data;
> +    struct fuse_session *se = qi->virtio_dev->se;
> +    FVRequest *req = data;
> +    VuVirtqElement *elem = &req->elem;
> +    struct fuse_buf fbuf = {};
> +    struct fuse_in_header *inhp;
> +    struct fuse_lk_in *lkinp;
> +    size_t lk_req_len;
> +    /* The 'out' part of the elem is from qemu */
> +    unsigned int out_num = elem->out_num;
> +    struct iovec *out_sg = elem->out_sg;
> +    size_t out_len = iov_size(out_sg, out_num);
> +    bool use_custom_tpool = false;
> +
> +    /*
> +     * If notifications are not enabled, no point in using cusotm lock
> +     * thread pool.
> +     */
> +    if (!se->notify_enabled) {
> +        return false;
> +    }
> +
> +    assert(se->bufsize > sizeof(struct fuse_in_header));
> +    lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in);
> +
> +    if (out_len < lk_req_len) {
> +        return false;
> +    }
> +
> +    fbuf.mem = g_malloc(se->bufsize);
> +    copy_from_iov(&fbuf, out_num, out_sg, lk_req_len);

This looks inefficient: for every FUSE request we now malloc se->bufsize
and then copy lk_req_len bytes, only to free the memory again.

Is it possible to keep lk_req_len bytes on the stack instead?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 15:01     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 15:01 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 4192 bytes --]

On Thu, Sep 30, 2021 at 11:30:35AM -0400, Vivek Goyal wrote:
> So far we did not have the notion of cross queue traffic. That is, we
> get request on a queue and send back response on same queue. So if a
> request be being processed and at the same time a stop queue request
> comes in, we wait for all pending requests to finish and then queue
> is stopped and associated data structure cleaned.
> 
> But with notification queue, now it is possible that we get a locking
> request on request queue and send the notification back on a different
> queue (notificaiton queue). This means, we need to make sure that

s/notificaiton/notification/

> notifiation queue has not already been shutdown or is not being

s/notifiation/notification/

> shutdown in parallel while we are trying to send a notification back.
> Otherwise bad things are bound to happen.
> 
> One way to solve this problem is that stop notification queue in the
> end. First stop hiprio and all request queues. That means by the
> time we are trying to stop notification queue, we know no other
> request can be in progress which can try to send something on
> notification queue.
> 
> But problem is that currently we don't have any control on in what
> order queues should be stopped. If there was a notion of whole device
> being stopped, then we could decide in what order queues should be
> stopped.
> 
> Stefan mentioned that there is a command to stop whole device
> VHOST_USER_SET_STATUS but it is not implemented in libvhost-user
> yet. Also we probably could not move away from per queue stop
> logic we have as of now.
> 
> As an alternative, he said if we stop all queue when qidx 0 is
> being stopped, it should be fine and we can solve the issue of
> notification queue shutdown order.
> 
> So in this patch I am shutting down all queues when queue 0
> is being shutdown. And also changed shutdown order in such a
> way that notification queue is shutdown last.
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index c67c2e0e7a..a87e88e286 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -826,6 +826,11 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>      assert(qidx < vud->nqueues);
>      ourqi = vud->qi[qidx];
>  
> +    /* Queue is already stopped */
> +    if (!ourqi) {
> +        return;
> +    }
> +
>      /* qidx == 1 is the notification queue if notifications are enabled */
>      if (!se->notify_enabled || qidx != 1) {
>          /* Kill the thread */
> @@ -847,14 +852,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>  
>  static void stop_all_queues(struct fv_VuDev *vud)
>  {
> +    struct fuse_session *se = vud->se;
> +
>      for (int i = 0; i < vud->nqueues; i++) {
>          if (!vud->qi[i]) {
>              continue;
>          }
>  
> +        /* Shutdown notification queue in the end */
> +        if (se->notify_enabled && i == 1) {
> +            continue;
> +        }
>          fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
>          fv_queue_cleanup_thread(vud, i);
>      }
> +
> +    if (se->notify_enabled) {
> +        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, 1);
> +        fv_queue_cleanup_thread(vud, 1);
> +    }
>  }
>  
>  /* Callback from libvhost-user on start or stop of a queue */
> @@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>           * the queue thread doesn't block in virtio_send_msg().
>           */
>          vu_dispatch_unlock(vud);
> -        fv_queue_cleanup_thread(vud, qidx);
> +
> +        /*
> +         * If queue 0 is being shutdown, treat it as if device is being
> +         * shutdown and stop all queues.
> +         */

Please expand this comment so it's clear why we do this.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
@ 2021-10-04 15:01     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 15:01 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 4192 bytes --]

On Thu, Sep 30, 2021 at 11:30:35AM -0400, Vivek Goyal wrote:
> So far we did not have the notion of cross queue traffic. That is, we
> get request on a queue and send back response on same queue. So if a
> request be being processed and at the same time a stop queue request
> comes in, we wait for all pending requests to finish and then queue
> is stopped and associated data structure cleaned.
> 
> But with notification queue, now it is possible that we get a locking
> request on request queue and send the notification back on a different
> queue (notificaiton queue). This means, we need to make sure that

s/notificaiton/notification/

> notifiation queue has not already been shutdown or is not being

s/notifiation/notification/

> shutdown in parallel while we are trying to send a notification back.
> Otherwise bad things are bound to happen.
> 
> One way to solve this problem is that stop notification queue in the
> end. First stop hiprio and all request queues. That means by the
> time we are trying to stop notification queue, we know no other
> request can be in progress which can try to send something on
> notification queue.
> 
> But problem is that currently we don't have any control on in what
> order queues should be stopped. If there was a notion of whole device
> being stopped, then we could decide in what order queues should be
> stopped.
> 
> Stefan mentioned that there is a command to stop whole device
> VHOST_USER_SET_STATUS but it is not implemented in libvhost-user
> yet. Also we probably could not move away from per queue stop
> logic we have as of now.
> 
> As an alternative, he said if we stop all queue when qidx 0 is
> being stopped, it should be fine and we can solve the issue of
> notification queue shutdown order.
> 
> So in this patch I am shutting down all queues when queue 0
> is being shutdown. And also changed shutdown order in such a
> way that notification queue is shutdown last.
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index c67c2e0e7a..a87e88e286 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -826,6 +826,11 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>      assert(qidx < vud->nqueues);
>      ourqi = vud->qi[qidx];
>  
> +    /* Queue is already stopped */
> +    if (!ourqi) {
> +        return;
> +    }
> +
>      /* qidx == 1 is the notification queue if notifications are enabled */
>      if (!se->notify_enabled || qidx != 1) {
>          /* Kill the thread */
> @@ -847,14 +852,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>  
>  static void stop_all_queues(struct fv_VuDev *vud)
>  {
> +    struct fuse_session *se = vud->se;
> +
>      for (int i = 0; i < vud->nqueues; i++) {
>          if (!vud->qi[i]) {
>              continue;
>          }
>  
> +        /* Shutdown notification queue in the end */
> +        if (se->notify_enabled && i == 1) {
> +            continue;
> +        }
>          fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
>          fv_queue_cleanup_thread(vud, i);
>      }
> +
> +    if (se->notify_enabled) {
> +        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, 1);
> +        fv_queue_cleanup_thread(vud, 1);
> +    }
>  }
>  
>  /* Callback from libvhost-user on start or stop of a queue */
> @@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>           * the queue thread doesn't block in virtio_send_msg().
>           */
>          vu_dispatch_unlock(vud);
> -        fv_queue_cleanup_thread(vud, qidx);
> +
> +        /*
> +         * If queue 0 is being shutdown, treat it as if device is being
> +         * shutdown and stop all queues.
> +         */

Please expand this comment so it's clear why we do this.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 12/13] virtiofsd: Implement blocking posix locks
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-04 15:07     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 15:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 2932 bytes --]

On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> -EOPNOTSUPP.
> 
> Change that by accepting these requests and returning a reply
> immediately asking caller to wait. Once lock is available, send a
> notification to the waiter indicating lock is available.
> 
> In response to lock request, we are returning error value as "1", which
> signals to client to queue the lock request internally and later client
> will get a notification which will signal lock is taken (or error). And
> then fuse client should wake up the guest process.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
>  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
>  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
>  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
>  4 files changed, 167 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index e4679c73ab..2e7f4b786d 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>          .unique = req->unique,
>          .error = error,
>      };
> -
> -    if (error <= -1000 || error > 0) {
> +    /* error = 1 has been used to signal client to wait for notificaiton */

s/notificaiton/notification/

> +    if (error <= -1000 || error > 1) {
>          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
>          out.error = -ERANGE;
>      }
> @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
>      return send_reply(req, -err, NULL, 0);
>  }
>  
> +int fuse_reply_wait(fuse_req_t req)
> +{
> +    return send_reply(req, 1, NULL, 0);
> +}
> +
>  void fuse_reply_none(fuse_req_t req)
>  {
>      fuse_free_req(req);
> @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
>      send_reply_ok(req, NULL, 0);
>  }
>  
> +static int send_notify_iov(struct fuse_session *se, int notify_code,
> +                           struct iovec *iov, int count)
> +{
> +    struct fuse_out_header out;
> +    if (!se->got_init) {
> +        return -ENOTCONN;
> +    }
> +    out.unique = 0;
> +    out.error = notify_code;

Please fully initialize all fuse_out_header fields so it's obvious that
there is no accidental information leak from virtiofsd to the guest:

  struct fuse_out_header out = {
      .error = notify_code,
  };

The host must not expose uninitialized memory to the guest (just like
the kernel vs userspace). fuse_send_msg() initializes out.len later, but
to be on the safe side I think we should be explicit here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
@ 2021-10-04 15:07     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-04 15:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 2932 bytes --]

On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> -EOPNOTSUPP.
> 
> Change that by accepting these requests and returning a reply
> immediately asking caller to wait. Once lock is available, send a
> notification to the waiter indicating lock is available.
> 
> In response to lock request, we are returning error value as "1", which
> signals to client to queue the lock request internally and later client
> will get a notification which will signal lock is taken (or error). And
> then fuse client should wake up the guest process.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
>  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
>  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
>  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
>  4 files changed, 167 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index e4679c73ab..2e7f4b786d 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>          .unique = req->unique,
>          .error = error,
>      };
> -
> -    if (error <= -1000 || error > 0) {
> +    /* error = 1 has been used to signal client to wait for notificaiton */

s/notificaiton/notification/

> +    if (error <= -1000 || error > 1) {
>          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
>          out.error = -ERANGE;
>      }
> @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
>      return send_reply(req, -err, NULL, 0);
>  }
>  
> +int fuse_reply_wait(fuse_req_t req)
> +{
> +    return send_reply(req, 1, NULL, 0);
> +}
> +
>  void fuse_reply_none(fuse_req_t req)
>  {
>      fuse_free_req(req);
> @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
>      send_reply_ok(req, NULL, 0);
>  }
>  
> +static int send_notify_iov(struct fuse_session *se, int notify_code,
> +                           struct iovec *iov, int count)
> +{
> +    struct fuse_out_header out;
> +    if (!se->got_init) {
> +        return -ENOTCONN;
> +    }
> +    out.unique = 0;
> +    out.error = notify_code;

Please fully initialize all fuse_out_header fields so it's obvious that
there is no accidental information leak from virtiofsd to the guest:

  struct fuse_out_header out = {
      .error = notify_code,
  };

The host must not expose uninitialized memory to the guest (just like
the kernel vs userspace). fuse_send_msg() initializes out.len later, but
to be on the safe side I think we should be explicit here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
  2021-10-04 13:54     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-04 19:58       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-04 19:58 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Mon, Oct 04, 2021 at 02:54:17PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:30AM -0400, Vivek Goyal wrote:
> > Add helpers to create/cleanup virtuqueues and use those helpers. I will
> 
> s/virtuqueues/virtqueues/
> 
> > need to reconfigure queues in later patches and using helpers will allow
> > reusing the code.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
> >  1 file changed, 52 insertions(+), 35 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index c595957983..d1efbc5b18 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> >      }
> >  }
> >  
> > +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    /*
> > +     * Not normally called; it's the daemon that handles the queue;
> > +     * however virtio's cleanup path can call this.
> > +     */
> > +}
> > +
> > +static void vuf_create_vqs(VirtIODevice *vdev)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +    unsigned int i;
> > +
> > +    /* Hiprio queue */
> > +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                     vuf_handle_output);
> > +
> > +    /* Request queues */
> > +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                          vuf_handle_output);
> > +    }
> > +
> > +    /* 1 high prio queue, plus the number configured */
> > +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> 
> These two lines prepare for vhost_dev_init(), so moving them here is
> debatable. If a caller is going to use this function again in the future
> then they need to be sure to also call vhost_dev_init(). For now it
> looks safe, so I guess it's okay.

Hmm..., I do call this function later from vuf_set_features() and
reconfigure the queues. I see that I don't call vhost_dev_init()
in that path. I am not even sure if I should be calling
vhost_dev_init() from inside vuf_set_features().

So core reuirement is that at the time of first creating device
I have no idea if driver supports notification queue or not. So
I do create device with notification queue. But later if driver
(and possibly vhost device) does not support notifiation queue,
then we need to reconfigure queues. What's the correct way to
do that?

Thanks
Vivek
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>




^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
@ 2021-10-04 19:58       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-04 19:58 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Mon, Oct 04, 2021 at 02:54:17PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:30AM -0400, Vivek Goyal wrote:
> > Add helpers to create/cleanup virtuqueues and use those helpers. I will
> 
> s/virtuqueues/virtqueues/
> 
> > need to reconfigure queues in later patches and using helpers will allow
> > reusing the code.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
> >  1 file changed, 52 insertions(+), 35 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index c595957983..d1efbc5b18 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> >      }
> >  }
> >  
> > +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    /*
> > +     * Not normally called; it's the daemon that handles the queue;
> > +     * however virtio's cleanup path can call this.
> > +     */
> > +}
> > +
> > +static void vuf_create_vqs(VirtIODevice *vdev)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +    unsigned int i;
> > +
> > +    /* Hiprio queue */
> > +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                     vuf_handle_output);
> > +
> > +    /* Request queues */
> > +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                          vuf_handle_output);
> > +    }
> > +
> > +    /* 1 high prio queue, plus the number configured */
> > +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> 
> These two lines prepare for vhost_dev_init(), so moving them here is
> debatable. If a caller is going to use this function again in the future
> then they need to be sure to also call vhost_dev_init(). For now it
> looks safe, so I guess it's okay.

Hmm..., I do call this function later from vuf_set_features() and
reconfigure the queues. I see that I don't call vhost_dev_init()
in that path. I am not even sure if I should be calling
vhost_dev_init() from inside vuf_set_features().

So core reuirement is that at the time of first creating device
I have no idea if driver supports notification queue or not. So
I do create device with notification queue. But later if driver
(and possibly vhost device) does not support notifiation queue,
then we need to reconfigure queues. What's the correct way to
do that?

Thanks
Vivek
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 08/13] virtiofsd: Create a notification queue
  2021-10-04 14:30     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-04 21:01       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-04 21:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Mon, Oct 04, 2021 at 03:30:38PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote:
> > Add a notification queue which will be used to send async notifications
> > for file lock availability.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  hw/virtio/vhost-user-fs-pci.c     |  4 +-
> >  hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
> >  include/hw/virtio/vhost-user-fs.h |  2 +
> >  tools/virtiofsd/fuse_i.h          |  1 +
> >  tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
> >  5 files changed, 116 insertions(+), 23 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
> > index 2ed8492b3f..cdb9471088 100644
> > --- a/hw/virtio/vhost-user-fs-pci.c
> > +++ b/hw/virtio/vhost-user-fs-pci.c
> > @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> >      DeviceState *vdev = DEVICE(&dev->vdev);
> >  
> >      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> > -        /* Also reserve config change and hiprio queue vectors */
> > -        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
> > +        /* Also reserve config change, hiprio and notification queue vectors */
> > +        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
> >      }
> >  
> >      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index d1efbc5b18..6bafcf0243 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
> >      VIRTIO_F_NOTIFY_ON_EMPTY,
> >      VIRTIO_F_RING_PACKED,
> >      VIRTIO_F_IOMMU_PLATFORM,
> > +    VIRTIO_FS_F_NOTIFICATION,
> >  
> >      VHOST_INVALID_FEATURE_BIT
> >  };
> > @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >       */
> >  }
> >  
> > -static void vuf_create_vqs(VirtIODevice *vdev)
> > +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >      unsigned int i;
> > @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> >      /* Hiprio queue */
> >      fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> >                                       vuf_handle_output);
> > +    /*
> > +     * Notification queue. Feature negotiation happens later. So at this
> > +     * point of time we don't know if driver will use notification queue
> > +     * or not.
> > +     */
> > +    if (notification_vq) {
> > +        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                               vuf_handle_output);
> > +    }
> >  
> >      /* Request queues */
> >      fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> >                                            vuf_handle_output);
> >      }
> >  
> > -    /* 1 high prio queue, plus the number configured */
> > -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > +    /* 1 high prio queue, 1 notification queue plus the number configured */
> > +    if (notification_vq) {
> > +        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> > +    } else {
> > +        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > +    }
> >      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> >  }
> >  
> > @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
> >      virtio_delete_queue(fs->hiprio_vq);
> >      fs->hiprio_vq = NULL;
> >  
> > +    if (fs->notification_vq) {
> > +        virtio_delete_queue(fs->notification_vq);
> > +    }
> > +    fs->notification_vq = NULL;
> > +
> >      for (i = 0; i < fs->conf.num_request_queues; i++) {
> >          virtio_delete_queue(fs->req_vqs[i]);
> >      }
> > @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >  
> > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > +
> >      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> >  }
> >  
> > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +
> > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > +        fs->notify_enabled = true;
> > +        /*
> > +         * If guest first booted with no notification queue support and
> > +         * later rebooted with kernel which supports notification, we
> > +         * can end up here
> > +         */
> > +        if (!fs->notification_vq) {
> > +            vuf_cleanup_vqs(vdev);
> > +            vuf_create_vqs(vdev, true);
> > +        }
> 
> I would simplify things by unconditionally creating the notification vq
> for the device and letting the vhost-user device backend decide whether
> it wants to handle the vq or not.
> If the backend doesn't implement the
> vq then it also won't advertise VIRTIO_FS_F_NOTIFICATION so the guest
> driver won't submit virtqueue buffers.

I think I am did not understand the idea. This code deals with that
both qemu and vhost-user device can deal with notification queue. But
driver can't deal with it. 

So if we first booted into a guest kernel which does not support
notification queue, then we will not have instantiated notification
queue. But later we reboot guest into a newer kernel and now it
has capability to deal with notification queues, so we create it
now.

IIUC, you are suggesting that somehow keep notification queue
instantiated even if guest driver does not support notifications, so
that we will not have to get into the exercise of cleaning up queues
and re-instantiating these?

But I think we can't keep notification queue around if driver does
not support it. Because it changes queue index. queue index 1 will
belong to request queue if notifications are not enabled otherwise
it will belong to notification queue. So If I always instantiate
notification queue, then guest and qemu/virtiofsd will have
different understanding of which queue index belongs to what
queue.

I probably have misunderstood what you are suggesting. If you can
explain a little bit more in detail, that will help.

Thanks
Vivek

> 
> I'm not 100% sure if that approach works. It should be tested with a
> virtiofsd that doesn't implement the notification vq, for example. But I
> think it's worth exploring that because the code will be simpler than
> worrying about whether notifications are enabled or disabled.
> 
> > +        return;
> > +    }
> > +
> > +    fs->notify_enabled = false;
> > +    if (!fs->notification_vq) {
> > +        return;
> > +    }
> > +    /*
> > +     * Driver does not support notification queue. Reconfigure queues
> > +     * and do not create notification queue.
> > +     */
> > +    vuf_cleanup_vqs(vdev);
> > +
> > +    /* Create queues again */
> > +    vuf_create_vqs(vdev, false);
> > +}
> > +
> >  static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
> >                                              bool mask)
> >  {
> > @@ -262,7 +315,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >      virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
> >                  sizeof(struct virtio_fs_config));
> >  
> > -    vuf_create_vqs(vdev);
> > +    vuf_create_vqs(vdev, true);
> >      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
> >                           VHOST_BACKEND_TYPE_USER, 0, errp);
> >      if (ret < 0) {
> > @@ -327,6 +380,7 @@ static void vuf_class_init(ObjectClass *klass, void *data)
> >      vdc->realize = vuf_device_realize;
> >      vdc->unrealize = vuf_device_unrealize;
> >      vdc->get_features = vuf_get_features;
> > +    vdc->set_features = vuf_set_features;
> >      vdc->get_config = vuf_get_config;
> >      vdc->set_status = vuf_set_status;
> >      vdc->guest_notifier_mask = vuf_guest_notifier_mask;
> > diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> > index 0d62834c25..95dc0dd402 100644
> > --- a/include/hw/virtio/vhost-user-fs.h
> > +++ b/include/hw/virtio/vhost-user-fs.h
> > @@ -39,7 +39,9 @@ struct VHostUserFS {
> >      VhostUserState vhost_user;
> >      VirtQueue **req_vqs;
> >      VirtQueue *hiprio_vq;
> > +    VirtQueue *notification_vq;
> >      int32_t bootindex;
> > +    bool notify_enabled;
> >  
> >      /*< public >*/
> >  };
> > diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> > index 492e002181..4942d080da 100644
> > --- a/tools/virtiofsd/fuse_i.h
> > +++ b/tools/virtiofsd/fuse_i.h
> > @@ -73,6 +73,7 @@ struct fuse_session {
> >      int   vu_socketfd;
> >      struct fv_VuDev *virtio_dev;
> >      int thread_pool_size;
> > +    bool notify_enabled;
> >  };
> >  
> >  struct fuse_chan {
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index baead08b28..f5b87a508a 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -14,6 +14,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/iov.h"
> >  #include "qapi/error.h"
> > +#include "standard-headers/linux/virtio_fs.h"
> >  #include "fuse_i.h"
> >  #include "standard-headers/linux/fuse.h"
> >  #include "fuse_misc.h"
> > @@ -85,12 +86,25 @@ struct fv_VuDev {
> >  /* Callback from libvhost-user */
> >  static uint64_t fv_get_features(VuDev *dev)
> >  {
> > -    return 1ULL << VIRTIO_F_VERSION_1;
> > +    uint64_t features;
> > +
> > +    features = 1ull << VIRTIO_F_VERSION_1 |
> > +               1ull << VIRTIO_FS_F_NOTIFICATION;
> > +
> > +    return features;
> >  }
> >  
> >  /* Callback from libvhost-user */
> >  static void fv_set_features(VuDev *dev, uint64_t features)
> >  {
> > +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> > +    struct fuse_session *se = vud->se;
> > +
> > +    if ((1ull << VIRTIO_FS_F_NOTIFICATION) & features) {
> > +        se->notify_enabled = true;
> > +    } else {
> > +        se->notify_enabled = false;
> > +    }
> >  }
> >  
> >  /*
> > @@ -719,22 +733,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
> >  {
> >      int ret;
> >      struct fv_QueueInfo *ourqi;
> > +    struct fuse_session *se = vud->se;
> >  
> >      assert(qidx < vud->nqueues);
> >      ourqi = vud->qi[qidx];
> >  
> > -    /* Kill the thread */
> > -    if (eventfd_write(ourqi->kill_fd, 1)) {
> > -        fuse_log(FUSE_LOG_ERR, "Eventfd_write for queue %d: %s\n",
> > -                 qidx, strerror(errno));
> > -    }
> > -    ret = pthread_join(ourqi->thread, NULL);
> > -    if (ret) {
> > -        fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err %d\n",
> > -                 __func__, qidx, ret);
> > +    /* qidx == 1 is the notification queue if notifications are enabled */
> > +    if (!se->notify_enabled || qidx != 1) {
> > +        /* Kill the thread */
> > +        if (eventfd_write(ourqi->kill_fd, 1)) {
> > +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> > +        }
> > +        ret = pthread_join(ourqi->thread, NULL);
> > +        if (ret) {
> > +            fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err"
> > +                     " %d\n", __func__, qidx, ret);
> > +        }
> > +        close(ourqi->kill_fd);
> >      }
> >      pthread_mutex_destroy(&ourqi->vq_lock);
> > -    close(ourqi->kill_fd);
> >      ourqi->kick_fd = -1;
> >      g_free(vud->qi[qidx]);
> >      vud->qi[qidx] = NULL;
> > @@ -757,6 +774,9 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >  {
> >      struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> >      struct fv_QueueInfo *ourqi;
> > +    int valid_queues = 2; /* One hiprio queue and one request queue */
> > +    bool notification_q = false;
> > +    struct fuse_session *se = vud->se;
> >  
> >      fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
> >               started);
> > @@ -768,10 +788,19 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >       * well-behaved client in mind and may not protect against all types of
> >       * races yet.
> >       */
> > -    if (qidx > 1) {
> > -        fuse_log(FUSE_LOG_ERR,
> > -                 "%s: multiple request queues not yet implemented, please only "
> > -                 "configure 1 request queue\n",
> > +    if (se->notify_enabled) {
> > +        valid_queues++;
> > +        /*
> > +         * If notification queue is enabled, then qidx 1 is notificaiton queue.
> 
> s/notificaiton/notification/
> 
> > +         */
> > +        if (qidx == 1) {
> > +            notification_q = true;
> > +        }
> > +    }
> > +
> > +    if (qidx >= valid_queues) {
> > +        fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
> > +                 "implemented, please only configure 1 request queue\n",
> >                   __func__);
> >          exit(EXIT_FAILURE);
> >      }
> > @@ -793,11 +822,18 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >              assert(vud->qi[qidx]->kick_fd == -1);
> >          }
> >          ourqi = vud->qi[qidx];
> > +        pthread_mutex_init(&ourqi->vq_lock, NULL);
> > +        /*
> > +         * For notification queue, we don't have to start a thread yet.
> > +         */
> > +        if (notification_q) {
> > +            return;
> > +        }
> > +
> >          ourqi->kick_fd = dev->vq[qidx].kick_fd;
> >  
> >          ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
> >          assert(ourqi->kill_fd != -1);
> > -        pthread_mutex_init(&ourqi->vq_lock, NULL);
> >  
> >          if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
> >              fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
> > @@ -1048,7 +1084,7 @@ int virtio_session_mount(struct fuse_session *se)
> >      se->vu_socketfd = data_sock;
> >      se->virtio_dev->se = se;
> >      pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
> > -    if (!vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
> > +    if (!vu_init(&se->virtio_dev->dev, 3, se->vu_socketfd, fv_panic, NULL,
> 
> The guest driver can invoke fv_queue_set_started() with qidx=2 even when
> VIRTIO_FS_F_NOTIFICATION is off. Luckily the following check protects
> fv_queue_set_started():
> 
>   if (qidx >= valid_queues) {
>       fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
>                "implemented, please only configure 1 request queue\n",
>                __func__);
>       exit(EXIT_FAILURE);
>   }
> 
> However, the error message suggests this is related to multiqueue. In
> fact, we'll need to keep this check even once multiqueue has been
> implemented. Maybe the error message should be tweaked or at least a
> comment needs to be added to the code so this check isn't accidentally
> removed once multiqueue is implemented.




^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 08/13] virtiofsd: Create a notification queue
@ 2021-10-04 21:01       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-04 21:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Mon, Oct 04, 2021 at 03:30:38PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote:
> > Add a notification queue which will be used to send async notifications
> > for file lock availability.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  hw/virtio/vhost-user-fs-pci.c     |  4 +-
> >  hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
> >  include/hw/virtio/vhost-user-fs.h |  2 +
> >  tools/virtiofsd/fuse_i.h          |  1 +
> >  tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
> >  5 files changed, 116 insertions(+), 23 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
> > index 2ed8492b3f..cdb9471088 100644
> > --- a/hw/virtio/vhost-user-fs-pci.c
> > +++ b/hw/virtio/vhost-user-fs-pci.c
> > @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> >      DeviceState *vdev = DEVICE(&dev->vdev);
> >  
> >      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> > -        /* Also reserve config change and hiprio queue vectors */
> > -        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
> > +        /* Also reserve config change, hiprio and notification queue vectors */
> > +        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
> >      }
> >  
> >      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index d1efbc5b18..6bafcf0243 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
> >      VIRTIO_F_NOTIFY_ON_EMPTY,
> >      VIRTIO_F_RING_PACKED,
> >      VIRTIO_F_IOMMU_PLATFORM,
> > +    VIRTIO_FS_F_NOTIFICATION,
> >  
> >      VHOST_INVALID_FEATURE_BIT
> >  };
> > @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >       */
> >  }
> >  
> > -static void vuf_create_vqs(VirtIODevice *vdev)
> > +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >      unsigned int i;
> > @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> >      /* Hiprio queue */
> >      fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> >                                       vuf_handle_output);
> > +    /*
> > +     * Notification queue. Feature negotiation happens later. So at this
> > +     * point of time we don't know if driver will use notification queue
> > +     * or not.
> > +     */
> > +    if (notification_vq) {
> > +        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                               vuf_handle_output);
> > +    }
> >  
> >      /* Request queues */
> >      fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> >                                            vuf_handle_output);
> >      }
> >  
> > -    /* 1 high prio queue, plus the number configured */
> > -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > +    /* 1 high prio queue, 1 notification queue plus the number configured */
> > +    if (notification_vq) {
> > +        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> > +    } else {
> > +        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > +    }
> >      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> >  }
> >  
> > @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
> >      virtio_delete_queue(fs->hiprio_vq);
> >      fs->hiprio_vq = NULL;
> >  
> > +    if (fs->notification_vq) {
> > +        virtio_delete_queue(fs->notification_vq);
> > +    }
> > +    fs->notification_vq = NULL;
> > +
> >      for (i = 0; i < fs->conf.num_request_queues; i++) {
> >          virtio_delete_queue(fs->req_vqs[i]);
> >      }
> > @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >  
> > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > +
> >      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> >  }
> >  
> > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +
> > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > +        fs->notify_enabled = true;
> > +        /*
> > +         * If guest first booted with no notification queue support and
> > +         * later rebooted with kernel which supports notification, we
> > +         * can end up here
> > +         */
> > +        if (!fs->notification_vq) {
> > +            vuf_cleanup_vqs(vdev);
> > +            vuf_create_vqs(vdev, true);
> > +        }
> 
> I would simplify things by unconditionally creating the notification vq
> for the device and letting the vhost-user device backend decide whether
> it wants to handle the vq or not.
> If the backend doesn't implement the
> vq then it also won't advertise VIRTIO_FS_F_NOTIFICATION so the guest
> driver won't submit virtqueue buffers.

I think I am did not understand the idea. This code deals with that
both qemu and vhost-user device can deal with notification queue. But
driver can't deal with it. 

So if we first booted into a guest kernel which does not support
notification queue, then we will not have instantiated notification
queue. But later we reboot guest into a newer kernel and now it
has capability to deal with notification queues, so we create it
now.

IIUC, you are suggesting that somehow keep notification queue
instantiated even if guest driver does not support notifications, so
that we will not have to get into the exercise of cleaning up queues
and re-instantiating these?

But I think we can't keep notification queue around if driver does
not support it. Because it changes queue index. queue index 1 will
belong to request queue if notifications are not enabled otherwise
it will belong to notification queue. So If I always instantiate
notification queue, then guest and qemu/virtiofsd will have
different understanding of which queue index belongs to what
queue.

I probably have misunderstood what you are suggesting. If you can
explain a little bit more in detail, that will help.

Thanks
Vivek

> 
> I'm not 100% sure if that approach works. It should be tested with a
> virtiofsd that doesn't implement the notification vq, for example. But I
> think it's worth exploring that because the code will be simpler than
> worrying about whether notifications are enabled or disabled.
> 
> > +        return;
> > +    }
> > +
> > +    fs->notify_enabled = false;
> > +    if (!fs->notification_vq) {
> > +        return;
> > +    }
> > +    /*
> > +     * Driver does not support notification queue. Reconfigure queues
> > +     * and do not create notification queue.
> > +     */
> > +    vuf_cleanup_vqs(vdev);
> > +
> > +    /* Create queues again */
> > +    vuf_create_vqs(vdev, false);
> > +}
> > +
> >  static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
> >                                              bool mask)
> >  {
> > @@ -262,7 +315,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >      virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
> >                  sizeof(struct virtio_fs_config));
> >  
> > -    vuf_create_vqs(vdev);
> > +    vuf_create_vqs(vdev, true);
> >      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
> >                           VHOST_BACKEND_TYPE_USER, 0, errp);
> >      if (ret < 0) {
> > @@ -327,6 +380,7 @@ static void vuf_class_init(ObjectClass *klass, void *data)
> >      vdc->realize = vuf_device_realize;
> >      vdc->unrealize = vuf_device_unrealize;
> >      vdc->get_features = vuf_get_features;
> > +    vdc->set_features = vuf_set_features;
> >      vdc->get_config = vuf_get_config;
> >      vdc->set_status = vuf_set_status;
> >      vdc->guest_notifier_mask = vuf_guest_notifier_mask;
> > diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> > index 0d62834c25..95dc0dd402 100644
> > --- a/include/hw/virtio/vhost-user-fs.h
> > +++ b/include/hw/virtio/vhost-user-fs.h
> > @@ -39,7 +39,9 @@ struct VHostUserFS {
> >      VhostUserState vhost_user;
> >      VirtQueue **req_vqs;
> >      VirtQueue *hiprio_vq;
> > +    VirtQueue *notification_vq;
> >      int32_t bootindex;
> > +    bool notify_enabled;
> >  
> >      /*< public >*/
> >  };
> > diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> > index 492e002181..4942d080da 100644
> > --- a/tools/virtiofsd/fuse_i.h
> > +++ b/tools/virtiofsd/fuse_i.h
> > @@ -73,6 +73,7 @@ struct fuse_session {
> >      int   vu_socketfd;
> >      struct fv_VuDev *virtio_dev;
> >      int thread_pool_size;
> > +    bool notify_enabled;
> >  };
> >  
> >  struct fuse_chan {
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index baead08b28..f5b87a508a 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -14,6 +14,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/iov.h"
> >  #include "qapi/error.h"
> > +#include "standard-headers/linux/virtio_fs.h"
> >  #include "fuse_i.h"
> >  #include "standard-headers/linux/fuse.h"
> >  #include "fuse_misc.h"
> > @@ -85,12 +86,25 @@ struct fv_VuDev {
> >  /* Callback from libvhost-user */
> >  static uint64_t fv_get_features(VuDev *dev)
> >  {
> > -    return 1ULL << VIRTIO_F_VERSION_1;
> > +    uint64_t features;
> > +
> > +    features = 1ull << VIRTIO_F_VERSION_1 |
> > +               1ull << VIRTIO_FS_F_NOTIFICATION;
> > +
> > +    return features;
> >  }
> >  
> >  /* Callback from libvhost-user */
> >  static void fv_set_features(VuDev *dev, uint64_t features)
> >  {
> > +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> > +    struct fuse_session *se = vud->se;
> > +
> > +    if ((1ull << VIRTIO_FS_F_NOTIFICATION) & features) {
> > +        se->notify_enabled = true;
> > +    } else {
> > +        se->notify_enabled = false;
> > +    }
> >  }
> >  
> >  /*
> > @@ -719,22 +733,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
> >  {
> >      int ret;
> >      struct fv_QueueInfo *ourqi;
> > +    struct fuse_session *se = vud->se;
> >  
> >      assert(qidx < vud->nqueues);
> >      ourqi = vud->qi[qidx];
> >  
> > -    /* Kill the thread */
> > -    if (eventfd_write(ourqi->kill_fd, 1)) {
> > -        fuse_log(FUSE_LOG_ERR, "Eventfd_write for queue %d: %s\n",
> > -                 qidx, strerror(errno));
> > -    }
> > -    ret = pthread_join(ourqi->thread, NULL);
> > -    if (ret) {
> > -        fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err %d\n",
> > -                 __func__, qidx, ret);
> > +    /* qidx == 1 is the notification queue if notifications are enabled */
> > +    if (!se->notify_enabled || qidx != 1) {
> > +        /* Kill the thread */
> > +        if (eventfd_write(ourqi->kill_fd, 1)) {
> > +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> > +        }
> > +        ret = pthread_join(ourqi->thread, NULL);
> > +        if (ret) {
> > +            fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err"
> > +                     " %d\n", __func__, qidx, ret);
> > +        }
> > +        close(ourqi->kill_fd);
> >      }
> >      pthread_mutex_destroy(&ourqi->vq_lock);
> > -    close(ourqi->kill_fd);
> >      ourqi->kick_fd = -1;
> >      g_free(vud->qi[qidx]);
> >      vud->qi[qidx] = NULL;
> > @@ -757,6 +774,9 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >  {
> >      struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> >      struct fv_QueueInfo *ourqi;
> > +    int valid_queues = 2; /* One hiprio queue and one request queue */
> > +    bool notification_q = false;
> > +    struct fuse_session *se = vud->se;
> >  
> >      fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
> >               started);
> > @@ -768,10 +788,19 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >       * well-behaved client in mind and may not protect against all types of
> >       * races yet.
> >       */
> > -    if (qidx > 1) {
> > -        fuse_log(FUSE_LOG_ERR,
> > -                 "%s: multiple request queues not yet implemented, please only "
> > -                 "configure 1 request queue\n",
> > +    if (se->notify_enabled) {
> > +        valid_queues++;
> > +        /*
> > +         * If notification queue is enabled, then qidx 1 is notificaiton queue.
> 
> s/notificaiton/notification/
> 
> > +         */
> > +        if (qidx == 1) {
> > +            notification_q = true;
> > +        }
> > +    }
> > +
> > +    if (qidx >= valid_queues) {
> > +        fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
> > +                 "implemented, please only configure 1 request queue\n",
> >                   __func__);
> >          exit(EXIT_FAILURE);
> >      }
> > @@ -793,11 +822,18 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >              assert(vud->qi[qidx]->kick_fd == -1);
> >          }
> >          ourqi = vud->qi[qidx];
> > +        pthread_mutex_init(&ourqi->vq_lock, NULL);
> > +        /*
> > +         * For notification queue, we don't have to start a thread yet.
> > +         */
> > +        if (notification_q) {
> > +            return;
> > +        }
> > +
> >          ourqi->kick_fd = dev->vq[qidx].kick_fd;
> >  
> >          ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
> >          assert(ourqi->kill_fd != -1);
> > -        pthread_mutex_init(&ourqi->vq_lock, NULL);
> >  
> >          if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
> >              fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
> > @@ -1048,7 +1084,7 @@ int virtio_session_mount(struct fuse_session *se)
> >      se->vu_socketfd = data_sock;
> >      se->virtio_dev->se = se;
> >      pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
> > -    if (!vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
> > +    if (!vu_init(&se->virtio_dev->dev, 3, se->vu_socketfd, fv_panic, NULL,
> 
> The guest driver can invoke fv_queue_set_started() with qidx=2 even when
> VIRTIO_FS_F_NOTIFICATION is off. Luckily the following check protects
> fv_queue_set_started():
> 
>   if (qidx >= valid_queues) {
>       fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
>                "implemented, please only configure 1 request queue\n",
>                __func__);
>       exit(EXIT_FAILURE);
>   }
> 
> However, the error message suggests this is related to multiqueue. In
> fact, we'll need to keep this check even once multiqueue has been
> implemented. Maybe the error message should be tweaked or at least a
> comment needs to be added to the code so this check isn't accidentally
> removed once multiqueue is implemented.



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 09/13] virtiofsd: Specify size of notification buffer using config space
  2021-10-04 14:33     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-04 21:10       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-04 21:10 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Mon, Oct 04, 2021 at 03:33:47PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:33AM -0400, Vivek Goyal wrote:
> > Daemon specifies size of notification buffer needed and that should be
> > done using config space.
> > 
> > Only ->notify_buf_size value of config space comes from daemon. Rest of
> > it is filled by qemu device emulation code.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  hw/virtio/vhost-user-fs.c                  | 27 +++++++++++++++++++
> >  include/hw/virtio/vhost-user-fs.h          |  2 ++
> >  include/standard-headers/linux/virtio_fs.h |  2 ++
> >  tools/virtiofsd/fuse_virtio.c              | 31 ++++++++++++++++++++++
> >  4 files changed, 62 insertions(+)
> > 
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 6bafcf0243..68a94708b4 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -36,15 +36,41 @@ static const int user_feature_bits[] = {
> >      VHOST_INVALID_FEATURE_BIT
> >  };
> >  
> > +static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
> > +{
> > +    return 0;
> > +}
> > +
> > +const VhostDevConfigOps fs_ops = {
> > +    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
> > +};
> > +
> >  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >      struct virtio_fs_config fscfg = {};
> > +    Error *local_err = NULL;
> > +    int ret;
> > +
> > +    /*
> > +     * As of now we only get notification buffer size from device. And that's
> > +     * needed only if notification queue is enabled.
> > +     */
> > +    if (fs->notify_enabled) {
> > +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> > +                                   sizeof(struct virtio_fs_config),
> > +                                   &local_err);
> > +        if (ret) {
> > +            error_report_err(local_err);
> > +            return;
> > +        }
> > +    }
> >  
> >      memcpy((char *)fscfg.tag, fs->conf.tag,
> >             MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
> >  
> >      virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
> > +    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
> >  
> >      memcpy(config, &fscfg, sizeof(fscfg));
> >  }
> > @@ -316,6 +342,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >                  sizeof(struct virtio_fs_config));
> >  
> >      vuf_create_vqs(vdev, true);
> > +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
> >      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
> >                           VHOST_BACKEND_TYPE_USER, 0, errp);
> >      if (ret < 0) {
> > diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> > index 95dc0dd402..3b114ee260 100644
> > --- a/include/hw/virtio/vhost-user-fs.h
> > +++ b/include/hw/virtio/vhost-user-fs.h
> > @@ -14,6 +14,7 @@
> >  #ifndef _QEMU_VHOST_USER_FS_H
> >  #define _QEMU_VHOST_USER_FS_H
> >  
> > +#include "standard-headers/linux/virtio_fs.h"
> >  #include "hw/virtio/virtio.h"
> >  #include "hw/virtio/vhost.h"
> >  #include "hw/virtio/vhost-user.h"
> > @@ -37,6 +38,7 @@ struct VHostUserFS {
> >      struct vhost_virtqueue *vhost_vqs;
> >      struct vhost_dev vhost_dev;
> >      VhostUserState vhost_user;
> > +    struct virtio_fs_config fscfg;
> >      VirtQueue **req_vqs;
> >      VirtQueue *hiprio_vq;
> >      VirtQueue *notification_vq;
> > diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
> > index b7f015186e..867d18acf6 100644
> > --- a/include/standard-headers/linux/virtio_fs.h
> > +++ b/include/standard-headers/linux/virtio_fs.h
> > @@ -17,6 +17,8 @@ struct virtio_fs_config {
> >  
> >  	/* Number of request queues */
> >  	uint32_t num_request_queues;
> > +	/* Size of notification buffer */
> > +	uint32_t notify_buf_size;
> >  } QEMU_PACKED;
> >  
> >  /* For the id field in virtio_pci_shm_cap */
> 
> Please put all the include/standard-headers/linux/ changes into a single
> commit that imports these changes from linux.git. Changes to this header
> shouldn't be hand-written, use scripts/update-linux-headers.sh instead.

Will do. These changes are not in kernel yet. So will use
update-linux-headers.sh when changes are upstreamed. But agreed,
that this change should be in separate patch even for review
purpose (before it is merged in kernel).

> 
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index f5b87a508a..3b720c5d4a 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -856,6 +856,35 @@ static bool fv_queue_order(VuDev *dev, int qidx)
> >      return false;
> >  }
> >  
> > +static uint64_t fv_get_protocol_features(VuDev *dev)
> > +{
> > +    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> > +}
> > +
> > +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> > +{
> > +    struct virtio_fs_config fscfg = {};
> > +    unsigned notify_size, roundto = 64;
> > +    union fuse_notify_union {
> > +        struct fuse_notify_poll_wakeup_out  wakeup_out;
> > +        struct fuse_notify_inval_inode_out  inode_out;
> > +        struct fuse_notify_inval_entry_out  entry_out;
> > +        struct fuse_notify_delete_out       delete_out;
> > +        struct fuse_notify_store_out        store_out;
> > +        struct fuse_notify_retrieve_out     retrieve_out;
> > +    };
> > +
> > +    notify_size = sizeof(struct fuse_out_header) +
> > +              sizeof(union fuse_notify_union);
> > +    notify_size = ((notify_size + roundto) / roundto) * roundto;
> 
> Why is the size rounded to 64 bytes?

Hmm.., I really can't remember why did I do that. Maybe I thought it
is just nice to round it to 64 bytes. I can get rid of this rounding
if it s not making sense.

Vivek




^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 09/13] virtiofsd: Specify size of notification buffer using config space
@ 2021-10-04 21:10       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-04 21:10 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Mon, Oct 04, 2021 at 03:33:47PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:33AM -0400, Vivek Goyal wrote:
> > Daemon specifies size of notification buffer needed and that should be
> > done using config space.
> > 
> > Only ->notify_buf_size value of config space comes from daemon. Rest of
> > it is filled by qemu device emulation code.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  hw/virtio/vhost-user-fs.c                  | 27 +++++++++++++++++++
> >  include/hw/virtio/vhost-user-fs.h          |  2 ++
> >  include/standard-headers/linux/virtio_fs.h |  2 ++
> >  tools/virtiofsd/fuse_virtio.c              | 31 ++++++++++++++++++++++
> >  4 files changed, 62 insertions(+)
> > 
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 6bafcf0243..68a94708b4 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -36,15 +36,41 @@ static const int user_feature_bits[] = {
> >      VHOST_INVALID_FEATURE_BIT
> >  };
> >  
> > +static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
> > +{
> > +    return 0;
> > +}
> > +
> > +const VhostDevConfigOps fs_ops = {
> > +    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
> > +};
> > +
> >  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >      struct virtio_fs_config fscfg = {};
> > +    Error *local_err = NULL;
> > +    int ret;
> > +
> > +    /*
> > +     * As of now we only get notification buffer size from device. And that's
> > +     * needed only if notification queue is enabled.
> > +     */
> > +    if (fs->notify_enabled) {
> > +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> > +                                   sizeof(struct virtio_fs_config),
> > +                                   &local_err);
> > +        if (ret) {
> > +            error_report_err(local_err);
> > +            return;
> > +        }
> > +    }
> >  
> >      memcpy((char *)fscfg.tag, fs->conf.tag,
> >             MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
> >  
> >      virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
> > +    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
> >  
> >      memcpy(config, &fscfg, sizeof(fscfg));
> >  }
> > @@ -316,6 +342,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >                  sizeof(struct virtio_fs_config));
> >  
> >      vuf_create_vqs(vdev, true);
> > +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
> >      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
> >                           VHOST_BACKEND_TYPE_USER, 0, errp);
> >      if (ret < 0) {
> > diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> > index 95dc0dd402..3b114ee260 100644
> > --- a/include/hw/virtio/vhost-user-fs.h
> > +++ b/include/hw/virtio/vhost-user-fs.h
> > @@ -14,6 +14,7 @@
> >  #ifndef _QEMU_VHOST_USER_FS_H
> >  #define _QEMU_VHOST_USER_FS_H
> >  
> > +#include "standard-headers/linux/virtio_fs.h"
> >  #include "hw/virtio/virtio.h"
> >  #include "hw/virtio/vhost.h"
> >  #include "hw/virtio/vhost-user.h"
> > @@ -37,6 +38,7 @@ struct VHostUserFS {
> >      struct vhost_virtqueue *vhost_vqs;
> >      struct vhost_dev vhost_dev;
> >      VhostUserState vhost_user;
> > +    struct virtio_fs_config fscfg;
> >      VirtQueue **req_vqs;
> >      VirtQueue *hiprio_vq;
> >      VirtQueue *notification_vq;
> > diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
> > index b7f015186e..867d18acf6 100644
> > --- a/include/standard-headers/linux/virtio_fs.h
> > +++ b/include/standard-headers/linux/virtio_fs.h
> > @@ -17,6 +17,8 @@ struct virtio_fs_config {
> >  
> >  	/* Number of request queues */
> >  	uint32_t num_request_queues;
> > +	/* Size of notification buffer */
> > +	uint32_t notify_buf_size;
> >  } QEMU_PACKED;
> >  
> >  /* For the id field in virtio_pci_shm_cap */
> 
> Please put all the include/standard-headers/linux/ changes into a single
> commit that imports these changes from linux.git. Changes to this header
> shouldn't be hand-written, use scripts/update-linux-headers.sh instead.

Will do. These changes are not in kernel yet. So will use
update-linux-headers.sh when changes are upstreamed. But agreed,
that this change should be in separate patch even for review
purpose (before it is merged in kernel).

> 
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index f5b87a508a..3b720c5d4a 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -856,6 +856,35 @@ static bool fv_queue_order(VuDev *dev, int qidx)
> >      return false;
> >  }
> >  
> > +static uint64_t fv_get_protocol_features(VuDev *dev)
> > +{
> > +    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> > +}
> > +
> > +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> > +{
> > +    struct virtio_fs_config fscfg = {};
> > +    unsigned notify_size, roundto = 64;
> > +    union fuse_notify_union {
> > +        struct fuse_notify_poll_wakeup_out  wakeup_out;
> > +        struct fuse_notify_inval_inode_out  inode_out;
> > +        struct fuse_notify_inval_entry_out  entry_out;
> > +        struct fuse_notify_delete_out       delete_out;
> > +        struct fuse_notify_store_out        store_out;
> > +        struct fuse_notify_retrieve_out     retrieve_out;
> > +    };
> > +
> > +    notify_size = sizeof(struct fuse_out_header) +
> > +              sizeof(union fuse_notify_union);
> > +    notify_size = ((notify_size + roundto) / roundto) * roundto;
> 
> Why is the size rounded to 64 bytes?

Hmm.., I really can't remember why did I do that. Maybe I thought it
is just nice to round it to 64 bytes. I can get rid of this rounding
if it s not making sense.

Vivek



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
  2021-10-04 19:58       ` [Virtio-fs] " Vivek Goyal
@ 2021-10-05  8:09         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05  8:09 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 3395 bytes --]

On Mon, Oct 04, 2021 at 03:58:09PM -0400, Vivek Goyal wrote:
> On Mon, Oct 04, 2021 at 02:54:17PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:30AM -0400, Vivek Goyal wrote:
> > > Add helpers to create/cleanup virtuqueues and use those helpers. I will
> > 
> > s/virtuqueues/virtqueues/
> > 
> > > need to reconfigure queues in later patches and using helpers will allow
> > > reusing the code.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > ---
> > >  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
> > >  1 file changed, 52 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > index c595957983..d1efbc5b18 100644
> > > --- a/hw/virtio/vhost-user-fs.c
> > > +++ b/hw/virtio/vhost-user-fs.c
> > > @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> > >      }
> > >  }
> > >  
> > > +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > > +{
> > > +    /*
> > > +     * Not normally called; it's the daemon that handles the queue;
> > > +     * however virtio's cleanup path can call this.
> > > +     */
> > > +}
> > > +
> > > +static void vuf_create_vqs(VirtIODevice *vdev)
> > > +{
> > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > +    unsigned int i;
> > > +
> > > +    /* Hiprio queue */
> > > +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > > +                                     vuf_handle_output);
> > > +
> > > +    /* Request queues */
> > > +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > > +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > > +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> > > +                                          vuf_handle_output);
> > > +    }
> > > +
> > > +    /* 1 high prio queue, plus the number configured */
> > > +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > 
> > These two lines prepare for vhost_dev_init(), so moving them here is
> > debatable. If a caller is going to use this function again in the future
> > then they need to be sure to also call vhost_dev_init(). For now it
> > looks safe, so I guess it's okay.
> 
> Hmm..., I do call this function later from vuf_set_features() and
> reconfigure the queues. I see that I don't call vhost_dev_init()
> in that path. I am not even sure if I should be calling
> vhost_dev_init() from inside vuf_set_features().
> 
> So core reuirement is that at the time of first creating device
> I have no idea if driver supports notification queue or not. So
> I do create device with notification queue. But later if driver
> (and possibly vhost device) does not support notifiation queue,
> then we need to reconfigure queues. What's the correct way to
> do that?

Ah, I see. The simplest approach is to always allocate the maximum
number of virtqueues. QEMU's vhost-user-fs device shouldn't need to
worry about which virtqueues are actually in use. Let virtiofsd (the
vhost-user backend) worry about that.

I posted ideas about how to do that in a reply to another patch in this
series. I can't guarantee it will work, but I think it's worth
exploring.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
@ 2021-10-05  8:09         ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05  8:09 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 3395 bytes --]

On Mon, Oct 04, 2021 at 03:58:09PM -0400, Vivek Goyal wrote:
> On Mon, Oct 04, 2021 at 02:54:17PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:30AM -0400, Vivek Goyal wrote:
> > > Add helpers to create/cleanup virtuqueues and use those helpers. I will
> > 
> > s/virtuqueues/virtqueues/
> > 
> > > need to reconfigure queues in later patches and using helpers will allow
> > > reusing the code.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > ---
> > >  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
> > >  1 file changed, 52 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > index c595957983..d1efbc5b18 100644
> > > --- a/hw/virtio/vhost-user-fs.c
> > > +++ b/hw/virtio/vhost-user-fs.c
> > > @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> > >      }
> > >  }
> > >  
> > > +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > > +{
> > > +    /*
> > > +     * Not normally called; it's the daemon that handles the queue;
> > > +     * however virtio's cleanup path can call this.
> > > +     */
> > > +}
> > > +
> > > +static void vuf_create_vqs(VirtIODevice *vdev)
> > > +{
> > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > +    unsigned int i;
> > > +
> > > +    /* Hiprio queue */
> > > +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > > +                                     vuf_handle_output);
> > > +
> > > +    /* Request queues */
> > > +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > > +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > > +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> > > +                                          vuf_handle_output);
> > > +    }
> > > +
> > > +    /* 1 high prio queue, plus the number configured */
> > > +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > 
> > These two lines prepare for vhost_dev_init(), so moving them here is
> > debatable. If a caller is going to use this function again in the future
> > then they need to be sure to also call vhost_dev_init(). For now it
> > looks safe, so I guess it's okay.
> 
> Hmm..., I do call this function later from vuf_set_features() and
> reconfigure the queues. I see that I don't call vhost_dev_init()
> in that path. I am not even sure if I should be calling
> vhost_dev_init() from inside vuf_set_features().
> 
> So core reuirement is that at the time of first creating device
> I have no idea if driver supports notification queue or not. So
> I do create device with notification queue. But later if driver
> (and possibly vhost device) does not support notifiation queue,
> then we need to reconfigure queues. What's the correct way to
> do that?

Ah, I see. The simplest approach is to always allocate the maximum
number of virtqueues. QEMU's vhost-user-fs device shouldn't need to
worry about which virtqueues are actually in use. Let virtiofsd (the
vhost-user backend) worry about that.

I posted ideas about how to do that in a reply to another patch in this
series. I can't guarantee it will work, but I think it's worth
exploring.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 08/13] virtiofsd: Create a notification queue
  2021-10-04 21:01       ` [Virtio-fs] " Vivek Goyal
@ 2021-10-05  8:14         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05  8:14 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 7318 bytes --]

On Mon, Oct 04, 2021 at 05:01:07PM -0400, Vivek Goyal wrote:
> On Mon, Oct 04, 2021 at 03:30:38PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote:
> > > Add a notification queue which will be used to send async notifications
> > > for file lock availability.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > > ---
> > >  hw/virtio/vhost-user-fs-pci.c     |  4 +-
> > >  hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
> > >  include/hw/virtio/vhost-user-fs.h |  2 +
> > >  tools/virtiofsd/fuse_i.h          |  1 +
> > >  tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
> > >  5 files changed, 116 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
> > > index 2ed8492b3f..cdb9471088 100644
> > > --- a/hw/virtio/vhost-user-fs-pci.c
> > > +++ b/hw/virtio/vhost-user-fs-pci.c
> > > @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > >      DeviceState *vdev = DEVICE(&dev->vdev);
> > >  
> > >      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> > > -        /* Also reserve config change and hiprio queue vectors */
> > > -        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
> > > +        /* Also reserve config change, hiprio and notification queue vectors */
> > > +        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
> > >      }
> > >  
> > >      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > index d1efbc5b18..6bafcf0243 100644
> > > --- a/hw/virtio/vhost-user-fs.c
> > > +++ b/hw/virtio/vhost-user-fs.c
> > > @@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
> > >      VIRTIO_F_NOTIFY_ON_EMPTY,
> > >      VIRTIO_F_RING_PACKED,
> > >      VIRTIO_F_IOMMU_PLATFORM,
> > > +    VIRTIO_FS_F_NOTIFICATION,
> > >  
> > >      VHOST_INVALID_FEATURE_BIT
> > >  };
> > > @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > >       */
> > >  }
> > >  
> > > -static void vuf_create_vqs(VirtIODevice *vdev)
> > > +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
> > >  {
> > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > >      unsigned int i;
> > > @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> > >      /* Hiprio queue */
> > >      fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > >                                       vuf_handle_output);
> > > +    /*
> > > +     * Notification queue. Feature negotiation happens later. So at this
> > > +     * point of time we don't know if driver will use notification queue
> > > +     * or not.
> > > +     */
> > > +    if (notification_vq) {
> > > +        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > > +                                               vuf_handle_output);
> > > +    }
> > >  
> > >      /* Request queues */
> > >      fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > > @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> > >                                            vuf_handle_output);
> > >      }
> > >  
> > > -    /* 1 high prio queue, plus the number configured */
> > > -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > +    /* 1 high prio queue, 1 notification queue plus the number configured */
> > > +    if (notification_vq) {
> > > +        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> > > +    } else {
> > > +        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > +    }
> > >      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > >  }
> > >  
> > > @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
> > >      virtio_delete_queue(fs->hiprio_vq);
> > >      fs->hiprio_vq = NULL;
> > >  
> > > +    if (fs->notification_vq) {
> > > +        virtio_delete_queue(fs->notification_vq);
> > > +    }
> > > +    fs->notification_vq = NULL;
> > > +
> > >      for (i = 0; i < fs->conf.num_request_queues; i++) {
> > >          virtio_delete_queue(fs->req_vqs[i]);
> > >      }
> > > @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
> > >  {
> > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > >  
> > > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > > +
> > >      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> > >  }
> > >  
> > > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > > +{
> > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > +
> > > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > > +        fs->notify_enabled = true;
> > > +        /*
> > > +         * If guest first booted with no notification queue support and
> > > +         * later rebooted with kernel which supports notification, we
> > > +         * can end up here
> > > +         */
> > > +        if (!fs->notification_vq) {
> > > +            vuf_cleanup_vqs(vdev);
> > > +            vuf_create_vqs(vdev, true);
> > > +        }
> > 
> > I would simplify things by unconditionally creating the notification vq
> > for the device and letting the vhost-user device backend decide whether
> > it wants to handle the vq or not.
> > If the backend doesn't implement the
> > vq then it also won't advertise VIRTIO_FS_F_NOTIFICATION so the guest
> > driver won't submit virtqueue buffers.
> 
> I think I am did not understand the idea. This code deals with that
> both qemu and vhost-user device can deal with notification queue. But
> driver can't deal with it. 
> 
> So if we first booted into a guest kernel which does not support
> notification queue, then we will not have instantiated notification
> queue. But later we reboot guest into a newer kernel and now it
> has capability to deal with notification queues, so we create it
> now.
> 
> IIUC, you are suggesting that somehow keep notification queue
> instantiated even if guest driver does not support notifications, so
> that we will not have to get into the exercise of cleaning up queues
> and re-instantiating these?

Yes.

> But I think we can't keep notification queue around if driver does
> not support it. Because it changes queue index. queue index 1 will
> belong to request queue if notifications are not enabled otherwise
> it will belong to notification queue. So If I always instantiate
> notification queue, then guest and qemu/virtiofsd will have
> different understanding of which queue index belongs to what
> queue.

The meaning of the virtqueue doesn't matter. That only matters to
virtiofsd when processing virtqueues. Since QEMU's -device
vhost-user-fs doesn't process virtqueues there's no difference between
hipri, request, and notification virtqueues.

I'm not 100% sure that the vhost-user code is set up to work smoothly in
this fashion, but I think it should be possible to make this work and
the end result will be simpler.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 08/13] virtiofsd: Create a notification queue
@ 2021-10-05  8:14         ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05  8:14 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 7318 bytes --]

On Mon, Oct 04, 2021 at 05:01:07PM -0400, Vivek Goyal wrote:
> On Mon, Oct 04, 2021 at 03:30:38PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote:
> > > Add a notification queue which will be used to send async notifications
> > > for file lock availability.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > > ---
> > >  hw/virtio/vhost-user-fs-pci.c     |  4 +-
> > >  hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
> > >  include/hw/virtio/vhost-user-fs.h |  2 +
> > >  tools/virtiofsd/fuse_i.h          |  1 +
> > >  tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
> > >  5 files changed, 116 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
> > > index 2ed8492b3f..cdb9471088 100644
> > > --- a/hw/virtio/vhost-user-fs-pci.c
> > > +++ b/hw/virtio/vhost-user-fs-pci.c
> > > @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > >      DeviceState *vdev = DEVICE(&dev->vdev);
> > >  
> > >      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> > > -        /* Also reserve config change and hiprio queue vectors */
> > > -        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
> > > +        /* Also reserve config change, hiprio and notification queue vectors */
> > > +        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
> > >      }
> > >  
> > >      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > index d1efbc5b18..6bafcf0243 100644
> > > --- a/hw/virtio/vhost-user-fs.c
> > > +++ b/hw/virtio/vhost-user-fs.c
> > > @@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
> > >      VIRTIO_F_NOTIFY_ON_EMPTY,
> > >      VIRTIO_F_RING_PACKED,
> > >      VIRTIO_F_IOMMU_PLATFORM,
> > > +    VIRTIO_FS_F_NOTIFICATION,
> > >  
> > >      VHOST_INVALID_FEATURE_BIT
> > >  };
> > > @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > >       */
> > >  }
> > >  
> > > -static void vuf_create_vqs(VirtIODevice *vdev)
> > > +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
> > >  {
> > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > >      unsigned int i;
> > > @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> > >      /* Hiprio queue */
> > >      fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > >                                       vuf_handle_output);
> > > +    /*
> > > +     * Notification queue. Feature negotiation happens later. So at this
> > > +     * point of time we don't know if driver will use notification queue
> > > +     * or not.
> > > +     */
> > > +    if (notification_vq) {
> > > +        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > > +                                               vuf_handle_output);
> > > +    }
> > >  
> > >      /* Request queues */
> > >      fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > > @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> > >                                            vuf_handle_output);
> > >      }
> > >  
> > > -    /* 1 high prio queue, plus the number configured */
> > > -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > +    /* 1 high prio queue, 1 notification queue plus the number configured */
> > > +    if (notification_vq) {
> > > +        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> > > +    } else {
> > > +        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > +    }
> > >      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > >  }
> > >  
> > > @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
> > >      virtio_delete_queue(fs->hiprio_vq);
> > >      fs->hiprio_vq = NULL;
> > >  
> > > +    if (fs->notification_vq) {
> > > +        virtio_delete_queue(fs->notification_vq);
> > > +    }
> > > +    fs->notification_vq = NULL;
> > > +
> > >      for (i = 0; i < fs->conf.num_request_queues; i++) {
> > >          virtio_delete_queue(fs->req_vqs[i]);
> > >      }
> > > @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
> > >  {
> > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > >  
> > > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > > +
> > >      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> > >  }
> > >  
> > > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > > +{
> > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > +
> > > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > > +        fs->notify_enabled = true;
> > > +        /*
> > > +         * If guest first booted with no notification queue support and
> > > +         * later rebooted with kernel which supports notification, we
> > > +         * can end up here
> > > +         */
> > > +        if (!fs->notification_vq) {
> > > +            vuf_cleanup_vqs(vdev);
> > > +            vuf_create_vqs(vdev, true);
> > > +        }
> > 
> > I would simplify things by unconditionally creating the notification vq
> > for the device and letting the vhost-user device backend decide whether
> > it wants to handle the vq or not.
> > If the backend doesn't implement the
> > vq then it also won't advertise VIRTIO_FS_F_NOTIFICATION so the guest
> > driver won't submit virtqueue buffers.
> 
> I think I am did not understand the idea. This code deals with that
> both qemu and vhost-user device can deal with notification queue. But
> driver can't deal with it. 
> 
> So if we first booted into a guest kernel which does not support
> notification queue, then we will not have instantiated notification
> queue. But later we reboot guest into a newer kernel and now it
> has capability to deal with notification queues, so we create it
> now.
> 
> IIUC, you are suggesting that somehow keep notification queue
> instantiated even if guest driver does not support notifications, so
> that we will not have to get into the exercise of cleaning up queues
> and re-instantiating these?

Yes.

> But I think we can't keep notification queue around if driver does
> not support it. Because it changes queue index. queue index 1 will
> belong to request queue if notifications are not enabled otherwise
> it will belong to notification queue. So If I always instantiate
> notification queue, then guest and qemu/virtiofsd will have
> different understanding of which queue index belongs to what
> queue.

The meaning of the virtqueue doesn't matter. That only matters to
virtiofsd when processing virtqueues. Since QEMU's -device
vhost-user-fs doesn't process virtqueues there's no difference between
hipri, request, and notification virtqueues.

I'm not 100% sure that the vhost-user code is set up to work smoothly in
this fashion, but I think it should be possible to make this work and
the end result will be simpler.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 12/13] virtiofsd: Implement blocking posix locks
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-05 12:22     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05 12:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 12426 bytes --]

On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> -EOPNOTSUPP.
> 
> Change that by accepting these requests and returning a reply
> immediately asking caller to wait. Once lock is available, send a
> notification to the waiter indicating lock is available.
> 
> In response to lock request, we are returning error value as "1", which
> signals to client to queue the lock request internally and later client
> will get a notification which will signal lock is taken (or error). And
> then fuse client should wake up the guest process.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
>  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
>  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
>  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
>  4 files changed, 167 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index e4679c73ab..2e7f4b786d 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>          .unique = req->unique,
>          .error = error,
>      };
> -
> -    if (error <= -1000 || error > 0) {
> +    /* error = 1 has been used to signal client to wait for notificaiton */
> +    if (error <= -1000 || error > 1) {
>          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
>          out.error = -ERANGE;
>      }
> @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
>      return send_reply(req, -err, NULL, 0);
>  }
>  
> +int fuse_reply_wait(fuse_req_t req)
> +{
> +    return send_reply(req, 1, NULL, 0);
> +}
> +
>  void fuse_reply_none(fuse_req_t req)
>  {
>      fuse_free_req(req);
> @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
>      send_reply_ok(req, NULL, 0);
>  }
>  
> +static int send_notify_iov(struct fuse_session *se, int notify_code,
> +                           struct iovec *iov, int count)
> +{
> +    struct fuse_out_header out;
> +    if (!se->got_init) {
> +        return -ENOTCONN;
> +    }
> +    out.unique = 0;
> +    out.error = notify_code;
> +    iov[0].iov_base = &out;
> +    iov[0].iov_len = sizeof(struct fuse_out_header);
> +    return fuse_send_msg(se, NULL, iov, count);
> +}
> +
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> +                  int32_t error)
> +{
> +    struct fuse_notify_lock_out outarg = {0};
> +    struct iovec iov[2];
> +
> +    outarg.unique = unique;
> +    outarg.error = -error;
> +
> +    iov[1].iov_base = &outarg;
> +    iov[1].iov_len = sizeof(outarg);
> +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> +}
> +
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>                                 off_t offset, struct fuse_bufvec *bufv)
>  {
> diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> index c55c0ca2fc..64624b48dc 100644
> --- a/tools/virtiofsd/fuse_lowlevel.h
> +++ b/tools/virtiofsd/fuse_lowlevel.h
> @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
>   */
>  int fuse_reply_err(fuse_req_t req, int err);
>  
> +/**
> + * Ask caller to wait for lock.
> + *
> + * Possible requests:
> + *   setlkw
> + *
> + * If caller sends a blocking lock request (setlkw), then reply to caller
> + * that wait for lock to be available. Once lock is available caller will

I can't parse the first sentence.

s/that wait for lock to be available/that waiting for the lock is
necessary/?

> + * receive a notification with request's unique id. Notification will
> + * carry info whether lock was successfully obtained or not.
> + *
> + * @param req request handle
> + * @return zero for success, -errno for failure to send reply
> + */
> +int fuse_reply_wait(fuse_req_t req);
> +
>  /**
>   * Don't send reply
>   *
> @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>                                 off_t offset, struct fuse_bufvec *bufv);
>  
> +/**
> + * Notify event related to previous lock request
> + *
> + * @param se the session object
> + * @param unique the unique id of the request which requested setlkw
> + * @param error zero for success, -errno for the failure
> + */
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> +                              int32_t error);
> +
>  /*
>   * Utility functions
>   */
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index a87e88e286..bb2d4456fc 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
>      vu_dispatch_unlock(qi->virtio_dev);
>  }
>  
> +/* Returns NULL if queue is empty */
> +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> +{
> +    struct fuse_session *se = qi->virtio_dev->se;
> +    VuDev *dev = &se->virtio_dev->dev;
> +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> +    FVRequest *req;
> +
> +    vu_dispatch_rdlock(qi->virtio_dev);
> +    pthread_mutex_lock(&qi->vq_lock);
> +    /* Pop an element from queue */
> +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> +    pthread_mutex_unlock(&qi->vq_lock);
> +    vu_dispatch_unlock(qi->virtio_dev);
> +    return req;
> +}
> +
>  /*
>   * Called back by ll whenever it wants to send a reply/message back
>   * The 1st element of the iov starts with the fuse_out_header
> @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
>  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>                      struct iovec *iov, int count)
>  {
> -    FVRequest *req = container_of(ch, FVRequest, ch);
> -    struct fv_QueueInfo *qi = ch->qi;
> -    VuVirtqElement *elem = &req->elem;
> +    FVRequest *req;
> +    struct fv_QueueInfo *qi;
> +    VuVirtqElement *elem;
>      int ret = 0;
>  
>      assert(count >= 1);
> @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>  
>      size_t tosend_len = iov_size(iov, count);
>  
> -    /* unique == 0 is notification, which we don't support */
> -    assert(out->unique);
> +    /* unique == 0 is notification */
> +    if (!out->unique) {

Is a check needed in fuse_session_process_buf_int() to reject requests
that the driver submitted to the device with req.unique == 0? If we get
confused about the correct virtqueue to use in virtio_send_msg() then
there could be bugs.

> +        if (!se->notify_enabled) {
> +            return -EOPNOTSUPP;
> +        }
> +        /* If notifications are enabled, queue index 1 is notification queue */
> +        qi = se->virtio_dev->qi[1];
> +        req = vq_pop_notify_elem(qi);

Where is req freed?

> +        if (!req) {
> +            /*
> +             * TODO: Implement some sort of ring buffer and queue notifications
> +             * on that and send these later when notification queue has space
> +             * available.
> +             */
> +            return -ENOSPC;

This needs to be addressed before this patch series can be merged. The
notification vq is kicked by the guest driver when buffers are
replenished. The vq handler function can wake up waiting threads using a
condvar.

> +        }
> +        req->reply_sent = false;
> +    } else {
> +        assert(ch);
> +        req = container_of(ch, FVRequest, ch);
> +        qi = ch->qi;
> +    }
> +
> +    elem = &req->elem;
>      assert(!req->reply_sent);
>  
>      /* The 'in' part of the elem is to qemu */
> @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
>          struct fuse_notify_delete_out       delete_out;
>          struct fuse_notify_store_out        store_out;
>          struct fuse_notify_retrieve_out     retrieve_out;
> +        struct fuse_notify_lock_out         lock_out;
>      };
>  
>      notify_size = sizeof(struct fuse_out_header) +
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 6928662e22..277f74762b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2131,13 +2131,35 @@ out:
>      }
>  }
>  
> +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> +                                    int saverr)
> +{
> +    int ret;
> +
> +    do {
> +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> +        /*
> +         * Retry sending notification if notification queue does not have
> +         * free descriptor yet, otherwise break out of loop. Either we
> +         * successfully sent notifiation or some other error occurred.
> +         */
> +        if (ret != -ENOSPC) {
> +            break;
> +        }
> +        usleep(10000);
> +    } while (1);

Please use the notification vq handler to wake up blocked threads
instead of usleep().

> +}
> +
>  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>                       struct flock *lock, int sleep)
>  {
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode;
>      struct lo_inode_plock *plock;
> -    int ret, saverr = 0;
> +    int ret, saverr = 0, ofd;
> +    uint64_t unique;
> +    struct fuse_session *se = req->se;
> +    bool blocking_lock = false;
>  
>      fuse_log(FUSE_LOG_DEBUG,
>               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>          return;
>      }
>  
> -    if (sleep) {
> -        fuse_reply_err(req, EOPNOTSUPP);
> -        return;
> -    }
> -
>      inode = lo_inode(req, ino);
>      if (!inode) {
>          fuse_reply_err(req, EBADF);
> @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>  
>      if (!plock) {
>          saverr = ret;
> +        pthread_mutex_unlock(&inode->plock_mutex);
>          goto out;
>      }
>  
> +    /*
> +     * plock is now released when inode is going away. We already have
> +     * a reference on inode, so it is guaranteed that plock->fd is
> +     * still around even after dropping inode->plock_mutex lock
> +     */
> +    ofd = plock->fd;
> +    pthread_mutex_unlock(&inode->plock_mutex);
> +
> +    /*
> +     * If this lock request can block, request caller to wait for
> +     * notification. Do not access req after this. Once lock is
> +     * available, send a notification instead.
> +     */
> +    if (sleep && lock->l_type != F_UNLCK) {
> +        /*
> +         * If notification queue is not enabled, can't support async
> +         * locks.
> +         */
> +        if (!se->notify_enabled) {
> +            saverr = EOPNOTSUPP;
> +            goto out;
> +        }
> +        blocking_lock = true;
> +        unique = req->unique;
> +        fuse_reply_wait(req);
> +    }
> +
>      /* TODO: Is it alright to modify flock? */
>      lock->l_pid = 0;
> -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> +    if (blocking_lock) {
> +        ret = fcntl(ofd, F_OFD_SETLKW, lock);

SETLKW can be interrupted by signals. Should we loop here when errno ==
EINTR?

> +    } else {
> +        ret = fcntl(ofd, F_OFD_SETLK, lock);
> +    }
>      if (ret == -1) {
>          saverr = errno;
>      }
>  
>  out:
> -    pthread_mutex_unlock(&inode->plock_mutex);
>      lo_inode_put(lo, &inode);
>  
> -    fuse_reply_err(req, saverr);
> +    if (!blocking_lock) {
> +        fuse_reply_err(req, saverr);
> +    } else {
> +        setlk_send_notification(se, unique, saverr);
> +    }
>  }
>  
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
> -- 
> 2.31.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
@ 2021-10-05 12:22     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05 12:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 12426 bytes --]

On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> -EOPNOTSUPP.
> 
> Change that by accepting these requests and returning a reply
> immediately asking caller to wait. Once lock is available, send a
> notification to the waiter indicating lock is available.
> 
> In response to lock request, we are returning error value as "1", which
> signals to client to queue the lock request internally and later client
> will get a notification which will signal lock is taken (or error). And
> then fuse client should wake up the guest process.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
>  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
>  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
>  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
>  4 files changed, 167 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index e4679c73ab..2e7f4b786d 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>          .unique = req->unique,
>          .error = error,
>      };
> -
> -    if (error <= -1000 || error > 0) {
> +    /* error = 1 has been used to signal client to wait for notificaiton */
> +    if (error <= -1000 || error > 1) {
>          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
>          out.error = -ERANGE;
>      }
> @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
>      return send_reply(req, -err, NULL, 0);
>  }
>  
> +int fuse_reply_wait(fuse_req_t req)
> +{
> +    return send_reply(req, 1, NULL, 0);
> +}
> +
>  void fuse_reply_none(fuse_req_t req)
>  {
>      fuse_free_req(req);
> @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
>      send_reply_ok(req, NULL, 0);
>  }
>  
> +static int send_notify_iov(struct fuse_session *se, int notify_code,
> +                           struct iovec *iov, int count)
> +{
> +    struct fuse_out_header out;
> +    if (!se->got_init) {
> +        return -ENOTCONN;
> +    }
> +    out.unique = 0;
> +    out.error = notify_code;
> +    iov[0].iov_base = &out;
> +    iov[0].iov_len = sizeof(struct fuse_out_header);
> +    return fuse_send_msg(se, NULL, iov, count);
> +}
> +
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> +                  int32_t error)
> +{
> +    struct fuse_notify_lock_out outarg = {0};
> +    struct iovec iov[2];
> +
> +    outarg.unique = unique;
> +    outarg.error = -error;
> +
> +    iov[1].iov_base = &outarg;
> +    iov[1].iov_len = sizeof(outarg);
> +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> +}
> +
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>                                 off_t offset, struct fuse_bufvec *bufv)
>  {
> diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> index c55c0ca2fc..64624b48dc 100644
> --- a/tools/virtiofsd/fuse_lowlevel.h
> +++ b/tools/virtiofsd/fuse_lowlevel.h
> @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
>   */
>  int fuse_reply_err(fuse_req_t req, int err);
>  
> +/**
> + * Ask caller to wait for lock.
> + *
> + * Possible requests:
> + *   setlkw
> + *
> + * If caller sends a blocking lock request (setlkw), then reply to caller
> + * that wait for lock to be available. Once lock is available caller will

I can't parse the first sentence.

s/that wait for lock to be available/that waiting for the lock is
necessary/?

> + * receive a notification with request's unique id. Notification will
> + * carry info whether lock was successfully obtained or not.
> + *
> + * @param req request handle
> + * @return zero for success, -errno for failure to send reply
> + */
> +int fuse_reply_wait(fuse_req_t req);
> +
>  /**
>   * Don't send reply
>   *
> @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>                                 off_t offset, struct fuse_bufvec *bufv);
>  
> +/**
> + * Notify event related to previous lock request
> + *
> + * @param se the session object
> + * @param unique the unique id of the request which requested setlkw
> + * @param error zero for success, -errno for the failure
> + */
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> +                              int32_t error);
> +
>  /*
>   * Utility functions
>   */
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index a87e88e286..bb2d4456fc 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
>      vu_dispatch_unlock(qi->virtio_dev);
>  }
>  
> +/* Returns NULL if queue is empty */
> +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> +{
> +    struct fuse_session *se = qi->virtio_dev->se;
> +    VuDev *dev = &se->virtio_dev->dev;
> +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> +    FVRequest *req;
> +
> +    vu_dispatch_rdlock(qi->virtio_dev);
> +    pthread_mutex_lock(&qi->vq_lock);
> +    /* Pop an element from queue */
> +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> +    pthread_mutex_unlock(&qi->vq_lock);
> +    vu_dispatch_unlock(qi->virtio_dev);
> +    return req;
> +}
> +
>  /*
>   * Called back by ll whenever it wants to send a reply/message back
>   * The 1st element of the iov starts with the fuse_out_header
> @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
>  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>                      struct iovec *iov, int count)
>  {
> -    FVRequest *req = container_of(ch, FVRequest, ch);
> -    struct fv_QueueInfo *qi = ch->qi;
> -    VuVirtqElement *elem = &req->elem;
> +    FVRequest *req;
> +    struct fv_QueueInfo *qi;
> +    VuVirtqElement *elem;
>      int ret = 0;
>  
>      assert(count >= 1);
> @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>  
>      size_t tosend_len = iov_size(iov, count);
>  
> -    /* unique == 0 is notification, which we don't support */
> -    assert(out->unique);
> +    /* unique == 0 is notification */
> +    if (!out->unique) {

Is a check needed in fuse_session_process_buf_int() to reject requests
that the driver submitted to the device with req.unique == 0? If we get
confused about the correct virtqueue to use in virtio_send_msg() then
there could be bugs.

> +        if (!se->notify_enabled) {
> +            return -EOPNOTSUPP;
> +        }
> +        /* If notifications are enabled, queue index 1 is notification queue */
> +        qi = se->virtio_dev->qi[1];
> +        req = vq_pop_notify_elem(qi);

Where is req freed?

> +        if (!req) {
> +            /*
> +             * TODO: Implement some sort of ring buffer and queue notifications
> +             * on that and send these later when notification queue has space
> +             * available.
> +             */
> +            return -ENOSPC;

This needs to be addressed before this patch series can be merged. The
notification vq is kicked by the guest driver when buffers are
replenished. The vq handler function can wake up waiting threads using a
condvar.

> +        }
> +        req->reply_sent = false;
> +    } else {
> +        assert(ch);
> +        req = container_of(ch, FVRequest, ch);
> +        qi = ch->qi;
> +    }
> +
> +    elem = &req->elem;
>      assert(!req->reply_sent);
>  
>      /* The 'in' part of the elem is to qemu */
> @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
>          struct fuse_notify_delete_out       delete_out;
>          struct fuse_notify_store_out        store_out;
>          struct fuse_notify_retrieve_out     retrieve_out;
> +        struct fuse_notify_lock_out         lock_out;
>      };
>  
>      notify_size = sizeof(struct fuse_out_header) +
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 6928662e22..277f74762b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2131,13 +2131,35 @@ out:
>      }
>  }
>  
> +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> +                                    int saverr)
> +{
> +    int ret;
> +
> +    do {
> +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> +        /*
> +         * Retry sending notification if notification queue does not have
> +         * free descriptor yet, otherwise break out of loop. Either we
> +         * successfully sent notifiation or some other error occurred.
> +         */
> +        if (ret != -ENOSPC) {
> +            break;
> +        }
> +        usleep(10000);
> +    } while (1);

Please use the notification vq handler to wake up blocked threads
instead of usleep().

> +}
> +
>  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>                       struct flock *lock, int sleep)
>  {
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode;
>      struct lo_inode_plock *plock;
> -    int ret, saverr = 0;
> +    int ret, saverr = 0, ofd;
> +    uint64_t unique;
> +    struct fuse_session *se = req->se;
> +    bool blocking_lock = false;
>  
>      fuse_log(FUSE_LOG_DEBUG,
>               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>          return;
>      }
>  
> -    if (sleep) {
> -        fuse_reply_err(req, EOPNOTSUPP);
> -        return;
> -    }
> -
>      inode = lo_inode(req, ino);
>      if (!inode) {
>          fuse_reply_err(req, EBADF);
> @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>  
>      if (!plock) {
>          saverr = ret;
> +        pthread_mutex_unlock(&inode->plock_mutex);
>          goto out;
>      }
>  
> +    /*
> +     * plock is now released when inode is going away. We already have
> +     * a reference on inode, so it is guaranteed that plock->fd is
> +     * still around even after dropping inode->plock_mutex lock
> +     */
> +    ofd = plock->fd;
> +    pthread_mutex_unlock(&inode->plock_mutex);
> +
> +    /*
> +     * If this lock request can block, request caller to wait for
> +     * notification. Do not access req after this. Once lock is
> +     * available, send a notification instead.
> +     */
> +    if (sleep && lock->l_type != F_UNLCK) {
> +        /*
> +         * If notification queue is not enabled, can't support async
> +         * locks.
> +         */
> +        if (!se->notify_enabled) {
> +            saverr = EOPNOTSUPP;
> +            goto out;
> +        }
> +        blocking_lock = true;
> +        unique = req->unique;
> +        fuse_reply_wait(req);
> +    }
> +
>      /* TODO: Is it alright to modify flock? */
>      lock->l_pid = 0;
> -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> +    if (blocking_lock) {
> +        ret = fcntl(ofd, F_OFD_SETLKW, lock);

SETLKW can be interrupted by signals. Should we loop here when errno ==
EINTR?

> +    } else {
> +        ret = fcntl(ofd, F_OFD_SETLK, lock);
> +    }
>      if (ret == -1) {
>          saverr = errno;
>      }
>  
>  out:
> -    pthread_mutex_unlock(&inode->plock_mutex);
>      lo_inode_put(lo, &inode);
>  
> -    fuse_reply_err(req, saverr);
> +    if (!blocking_lock) {
> +        fuse_reply_err(req, saverr);
> +    } else {
> +        setlk_send_notification(se, unique, saverr);
> +    }
>  }
>  
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
> -- 
> 2.31.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-05 12:22     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05 12:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

On Thu, Sep 30, 2021 at 11:30:37AM -0400, Vivek Goyal wrote:
> g_usleep() calls nanosleep() and that now seems to call clock_nanosleep()
> syscall. Now these patches are making use of g_usleep(). So add
> clock_nanosleep() to list of allowed syscalls.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/passthrough_seccomp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index cd24b40b78..03080806c0 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -117,6 +117,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(writev),
>      SCMP_SYS(umask),
>      SCMP_SYS(nanosleep),
> +    SCMP_SYS(clock_nanosleep),

This patch can be dropped once sleep has been replaced by a condvar.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list
@ 2021-10-05 12:22     ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05 12:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

On Thu, Sep 30, 2021 at 11:30:37AM -0400, Vivek Goyal wrote:
> g_usleep() calls nanosleep() and that now seems to call clock_nanosleep()
> syscall. Now these patches are making use of g_usleep(). So add
> clock_nanosleep() to list of allowed syscalls.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/passthrough_seccomp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index cd24b40b78..03080806c0 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -117,6 +117,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(writev),
>      SCMP_SYS(umask),
>      SCMP_SYS(nanosleep),
> +    SCMP_SYS(clock_nanosleep),

This patch can be dropped once sleep has been replaced by a condvar.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 08/13] virtiofsd: Create a notification queue
  2021-10-05  8:14         ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-05 12:31           ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 12:31 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Tue, Oct 05, 2021 at 09:14:14AM +0100, Stefan Hajnoczi wrote:
> On Mon, Oct 04, 2021 at 05:01:07PM -0400, Vivek Goyal wrote:
> > On Mon, Oct 04, 2021 at 03:30:38PM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote:
> > > > Add a notification queue which will be used to send async notifications
> > > > for file lock availability.
> > > > 
> > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > > > ---
> > > >  hw/virtio/vhost-user-fs-pci.c     |  4 +-
> > > >  hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
> > > >  include/hw/virtio/vhost-user-fs.h |  2 +
> > > >  tools/virtiofsd/fuse_i.h          |  1 +
> > > >  tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
> > > >  5 files changed, 116 insertions(+), 23 deletions(-)
> > > > 
> > > > diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
> > > > index 2ed8492b3f..cdb9471088 100644
> > > > --- a/hw/virtio/vhost-user-fs-pci.c
> > > > +++ b/hw/virtio/vhost-user-fs-pci.c
> > > > @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > > >      DeviceState *vdev = DEVICE(&dev->vdev);
> > > >  
> > > >      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> > > > -        /* Also reserve config change and hiprio queue vectors */
> > > > -        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
> > > > +        /* Also reserve config change, hiprio and notification queue vectors */
> > > > +        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
> > > >      }
> > > >  
> > > >      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > > index d1efbc5b18..6bafcf0243 100644
> > > > --- a/hw/virtio/vhost-user-fs.c
> > > > +++ b/hw/virtio/vhost-user-fs.c
> > > > @@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
> > > >      VIRTIO_F_NOTIFY_ON_EMPTY,
> > > >      VIRTIO_F_RING_PACKED,
> > > >      VIRTIO_F_IOMMU_PLATFORM,
> > > > +    VIRTIO_FS_F_NOTIFICATION,
> > > >  
> > > >      VHOST_INVALID_FEATURE_BIT
> > > >  };
> > > > @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > > >       */
> > > >  }
> > > >  
> > > > -static void vuf_create_vqs(VirtIODevice *vdev)
> > > > +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
> > > >  {
> > > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > >      unsigned int i;
> > > > @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> > > >      /* Hiprio queue */
> > > >      fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > > >                                       vuf_handle_output);
> > > > +    /*
> > > > +     * Notification queue. Feature negotiation happens later. So at this
> > > > +     * point of time we don't know if driver will use notification queue
> > > > +     * or not.
> > > > +     */
> > > > +    if (notification_vq) {
> > > > +        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > > > +                                               vuf_handle_output);
> > > > +    }
> > > >  
> > > >      /* Request queues */
> > > >      fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > > > @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> > > >                                            vuf_handle_output);
> > > >      }
> > > >  
> > > > -    /* 1 high prio queue, plus the number configured */
> > > > -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > > +    /* 1 high prio queue, 1 notification queue plus the number configured */
> > > > +    if (notification_vq) {
> > > > +        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> > > > +    } else {
> > > > +        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > > +    }
> > > >      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > > >  }
> > > >  
> > > > @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
> > > >      virtio_delete_queue(fs->hiprio_vq);
> > > >      fs->hiprio_vq = NULL;
> > > >  
> > > > +    if (fs->notification_vq) {
> > > > +        virtio_delete_queue(fs->notification_vq);
> > > > +    }
> > > > +    fs->notification_vq = NULL;
> > > > +
> > > >      for (i = 0; i < fs->conf.num_request_queues; i++) {
> > > >          virtio_delete_queue(fs->req_vqs[i]);
> > > >      }
> > > > @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
> > > >  {
> > > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > >  
> > > > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > > > +
> > > >      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> > > >  }
> > > >  
> > > > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > > > +{
> > > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > > +
> > > > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > > > +        fs->notify_enabled = true;
> > > > +        /*
> > > > +         * If guest first booted with no notification queue support and
> > > > +         * later rebooted with kernel which supports notification, we
> > > > +         * can end up here
> > > > +         */
> > > > +        if (!fs->notification_vq) {
> > > > +            vuf_cleanup_vqs(vdev);
> > > > +            vuf_create_vqs(vdev, true);
> > > > +        }
> > > 
> > > I would simplify things by unconditionally creating the notification vq
> > > for the device and letting the vhost-user device backend decide whether
> > > it wants to handle the vq or not.
> > > If the backend doesn't implement the
> > > vq then it also won't advertise VIRTIO_FS_F_NOTIFICATION so the guest
> > > driver won't submit virtqueue buffers.
> > 
> > I think I am did not understand the idea. This code deals with that
> > both qemu and vhost-user device can deal with notification queue. But
> > driver can't deal with it. 
> > 
> > So if we first booted into a guest kernel which does not support
> > notification queue, then we will not have instantiated notification
> > queue. But later we reboot guest into a newer kernel and now it
> > has capability to deal with notification queues, so we create it
> > now.
> > 
> > IIUC, you are suggesting that somehow keep notification queue
> > instantiated even if guest driver does not support notifications, so
> > that we will not have to get into the exercise of cleaning up queues
> > and re-instantiating these?
> 
> Yes.
> 
> > But I think we can't keep notification queue around if driver does
> > not support it. Because it changes queue index. queue index 1 will
> > belong to request queue if notifications are not enabled otherwise
> > it will belong to notification queue. So If I always instantiate
> > notification queue, then guest and qemu/virtiofsd will have
> > different understanding of which queue index belongs to what
> > queue.
> 
> The meaning of the virtqueue doesn't matter. That only matters to
> virtiofsd when processing virtqueues. Since QEMU's -device
> vhost-user-fs doesn't process virtqueues there's no difference between
> hipri, request, and notification virtqueues.

Ok, I will think more about it and look at the code and see if this
is feasible. First question I have is that vhost-user device will
have to know whether driver supports notification or not so that
it can adjust its internal view of virtqueue mapping.

BTW, complexity aside, is my current implementation of reconfiguring
queues broken?

Vivek

> 
> I'm not 100% sure that the vhost-user code is set up to work smoothly in
> this fashion, but I think it should be possible to make this work and
> the end result will be simpler.
> 
> Stefan




^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 08/13] virtiofsd: Create a notification queue
@ 2021-10-05 12:31           ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 12:31 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Tue, Oct 05, 2021 at 09:14:14AM +0100, Stefan Hajnoczi wrote:
> On Mon, Oct 04, 2021 at 05:01:07PM -0400, Vivek Goyal wrote:
> > On Mon, Oct 04, 2021 at 03:30:38PM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Sep 30, 2021 at 11:30:32AM -0400, Vivek Goyal wrote:
> > > > Add a notification queue which will be used to send async notifications
> > > > for file lock availability.
> > > > 
> > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > > > ---
> > > >  hw/virtio/vhost-user-fs-pci.c     |  4 +-
> > > >  hw/virtio/vhost-user-fs.c         | 62 +++++++++++++++++++++++++--
> > > >  include/hw/virtio/vhost-user-fs.h |  2 +
> > > >  tools/virtiofsd/fuse_i.h          |  1 +
> > > >  tools/virtiofsd/fuse_virtio.c     | 70 +++++++++++++++++++++++--------
> > > >  5 files changed, 116 insertions(+), 23 deletions(-)
> > > > 
> > > > diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
> > > > index 2ed8492b3f..cdb9471088 100644
> > > > --- a/hw/virtio/vhost-user-fs-pci.c
> > > > +++ b/hw/virtio/vhost-user-fs-pci.c
> > > > @@ -41,8 +41,8 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > > >      DeviceState *vdev = DEVICE(&dev->vdev);
> > > >  
> > > >      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> > > > -        /* Also reserve config change and hiprio queue vectors */
> > > > -        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
> > > > +        /* Also reserve config change, hiprio and notification queue vectors */
> > > > +        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 3;
> > > >      }
> > > >  
> > > >      qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
> > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > > index d1efbc5b18..6bafcf0243 100644
> > > > --- a/hw/virtio/vhost-user-fs.c
> > > > +++ b/hw/virtio/vhost-user-fs.c
> > > > @@ -31,6 +31,7 @@ static const int user_feature_bits[] = {
> > > >      VIRTIO_F_NOTIFY_ON_EMPTY,
> > > >      VIRTIO_F_RING_PACKED,
> > > >      VIRTIO_F_IOMMU_PLATFORM,
> > > > +    VIRTIO_FS_F_NOTIFICATION,
> > > >  
> > > >      VHOST_INVALID_FEATURE_BIT
> > > >  };
> > > > @@ -147,7 +148,7 @@ static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > > >       */
> > > >  }
> > > >  
> > > > -static void vuf_create_vqs(VirtIODevice *vdev)
> > > > +static void vuf_create_vqs(VirtIODevice *vdev, bool notification_vq)
> > > >  {
> > > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > >      unsigned int i;
> > > > @@ -155,6 +156,15 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> > > >      /* Hiprio queue */
> > > >      fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > > >                                       vuf_handle_output);
> > > > +    /*
> > > > +     * Notification queue. Feature negotiation happens later. So at this
> > > > +     * point of time we don't know if driver will use notification queue
> > > > +     * or not.
> > > > +     */
> > > > +    if (notification_vq) {
> > > > +        fs->notification_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > > > +                                               vuf_handle_output);
> > > > +    }
> > > >  
> > > >      /* Request queues */
> > > >      fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > > > @@ -163,8 +173,12 @@ static void vuf_create_vqs(VirtIODevice *vdev)
> > > >                                            vuf_handle_output);
> > > >      }
> > > >  
> > > > -    /* 1 high prio queue, plus the number configured */
> > > > -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > > +    /* 1 high prio queue, 1 notification queue plus the number configured */
> > > > +    if (notification_vq) {
> > > > +        fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> > > > +    } else {
> > > > +        fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > > > +    }
> > > >      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > > >  }
> > > >  
> > > > @@ -176,6 +190,11 @@ static void vuf_cleanup_vqs(VirtIODevice *vdev)
> > > >      virtio_delete_queue(fs->hiprio_vq);
> > > >      fs->hiprio_vq = NULL;
> > > >  
> > > > +    if (fs->notification_vq) {
> > > > +        virtio_delete_queue(fs->notification_vq);
> > > > +    }
> > > > +    fs->notification_vq = NULL;
> > > > +
> > > >      for (i = 0; i < fs->conf.num_request_queues; i++) {
> > > >          virtio_delete_queue(fs->req_vqs[i]);
> > > >      }
> > > > @@ -194,9 +213,43 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
> > > >  {
> > > >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > >  
> > > > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > > > +
> > > >      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> > > >  }
> > > >  
> > > > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > > > +{
> > > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > > +
> > > > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > > > +        fs->notify_enabled = true;
> > > > +        /*
> > > > +         * If guest first booted with no notification queue support and
> > > > +         * later rebooted with kernel which supports notification, we
> > > > +         * can end up here
> > > > +         */
> > > > +        if (!fs->notification_vq) {
> > > > +            vuf_cleanup_vqs(vdev);
> > > > +            vuf_create_vqs(vdev, true);
> > > > +        }
> > > 
> > > I would simplify things by unconditionally creating the notification vq
> > > for the device and letting the vhost-user device backend decide whether
> > > it wants to handle the vq or not.
> > > If the backend doesn't implement the
> > > vq then it also won't advertise VIRTIO_FS_F_NOTIFICATION so the guest
> > > driver won't submit virtqueue buffers.
> > 
> > I think I am did not understand the idea. This code deals with that
> > both qemu and vhost-user device can deal with notification queue. But
> > driver can't deal with it. 
> > 
> > So if we first booted into a guest kernel which does not support
> > notification queue, then we will not have instantiated notification
> > queue. But later we reboot guest into a newer kernel and now it
> > has capability to deal with notification queues, so we create it
> > now.
> > 
> > IIUC, you are suggesting that somehow keep notification queue
> > instantiated even if guest driver does not support notifications, so
> > that we will not have to get into the exercise of cleaning up queues
> > and re-instantiating these?
> 
> Yes.
> 
> > But I think we can't keep notification queue around if driver does
> > not support it. Because it changes queue index. queue index 1 will
> > belong to request queue if notifications are not enabled otherwise
> > it will belong to notification queue. So If I always instantiate
> > notification queue, then guest and qemu/virtiofsd will have
> > different understanding of which queue index belongs to what
> > queue.
> 
> The meaning of the virtqueue doesn't matter. That only matters to
> virtiofsd when processing virtqueues. Since QEMU's -device
> vhost-user-fs doesn't process virtqueues there's no difference between
> hipri, request, and notification virtqueues.

Ok, I will think more about it and look at the code and see if this
is feasible. First question I have is that vhost-user device will
have to know whether driver supports notification or not so that
it can adjust its internal view of virtqueue mapping.

BTW, complexity aside, is my current implementation of reconfiguring
queues broken?

Vivek

> 
> I'm not 100% sure that the vhost-user code is set up to work smoothly in
> this fashion, but I think it should be possible to make this work and
> the end result will be simpler.
> 
> Stefan



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
  2021-10-04 14:54     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-05 13:06       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 13:06 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Mon, Oct 04, 2021 at 03:54:31PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote:
> > Add a new custom threadpool using posix threads that specifically
> > service locking requests.
> > 
> > In the case of a fcntl(SETLKW) request, if the guest is waiting
> > for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
> > unblocks the blocked threads by sending a signal to them and waking
> > them up.
> > 
> > The current threadpool (GThreadPool) is not adequate to service the
> > locking requests that result in a thread blocking. That is because
> > GLib does not provide an API to cancel the request while it is
> > serviced by a thread. In addition, a user might be running virtiofsd
> > without a threadpool (--thread-pool-size=0), thus a locking request
> > that blocks, will block the main virtqueue thread that services requests
> > from servicing any other requests.
> > 
> > The only exception occurs when the lock is of type F_UNLCK. In this case
> > the request is serviced by the main virtqueue thread or a GThreadPool
> > thread to avoid a deadlock, when all the threads in the custom threadpool
> > are blocked.
> > 
> > Then virtiofsd proceeds to cleanup the state of the threads, release
> > them back to the system and re-initialize.
> 
> Is there another way to cancel SETLKW without resorting to a new thread
> pool? Since this only matters when shutting down or restarting, can we
> close all plock->fd file descriptors to kick the GThreadPool workers out
> of fnctl()?

I don't think that closing plock->fd will unblock fcntl().  

SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
        struct fd f = fdget_raw(fd);
}

IIUC, fdget_raw() will take a reference on associated "struct file" and
after that rest of the code will work with that "struct file".

static int do_lock_file_wait(struct file *filp, unsigned int cmd,
                             struct file_lock *fl)
{
..
..
                error = wait_event_interruptible(fl->fl_wait,
                                        list_empty(&fl->fl_blocked_member));

..
..
}

And this shoudl break upon receiving signal. And man page says the
same thing.

       F_OFD_SETLKW (struct flock *)
              As for F_OFD_SETLK, but if a conflicting lock  is  held  on  the
              file,  then  wait  for that lock to be released.  If a signal is
              caught while waiting, then the call is  interrupted  and  (after
              the  signal  handler has returned) returns immediately (with re‐
              turn value -1 and errno set to EINTR; see signal(7)).

It would be nice if we don't have to implement our own custom threadpool
just for locking. Would have been better if glib thread pool provided
some facility for this.

[..]
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index 3b720c5d4a..c67c2e0e7a 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -20,6 +20,7 @@
> >  #include "fuse_misc.h"
> >  #include "fuse_opt.h"
> >  #include "fuse_virtio.h"
> > +#include "tpool.h"
> >  
> >  #include <sys/eventfd.h>
> >  #include <sys/socket.h>
> > @@ -612,6 +613,60 @@ out:
> >      free(req);
> >  }
> >  
> > +/*
> > + * If the request is a locking request, use a custom locking thread pool.
> > + */
> > +static bool use_lock_tpool(gpointer data, gpointer user_data)
> > +{
> > +    struct fv_QueueInfo *qi = user_data;
> > +    struct fuse_session *se = qi->virtio_dev->se;
> > +    FVRequest *req = data;
> > +    VuVirtqElement *elem = &req->elem;
> > +    struct fuse_buf fbuf = {};
> > +    struct fuse_in_header *inhp;
> > +    struct fuse_lk_in *lkinp;
> > +    size_t lk_req_len;
> > +    /* The 'out' part of the elem is from qemu */
> > +    unsigned int out_num = elem->out_num;
> > +    struct iovec *out_sg = elem->out_sg;
> > +    size_t out_len = iov_size(out_sg, out_num);
> > +    bool use_custom_tpool = false;
> > +
> > +    /*
> > +     * If notifications are not enabled, no point in using cusotm lock
> > +     * thread pool.
> > +     */
> > +    if (!se->notify_enabled) {
> > +        return false;
> > +    }
> > +
> > +    assert(se->bufsize > sizeof(struct fuse_in_header));
> > +    lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in);
> > +
> > +    if (out_len < lk_req_len) {
> > +        return false;
> > +    }
> > +
> > +    fbuf.mem = g_malloc(se->bufsize);
> > +    copy_from_iov(&fbuf, out_num, out_sg, lk_req_len);
> 
> This looks inefficient: for every FUSE request we now malloc se->bufsize
> and then copy lk_req_len bytes, only to free the memory again.
> 
> Is it possible to keep lk_req_len bytes on the stack instead?

I guess it should be possible. se->bufsize is variable but lk_req_len
is known at compile time.

lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in);

So we should be able to allocate this much space on stack and point
fbuf.mem to it.

char buf[sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in)];
fbuf.mem = buf;

Will give it a try.

Vivek



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
@ 2021-10-05 13:06       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 13:06 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Mon, Oct 04, 2021 at 03:54:31PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote:
> > Add a new custom threadpool using posix threads that specifically
> > service locking requests.
> > 
> > In the case of a fcntl(SETLKW) request, if the guest is waiting
> > for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
> > unblocks the blocked threads by sending a signal to them and waking
> > them up.
> > 
> > The current threadpool (GThreadPool) is not adequate to service the
> > locking requests that result in a thread blocking. That is because
> > GLib does not provide an API to cancel the request while it is
> > serviced by a thread. In addition, a user might be running virtiofsd
> > without a threadpool (--thread-pool-size=0), thus a locking request
> > that blocks, will block the main virtqueue thread that services requests
> > from servicing any other requests.
> > 
> > The only exception occurs when the lock is of type F_UNLCK. In this case
> > the request is serviced by the main virtqueue thread or a GThreadPool
> > thread to avoid a deadlock, when all the threads in the custom threadpool
> > are blocked.
> > 
> > Then virtiofsd proceeds to cleanup the state of the threads, release
> > them back to the system and re-initialize.
> 
> Is there another way to cancel SETLKW without resorting to a new thread
> pool? Since this only matters when shutting down or restarting, can we
> close all plock->fd file descriptors to kick the GThreadPool workers out
> of fnctl()?

I don't think that closing plock->fd will unblock fcntl().  

SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
        struct fd f = fdget_raw(fd);
}

IIUC, fdget_raw() will take a reference on associated "struct file" and
after that rest of the code will work with that "struct file".

static int do_lock_file_wait(struct file *filp, unsigned int cmd,
                             struct file_lock *fl)
{
..
..
                error = wait_event_interruptible(fl->fl_wait,
                                        list_empty(&fl->fl_blocked_member));

..
..
}

And this shoudl break upon receiving signal. And man page says the
same thing.

       F_OFD_SETLKW (struct flock *)
              As for F_OFD_SETLK, but if a conflicting lock  is  held  on  the
              file,  then  wait  for that lock to be released.  If a signal is
              caught while waiting, then the call is  interrupted  and  (after
              the  signal  handler has returned) returns immediately (with re‐
              turn value -1 and errno set to EINTR; see signal(7)).

It would be nice if we don't have to implement our own custom threadpool
just for locking. Would have been better if glib thread pool provided
some facility for this.

[..]
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index 3b720c5d4a..c67c2e0e7a 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -20,6 +20,7 @@
> >  #include "fuse_misc.h"
> >  #include "fuse_opt.h"
> >  #include "fuse_virtio.h"
> > +#include "tpool.h"
> >  
> >  #include <sys/eventfd.h>
> >  #include <sys/socket.h>
> > @@ -612,6 +613,60 @@ out:
> >      free(req);
> >  }
> >  
> > +/*
> > + * If the request is a locking request, use a custom locking thread pool.
> > + */
> > +static bool use_lock_tpool(gpointer data, gpointer user_data)
> > +{
> > +    struct fv_QueueInfo *qi = user_data;
> > +    struct fuse_session *se = qi->virtio_dev->se;
> > +    FVRequest *req = data;
> > +    VuVirtqElement *elem = &req->elem;
> > +    struct fuse_buf fbuf = {};
> > +    struct fuse_in_header *inhp;
> > +    struct fuse_lk_in *lkinp;
> > +    size_t lk_req_len;
> > +    /* The 'out' part of the elem is from qemu */
> > +    unsigned int out_num = elem->out_num;
> > +    struct iovec *out_sg = elem->out_sg;
> > +    size_t out_len = iov_size(out_sg, out_num);
> > +    bool use_custom_tpool = false;
> > +
> > +    /*
> > +     * If notifications are not enabled, no point in using cusotm lock
> > +     * thread pool.
> > +     */
> > +    if (!se->notify_enabled) {
> > +        return false;
> > +    }
> > +
> > +    assert(se->bufsize > sizeof(struct fuse_in_header));
> > +    lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in);
> > +
> > +    if (out_len < lk_req_len) {
> > +        return false;
> > +    }
> > +
> > +    fbuf.mem = g_malloc(se->bufsize);
> > +    copy_from_iov(&fbuf, out_num, out_sg, lk_req_len);
> 
> This looks inefficient: for every FUSE request we now malloc se->bufsize
> and then copy lk_req_len bytes, only to free the memory again.
> 
> Is it possible to keep lk_req_len bytes on the stack instead?

I guess it should be possible. se->bufsize is variable but lk_req_len
is known at compile time.

lk_req_len = sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in);

So we should be able to allocate this much space on stack and point
fbuf.mem to it.

char buf[sizeof(struct fuse_in_header) + sizeof(struct fuse_lk_in)];
fbuf.mem = buf;

Will give it a try.

Vivek


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
  2021-10-04 15:01     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-05 13:19       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 13:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Mon, Oct 04, 2021 at 04:01:02PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:35AM -0400, Vivek Goyal wrote:
> > So far we did not have the notion of cross queue traffic. That is, we
> > get request on a queue and send back response on same queue. So if a
> > request be being processed and at the same time a stop queue request
> > comes in, we wait for all pending requests to finish and then queue
> > is stopped and associated data structure cleaned.
> > 
> > But with notification queue, now it is possible that we get a locking
> > request on request queue and send the notification back on a different
> > queue (notificaiton queue). This means, we need to make sure that
> 
> s/notificaiton/notification/
> 
> > notifiation queue has not already been shutdown or is not being
> 
> s/notifiation/notification/

Will fix both.

[..]
> >  /* Callback from libvhost-user on start or stop of a queue */
> > @@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >           * the queue thread doesn't block in virtio_send_msg().
> >           */
> >          vu_dispatch_unlock(vud);
> > -        fv_queue_cleanup_thread(vud, qidx);
> > +
> > +        /*
> > +         * If queue 0 is being shutdown, treat it as if device is being
> > +         * shutdown and stop all queues.
> > +         */
> 
> Please expand this comment so it's clear why we do this.

Ok, will do. I put the justification in commit message but it is a good
idea to put it here as well.

Vivek



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
@ 2021-10-05 13:19       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 13:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Mon, Oct 04, 2021 at 04:01:02PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:35AM -0400, Vivek Goyal wrote:
> > So far we did not have the notion of cross queue traffic. That is, we
> > get request on a queue and send back response on same queue. So if a
> > request be being processed and at the same time a stop queue request
> > comes in, we wait for all pending requests to finish and then queue
> > is stopped and associated data structure cleaned.
> > 
> > But with notification queue, now it is possible that we get a locking
> > request on request queue and send the notification back on a different
> > queue (notificaiton queue). This means, we need to make sure that
> 
> s/notificaiton/notification/
> 
> > notifiation queue has not already been shutdown or is not being
> 
> s/notifiation/notification/

Will fix both.

[..]
> >  /* Callback from libvhost-user on start or stop of a queue */
> > @@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >           * the queue thread doesn't block in virtio_send_msg().
> >           */
> >          vu_dispatch_unlock(vud);
> > -        fv_queue_cleanup_thread(vud, qidx);
> > +
> > +        /*
> > +         * If queue 0 is being shutdown, treat it as if device is being
> > +         * shutdown and stop all queues.
> > +         */
> 
> Please expand this comment so it's clear why we do this.

Ok, will do. I put the justification in commit message but it is a good
idea to put it here as well.

Vivek


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 12/13] virtiofsd: Implement blocking posix locks
  2021-10-04 15:07     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-05 13:26       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 13:26 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Mon, Oct 04, 2021 at 04:07:04PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> > As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> > -EOPNOTSUPP.
> > 
> > Change that by accepting these requests and returning a reply
> > immediately asking caller to wait. Once lock is available, send a
> > notification to the waiter indicating lock is available.
> > 
> > In response to lock request, we are returning error value as "1", which
> > signals to client to queue the lock request internally and later client
> > will get a notification which will signal lock is taken (or error). And
> > then fuse client should wake up the guest process.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
> >  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
> >  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
> >  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
> >  4 files changed, 167 insertions(+), 16 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index e4679c73ab..2e7f4b786d 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
> >          .unique = req->unique,
> >          .error = error,
> >      };
> > -
> > -    if (error <= -1000 || error > 0) {
> > +    /* error = 1 has been used to signal client to wait for notificaiton */
> 
> s/notificaiton/notification/

Will fix. I have made too many spelling mistakes. :-(

> 
> > +    if (error <= -1000 || error > 1) {
> >          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
> >          out.error = -ERANGE;
> >      }
> > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
> >      return send_reply(req, -err, NULL, 0);
> >  }
> >  
> > +int fuse_reply_wait(fuse_req_t req)
> > +{
> > +    return send_reply(req, 1, NULL, 0);
> > +}
> > +
> >  void fuse_reply_none(fuse_req_t req)
> >  {
> >      fuse_free_req(req);
> > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
> >      send_reply_ok(req, NULL, 0);
> >  }
> >  
> > +static int send_notify_iov(struct fuse_session *se, int notify_code,
> > +                           struct iovec *iov, int count)
> > +{
> > +    struct fuse_out_header out;
> > +    if (!se->got_init) {
> > +        return -ENOTCONN;
> > +    }
> > +    out.unique = 0;
> > +    out.error = notify_code;
> 
> Please fully initialize all fuse_out_header fields so it's obvious that
> there is no accidental information leak from virtiofsd to the guest:
> 
>   struct fuse_out_header out = {
>       .error = notify_code,
>   };
> 
> The host must not expose uninitialized memory to the guest (just like
> the kernel vs userspace). fuse_send_msg() initializes out.len later, but
> to be on the safe side I think we should be explicit here.

Agreed. Its better to be explicit here and initialize fuse_out_header
fully. Will do.

Vivek



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
@ 2021-10-05 13:26       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 13:26 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Mon, Oct 04, 2021 at 04:07:04PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> > As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> > -EOPNOTSUPP.
> > 
> > Change that by accepting these requests and returning a reply
> > immediately asking caller to wait. Once lock is available, send a
> > notification to the waiter indicating lock is available.
> > 
> > In response to lock request, we are returning error value as "1", which
> > signals to client to queue the lock request internally and later client
> > will get a notification which will signal lock is taken (or error). And
> > then fuse client should wake up the guest process.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
> >  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
> >  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
> >  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
> >  4 files changed, 167 insertions(+), 16 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index e4679c73ab..2e7f4b786d 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
> >          .unique = req->unique,
> >          .error = error,
> >      };
> > -
> > -    if (error <= -1000 || error > 0) {
> > +    /* error = 1 has been used to signal client to wait for notificaiton */
> 
> s/notificaiton/notification/

Will fix. I have made too many spelling mistakes. :-(

> 
> > +    if (error <= -1000 || error > 1) {
> >          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
> >          out.error = -ERANGE;
> >      }
> > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
> >      return send_reply(req, -err, NULL, 0);
> >  }
> >  
> > +int fuse_reply_wait(fuse_req_t req)
> > +{
> > +    return send_reply(req, 1, NULL, 0);
> > +}
> > +
> >  void fuse_reply_none(fuse_req_t req)
> >  {
> >      fuse_free_req(req);
> > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
> >      send_reply_ok(req, NULL, 0);
> >  }
> >  
> > +static int send_notify_iov(struct fuse_session *se, int notify_code,
> > +                           struct iovec *iov, int count)
> > +{
> > +    struct fuse_out_header out;
> > +    if (!se->got_init) {
> > +        return -ENOTCONN;
> > +    }
> > +    out.unique = 0;
> > +    out.error = notify_code;
> 
> Please fully initialize all fuse_out_header fields so it's obvious that
> there is no accidental information leak from virtiofsd to the guest:
> 
>   struct fuse_out_header out = {
>       .error = notify_code,
>   };
> 
> The host must not expose uninitialized memory to the guest (just like
> the kernel vs userspace). fuse_send_msg() initializes out.len later, but
> to be on the safe side I think we should be explicit here.

Agreed. Its better to be explicit here and initialize fuse_out_header
fully. Will do.

Vivek


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 07/13] virtiofsd: Release file locks using F_UNLCK
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-05 13:37     ` Christophe de Dinechin
  -1 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-05 13:37 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, stefanha, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> We are emulating posix locks for guest using open file description locks
> in virtiofsd. When any of the fd is closed in guest, we find associated
> OFD lock fd (if there is one) and close it to release all the locks.
>
> Assumption here is that there is no other thread using lo_inode_plock
> structure or plock->fd, hence it is safe to do so.
>
> But now we are about to introduce blocking variant of locks (SETLKW),
> and that means we might be waiting to a lock to be available and
> using plock->fd. And that means there are still users of plock
> structure.
>
> So release locks using fcntl(SETLK, F_UNLCK) instead of closing fd
> and plock will be freed later when lo_inode is being freed.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 38b2af8599..6928662e22 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1557,9 +1557,6 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>          lo_map_remove(&lo->ino_map, inode->fuse_ino);
>          g_hash_table_remove(lo->inodes, &inode->key);
>          if (lo->posix_lock) {
> -            if (g_hash_table_size(inode->posix_locks)) {
> -                fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> -            }
>              g_hash_table_destroy(inode->posix_locks);
>              pthread_mutex_destroy(&inode->plock_mutex);
>          }
> @@ -2266,6 +2263,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      (void)ino;
>      struct lo_inode *inode;
>      struct lo_data *lo = lo_data(req);
> +    struct lo_inode_plock *plock;
> +    struct flock flock;
>
>      inode = lo_inode(req, ino);
>      if (!inode) {
> @@ -2282,8 +2281,22 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      /* An fd is going away. Cleanup associated posix locks */
>      if (lo->posix_lock) {
>          pthread_mutex_lock(&inode->plock_mutex);
> -        g_hash_table_remove(inode->posix_locks,

I'm curious why the g_hash_table_remove above is not in the 'if' below?

> +        plock = g_hash_table_lookup(inode->posix_locks,
>              GUINT_TO_POINTER(fi->lock_owner));
> +
> +        if (plock) {
> +            /*
> +             * An fd is being closed. For posix locks, this means
> +             * drop all the associated locks.
> +             */
> +            memset(&flock, 0, sizeof(struct flock));
> +            flock.l_type = F_UNLCK;
> +            flock.l_whence = SEEK_SET;
> +            /* Unlock whole file */
> +            flock.l_start = flock.l_len = 0;
> +            fcntl(plock->fd, F_OFD_SETLK, &flock);
> +        }
> +
>          pthread_mutex_unlock(&inode->plock_mutex);
>      }
>      res = close(dup(lo_fi_fd(req, fi)));


--
Cheers,
Christophe de Dinechin (IRC c3d)



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 07/13] virtiofsd: Release file locks using F_UNLCK
@ 2021-10-05 13:37     ` Christophe de Dinechin
  0 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-05 13:37 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> We are emulating posix locks for guest using open file description locks
> in virtiofsd. When any of the fd is closed in guest, we find associated
> OFD lock fd (if there is one) and close it to release all the locks.
>
> Assumption here is that there is no other thread using lo_inode_plock
> structure or plock->fd, hence it is safe to do so.
>
> But now we are about to introduce blocking variant of locks (SETLKW),
> and that means we might be waiting to a lock to be available and
> using plock->fd. And that means there are still users of plock
> structure.
>
> So release locks using fcntl(SETLK, F_UNLCK) instead of closing fd
> and plock will be freed later when lo_inode is being freed.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 38b2af8599..6928662e22 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1557,9 +1557,6 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>          lo_map_remove(&lo->ino_map, inode->fuse_ino);
>          g_hash_table_remove(lo->inodes, &inode->key);
>          if (lo->posix_lock) {
> -            if (g_hash_table_size(inode->posix_locks)) {
> -                fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> -            }
>              g_hash_table_destroy(inode->posix_locks);
>              pthread_mutex_destroy(&inode->plock_mutex);
>          }
> @@ -2266,6 +2263,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      (void)ino;
>      struct lo_inode *inode;
>      struct lo_data *lo = lo_data(req);
> +    struct lo_inode_plock *plock;
> +    struct flock flock;
>
>      inode = lo_inode(req, ino);
>      if (!inode) {
> @@ -2282,8 +2281,22 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      /* An fd is going away. Cleanup associated posix locks */
>      if (lo->posix_lock) {
>          pthread_mutex_lock(&inode->plock_mutex);
> -        g_hash_table_remove(inode->posix_locks,

I'm curious why the g_hash_table_remove above is not in the 'if' below?

> +        plock = g_hash_table_lookup(inode->posix_locks,
>              GUINT_TO_POINTER(fi->lock_owner));
> +
> +        if (plock) {
> +            /*
> +             * An fd is being closed. For posix locks, this means
> +             * drop all the associated locks.
> +             */
> +            memset(&flock, 0, sizeof(struct flock));
> +            flock.l_type = F_UNLCK;
> +            flock.l_whence = SEEK_SET;
> +            /* Unlock whole file */
> +            flock.l_start = flock.l_len = 0;
> +            fcntl(plock->fd, F_OFD_SETLK, &flock);
> +        }
> +
>          pthread_mutex_unlock(&inode->plock_mutex);
>      }
>      res = close(dup(lo_fi_fd(req, fi)));


--
Cheers,
Christophe de Dinechin (IRC c3d)


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 12/13] virtiofsd: Implement blocking posix locks
  2021-10-05 12:22     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-05 15:14       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 15:14 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Tue, Oct 05, 2021 at 01:22:21PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> > As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> > -EOPNOTSUPP.
> > 
> > Change that by accepting these requests and returning a reply
> > immediately asking caller to wait. Once lock is available, send a
> > notification to the waiter indicating lock is available.
> > 
> > In response to lock request, we are returning error value as "1", which
> > signals to client to queue the lock request internally and later client
> > will get a notification which will signal lock is taken (or error). And
> > then fuse client should wake up the guest process.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
> >  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
> >  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
> >  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
> >  4 files changed, 167 insertions(+), 16 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index e4679c73ab..2e7f4b786d 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
> >          .unique = req->unique,
> >          .error = error,
> >      };
> > -
> > -    if (error <= -1000 || error > 0) {
> > +    /* error = 1 has been used to signal client to wait for notificaiton */
> > +    if (error <= -1000 || error > 1) {
> >          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
> >          out.error = -ERANGE;
> >      }
> > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
> >      return send_reply(req, -err, NULL, 0);
> >  }
> >  
> > +int fuse_reply_wait(fuse_req_t req)
> > +{
> > +    return send_reply(req, 1, NULL, 0);
> > +}
> > +
> >  void fuse_reply_none(fuse_req_t req)
> >  {
> >      fuse_free_req(req);
> > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
> >      send_reply_ok(req, NULL, 0);
> >  }
> >  
> > +static int send_notify_iov(struct fuse_session *se, int notify_code,
> > +                           struct iovec *iov, int count)
> > +{
> > +    struct fuse_out_header out;
> > +    if (!se->got_init) {
> > +        return -ENOTCONN;
> > +    }
> > +    out.unique = 0;
> > +    out.error = notify_code;
> > +    iov[0].iov_base = &out;
> > +    iov[0].iov_len = sizeof(struct fuse_out_header);
> > +    return fuse_send_msg(se, NULL, iov, count);
> > +}
> > +
> > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > +                  int32_t error)
> > +{
> > +    struct fuse_notify_lock_out outarg = {0};
> > +    struct iovec iov[2];
> > +
> > +    outarg.unique = unique;
> > +    outarg.error = -error;
> > +
> > +    iov[1].iov_base = &outarg;
> > +    iov[1].iov_len = sizeof(outarg);
> > +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> > +}
> > +
> >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> >                                 off_t offset, struct fuse_bufvec *bufv)
> >  {
> > diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> > index c55c0ca2fc..64624b48dc 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.h
> > +++ b/tools/virtiofsd/fuse_lowlevel.h
> > @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
> >   */
> >  int fuse_reply_err(fuse_req_t req, int err);
> >  
> > +/**
> > + * Ask caller to wait for lock.
> > + *
> > + * Possible requests:
> > + *   setlkw
> > + *
> > + * If caller sends a blocking lock request (setlkw), then reply to caller
> > + * that wait for lock to be available. Once lock is available caller will
> 
> I can't parse the first sentence.
> 
> s/that wait for lock to be available/that waiting for the lock is
> necessary/?

Ok, will change it.

> 
> > + * receive a notification with request's unique id. Notification will
> > + * carry info whether lock was successfully obtained or not.
> > + *
> > + * @param req request handle
> > + * @return zero for success, -errno for failure to send reply
> > + */
> > +int fuse_reply_wait(fuse_req_t req);
> > +
> >  /**
> >   * Don't send reply
> >   *
> > @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
> >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> >                                 off_t offset, struct fuse_bufvec *bufv);
> >  
> > +/**
> > + * Notify event related to previous lock request
> > + *
> > + * @param se the session object
> > + * @param unique the unique id of the request which requested setlkw
> > + * @param error zero for success, -errno for the failure
> > + */
> > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > +                              int32_t error);
> > +
> >  /*
> >   * Utility functions
> >   */
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index a87e88e286..bb2d4456fc 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> >      vu_dispatch_unlock(qi->virtio_dev);
> >  }
> >  
> > +/* Returns NULL if queue is empty */
> > +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> > +{
> > +    struct fuse_session *se = qi->virtio_dev->se;
> > +    VuDev *dev = &se->virtio_dev->dev;
> > +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> > +    FVRequest *req;
> > +
> > +    vu_dispatch_rdlock(qi->virtio_dev);
> > +    pthread_mutex_lock(&qi->vq_lock);
> > +    /* Pop an element from queue */
> > +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> > +    pthread_mutex_unlock(&qi->vq_lock);
> > +    vu_dispatch_unlock(qi->virtio_dev);
> > +    return req;
> > +}
> > +
> >  /*
> >   * Called back by ll whenever it wants to send a reply/message back
> >   * The 1st element of the iov starts with the fuse_out_header
> > @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> >  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >                      struct iovec *iov, int count)
> >  {
> > -    FVRequest *req = container_of(ch, FVRequest, ch);
> > -    struct fv_QueueInfo *qi = ch->qi;
> > -    VuVirtqElement *elem = &req->elem;
> > +    FVRequest *req;
> > +    struct fv_QueueInfo *qi;
> > +    VuVirtqElement *elem;
> >      int ret = 0;
> >  
> >      assert(count >= 1);
> > @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >  
> >      size_t tosend_len = iov_size(iov, count);
> >  
> > -    /* unique == 0 is notification, which we don't support */
> > -    assert(out->unique);
> > +    /* unique == 0 is notification */
> > +    if (!out->unique) {
> 
> Is a check needed in fuse_session_process_buf_int() to reject requests
> that the driver submitted to the device with req.unique == 0? If we get
> confused about the correct virtqueue to use in virtio_send_msg() then
> there could be bugs.

Ok. Should we abort/exit virtiofsd if fuse_session_process_buf_int()
gets a request with unique=0. If we try to reply to it instead, then
I will have to carve out a separate path which does not interpret
unique=0 as notification request instead.

> 
> > +        if (!se->notify_enabled) {
> > +            return -EOPNOTSUPP;
> > +        }
> > +        /* If notifications are enabled, queue index 1 is notification queue */
> > +        qi = se->virtio_dev->qi[1];
> > +        req = vq_pop_notify_elem(qi);
> 
> Where is req freed?

I think we are not freeing req in case of notification. Good catch.
Will fix it.

> 
> > +        if (!req) {
> > +            /*
> > +             * TODO: Implement some sort of ring buffer and queue notifications
> > +             * on that and send these later when notification queue has space
> > +             * available.
> > +             */
> > +            return -ENOSPC;
> 
> This needs to be addressed before this patch series can be merged. The
> notification vq is kicked by the guest driver when buffers are
> replenished. The vq handler function can wake up waiting threads using a
> condvar.

I have taken care of this using by polling in a loop (with sleep
in between). Just that sleeping on a variable and subsequent wake
up will be more efficient.

> 
> > +        }
> > +        req->reply_sent = false;
> > +    } else {
> > +        assert(ch);
> > +        req = container_of(ch, FVRequest, ch);
> > +        qi = ch->qi;
> > +    }
> > +
> > +    elem = &req->elem;
> >      assert(!req->reply_sent);
> >  
> >      /* The 'in' part of the elem is to qemu */
> > @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> >          struct fuse_notify_delete_out       delete_out;
> >          struct fuse_notify_store_out        store_out;
> >          struct fuse_notify_retrieve_out     retrieve_out;
> > +        struct fuse_notify_lock_out         lock_out;
> >      };
> >  
> >      notify_size = sizeof(struct fuse_out_header) +
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 6928662e22..277f74762b 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2131,13 +2131,35 @@ out:
> >      }
> >  }
> >  
> > +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> > +                                    int saverr)
> > +{
> > +    int ret;
> > +
> > +    do {
> > +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> > +        /*
> > +         * Retry sending notification if notification queue does not have
> > +         * free descriptor yet, otherwise break out of loop. Either we
> > +         * successfully sent notifiation or some other error occurred.
> > +         */
> > +        if (ret != -ENOSPC) {
> > +            break;
> > +        }
> > +        usleep(10000);
> > +    } while (1);
> 
> Please use the notification vq handler to wake up blocked threads
> instead of usleep().

Ok, I will look into it. This will be more code. First thing I can
see that I have not started a thread for notification queue. Looks
like I will have to start one so that that thread can see queue
kicks and if qemu is going away. And wake up waiters.

> 
> > +}
> > +
> >  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >                       struct flock *lock, int sleep)
> >  {
> >      struct lo_data *lo = lo_data(req);
> >      struct lo_inode *inode;
> >      struct lo_inode_plock *plock;
> > -    int ret, saverr = 0;
> > +    int ret, saverr = 0, ofd;
> > +    uint64_t unique;
> > +    struct fuse_session *se = req->se;
> > +    bool blocking_lock = false;
> >  
> >      fuse_log(FUSE_LOG_DEBUG,
> >               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> > @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >          return;
> >      }
> >  
> > -    if (sleep) {
> > -        fuse_reply_err(req, EOPNOTSUPP);
> > -        return;
> > -    }
> > -
> >      inode = lo_inode(req, ino);
> >      if (!inode) {
> >          fuse_reply_err(req, EBADF);
> > @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >  
> >      if (!plock) {
> >          saverr = ret;
> > +        pthread_mutex_unlock(&inode->plock_mutex);
> >          goto out;
> >      }
> >  
> > +    /*
> > +     * plock is now released when inode is going away. We already have
> > +     * a reference on inode, so it is guaranteed that plock->fd is
> > +     * still around even after dropping inode->plock_mutex lock
> > +     */
> > +    ofd = plock->fd;
> > +    pthread_mutex_unlock(&inode->plock_mutex);
> > +
> > +    /*
> > +     * If this lock request can block, request caller to wait for
> > +     * notification. Do not access req after this. Once lock is
> > +     * available, send a notification instead.
> > +     */
> > +    if (sleep && lock->l_type != F_UNLCK) {
> > +        /*
> > +         * If notification queue is not enabled, can't support async
> > +         * locks.
> > +         */
> > +        if (!se->notify_enabled) {
> > +            saverr = EOPNOTSUPP;
> > +            goto out;
> > +        }
> > +        blocking_lock = true;
> > +        unique = req->unique;
> > +        fuse_reply_wait(req);
> > +    }
> > +
> >      /* TODO: Is it alright to modify flock? */
> >      lock->l_pid = 0;
> > -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > +    if (blocking_lock) {
> > +        ret = fcntl(ofd, F_OFD_SETLKW, lock);
> 
> SETLKW can be interrupted by signals. Should we loop here when errno ==
> EINTR?

So there are two cases. In some cases we want to bail out because
qemu has forced reboot kernel, and we have sent signal to this
thread so that it stops waiting.

Other use case is that some other external entity sends signal to
virtiofsd thread. In that case we are relying sending -EINTR to
client and let client restart the syscall and send request again.

In future probably we can keep track of state whether we want
to return on -EINTR or should call fcntl() again if that helps.

Thanks
Vivek

> 
> > +    } else {
> > +        ret = fcntl(ofd, F_OFD_SETLK, lock);
> > +    }
> >      if (ret == -1) {
> >          saverr = errno;
> >      }
> >  
> >  out:
> > -    pthread_mutex_unlock(&inode->plock_mutex);
> >      lo_inode_put(lo, &inode);
> >  
> > -    fuse_reply_err(req, saverr);
> > +    if (!blocking_lock) {
> > +        fuse_reply_err(req, saverr);
> > +    } else {
> > +        setlk_send_notification(se, unique, saverr);
> > +    }
> >  }
> >  
> >  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
> > -- 
> > 2.31.1
> > 




^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
@ 2021-10-05 15:14       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 15:14 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Tue, Oct 05, 2021 at 01:22:21PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> > As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> > -EOPNOTSUPP.
> > 
> > Change that by accepting these requests and returning a reply
> > immediately asking caller to wait. Once lock is available, send a
> > notification to the waiter indicating lock is available.
> > 
> > In response to lock request, we are returning error value as "1", which
> > signals to client to queue the lock request internally and later client
> > will get a notification which will signal lock is taken (or error). And
> > then fuse client should wake up the guest process.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
> >  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
> >  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
> >  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
> >  4 files changed, 167 insertions(+), 16 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index e4679c73ab..2e7f4b786d 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
> >          .unique = req->unique,
> >          .error = error,
> >      };
> > -
> > -    if (error <= -1000 || error > 0) {
> > +    /* error = 1 has been used to signal client to wait for notificaiton */
> > +    if (error <= -1000 || error > 1) {
> >          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
> >          out.error = -ERANGE;
> >      }
> > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
> >      return send_reply(req, -err, NULL, 0);
> >  }
> >  
> > +int fuse_reply_wait(fuse_req_t req)
> > +{
> > +    return send_reply(req, 1, NULL, 0);
> > +}
> > +
> >  void fuse_reply_none(fuse_req_t req)
> >  {
> >      fuse_free_req(req);
> > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
> >      send_reply_ok(req, NULL, 0);
> >  }
> >  
> > +static int send_notify_iov(struct fuse_session *se, int notify_code,
> > +                           struct iovec *iov, int count)
> > +{
> > +    struct fuse_out_header out;
> > +    if (!se->got_init) {
> > +        return -ENOTCONN;
> > +    }
> > +    out.unique = 0;
> > +    out.error = notify_code;
> > +    iov[0].iov_base = &out;
> > +    iov[0].iov_len = sizeof(struct fuse_out_header);
> > +    return fuse_send_msg(se, NULL, iov, count);
> > +}
> > +
> > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > +                  int32_t error)
> > +{
> > +    struct fuse_notify_lock_out outarg = {0};
> > +    struct iovec iov[2];
> > +
> > +    outarg.unique = unique;
> > +    outarg.error = -error;
> > +
> > +    iov[1].iov_base = &outarg;
> > +    iov[1].iov_len = sizeof(outarg);
> > +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> > +}
> > +
> >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> >                                 off_t offset, struct fuse_bufvec *bufv)
> >  {
> > diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> > index c55c0ca2fc..64624b48dc 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.h
> > +++ b/tools/virtiofsd/fuse_lowlevel.h
> > @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
> >   */
> >  int fuse_reply_err(fuse_req_t req, int err);
> >  
> > +/**
> > + * Ask caller to wait for lock.
> > + *
> > + * Possible requests:
> > + *   setlkw
> > + *
> > + * If caller sends a blocking lock request (setlkw), then reply to caller
> > + * that wait for lock to be available. Once lock is available caller will
> 
> I can't parse the first sentence.
> 
> s/that wait for lock to be available/that waiting for the lock is
> necessary/?

Ok, will change it.

> 
> > + * receive a notification with request's unique id. Notification will
> > + * carry info whether lock was successfully obtained or not.
> > + *
> > + * @param req request handle
> > + * @return zero for success, -errno for failure to send reply
> > + */
> > +int fuse_reply_wait(fuse_req_t req);
> > +
> >  /**
> >   * Don't send reply
> >   *
> > @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
> >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> >                                 off_t offset, struct fuse_bufvec *bufv);
> >  
> > +/**
> > + * Notify event related to previous lock request
> > + *
> > + * @param se the session object
> > + * @param unique the unique id of the request which requested setlkw
> > + * @param error zero for success, -errno for the failure
> > + */
> > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > +                              int32_t error);
> > +
> >  /*
> >   * Utility functions
> >   */
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index a87e88e286..bb2d4456fc 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> >      vu_dispatch_unlock(qi->virtio_dev);
> >  }
> >  
> > +/* Returns NULL if queue is empty */
> > +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> > +{
> > +    struct fuse_session *se = qi->virtio_dev->se;
> > +    VuDev *dev = &se->virtio_dev->dev;
> > +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> > +    FVRequest *req;
> > +
> > +    vu_dispatch_rdlock(qi->virtio_dev);
> > +    pthread_mutex_lock(&qi->vq_lock);
> > +    /* Pop an element from queue */
> > +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> > +    pthread_mutex_unlock(&qi->vq_lock);
> > +    vu_dispatch_unlock(qi->virtio_dev);
> > +    return req;
> > +}
> > +
> >  /*
> >   * Called back by ll whenever it wants to send a reply/message back
> >   * The 1st element of the iov starts with the fuse_out_header
> > @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> >  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >                      struct iovec *iov, int count)
> >  {
> > -    FVRequest *req = container_of(ch, FVRequest, ch);
> > -    struct fv_QueueInfo *qi = ch->qi;
> > -    VuVirtqElement *elem = &req->elem;
> > +    FVRequest *req;
> > +    struct fv_QueueInfo *qi;
> > +    VuVirtqElement *elem;
> >      int ret = 0;
> >  
> >      assert(count >= 1);
> > @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >  
> >      size_t tosend_len = iov_size(iov, count);
> >  
> > -    /* unique == 0 is notification, which we don't support */
> > -    assert(out->unique);
> > +    /* unique == 0 is notification */
> > +    if (!out->unique) {
> 
> Is a check needed in fuse_session_process_buf_int() to reject requests
> that the driver submitted to the device with req.unique == 0? If we get
> confused about the correct virtqueue to use in virtio_send_msg() then
> there could be bugs.

Ok. Should we abort/exit virtiofsd if fuse_session_process_buf_int()
gets a request with unique=0. If we try to reply to it instead, then
I will have to carve out a separate path which does not interpret
unique=0 as notification request instead.

> 
> > +        if (!se->notify_enabled) {
> > +            return -EOPNOTSUPP;
> > +        }
> > +        /* If notifications are enabled, queue index 1 is notification queue */
> > +        qi = se->virtio_dev->qi[1];
> > +        req = vq_pop_notify_elem(qi);
> 
> Where is req freed?

I think we are not freeing req in case of notification. Good catch.
Will fix it.

> 
> > +        if (!req) {
> > +            /*
> > +             * TODO: Implement some sort of ring buffer and queue notifications
> > +             * on that and send these later when notification queue has space
> > +             * available.
> > +             */
> > +            return -ENOSPC;
> 
> This needs to be addressed before this patch series can be merged. The
> notification vq is kicked by the guest driver when buffers are
> replenished. The vq handler function can wake up waiting threads using a
> condvar.

I have taken care of this using by polling in a loop (with sleep
in between). Just that sleeping on a variable and subsequent wake
up will be more efficient.

> 
> > +        }
> > +        req->reply_sent = false;
> > +    } else {
> > +        assert(ch);
> > +        req = container_of(ch, FVRequest, ch);
> > +        qi = ch->qi;
> > +    }
> > +
> > +    elem = &req->elem;
> >      assert(!req->reply_sent);
> >  
> >      /* The 'in' part of the elem is to qemu */
> > @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> >          struct fuse_notify_delete_out       delete_out;
> >          struct fuse_notify_store_out        store_out;
> >          struct fuse_notify_retrieve_out     retrieve_out;
> > +        struct fuse_notify_lock_out         lock_out;
> >      };
> >  
> >      notify_size = sizeof(struct fuse_out_header) +
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 6928662e22..277f74762b 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2131,13 +2131,35 @@ out:
> >      }
> >  }
> >  
> > +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> > +                                    int saverr)
> > +{
> > +    int ret;
> > +
> > +    do {
> > +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> > +        /*
> > +         * Retry sending notification if notification queue does not have
> > +         * free descriptor yet, otherwise break out of loop. Either we
> > +         * successfully sent notifiation or some other error occurred.
> > +         */
> > +        if (ret != -ENOSPC) {
> > +            break;
> > +        }
> > +        usleep(10000);
> > +    } while (1);
> 
> Please use the notification vq handler to wake up blocked threads
> instead of usleep().

Ok, I will look into it. This will be more code. First thing I can
see that I have not started a thread for notification queue. Looks
like I will have to start one so that that thread can see queue
kicks and if qemu is going away. And wake up waiters.

> 
> > +}
> > +
> >  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >                       struct flock *lock, int sleep)
> >  {
> >      struct lo_data *lo = lo_data(req);
> >      struct lo_inode *inode;
> >      struct lo_inode_plock *plock;
> > -    int ret, saverr = 0;
> > +    int ret, saverr = 0, ofd;
> > +    uint64_t unique;
> > +    struct fuse_session *se = req->se;
> > +    bool blocking_lock = false;
> >  
> >      fuse_log(FUSE_LOG_DEBUG,
> >               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> > @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >          return;
> >      }
> >  
> > -    if (sleep) {
> > -        fuse_reply_err(req, EOPNOTSUPP);
> > -        return;
> > -    }
> > -
> >      inode = lo_inode(req, ino);
> >      if (!inode) {
> >          fuse_reply_err(req, EBADF);
> > @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >  
> >      if (!plock) {
> >          saverr = ret;
> > +        pthread_mutex_unlock(&inode->plock_mutex);
> >          goto out;
> >      }
> >  
> > +    /*
> > +     * plock is now released when inode is going away. We already have
> > +     * a reference on inode, so it is guaranteed that plock->fd is
> > +     * still around even after dropping inode->plock_mutex lock
> > +     */
> > +    ofd = plock->fd;
> > +    pthread_mutex_unlock(&inode->plock_mutex);
> > +
> > +    /*
> > +     * If this lock request can block, request caller to wait for
> > +     * notification. Do not access req after this. Once lock is
> > +     * available, send a notification instead.
> > +     */
> > +    if (sleep && lock->l_type != F_UNLCK) {
> > +        /*
> > +         * If notification queue is not enabled, can't support async
> > +         * locks.
> > +         */
> > +        if (!se->notify_enabled) {
> > +            saverr = EOPNOTSUPP;
> > +            goto out;
> > +        }
> > +        blocking_lock = true;
> > +        unique = req->unique;
> > +        fuse_reply_wait(req);
> > +    }
> > +
> >      /* TODO: Is it alright to modify flock? */
> >      lock->l_pid = 0;
> > -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > +    if (blocking_lock) {
> > +        ret = fcntl(ofd, F_OFD_SETLKW, lock);
> 
> SETLKW can be interrupted by signals. Should we loop here when errno ==
> EINTR?

So there are two cases. In some cases we want to bail out because
qemu has forced reboot kernel, and we have sent signal to this
thread so that it stops waiting.

Other use case is that some other external entity sends signal to
virtiofsd thread. In that case we are relying sending -EINTR to
client and let client restart the syscall and send request again.

In future probably we can keep track of state whether we want
to return on -EINTR or should call fcntl() again if that helps.

Thanks
Vivek

> 
> > +    } else {
> > +        ret = fcntl(ofd, F_OFD_SETLK, lock);
> > +    }
> >      if (ret == -1) {
> >          saverr = errno;
> >      }
> >  
> >  out:
> > -    pthread_mutex_unlock(&inode->plock_mutex);
> >      lo_inode_put(lo, &inode);
> >  
> > -    fuse_reply_err(req, saverr);
> > +    if (!blocking_lock) {
> > +        fuse_reply_err(req, saverr);
> > +    } else {
> > +        setlk_send_notification(se, unique, saverr);
> > +    }
> >  }
> >  
> >  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
> > -- 
> > 2.31.1
> > 



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list
  2021-10-05 12:22     ` [Virtio-fs] " Stefan Hajnoczi
  (?)
@ 2021-10-05 15:16     ` Vivek Goyal
  2021-10-05 15:50       ` Stefan Hajnoczi
  -1 siblings, 1 reply; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 15:16 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, qemu-devel, miklos

On Tue, Oct 05, 2021 at 01:22:58PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:37AM -0400, Vivek Goyal wrote:
> > g_usleep() calls nanosleep() and that now seems to call clock_nanosleep()
> > syscall. Now these patches are making use of g_usleep(). So add
> > clock_nanosleep() to list of allowed syscalls.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_seccomp.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> > index cd24b40b78..03080806c0 100644
> > --- a/tools/virtiofsd/passthrough_seccomp.c
> > +++ b/tools/virtiofsd/passthrough_seccomp.c
> > @@ -117,6 +117,7 @@ static const int syscall_allowlist[] = {
> >      SCMP_SYS(writev),
> >      SCMP_SYS(umask),
> >      SCMP_SYS(nanosleep),
> > +    SCMP_SYS(clock_nanosleep),
> 
> This patch can be dropped once sleep has been replaced by a condvar.

There is another sleep in do_pool_destroy() where we are waiting
for all current threads to exit.

do_pool_destroy() {
    g_usleep(10000);
}

Vivek


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 07/13] virtiofsd: Release file locks using F_UNLCK
  2021-10-05 13:37     ` Christophe de Dinechin
@ 2021-10-05 15:38       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 15:38 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: virtio-fs, qemu-devel, stefanha, miklos

On Tue, Oct 05, 2021 at 03:37:17PM +0200, Christophe de Dinechin wrote:
> 
> On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> > We are emulating posix locks for guest using open file description locks
> > in virtiofsd. When any of the fd is closed in guest, we find associated
> > OFD lock fd (if there is one) and close it to release all the locks.
> >
> > Assumption here is that there is no other thread using lo_inode_plock
> > structure or plock->fd, hence it is safe to do so.
> >
> > But now we are about to introduce blocking variant of locks (SETLKW),
> > and that means we might be waiting to a lock to be available and
> > using plock->fd. And that means there are still users of plock
> > structure.
> >
> > So release locks using fcntl(SETLK, F_UNLCK) instead of closing fd
> > and plock will be freed later when lo_inode is being freed.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 21 +++++++++++++++++----
> >  1 file changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 38b2af8599..6928662e22 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -1557,9 +1557,6 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
> >          lo_map_remove(&lo->ino_map, inode->fuse_ino);
> >          g_hash_table_remove(lo->inodes, &inode->key);
> >          if (lo->posix_lock) {
> > -            if (g_hash_table_size(inode->posix_locks)) {
> > -                fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> > -            }
> >              g_hash_table_destroy(inode->posix_locks);
> >              pthread_mutex_destroy(&inode->plock_mutex);
> >          }
> > @@ -2266,6 +2263,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
> >      (void)ino;
> >      struct lo_inode *inode;
> >      struct lo_data *lo = lo_data(req);
> > +    struct lo_inode_plock *plock;
> > +    struct flock flock;
> >
> >      inode = lo_inode(req, ino);
> >      if (!inode) {
> > @@ -2282,8 +2281,22 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
> >      /* An fd is going away. Cleanup associated posix locks */
> >      if (lo->posix_lock) {
> >          pthread_mutex_lock(&inode->plock_mutex);
> > -        g_hash_table_remove(inode->posix_locks,
> 
> I'm curious why the g_hash_table_remove above is not in the 'if' below?

Because now we are not removing plock from hash table when file is
closed. We leave it in place and it will be cleaned up when inode
is going away.

unref_inode() {
    g_hash_table_destroy(inode->posix_locks)
}

Now it is possible that some thread is waiting for a lock and
using plock->fd. So it probably is not a good idea to close(plock->fd)
and cleanup plock yet. It could be racy too.

So instead cleanup it up when inode is going away and that time we
are sure that no thread could be waiting on a lock on this file/inode.

IOW, previously we were cleaning up plock and plock->fd in lo_flush()
and now that has been delayed to unref_inode().

Thanks
Vivek

> 
> > +        plock = g_hash_table_lookup(inode->posix_locks,
> >              GUINT_TO_POINTER(fi->lock_owner));
> > +
> > +        if (plock) {
> > +            /*
> > +             * An fd is being closed. For posix locks, this means
> > +             * drop all the associated locks.
> > +             */
> > +            memset(&flock, 0, sizeof(struct flock));
> > +            flock.l_type = F_UNLCK;
> > +            flock.l_whence = SEEK_SET;
> > +            /* Unlock whole file */
> > +            flock.l_start = flock.l_len = 0;
> > +            fcntl(plock->fd, F_OFD_SETLK, &flock);
> > +        }
> > +
> >          pthread_mutex_unlock(&inode->plock_mutex);
> >      }
> >      res = close(dup(lo_fi_fd(req, fi)));
> 
> 
> --
> Cheers,
> Christophe de Dinechin (IRC c3d)
> 



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 07/13] virtiofsd: Release file locks using F_UNLCK
@ 2021-10-05 15:38       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 15:38 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: virtio-fs, qemu-devel, miklos

On Tue, Oct 05, 2021 at 03:37:17PM +0200, Christophe de Dinechin wrote:
> 
> On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> > We are emulating posix locks for guest using open file description locks
> > in virtiofsd. When any of the fd is closed in guest, we find associated
> > OFD lock fd (if there is one) and close it to release all the locks.
> >
> > Assumption here is that there is no other thread using lo_inode_plock
> > structure or plock->fd, hence it is safe to do so.
> >
> > But now we are about to introduce blocking variant of locks (SETLKW),
> > and that means we might be waiting to a lock to be available and
> > using plock->fd. And that means there are still users of plock
> > structure.
> >
> > So release locks using fcntl(SETLK, F_UNLCK) instead of closing fd
> > and plock will be freed later when lo_inode is being freed.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 21 +++++++++++++++++----
> >  1 file changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 38b2af8599..6928662e22 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -1557,9 +1557,6 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
> >          lo_map_remove(&lo->ino_map, inode->fuse_ino);
> >          g_hash_table_remove(lo->inodes, &inode->key);
> >          if (lo->posix_lock) {
> > -            if (g_hash_table_size(inode->posix_locks)) {
> > -                fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> > -            }
> >              g_hash_table_destroy(inode->posix_locks);
> >              pthread_mutex_destroy(&inode->plock_mutex);
> >          }
> > @@ -2266,6 +2263,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
> >      (void)ino;
> >      struct lo_inode *inode;
> >      struct lo_data *lo = lo_data(req);
> > +    struct lo_inode_plock *plock;
> > +    struct flock flock;
> >
> >      inode = lo_inode(req, ino);
> >      if (!inode) {
> > @@ -2282,8 +2281,22 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
> >      /* An fd is going away. Cleanup associated posix locks */
> >      if (lo->posix_lock) {
> >          pthread_mutex_lock(&inode->plock_mutex);
> > -        g_hash_table_remove(inode->posix_locks,
> 
> I'm curious why the g_hash_table_remove above is not in the 'if' below?

Because now we are not removing plock from hash table when file is
closed. We leave it in place and it will be cleaned up when inode
is going away.

unref_inode() {
    g_hash_table_destroy(inode->posix_locks)
}

Now it is possible that some thread is waiting for a lock and
using plock->fd. So it probably is not a good idea to close(plock->fd)
and cleanup plock yet. It could be racy too.

So instead cleanup it up when inode is going away and that time we
are sure that no thread could be waiting on a lock on this file/inode.

IOW, previously we were cleaning up plock and plock->fd in lo_flush()
and now that has been delayed to unref_inode().

Thanks
Vivek

> 
> > +        plock = g_hash_table_lookup(inode->posix_locks,
> >              GUINT_TO_POINTER(fi->lock_owner));
> > +
> > +        if (plock) {
> > +            /*
> > +             * An fd is being closed. For posix locks, this means
> > +             * drop all the associated locks.
> > +             */
> > +            memset(&flock, 0, sizeof(struct flock));
> > +            flock.l_type = F_UNLCK;
> > +            flock.l_whence = SEEK_SET;
> > +            /* Unlock whole file */
> > +            flock.l_start = flock.l_len = 0;
> > +            fcntl(plock->fd, F_OFD_SETLK, &flock);
> > +        }
> > +
> >          pthread_mutex_unlock(&inode->plock_mutex);
> >      }
> >      res = close(dup(lo_fi_fd(req, fi)));
> 
> 
> --
> Cheers,
> Christophe de Dinechin (IRC c3d)
> 


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 12/13] virtiofsd: Implement blocking posix locks
  2021-10-05 15:14       ` [Virtio-fs] " Vivek Goyal
@ 2021-10-05 15:49         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05 15:49 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 15218 bytes --]

On Tue, Oct 05, 2021 at 11:14:19AM -0400, Vivek Goyal wrote:
> On Tue, Oct 05, 2021 at 01:22:21PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> > > As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> > > -EOPNOTSUPP.
> > > 
> > > Change that by accepting these requests and returning a reply
> > > immediately asking caller to wait. Once lock is available, send a
> > > notification to the waiter indicating lock is available.
> > > 
> > > In response to lock request, we are returning error value as "1", which
> > > signals to client to queue the lock request internally and later client
> > > will get a notification which will signal lock is taken (or error). And
> > > then fuse client should wake up the guest process.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > > ---
> > >  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
> > >  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
> > >  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
> > >  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
> > >  4 files changed, 167 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > > index e4679c73ab..2e7f4b786d 100644
> > > --- a/tools/virtiofsd/fuse_lowlevel.c
> > > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
> > >          .unique = req->unique,
> > >          .error = error,
> > >      };
> > > -
> > > -    if (error <= -1000 || error > 0) {
> > > +    /* error = 1 has been used to signal client to wait for notificaiton */
> > > +    if (error <= -1000 || error > 1) {
> > >          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
> > >          out.error = -ERANGE;
> > >      }
> > > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
> > >      return send_reply(req, -err, NULL, 0);
> > >  }
> > >  
> > > +int fuse_reply_wait(fuse_req_t req)
> > > +{
> > > +    return send_reply(req, 1, NULL, 0);
> > > +}
> > > +
> > >  void fuse_reply_none(fuse_req_t req)
> > >  {
> > >      fuse_free_req(req);
> > > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
> > >      send_reply_ok(req, NULL, 0);
> > >  }
> > >  
> > > +static int send_notify_iov(struct fuse_session *se, int notify_code,
> > > +                           struct iovec *iov, int count)
> > > +{
> > > +    struct fuse_out_header out;
> > > +    if (!se->got_init) {
> > > +        return -ENOTCONN;
> > > +    }
> > > +    out.unique = 0;
> > > +    out.error = notify_code;
> > > +    iov[0].iov_base = &out;
> > > +    iov[0].iov_len = sizeof(struct fuse_out_header);
> > > +    return fuse_send_msg(se, NULL, iov, count);
> > > +}
> > > +
> > > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > > +                  int32_t error)
> > > +{
> > > +    struct fuse_notify_lock_out outarg = {0};
> > > +    struct iovec iov[2];
> > > +
> > > +    outarg.unique = unique;
> > > +    outarg.error = -error;
> > > +
> > > +    iov[1].iov_base = &outarg;
> > > +    iov[1].iov_len = sizeof(outarg);
> > > +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> > > +}
> > > +
> > >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> > >                                 off_t offset, struct fuse_bufvec *bufv)
> > >  {
> > > diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> > > index c55c0ca2fc..64624b48dc 100644
> > > --- a/tools/virtiofsd/fuse_lowlevel.h
> > > +++ b/tools/virtiofsd/fuse_lowlevel.h
> > > @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
> > >   */
> > >  int fuse_reply_err(fuse_req_t req, int err);
> > >  
> > > +/**
> > > + * Ask caller to wait for lock.
> > > + *
> > > + * Possible requests:
> > > + *   setlkw
> > > + *
> > > + * If caller sends a blocking lock request (setlkw), then reply to caller
> > > + * that wait for lock to be available. Once lock is available caller will
> > 
> > I can't parse the first sentence.
> > 
> > s/that wait for lock to be available/that waiting for the lock is
> > necessary/?
> 
> Ok, will change it.
> 
> > 
> > > + * receive a notification with request's unique id. Notification will
> > > + * carry info whether lock was successfully obtained or not.
> > > + *
> > > + * @param req request handle
> > > + * @return zero for success, -errno for failure to send reply
> > > + */
> > > +int fuse_reply_wait(fuse_req_t req);
> > > +
> > >  /**
> > >   * Don't send reply
> > >   *
> > > @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
> > >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> > >                                 off_t offset, struct fuse_bufvec *bufv);
> > >  
> > > +/**
> > > + * Notify event related to previous lock request
> > > + *
> > > + * @param se the session object
> > > + * @param unique the unique id of the request which requested setlkw
> > > + * @param error zero for success, -errno for the failure
> > > + */
> > > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > > +                              int32_t error);
> > > +
> > >  /*
> > >   * Utility functions
> > >   */
> > > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > > index a87e88e286..bb2d4456fc 100644
> > > --- a/tools/virtiofsd/fuse_virtio.c
> > > +++ b/tools/virtiofsd/fuse_virtio.c
> > > @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> > >      vu_dispatch_unlock(qi->virtio_dev);
> > >  }
> > >  
> > > +/* Returns NULL if queue is empty */
> > > +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> > > +{
> > > +    struct fuse_session *se = qi->virtio_dev->se;
> > > +    VuDev *dev = &se->virtio_dev->dev;
> > > +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> > > +    FVRequest *req;
> > > +
> > > +    vu_dispatch_rdlock(qi->virtio_dev);
> > > +    pthread_mutex_lock(&qi->vq_lock);
> > > +    /* Pop an element from queue */
> > > +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> > > +    pthread_mutex_unlock(&qi->vq_lock);
> > > +    vu_dispatch_unlock(qi->virtio_dev);
> > > +    return req;
> > > +}
> > > +
> > >  /*
> > >   * Called back by ll whenever it wants to send a reply/message back
> > >   * The 1st element of the iov starts with the fuse_out_header
> > > @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> > >  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> > >                      struct iovec *iov, int count)
> > >  {
> > > -    FVRequest *req = container_of(ch, FVRequest, ch);
> > > -    struct fv_QueueInfo *qi = ch->qi;
> > > -    VuVirtqElement *elem = &req->elem;
> > > +    FVRequest *req;
> > > +    struct fv_QueueInfo *qi;
> > > +    VuVirtqElement *elem;
> > >      int ret = 0;
> > >  
> > >      assert(count >= 1);
> > > @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> > >  
> > >      size_t tosend_len = iov_size(iov, count);
> > >  
> > > -    /* unique == 0 is notification, which we don't support */
> > > -    assert(out->unique);
> > > +    /* unique == 0 is notification */
> > > +    if (!out->unique) {
> > 
> > Is a check needed in fuse_session_process_buf_int() to reject requests
> > that the driver submitted to the device with req.unique == 0? If we get
> > confused about the correct virtqueue to use in virtio_send_msg() then
> > there could be bugs.
> 
> Ok. Should we abort/exit virtiofsd if fuse_session_process_buf_int()
> gets a request with unique=0. If we try to reply to it instead, then
> I will have to carve out a separate path which does not interpret
> unique=0 as notification request instead.
> 
> > 
> > > +        if (!se->notify_enabled) {
> > > +            return -EOPNOTSUPP;
> > > +        }
> > > +        /* If notifications are enabled, queue index 1 is notification queue */
> > > +        qi = se->virtio_dev->qi[1];
> > > +        req = vq_pop_notify_elem(qi);
> > 
> > Where is req freed?
> 
> I think we are not freeing req in case of notification. Good catch.
> Will fix it.
> 
> > 
> > > +        if (!req) {
> > > +            /*
> > > +             * TODO: Implement some sort of ring buffer and queue notifications
> > > +             * on that and send these later when notification queue has space
> > > +             * available.
> > > +             */
> > > +            return -ENOSPC;
> > 
> > This needs to be addressed before this patch series can be merged. The
> > notification vq is kicked by the guest driver when buffers are
> > replenished. The vq handler function can wake up waiting threads using a
> > condvar.
> 
> I have taken care of this using by polling in a loop (with sleep
> in between). Just that sleeping on a variable and subsequent wake
> up will be more efficient.
> 
> > 
> > > +        }
> > > +        req->reply_sent = false;
> > > +    } else {
> > > +        assert(ch);
> > > +        req = container_of(ch, FVRequest, ch);
> > > +        qi = ch->qi;
> > > +    }
> > > +
> > > +    elem = &req->elem;
> > >      assert(!req->reply_sent);
> > >  
> > >      /* The 'in' part of the elem is to qemu */
> > > @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> > >          struct fuse_notify_delete_out       delete_out;
> > >          struct fuse_notify_store_out        store_out;
> > >          struct fuse_notify_retrieve_out     retrieve_out;
> > > +        struct fuse_notify_lock_out         lock_out;
> > >      };
> > >  
> > >      notify_size = sizeof(struct fuse_out_header) +
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 6928662e22..277f74762b 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -2131,13 +2131,35 @@ out:
> > >      }
> > >  }
> > >  
> > > +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> > > +                                    int saverr)
> > > +{
> > > +    int ret;
> > > +
> > > +    do {
> > > +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> > > +        /*
> > > +         * Retry sending notification if notification queue does not have
> > > +         * free descriptor yet, otherwise break out of loop. Either we
> > > +         * successfully sent notifiation or some other error occurred.
> > > +         */
> > > +        if (ret != -ENOSPC) {
> > > +            break;
> > > +        }
> > > +        usleep(10000);
> > > +    } while (1);
> > 
> > Please use the notification vq handler to wake up blocked threads
> > instead of usleep().
> 
> Ok, I will look into it. This will be more code. First thing I can
> see that I have not started a thread for notification queue. Looks
> like I will have to start one so that that thread can see queue
> kicks and if qemu is going away. And wake up waiters.

If you think creating a thread just for the notification virtqueue is
too much, there's an alternative. Call vu_set_queue_handler() to
register a virtqueue handler callback that's invoked from the same event
loop as the vhost-user protocol thread.

> > 
> > > +}
> > > +
> > >  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> > >                       struct flock *lock, int sleep)
> > >  {
> > >      struct lo_data *lo = lo_data(req);
> > >      struct lo_inode *inode;
> > >      struct lo_inode_plock *plock;
> > > -    int ret, saverr = 0;
> > > +    int ret, saverr = 0, ofd;
> > > +    uint64_t unique;
> > > +    struct fuse_session *se = req->se;
> > > +    bool blocking_lock = false;
> > >  
> > >      fuse_log(FUSE_LOG_DEBUG,
> > >               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> > > @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> > >          return;
> > >      }
> > >  
> > > -    if (sleep) {
> > > -        fuse_reply_err(req, EOPNOTSUPP);
> > > -        return;
> > > -    }
> > > -
> > >      inode = lo_inode(req, ino);
> > >      if (!inode) {
> > >          fuse_reply_err(req, EBADF);
> > > @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> > >  
> > >      if (!plock) {
> > >          saverr = ret;
> > > +        pthread_mutex_unlock(&inode->plock_mutex);
> > >          goto out;
> > >      }
> > >  
> > > +    /*
> > > +     * plock is now released when inode is going away. We already have
> > > +     * a reference on inode, so it is guaranteed that plock->fd is
> > > +     * still around even after dropping inode->plock_mutex lock
> > > +     */
> > > +    ofd = plock->fd;
> > > +    pthread_mutex_unlock(&inode->plock_mutex);
> > > +
> > > +    /*
> > > +     * If this lock request can block, request caller to wait for
> > > +     * notification. Do not access req after this. Once lock is
> > > +     * available, send a notification instead.
> > > +     */
> > > +    if (sleep && lock->l_type != F_UNLCK) {
> > > +        /*
> > > +         * If notification queue is not enabled, can't support async
> > > +         * locks.
> > > +         */
> > > +        if (!se->notify_enabled) {
> > > +            saverr = EOPNOTSUPP;
> > > +            goto out;
> > > +        }
> > > +        blocking_lock = true;
> > > +        unique = req->unique;
> > > +        fuse_reply_wait(req);
> > > +    }
> > > +
> > >      /* TODO: Is it alright to modify flock? */
> > >      lock->l_pid = 0;
> > > -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > > +    if (blocking_lock) {
> > > +        ret = fcntl(ofd, F_OFD_SETLKW, lock);
> > 
> > SETLKW can be interrupted by signals. Should we loop here when errno ==
> > EINTR?
> 
> So there are two cases. In some cases we want to bail out because
> qemu has forced reboot kernel, and we have sent signal to this
> thread so that it stops waiting.
> 
> Other use case is that some other external entity sends signal to
> virtiofsd thread. In that case we are relying sending -EINTR to
> client and let client restart the syscall and send request again.
> 
> In future probably we can keep track of state whether we want
> to return on -EINTR or should call fcntl() again if that helps.

Returning EINTR to the client if there is a signal on the server is
strange. There is no signal on the client side and the client doesn't
care if virtiofsd was interrupted.

Bailing out to cancel a blocking operation is definitely a valid case
though.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
@ 2021-10-05 15:49         ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05 15:49 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 15218 bytes --]

On Tue, Oct 05, 2021 at 11:14:19AM -0400, Vivek Goyal wrote:
> On Tue, Oct 05, 2021 at 01:22:21PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:36AM -0400, Vivek Goyal wrote:
> > > As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> > > -EOPNOTSUPP.
> > > 
> > > Change that by accepting these requests and returning a reply
> > > immediately asking caller to wait. Once lock is available, send a
> > > notification to the waiter indicating lock is available.
> > > 
> > > In response to lock request, we are returning error value as "1", which
> > > signals to client to queue the lock request internally and later client
> > > will get a notification which will signal lock is taken (or error). And
> > > then fuse client should wake up the guest process.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > > ---
> > >  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
> > >  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
> > >  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
> > >  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
> > >  4 files changed, 167 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > > index e4679c73ab..2e7f4b786d 100644
> > > --- a/tools/virtiofsd/fuse_lowlevel.c
> > > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
> > >          .unique = req->unique,
> > >          .error = error,
> > >      };
> > > -
> > > -    if (error <= -1000 || error > 0) {
> > > +    /* error = 1 has been used to signal client to wait for notificaiton */
> > > +    if (error <= -1000 || error > 1) {
> > >          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
> > >          out.error = -ERANGE;
> > >      }
> > > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
> > >      return send_reply(req, -err, NULL, 0);
> > >  }
> > >  
> > > +int fuse_reply_wait(fuse_req_t req)
> > > +{
> > > +    return send_reply(req, 1, NULL, 0);
> > > +}
> > > +
> > >  void fuse_reply_none(fuse_req_t req)
> > >  {
> > >      fuse_free_req(req);
> > > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
> > >      send_reply_ok(req, NULL, 0);
> > >  }
> > >  
> > > +static int send_notify_iov(struct fuse_session *se, int notify_code,
> > > +                           struct iovec *iov, int count)
> > > +{
> > > +    struct fuse_out_header out;
> > > +    if (!se->got_init) {
> > > +        return -ENOTCONN;
> > > +    }
> > > +    out.unique = 0;
> > > +    out.error = notify_code;
> > > +    iov[0].iov_base = &out;
> > > +    iov[0].iov_len = sizeof(struct fuse_out_header);
> > > +    return fuse_send_msg(se, NULL, iov, count);
> > > +}
> > > +
> > > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > > +                  int32_t error)
> > > +{
> > > +    struct fuse_notify_lock_out outarg = {0};
> > > +    struct iovec iov[2];
> > > +
> > > +    outarg.unique = unique;
> > > +    outarg.error = -error;
> > > +
> > > +    iov[1].iov_base = &outarg;
> > > +    iov[1].iov_len = sizeof(outarg);
> > > +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> > > +}
> > > +
> > >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> > >                                 off_t offset, struct fuse_bufvec *bufv)
> > >  {
> > > diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> > > index c55c0ca2fc..64624b48dc 100644
> > > --- a/tools/virtiofsd/fuse_lowlevel.h
> > > +++ b/tools/virtiofsd/fuse_lowlevel.h
> > > @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
> > >   */
> > >  int fuse_reply_err(fuse_req_t req, int err);
> > >  
> > > +/**
> > > + * Ask caller to wait for lock.
> > > + *
> > > + * Possible requests:
> > > + *   setlkw
> > > + *
> > > + * If caller sends a blocking lock request (setlkw), then reply to caller
> > > + * that wait for lock to be available. Once lock is available caller will
> > 
> > I can't parse the first sentence.
> > 
> > s/that wait for lock to be available/that waiting for the lock is
> > necessary/?
> 
> Ok, will change it.
> 
> > 
> > > + * receive a notification with request's unique id. Notification will
> > > + * carry info whether lock was successfully obtained or not.
> > > + *
> > > + * @param req request handle
> > > + * @return zero for success, -errno for failure to send reply
> > > + */
> > > +int fuse_reply_wait(fuse_req_t req);
> > > +
> > >  /**
> > >   * Don't send reply
> > >   *
> > > @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
> > >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> > >                                 off_t offset, struct fuse_bufvec *bufv);
> > >  
> > > +/**
> > > + * Notify event related to previous lock request
> > > + *
> > > + * @param se the session object
> > > + * @param unique the unique id of the request which requested setlkw
> > > + * @param error zero for success, -errno for the failure
> > > + */
> > > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > > +                              int32_t error);
> > > +
> > >  /*
> > >   * Utility functions
> > >   */
> > > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > > index a87e88e286..bb2d4456fc 100644
> > > --- a/tools/virtiofsd/fuse_virtio.c
> > > +++ b/tools/virtiofsd/fuse_virtio.c
> > > @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> > >      vu_dispatch_unlock(qi->virtio_dev);
> > >  }
> > >  
> > > +/* Returns NULL if queue is empty */
> > > +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> > > +{
> > > +    struct fuse_session *se = qi->virtio_dev->se;
> > > +    VuDev *dev = &se->virtio_dev->dev;
> > > +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> > > +    FVRequest *req;
> > > +
> > > +    vu_dispatch_rdlock(qi->virtio_dev);
> > > +    pthread_mutex_lock(&qi->vq_lock);
> > > +    /* Pop an element from queue */
> > > +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> > > +    pthread_mutex_unlock(&qi->vq_lock);
> > > +    vu_dispatch_unlock(qi->virtio_dev);
> > > +    return req;
> > > +}
> > > +
> > >  /*
> > >   * Called back by ll whenever it wants to send a reply/message back
> > >   * The 1st element of the iov starts with the fuse_out_header
> > > @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> > >  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> > >                      struct iovec *iov, int count)
> > >  {
> > > -    FVRequest *req = container_of(ch, FVRequest, ch);
> > > -    struct fv_QueueInfo *qi = ch->qi;
> > > -    VuVirtqElement *elem = &req->elem;
> > > +    FVRequest *req;
> > > +    struct fv_QueueInfo *qi;
> > > +    VuVirtqElement *elem;
> > >      int ret = 0;
> > >  
> > >      assert(count >= 1);
> > > @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> > >  
> > >      size_t tosend_len = iov_size(iov, count);
> > >  
> > > -    /* unique == 0 is notification, which we don't support */
> > > -    assert(out->unique);
> > > +    /* unique == 0 is notification */
> > > +    if (!out->unique) {
> > 
> > Is a check needed in fuse_session_process_buf_int() to reject requests
> > that the driver submitted to the device with req.unique == 0? If we get
> > confused about the correct virtqueue to use in virtio_send_msg() then
> > there could be bugs.
> 
> Ok. Should we abort/exit virtiofsd if fuse_session_process_buf_int()
> gets a request with unique=0. If we try to reply to it instead, then
> I will have to carve out a separate path which does not interpret
> unique=0 as notification request instead.
> 
> > 
> > > +        if (!se->notify_enabled) {
> > > +            return -EOPNOTSUPP;
> > > +        }
> > > +        /* If notifications are enabled, queue index 1 is notification queue */
> > > +        qi = se->virtio_dev->qi[1];
> > > +        req = vq_pop_notify_elem(qi);
> > 
> > Where is req freed?
> 
> I think we are not freeing req in case of notification. Good catch.
> Will fix it.
> 
> > 
> > > +        if (!req) {
> > > +            /*
> > > +             * TODO: Implement some sort of ring buffer and queue notifications
> > > +             * on that and send these later when notification queue has space
> > > +             * available.
> > > +             */
> > > +            return -ENOSPC;
> > 
> > This needs to be addressed before this patch series can be merged. The
> > notification vq is kicked by the guest driver when buffers are
> > replenished. The vq handler function can wake up waiting threads using a
> > condvar.
> 
> I have taken care of this using by polling in a loop (with sleep
> in between). Just that sleeping on a variable and subsequent wake
> up will be more efficient.
> 
> > 
> > > +        }
> > > +        req->reply_sent = false;
> > > +    } else {
> > > +        assert(ch);
> > > +        req = container_of(ch, FVRequest, ch);
> > > +        qi = ch->qi;
> > > +    }
> > > +
> > > +    elem = &req->elem;
> > >      assert(!req->reply_sent);
> > >  
> > >      /* The 'in' part of the elem is to qemu */
> > > @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> > >          struct fuse_notify_delete_out       delete_out;
> > >          struct fuse_notify_store_out        store_out;
> > >          struct fuse_notify_retrieve_out     retrieve_out;
> > > +        struct fuse_notify_lock_out         lock_out;
> > >      };
> > >  
> > >      notify_size = sizeof(struct fuse_out_header) +
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 6928662e22..277f74762b 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -2131,13 +2131,35 @@ out:
> > >      }
> > >  }
> > >  
> > > +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> > > +                                    int saverr)
> > > +{
> > > +    int ret;
> > > +
> > > +    do {
> > > +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> > > +        /*
> > > +         * Retry sending notification if notification queue does not have
> > > +         * free descriptor yet, otherwise break out of loop. Either we
> > > +         * successfully sent notifiation or some other error occurred.
> > > +         */
> > > +        if (ret != -ENOSPC) {
> > > +            break;
> > > +        }
> > > +        usleep(10000);
> > > +    } while (1);
> > 
> > Please use the notification vq handler to wake up blocked threads
> > instead of usleep().
> 
> Ok, I will look into it. This will be more code. First thing I can
> see that I have not started a thread for notification queue. Looks
> like I will have to start one so that that thread can see queue
> kicks and if qemu is going away. And wake up waiters.

If you think creating a thread just for the notification virtqueue is
too much, there's an alternative. Call vu_set_queue_handler() to
register a virtqueue handler callback that's invoked from the same event
loop as the vhost-user protocol thread.

> > 
> > > +}
> > > +
> > >  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> > >                       struct flock *lock, int sleep)
> > >  {
> > >      struct lo_data *lo = lo_data(req);
> > >      struct lo_inode *inode;
> > >      struct lo_inode_plock *plock;
> > > -    int ret, saverr = 0;
> > > +    int ret, saverr = 0, ofd;
> > > +    uint64_t unique;
> > > +    struct fuse_session *se = req->se;
> > > +    bool blocking_lock = false;
> > >  
> > >      fuse_log(FUSE_LOG_DEBUG,
> > >               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> > > @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> > >          return;
> > >      }
> > >  
> > > -    if (sleep) {
> > > -        fuse_reply_err(req, EOPNOTSUPP);
> > > -        return;
> > > -    }
> > > -
> > >      inode = lo_inode(req, ino);
> > >      if (!inode) {
> > >          fuse_reply_err(req, EBADF);
> > > @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> > >  
> > >      if (!plock) {
> > >          saverr = ret;
> > > +        pthread_mutex_unlock(&inode->plock_mutex);
> > >          goto out;
> > >      }
> > >  
> > > +    /*
> > > +     * plock is now released when inode is going away. We already have
> > > +     * a reference on inode, so it is guaranteed that plock->fd is
> > > +     * still around even after dropping inode->plock_mutex lock
> > > +     */
> > > +    ofd = plock->fd;
> > > +    pthread_mutex_unlock(&inode->plock_mutex);
> > > +
> > > +    /*
> > > +     * If this lock request can block, request caller to wait for
> > > +     * notification. Do not access req after this. Once lock is
> > > +     * available, send a notification instead.
> > > +     */
> > > +    if (sleep && lock->l_type != F_UNLCK) {
> > > +        /*
> > > +         * If notification queue is not enabled, can't support async
> > > +         * locks.
> > > +         */
> > > +        if (!se->notify_enabled) {
> > > +            saverr = EOPNOTSUPP;
> > > +            goto out;
> > > +        }
> > > +        blocking_lock = true;
> > > +        unique = req->unique;
> > > +        fuse_reply_wait(req);
> > > +    }
> > > +
> > >      /* TODO: Is it alright to modify flock? */
> > >      lock->l_pid = 0;
> > > -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > > +    if (blocking_lock) {
> > > +        ret = fcntl(ofd, F_OFD_SETLKW, lock);
> > 
> > SETLKW can be interrupted by signals. Should we loop here when errno ==
> > EINTR?
> 
> So there are two cases. In some cases we want to bail out because
> qemu has forced reboot kernel, and we have sent signal to this
> thread so that it stops waiting.
> 
> Other use case is that some other external entity sends signal to
> virtiofsd thread. In that case we are relying sending -EINTR to
> client and let client restart the syscall and send request again.
> 
> In future probably we can keep track of state whether we want
> to return on -EINTR or should call fcntl() again if that helps.

Returning EINTR to the client if there is a signal on the server is
strange. There is no signal on the client side and the client doesn't
care if virtiofsd was interrupted.

Bailing out to cancel a blocking operation is definitely a valid case
though.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list
  2021-10-05 15:16     ` Vivek Goyal
@ 2021-10-05 15:50       ` Stefan Hajnoczi
  2021-10-05 17:28         ` Vivek Goyal
  0 siblings, 1 reply; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-05 15:50 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos

[-- Attachment #1: Type: text/plain, Size: 1440 bytes --]

On Tue, Oct 05, 2021 at 11:16:18AM -0400, Vivek Goyal wrote:
> On Tue, Oct 05, 2021 at 01:22:58PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:37AM -0400, Vivek Goyal wrote:
> > > g_usleep() calls nanosleep() and that now seems to call clock_nanosleep()
> > > syscall. Now these patches are making use of g_usleep(). So add
> > > clock_nanosleep() to list of allowed syscalls.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > ---
> > >  tools/virtiofsd/passthrough_seccomp.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> > > index cd24b40b78..03080806c0 100644
> > > --- a/tools/virtiofsd/passthrough_seccomp.c
> > > +++ b/tools/virtiofsd/passthrough_seccomp.c
> > > @@ -117,6 +117,7 @@ static const int syscall_allowlist[] = {
> > >      SCMP_SYS(writev),
> > >      SCMP_SYS(umask),
> > >      SCMP_SYS(nanosleep),
> > > +    SCMP_SYS(clock_nanosleep),
> > 
> > This patch can be dropped once sleep has been replaced by a condvar.
> 
> There is another sleep in do_pool_destroy() where we are waiting
> for all current threads to exit.
> 
> do_pool_destroy() {
>     g_usleep(10000);
> }

That won't be necessary if there's a way to avoid the thread pool :).
See my other reply about closing the OFD instead of using signals to
cancel blocking fcntl(2).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list
  2021-10-05 15:50       ` Stefan Hajnoczi
@ 2021-10-05 17:28         ` Vivek Goyal
  2021-10-06 10:27           ` Stefan Hajnoczi
  0 siblings, 1 reply; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 17:28 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, qemu-devel, miklos

On Tue, Oct 05, 2021 at 04:50:43PM +0100, Stefan Hajnoczi wrote:
> On Tue, Oct 05, 2021 at 11:16:18AM -0400, Vivek Goyal wrote:
> > On Tue, Oct 05, 2021 at 01:22:58PM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Sep 30, 2021 at 11:30:37AM -0400, Vivek Goyal wrote:
> > > > g_usleep() calls nanosleep() and that now seems to call clock_nanosleep()
> > > > syscall. Now these patches are making use of g_usleep(). So add
> > > > clock_nanosleep() to list of allowed syscalls.
> > > > 
> > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/passthrough_seccomp.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> > > > index cd24b40b78..03080806c0 100644
> > > > --- a/tools/virtiofsd/passthrough_seccomp.c
> > > > +++ b/tools/virtiofsd/passthrough_seccomp.c
> > > > @@ -117,6 +117,7 @@ static const int syscall_allowlist[] = {
> > > >      SCMP_SYS(writev),
> > > >      SCMP_SYS(umask),
> > > >      SCMP_SYS(nanosleep),
> > > > +    SCMP_SYS(clock_nanosleep),
> > > 
> > > This patch can be dropped once sleep has been replaced by a condvar.
> > 
> > There is another sleep in do_pool_destroy() where we are waiting
> > for all current threads to exit.
> > 
> > do_pool_destroy() {
> >     g_usleep(10000);
> > }
> 
> That won't be necessary if there's a way to avoid the thread pool :).
> See my other reply about closing the OFD instead of using signals to
> cancel blocking fcntl(2).

Hi Stefan,

I responded to that email already. man fnctl does not say anything
about closing fd will unblock the waiter with -EINTR and I had a 
quick look at kernel code and did not find anything which suggested
closing fd will unblock current waiters.

So is this something you know works or you want me to try and see
if it works?

Thanks
Vivek


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
  2021-10-04 14:54     ` [Virtio-fs] " Stefan Hajnoczi
@ 2021-10-05 20:09       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 20:09 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

On Mon, Oct 04, 2021 at 03:54:31PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote:
> > Add a new custom threadpool using posix threads that specifically
> > service locking requests.
> > 
> > In the case of a fcntl(SETLKW) request, if the guest is waiting
> > for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
> > unblocks the blocked threads by sending a signal to them and waking
> > them up.
> > 
> > The current threadpool (GThreadPool) is not adequate to service the
> > locking requests that result in a thread blocking. That is because
> > GLib does not provide an API to cancel the request while it is
> > serviced by a thread. In addition, a user might be running virtiofsd
> > without a threadpool (--thread-pool-size=0), thus a locking request
> > that blocks, will block the main virtqueue thread that services requests
> > from servicing any other requests.
> > 
> > The only exception occurs when the lock is of type F_UNLCK. In this case
> > the request is serviced by the main virtqueue thread or a GThreadPool
> > thread to avoid a deadlock, when all the threads in the custom threadpool
> > are blocked.
> > 
> > Then virtiofsd proceeds to cleanup the state of the threads, release
> > them back to the system and re-initialize.
> 
> Is there another way to cancel SETLKW without resorting to a new thread
> pool? Since this only matters when shutting down or restarting, can we
> close all plock->fd file descriptors to kick the GThreadPool workers out
> of fnctl()?

Ok, I tested this. If a thread is blocked on OFD lock and another
thread closes associated "fd", it does not unblock the thread
which is blocked on lock. So closing OFD can't be used for unblocking
a thread.

Even if it could be, it can't be a replacement for a thread pool
in general as we can't block main thread otherwise it can deadlock.
But we could have used another glib thread pool (instead of a
custom thread pool which can handle signals to unblock threads).

If you are curious, here is my test program.

https://github.com/rhvgoyal/misc/blob/master/virtiofs-tests/ofd-lock.c

Comments in there explain how to use it. It can block on an OFD
lock and one can send SIGUSR1 which will close fd.

Thanks
Vivek



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
@ 2021-10-05 20:09       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-05 20:09 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: miklos, qemu-devel, virtio-fs

On Mon, Oct 04, 2021 at 03:54:31PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote:
> > Add a new custom threadpool using posix threads that specifically
> > service locking requests.
> > 
> > In the case of a fcntl(SETLKW) request, if the guest is waiting
> > for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
> > unblocks the blocked threads by sending a signal to them and waking
> > them up.
> > 
> > The current threadpool (GThreadPool) is not adequate to service the
> > locking requests that result in a thread blocking. That is because
> > GLib does not provide an API to cancel the request while it is
> > serviced by a thread. In addition, a user might be running virtiofsd
> > without a threadpool (--thread-pool-size=0), thus a locking request
> > that blocks, will block the main virtqueue thread that services requests
> > from servicing any other requests.
> > 
> > The only exception occurs when the lock is of type F_UNLCK. In this case
> > the request is serviced by the main virtqueue thread or a GThreadPool
> > thread to avoid a deadlock, when all the threads in the custom threadpool
> > are blocked.
> > 
> > Then virtiofsd proceeds to cleanup the state of the threads, release
> > them back to the system and re-initialize.
> 
> Is there another way to cancel SETLKW without resorting to a new thread
> pool? Since this only matters when shutting down or restarting, can we
> close all plock->fd file descriptors to kick the GThreadPool workers out
> of fnctl()?

Ok, I tested this. If a thread is blocked on OFD lock and another
thread closes associated "fd", it does not unblock the thread
which is blocked on lock. So closing OFD can't be used for unblocking
a thread.

Even if it could be, it can't be a replacement for a thread pool
in general as we can't block main thread otherwise it can deadlock.
But we could have used another glib thread pool (instead of a
custom thread pool which can handle signals to unblock threads).

If you are curious, here is my test program.

https://github.com/rhvgoyal/misc/blob/master/virtiofs-tests/ofd-lock.c

Comments in there explain how to use it. It can block on an OFD
lock and one can send SIGUSR1 which will close fd.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 09/13] virtiofsd: Specify size of notification buffer using config space
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-06 10:05     ` Christophe de Dinechin
  -1 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-06 10:05 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, stefanha, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> Daemon specifies size of notification buffer needed and that should be
> done using config space.
>
> Only ->notify_buf_size value of config space comes from daemon. Rest of
> it is filled by qemu device emulation code.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c                  | 27 +++++++++++++++++++
>  include/hw/virtio/vhost-user-fs.h          |  2 ++
>  include/standard-headers/linux/virtio_fs.h |  2 ++
>  tools/virtiofsd/fuse_virtio.c              | 31 ++++++++++++++++++++++
>  4 files changed, 62 insertions(+)
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 6bafcf0243..68a94708b4 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -36,15 +36,41 @@ static const int user_feature_bits[] = {
>      VHOST_INVALID_FEATURE_BIT
>  };
>
> +static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
> +{
> +    return 0;
> +}
> +
> +const VhostDevConfigOps fs_ops = {
> +    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
> +};
> +
>  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>      struct virtio_fs_config fscfg = {};
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    /*
> +     * As of now we only get notification buffer size from device. And that's
> +     * needed only if notification queue is enabled.
> +     */
> +    if (fs->notify_enabled) {
> +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> +                                   sizeof(struct virtio_fs_config),
> +                                   &local_err);
> +        if (ret) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +    }

I was a bit puzzled by this form of error reporting from the config
callback. It looks like this is not a first, the same pattern exists in
vhost-user-input, vhost-user-gpu, vhost-user-gpu.

However, in all these other cases, there is a memset of the config data to
zero before returning, so you don't leave it uninitialized. Only
vhost-user-blk follows a similar pattern as the code above. Apparently,
vhost_dev_get_config itself does not zero the config either.

Would it be worth adding the following to the error path?

    memset(config, 0, sizeof(fscfg));

>
>      memcpy((char *)fscfg.tag, fs->conf.tag,
>             MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
>
>      virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
> +    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
>
>      memcpy(config, &fscfg, sizeof(fscfg));
>  }
> @@ -316,6 +342,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>                  sizeof(struct virtio_fs_config));
>
>      vuf_create_vqs(vdev, true);
> +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
>      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
>                           VHOST_BACKEND_TYPE_USER, 0, errp);
>      if (ret < 0) {
> diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> index 95dc0dd402..3b114ee260 100644
> --- a/include/hw/virtio/vhost-user-fs.h
> +++ b/include/hw/virtio/vhost-user-fs.h
> @@ -14,6 +14,7 @@
>  #ifndef _QEMU_VHOST_USER_FS_H
>  #define _QEMU_VHOST_USER_FS_H
>
> +#include "standard-headers/linux/virtio_fs.h"
>  #include "hw/virtio/virtio.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
> @@ -37,6 +38,7 @@ struct VHostUserFS {
>      struct vhost_virtqueue *vhost_vqs;
>      struct vhost_dev vhost_dev;
>      VhostUserState vhost_user;
> +    struct virtio_fs_config fscfg;
>      VirtQueue **req_vqs;
>      VirtQueue *hiprio_vq;
>      VirtQueue *notification_vq;
> diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
> index b7f015186e..867d18acf6 100644
> --- a/include/standard-headers/linux/virtio_fs.h
> +++ b/include/standard-headers/linux/virtio_fs.h
> @@ -17,6 +17,8 @@ struct virtio_fs_config {
>
>  	/* Number of request queues */
>  	uint32_t num_request_queues;
> +	/* Size of notification buffer */
> +	uint32_t notify_buf_size;
>  } QEMU_PACKED;
>
>  /* For the id field in virtio_pci_shm_cap */
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index f5b87a508a..3b720c5d4a 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -856,6 +856,35 @@ static bool fv_queue_order(VuDev *dev, int qidx)
>      return false;
>  }
>
> +static uint64_t fv_get_protocol_features(VuDev *dev)
> +{
> +    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> +}
> +
> +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> +{
> +    struct virtio_fs_config fscfg = {};
> +    unsigned notify_size, roundto = 64;
> +    union fuse_notify_union {
> +        struct fuse_notify_poll_wakeup_out  wakeup_out;
> +        struct fuse_notify_inval_inode_out  inode_out;
> +        struct fuse_notify_inval_entry_out  entry_out;
> +        struct fuse_notify_delete_out       delete_out;
> +        struct fuse_notify_store_out        store_out;
> +        struct fuse_notify_retrieve_out     retrieve_out;
> +    };
> +
> +    notify_size = sizeof(struct fuse_out_header) +
> +              sizeof(union fuse_notify_union);
> +    notify_size = ((notify_size + roundto) / roundto) * roundto;
> +
> +    fscfg.notify_buf_size = notify_size;
> +    memcpy(config, &fscfg, len);
> +    fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
> +             fscfg.notify_buf_size);
> +    return 0;
> +}
> +
>  static const VuDevIface fv_iface = {
>      .get_features = fv_get_features,
>      .set_features = fv_set_features,
> @@ -864,6 +893,8 @@ static const VuDevIface fv_iface = {
>      .queue_set_started = fv_queue_set_started,
>
>      .queue_is_processed_in_order = fv_queue_order,
> +    .get_protocol_features = fv_get_protocol_features,
> +    .get_config = fv_get_config,
>  };
>
>  /*


--
Cheers,
Christophe de Dinechin (IRC c3d)



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 09/13] virtiofsd: Specify size of notification buffer using config space
@ 2021-10-06 10:05     ` Christophe de Dinechin
  0 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-06 10:05 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> Daemon specifies size of notification buffer needed and that should be
> done using config space.
>
> Only ->notify_buf_size value of config space comes from daemon. Rest of
> it is filled by qemu device emulation code.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c                  | 27 +++++++++++++++++++
>  include/hw/virtio/vhost-user-fs.h          |  2 ++
>  include/standard-headers/linux/virtio_fs.h |  2 ++
>  tools/virtiofsd/fuse_virtio.c              | 31 ++++++++++++++++++++++
>  4 files changed, 62 insertions(+)
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 6bafcf0243..68a94708b4 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -36,15 +36,41 @@ static const int user_feature_bits[] = {
>      VHOST_INVALID_FEATURE_BIT
>  };
>
> +static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
> +{
> +    return 0;
> +}
> +
> +const VhostDevConfigOps fs_ops = {
> +    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
> +};
> +
>  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>      struct virtio_fs_config fscfg = {};
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    /*
> +     * As of now we only get notification buffer size from device. And that's
> +     * needed only if notification queue is enabled.
> +     */
> +    if (fs->notify_enabled) {
> +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> +                                   sizeof(struct virtio_fs_config),
> +                                   &local_err);
> +        if (ret) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +    }

I was a bit puzzled by this form of error reporting from the config
callback. It looks like this is not a first, the same pattern exists in
vhost-user-input, vhost-user-gpu, vhost-user-gpu.

However, in all these other cases, there is a memset of the config data to
zero before returning, so you don't leave it uninitialized. Only
vhost-user-blk follows a similar pattern as the code above. Apparently,
vhost_dev_get_config itself does not zero the config either.

Would it be worth adding the following to the error path?

    memset(config, 0, sizeof(fscfg));

>
>      memcpy((char *)fscfg.tag, fs->conf.tag,
>             MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
>
>      virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
> +    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
>
>      memcpy(config, &fscfg, sizeof(fscfg));
>  }
> @@ -316,6 +342,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>                  sizeof(struct virtio_fs_config));
>
>      vuf_create_vqs(vdev, true);
> +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
>      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
>                           VHOST_BACKEND_TYPE_USER, 0, errp);
>      if (ret < 0) {
> diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
> index 95dc0dd402..3b114ee260 100644
> --- a/include/hw/virtio/vhost-user-fs.h
> +++ b/include/hw/virtio/vhost-user-fs.h
> @@ -14,6 +14,7 @@
>  #ifndef _QEMU_VHOST_USER_FS_H
>  #define _QEMU_VHOST_USER_FS_H
>
> +#include "standard-headers/linux/virtio_fs.h"
>  #include "hw/virtio/virtio.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
> @@ -37,6 +38,7 @@ struct VHostUserFS {
>      struct vhost_virtqueue *vhost_vqs;
>      struct vhost_dev vhost_dev;
>      VhostUserState vhost_user;
> +    struct virtio_fs_config fscfg;
>      VirtQueue **req_vqs;
>      VirtQueue *hiprio_vq;
>      VirtQueue *notification_vq;
> diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
> index b7f015186e..867d18acf6 100644
> --- a/include/standard-headers/linux/virtio_fs.h
> +++ b/include/standard-headers/linux/virtio_fs.h
> @@ -17,6 +17,8 @@ struct virtio_fs_config {
>
>  	/* Number of request queues */
>  	uint32_t num_request_queues;
> +	/* Size of notification buffer */
> +	uint32_t notify_buf_size;
>  } QEMU_PACKED;
>
>  /* For the id field in virtio_pci_shm_cap */
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index f5b87a508a..3b720c5d4a 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -856,6 +856,35 @@ static bool fv_queue_order(VuDev *dev, int qidx)
>      return false;
>  }
>
> +static uint64_t fv_get_protocol_features(VuDev *dev)
> +{
> +    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> +}
> +
> +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> +{
> +    struct virtio_fs_config fscfg = {};
> +    unsigned notify_size, roundto = 64;
> +    union fuse_notify_union {
> +        struct fuse_notify_poll_wakeup_out  wakeup_out;
> +        struct fuse_notify_inval_inode_out  inode_out;
> +        struct fuse_notify_inval_entry_out  entry_out;
> +        struct fuse_notify_delete_out       delete_out;
> +        struct fuse_notify_store_out        store_out;
> +        struct fuse_notify_retrieve_out     retrieve_out;
> +    };
> +
> +    notify_size = sizeof(struct fuse_out_header) +
> +              sizeof(union fuse_notify_union);
> +    notify_size = ((notify_size + roundto) / roundto) * roundto;
> +
> +    fscfg.notify_buf_size = notify_size;
> +    memcpy(config, &fscfg, len);
> +    fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
> +             fscfg.notify_buf_size);
> +    return 0;
> +}
> +
>  static const VuDevIface fv_iface = {
>      .get_features = fv_get_features,
>      .set_features = fv_set_features,
> @@ -864,6 +893,8 @@ static const VuDevIface fv_iface = {
>      .queue_set_started = fv_queue_set_started,
>
>      .queue_is_processed_in_order = fv_queue_order,
> +    .get_protocol_features = fv_get_protocol_features,
> +    .get_config = fv_get_config,
>  };
>
>  /*


--
Cheers,
Christophe de Dinechin (IRC c3d)


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
  2021-10-05 20:09       ` [Virtio-fs] " Vivek Goyal
@ 2021-10-06 10:26         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-06 10:26 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, iangelak, dgilbert, virtio-fs, jaggel

[-- Attachment #1: Type: text/plain, Size: 3526 bytes --]

On Tue, Oct 05, 2021 at 04:09:35PM -0400, Vivek Goyal wrote:
> On Mon, Oct 04, 2021 at 03:54:31PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote:
> > > Add a new custom threadpool using posix threads that specifically
> > > service locking requests.
> > > 
> > > In the case of a fcntl(SETLKW) request, if the guest is waiting
> > > for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
> > > unblocks the blocked threads by sending a signal to them and waking
> > > them up.
> > > 
> > > The current threadpool (GThreadPool) is not adequate to service the
> > > locking requests that result in a thread blocking. That is because
> > > GLib does not provide an API to cancel the request while it is
> > > serviced by a thread. In addition, a user might be running virtiofsd
> > > without a threadpool (--thread-pool-size=0), thus a locking request
> > > that blocks, will block the main virtqueue thread that services requests
> > > from servicing any other requests.
> > > 
> > > The only exception occurs when the lock is of type F_UNLCK. In this case
> > > the request is serviced by the main virtqueue thread or a GThreadPool
> > > thread to avoid a deadlock, when all the threads in the custom threadpool
> > > are blocked.
> > > 
> > > Then virtiofsd proceeds to cleanup the state of the threads, release
> > > them back to the system and re-initialize.
> > 
> > Is there another way to cancel SETLKW without resorting to a new thread
> > pool? Since this only matters when shutting down or restarting, can we
> > close all plock->fd file descriptors to kick the GThreadPool workers out
> > of fnctl()?
> 
> Ok, I tested this. If a thread is blocked on OFD lock and another
> thread closes associated "fd", it does not unblock the thread
> which is blocked on lock. So closing OFD can't be used for unblocking
> a thread.
> 
> Even if it could be, it can't be a replacement for a thread pool
> in general as we can't block main thread otherwise it can deadlock.
> But we could have used another glib thread pool (instead of a
> custom thread pool which can handle signals to unblock threads).
> 
> If you are curious, here is my test program.
> 
> https://github.com/rhvgoyal/misc/blob/master/virtiofs-tests/ofd-lock.c
> 
> Comments in there explain how to use it. It can block on an OFD
> lock and one can send SIGUSR1 which will close fd.

Thanks for investigating this! Too bad that the semantics of SETLKW are
not usable:

I ran two instances on my system so that the second instance blocks in
SETLKW and found the same thing. fcntl(fd, F_OFD_SETLKW, &flock) return
success even though the other thread already closed the fd while the
main thread was blocked in fcntl().

Here is where it gets weird: lslocks(1) shows the OFD locks that are
acquired (process 1) and waiting (process 2). When process 1 terminates,
process 2 makes progress but lslocks(1) shows there are no OFD locks.

This suggests that when fcntl(2) returns success in process 2, the OFD
lock is immediately released by the kernel since the fd was already
closed beforehand. Process 2 would have no way of releasing the lock
since it already closed its fd. So the 0 return value does not really
mean success - there is no acquired OFD lock when fcntl(2) returns!

The problem is that doesn't return early with -EBADFD or similar when
fcntl(2) is blocked, so we cannot use close(fd) to interrupt it :(.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
@ 2021-10-06 10:26         ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-06 10:26 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

[-- Attachment #1: Type: text/plain, Size: 3526 bytes --]

On Tue, Oct 05, 2021 at 04:09:35PM -0400, Vivek Goyal wrote:
> On Mon, Oct 04, 2021 at 03:54:31PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote:
> > > Add a new custom threadpool using posix threads that specifically
> > > service locking requests.
> > > 
> > > In the case of a fcntl(SETLKW) request, if the guest is waiting
> > > for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
> > > unblocks the blocked threads by sending a signal to them and waking
> > > them up.
> > > 
> > > The current threadpool (GThreadPool) is not adequate to service the
> > > locking requests that result in a thread blocking. That is because
> > > GLib does not provide an API to cancel the request while it is
> > > serviced by a thread. In addition, a user might be running virtiofsd
> > > without a threadpool (--thread-pool-size=0), thus a locking request
> > > that blocks, will block the main virtqueue thread that services requests
> > > from servicing any other requests.
> > > 
> > > The only exception occurs when the lock is of type F_UNLCK. In this case
> > > the request is serviced by the main virtqueue thread or a GThreadPool
> > > thread to avoid a deadlock, when all the threads in the custom threadpool
> > > are blocked.
> > > 
> > > Then virtiofsd proceeds to cleanup the state of the threads, release
> > > them back to the system and re-initialize.
> > 
> > Is there another way to cancel SETLKW without resorting to a new thread
> > pool? Since this only matters when shutting down or restarting, can we
> > close all plock->fd file descriptors to kick the GThreadPool workers out
> > of fnctl()?
> 
> Ok, I tested this. If a thread is blocked on OFD lock and another
> thread closes associated "fd", it does not unblock the thread
> which is blocked on lock. So closing OFD can't be used for unblocking
> a thread.
> 
> Even if it could be, it can't be a replacement for a thread pool
> in general as we can't block main thread otherwise it can deadlock.
> But we could have used another glib thread pool (instead of a
> custom thread pool which can handle signals to unblock threads).
> 
> If you are curious, here is my test program.
> 
> https://github.com/rhvgoyal/misc/blob/master/virtiofs-tests/ofd-lock.c
> 
> Comments in there explain how to use it. It can block on an OFD
> lock and one can send SIGUSR1 which will close fd.

Thanks for investigating this! Too bad that the semantics of SETLKW are
not usable:

I ran two instances on my system so that the second instance blocks in
SETLKW and found the same thing. fcntl(fd, F_OFD_SETLKW, &flock) return
success even though the other thread already closed the fd while the
main thread was blocked in fcntl().

Here is where it gets weird: lslocks(1) shows the OFD locks that are
acquired (process 1) and waiting (process 2). When process 1 terminates,
process 2 makes progress but lslocks(1) shows there are no OFD locks.

This suggests that when fcntl(2) returns success in process 2, the OFD
lock is immediately released by the kernel since the fd was already
closed beforehand. Process 2 would have no way of releasing the lock
since it already closed its fd. So the 0 return value does not really
mean success - there is no acquired OFD lock when fcntl(2) returns!

The problem is that doesn't return early with -EBADFD or similar when
fcntl(2) is blocked, so we cannot use close(fd) to interrupt it :(.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list
  2021-10-05 17:28         ` Vivek Goyal
@ 2021-10-06 10:27           ` Stefan Hajnoczi
  0 siblings, 0 replies; 106+ messages in thread
From: Stefan Hajnoczi @ 2021-10-06 10:27 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos

[-- Attachment #1: Type: text/plain, Size: 2097 bytes --]

On Tue, Oct 05, 2021 at 01:28:21PM -0400, Vivek Goyal wrote:
> On Tue, Oct 05, 2021 at 04:50:43PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Oct 05, 2021 at 11:16:18AM -0400, Vivek Goyal wrote:
> > > On Tue, Oct 05, 2021 at 01:22:58PM +0100, Stefan Hajnoczi wrote:
> > > > On Thu, Sep 30, 2021 at 11:30:37AM -0400, Vivek Goyal wrote:
> > > > > g_usleep() calls nanosleep() and that now seems to call clock_nanosleep()
> > > > > syscall. Now these patches are making use of g_usleep(). So add
> > > > > clock_nanosleep() to list of allowed syscalls.
> > > > > 
> > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > > ---
> > > > >  tools/virtiofsd/passthrough_seccomp.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> > > > > index cd24b40b78..03080806c0 100644
> > > > > --- a/tools/virtiofsd/passthrough_seccomp.c
> > > > > +++ b/tools/virtiofsd/passthrough_seccomp.c
> > > > > @@ -117,6 +117,7 @@ static const int syscall_allowlist[] = {
> > > > >      SCMP_SYS(writev),
> > > > >      SCMP_SYS(umask),
> > > > >      SCMP_SYS(nanosleep),
> > > > > +    SCMP_SYS(clock_nanosleep),
> > > > 
> > > > This patch can be dropped once sleep has been replaced by a condvar.
> > > 
> > > There is another sleep in do_pool_destroy() where we are waiting
> > > for all current threads to exit.
> > > 
> > > do_pool_destroy() {
> > >     g_usleep(10000);
> > > }
> > 
> > That won't be necessary if there's a way to avoid the thread pool :).
> > See my other reply about closing the OFD instead of using signals to
> > cancel blocking fcntl(2).
> 
> Hi Stefan,
> 
> I responded to that email already. man fnctl does not say anything
> about closing fd will unblock the waiter with -EINTR and I had a 
> quick look at kernel code and did not find anything which suggested
> closing fd will unblock current waiters.
> 
> So is this something you know works or you want me to try and see
> if it works?

Thanks for testing it!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-06 13:35     ` Christophe de Dinechin
  -1 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-06 13:35 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, stefanha, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> Add helpers to create/cleanup virtuqueues and use those helpers. I will

Typo, virtuqueues -> virtqueues

Also, while I'm nitpicking, virtqueue could be plural in commit description ;-)

> need to reconfigure queues in later patches and using helpers will allow
> reusing the code.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
>  1 file changed, 52 insertions(+), 35 deletions(-)
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index c595957983..d1efbc5b18 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
>      }
>  }
>
> +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    /*
> +     * Not normally called; it's the daemon that handles the queue;
> +     * however virtio's cleanup path can call this.
> +     */
> +}
> +
> +static void vuf_create_vqs(VirtIODevice *vdev)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +    unsigned int i;
> +
> +    /* Hiprio queue */
> +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                     vuf_handle_output);
> +
> +    /* Request queues */
> +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                          vuf_handle_output);
> +    }
> +
> +    /* 1 high prio queue, plus the number configured */
> +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> +}
> +
> +static void vuf_cleanup_vqs(VirtIODevice *vdev)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +    unsigned int i;
> +
> +    virtio_delete_queue(fs->hiprio_vq);
> +    fs->hiprio_vq = NULL;
> +
> +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> +        virtio_delete_queue(fs->req_vqs[i]);
> +    }
> +
> +    g_free(fs->req_vqs);
> +    fs->req_vqs = NULL;
> +
> +    fs->vhost_dev.nvqs = 0;
> +    g_free(fs->vhost_dev.vqs);
> +    fs->vhost_dev.vqs = NULL;
> +}
> +
>  static uint64_t vuf_get_features(VirtIODevice *vdev,
>                                   uint64_t features,
>                                   Error **errp)
> @@ -148,14 +197,6 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
>      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
>  }
>
> -static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> -{
> -    /*
> -     * Not normally called; it's the daemon that handles the queue;
> -     * however virtio's cleanup path can call this.
> -     */
> -}
> -
>  static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
>                                              bool mask)
>  {
> @@ -175,7 +216,6 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VHostUserFS *fs = VHOST_USER_FS(dev);
> -    unsigned int i;
>      size_t len;
>      int ret;
>
> @@ -222,18 +262,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>      virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
>                  sizeof(struct virtio_fs_config));
>
> -    /* Hiprio queue */
> -    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
> -
> -    /* Request queues */
> -    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> -        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
> -    }
> -
> -    /* 1 high prio queue, plus the number configured */
> -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> -    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> +    vuf_create_vqs(vdev);
>      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
>                           VHOST_BACKEND_TYPE_USER, 0, errp);
>      if (ret < 0) {
> @@ -244,13 +273,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>
>  err_virtio:
>      vhost_user_cleanup(&fs->vhost_user);
> -    virtio_delete_queue(fs->hiprio_vq);
> -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> -        virtio_delete_queue(fs->req_vqs[i]);
> -    }
> -    g_free(fs->req_vqs);
> +    vuf_cleanup_vqs(vdev);
>      virtio_cleanup(vdev);
> -    g_free(fs->vhost_dev.vqs);
>      return;
>  }
>
> @@ -258,7 +282,6 @@ static void vuf_device_unrealize(DeviceState *dev)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VHostUserFS *fs = VHOST_USER_FS(dev);
> -    int i;
>
>      /* This will stop vhost backend if appropriate. */
>      vuf_set_status(vdev, 0);
> @@ -267,14 +290,8 @@ static void vuf_device_unrealize(DeviceState *dev)
>
>      vhost_user_cleanup(&fs->vhost_user);
>
> -    virtio_delete_queue(fs->hiprio_vq);
> -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> -        virtio_delete_queue(fs->req_vqs[i]);
> -    }
> -    g_free(fs->req_vqs);
> +    vuf_cleanup_vqs(vdev);
>      virtio_cleanup(vdev);
> -    g_free(fs->vhost_dev.vqs);
> -    fs->vhost_dev.vqs = NULL;
>  }
>
>  static const VMStateDescription vuf_vmstate = {


--
Cheers,
Christophe de Dinechin (IRC c3d)



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
@ 2021-10-06 13:35     ` Christophe de Dinechin
  0 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-06 13:35 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> Add helpers to create/cleanup virtuqueues and use those helpers. I will

Typo, virtuqueues -> virtqueues

Also, while I'm nitpicking, virtqueue could be plural in commit description ;-)

> need to reconfigure queues in later patches and using helpers will allow
> reusing the code.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
>  1 file changed, 52 insertions(+), 35 deletions(-)
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index c595957983..d1efbc5b18 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
>      }
>  }
>
> +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    /*
> +     * Not normally called; it's the daemon that handles the queue;
> +     * however virtio's cleanup path can call this.
> +     */
> +}
> +
> +static void vuf_create_vqs(VirtIODevice *vdev)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +    unsigned int i;
> +
> +    /* Hiprio queue */
> +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                     vuf_handle_output);
> +
> +    /* Request queues */
> +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> +                                          vuf_handle_output);
> +    }
> +
> +    /* 1 high prio queue, plus the number configured */
> +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> +}
> +
> +static void vuf_cleanup_vqs(VirtIODevice *vdev)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +    unsigned int i;
> +
> +    virtio_delete_queue(fs->hiprio_vq);
> +    fs->hiprio_vq = NULL;
> +
> +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> +        virtio_delete_queue(fs->req_vqs[i]);
> +    }
> +
> +    g_free(fs->req_vqs);
> +    fs->req_vqs = NULL;
> +
> +    fs->vhost_dev.nvqs = 0;
> +    g_free(fs->vhost_dev.vqs);
> +    fs->vhost_dev.vqs = NULL;
> +}
> +
>  static uint64_t vuf_get_features(VirtIODevice *vdev,
>                                   uint64_t features,
>                                   Error **errp)
> @@ -148,14 +197,6 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
>      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
>  }
>
> -static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> -{
> -    /*
> -     * Not normally called; it's the daemon that handles the queue;
> -     * however virtio's cleanup path can call this.
> -     */
> -}
> -
>  static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
>                                              bool mask)
>  {
> @@ -175,7 +216,6 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VHostUserFS *fs = VHOST_USER_FS(dev);
> -    unsigned int i;
>      size_t len;
>      int ret;
>
> @@ -222,18 +262,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>      virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
>                  sizeof(struct virtio_fs_config));
>
> -    /* Hiprio queue */
> -    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
> -
> -    /* Request queues */
> -    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> -        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
> -    }
> -
> -    /* 1 high prio queue, plus the number configured */
> -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> -    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> +    vuf_create_vqs(vdev);
>      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
>                           VHOST_BACKEND_TYPE_USER, 0, errp);
>      if (ret < 0) {
> @@ -244,13 +273,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>
>  err_virtio:
>      vhost_user_cleanup(&fs->vhost_user);
> -    virtio_delete_queue(fs->hiprio_vq);
> -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> -        virtio_delete_queue(fs->req_vqs[i]);
> -    }
> -    g_free(fs->req_vqs);
> +    vuf_cleanup_vqs(vdev);
>      virtio_cleanup(vdev);
> -    g_free(fs->vhost_dev.vqs);
>      return;
>  }
>
> @@ -258,7 +282,6 @@ static void vuf_device_unrealize(DeviceState *dev)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VHostUserFS *fs = VHOST_USER_FS(dev);
> -    int i;
>
>      /* This will stop vhost backend if appropriate. */
>      vuf_set_status(vdev, 0);
> @@ -267,14 +290,8 @@ static void vuf_device_unrealize(DeviceState *dev)
>
>      vhost_user_cleanup(&fs->vhost_user);
>
> -    virtio_delete_queue(fs->hiprio_vq);
> -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> -        virtio_delete_queue(fs->req_vqs[i]);
> -    }
> -    g_free(fs->req_vqs);
> +    vuf_cleanup_vqs(vdev);
>      virtio_cleanup(vdev);
> -    g_free(fs->vhost_dev.vqs);
> -    fs->vhost_dev.vqs = NULL;
>  }
>
>  static const VMStateDescription vuf_vmstate = {


--
Cheers,
Christophe de Dinechin (IRC c3d)


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-06 15:15     ` Christophe de Dinechin
  -1 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-06 15:15 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, stefanha, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> So far we did not have the notion of cross queue traffic. That is, we
> get request on a queue and send back response on same queue. So if a
> request be being processed and at the same time a stop queue request
> comes in, we wait for all pending requests to finish and then queue
> is stopped and associated data structure cleaned.
>
> But with notification queue, now it is possible that we get a locking
> request on request queue and send the notification back on a different
> queue (notificaiton queue). This means, we need to make sure that

typo: notification (I just saw Stefan noticed it too)

> notifiation queue has not already been shutdown or is not being

typo: notification ;-)

> shutdown in parallel while we are trying to send a notification back.
> Otherwise bad things are bound to happen.
>
> One way to solve this problem is that stop notification queue in the
> end. First stop hiprio and all request queues.

I do not understand that sentence. Maybe you meant to write "is to stop
notification queue in the end", but even so I don't understand if you mean
"in the end" (of what) or "last" (relative to other queues)? I guess you
meant last.

> That means by the
> time we are trying to stop notification queue, we know no other
> request can be in progress which can try to send something on
> notification queue.
>
> But problem is that currently we don't have any control on in what
> order queues should be stopped. If there was a notion of whole device
> being stopped, then we could decide in what order queues should be
> stopped.
>
> Stefan mentioned that there is a command to stop whole device
> VHOST_USER_SET_STATUS but it is not implemented in libvhost-user
> yet. Also we probably could not move away from per queue stop
> logic we have as of now.
>
> As an alternative, he said if we stop all queue when qidx 0 is
> being stopped, it should be fine and we can solve the issue of
> notification queue shutdown order.
>
> So in this patch I am shutting down all queues when queue 0
> is being shutdown. And also changed shutdown order in such a
> way that notification queue is shutdown last.

For my education: I assume there is no valid case where there is no queue
and only the notification queue?

>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index c67c2e0e7a..a87e88e286 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -826,6 +826,11 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>      assert(qidx < vud->nqueues);
>      ourqi = vud->qi[qidx];
>
> +    /* Queue is already stopped */
> +    if (!ourqi) {
> +        return;
> +    }
> +
>      /* qidx == 1 is the notification queue if notifications are enabled */
>      if (!se->notify_enabled || qidx != 1) {
>          /* Kill the thread */
> @@ -847,14 +852,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>
>  static void stop_all_queues(struct fv_VuDev *vud)
>  {
> +    struct fuse_session *se = vud->se;
> +
>      for (int i = 0; i < vud->nqueues; i++) {
>          if (!vud->qi[i]) {
>              continue;
>          }
>
> +        /* Shutdown notification queue in the end */
> +        if (se->notify_enabled && i == 1) {
> +            continue;
> +        }
>          fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
>          fv_queue_cleanup_thread(vud, i);
>      }
> +
> +    if (se->notify_enabled) {
> +        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, 1);
> +        fv_queue_cleanup_thread(vud, 1);
> +    }
>  }
>
>  /* Callback from libvhost-user on start or stop of a queue */
> @@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>           * the queue thread doesn't block in virtio_send_msg().
>           */
>          vu_dispatch_unlock(vud);
> -        fv_queue_cleanup_thread(vud, qidx);
> +
> +        /*
> +         * If queue 0 is being shutdown, treat it as if device is being
> +         * shutdown and stop all queues.
> +         */
> +        if (qidx == 0) {
> +            stop_all_queues(vud);
> +        } else {
> +            fv_queue_cleanup_thread(vud, qidx);
> +        }
>          vu_dispatch_wrlock(vud);
>      }
>  }

For my education: given that we dropped the write lock above, what prevents
queue 0 from being shutdown on one thread while another cleans up another
queue. What makes it safe in that case? I think this is worth a comment.

--
Cheers,
Christophe de Dinechin (IRC c3d)



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
@ 2021-10-06 15:15     ` Christophe de Dinechin
  0 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-06 15:15 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> So far we did not have the notion of cross queue traffic. That is, we
> get request on a queue and send back response on same queue. So if a
> request be being processed and at the same time a stop queue request
> comes in, we wait for all pending requests to finish and then queue
> is stopped and associated data structure cleaned.
>
> But with notification queue, now it is possible that we get a locking
> request on request queue and send the notification back on a different
> queue (notificaiton queue). This means, we need to make sure that

typo: notification (I just saw Stefan noticed it too)

> notifiation queue has not already been shutdown or is not being

typo: notification ;-)

> shutdown in parallel while we are trying to send a notification back.
> Otherwise bad things are bound to happen.
>
> One way to solve this problem is that stop notification queue in the
> end. First stop hiprio and all request queues.

I do not understand that sentence. Maybe you meant to write "is to stop
notification queue in the end", but even so I don't understand if you mean
"in the end" (of what) or "last" (relative to other queues)? I guess you
meant last.

> That means by the
> time we are trying to stop notification queue, we know no other
> request can be in progress which can try to send something on
> notification queue.
>
> But problem is that currently we don't have any control on in what
> order queues should be stopped. If there was a notion of whole device
> being stopped, then we could decide in what order queues should be
> stopped.
>
> Stefan mentioned that there is a command to stop whole device
> VHOST_USER_SET_STATUS but it is not implemented in libvhost-user
> yet. Also we probably could not move away from per queue stop
> logic we have as of now.
>
> As an alternative, he said if we stop all queue when qidx 0 is
> being stopped, it should be fine and we can solve the issue of
> notification queue shutdown order.
>
> So in this patch I am shutting down all queues when queue 0
> is being shutdown. And also changed shutdown order in such a
> way that notification queue is shutdown last.

For my education: I assume there is no valid case where there is no queue
and only the notification queue?

>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index c67c2e0e7a..a87e88e286 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -826,6 +826,11 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>      assert(qidx < vud->nqueues);
>      ourqi = vud->qi[qidx];
>
> +    /* Queue is already stopped */
> +    if (!ourqi) {
> +        return;
> +    }
> +
>      /* qidx == 1 is the notification queue if notifications are enabled */
>      if (!se->notify_enabled || qidx != 1) {
>          /* Kill the thread */
> @@ -847,14 +852,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
>
>  static void stop_all_queues(struct fv_VuDev *vud)
>  {
> +    struct fuse_session *se = vud->se;
> +
>      for (int i = 0; i < vud->nqueues; i++) {
>          if (!vud->qi[i]) {
>              continue;
>          }
>
> +        /* Shutdown notification queue in the end */
> +        if (se->notify_enabled && i == 1) {
> +            continue;
> +        }
>          fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
>          fv_queue_cleanup_thread(vud, i);
>      }
> +
> +    if (se->notify_enabled) {
> +        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, 1);
> +        fv_queue_cleanup_thread(vud, 1);
> +    }
>  }
>
>  /* Callback from libvhost-user on start or stop of a queue */
> @@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>           * the queue thread doesn't block in virtio_send_msg().
>           */
>          vu_dispatch_unlock(vud);
> -        fv_queue_cleanup_thread(vud, qidx);
> +
> +        /*
> +         * If queue 0 is being shutdown, treat it as if device is being
> +         * shutdown and stop all queues.
> +         */
> +        if (qidx == 0) {
> +            stop_all_queues(vud);
> +        } else {
> +            fv_queue_cleanup_thread(vud, qidx);
> +        }
>          vu_dispatch_wrlock(vud);
>      }
>  }

For my education: given that we dropped the write lock above, what prevents
queue 0 from being shutdown on one thread while another cleans up another
queue. What makes it safe in that case? I think this is worth a comment.

--
Cheers,
Christophe de Dinechin (IRC c3d)


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
  2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
@ 2021-10-06 15:34     ` Christophe de Dinechin
  -1 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-06 15:34 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, stefanha, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> -EOPNOTSUPP.
>
> Change that by accepting these requests and returning a reply
> immediately asking caller to wait. Once lock is available, send a
> notification to the waiter indicating lock is available.
>
> In response to lock request, we are returning error value as "1", which
> signals to client to queue the lock request internally and later client
> will get a notification which will signal lock is taken (or error). And
> then fuse client should wake up the guest process.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
>  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
>  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
>  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
>  4 files changed, 167 insertions(+), 16 deletions(-)
>
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index e4679c73ab..2e7f4b786d 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>          .unique = req->unique,
>          .error = error,
>      };
> -
> -    if (error <= -1000 || error > 0) {
> +    /* error = 1 has been used to signal client to wait for notificaiton */
> +    if (error <= -1000 || error > 1) {

What about adding a #define for that special value 1?

(and while we are at it, the -1000 does not look too good either, that could
be a separate cleanup patch)

>          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
>          out.error = -ERANGE;
>      }
> @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
>      return send_reply(req, -err, NULL, 0);
>  }
>
> +int fuse_reply_wait(fuse_req_t req)
> +{
> +    return send_reply(req, 1, NULL, 0);

... to be used here too.

> +}
> +
>  void fuse_reply_none(fuse_req_t req)
>  {
>      fuse_free_req(req);
> @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
>      send_reply_ok(req, NULL, 0);
>  }
>
> +static int send_notify_iov(struct fuse_session *se, int notify_code,
> +                           struct iovec *iov, int count)
> +{
> +    struct fuse_out_header out;
> +    if (!se->got_init) {
> +        return -ENOTCONN;
> +    }
> +    out.unique = 0;
> +    out.error = notify_code;
> +    iov[0].iov_base = &out;
> +    iov[0].iov_len = sizeof(struct fuse_out_header);
> +    return fuse_send_msg(se, NULL, iov, count);
> +}
> +
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> +                  int32_t error)
> +{
> +    struct fuse_notify_lock_out outarg = {0};
> +    struct iovec iov[2];
> +
> +    outarg.unique = unique;
> +    outarg.error = -error;
> +
> +    iov[1].iov_base = &outarg;
> +    iov[1].iov_len = sizeof(outarg);
> +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> +}

This may be just me, but I find it odd that you fill iov[0] and iov[1] in
two separate functions, one of them being static and AFAICT only used once.
I understand that you are trying to split the notify logic from the lock.
But the logic is not fully isolated, e.g. the caller needs to know to add
one to the count, start filling at 1, etc.

Just a matter of taste, I guess ;-)

> +
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>                                 off_t offset, struct fuse_bufvec *bufv)
>  {
> diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> index c55c0ca2fc..64624b48dc 100644
> --- a/tools/virtiofsd/fuse_lowlevel.h
> +++ b/tools/virtiofsd/fuse_lowlevel.h
> @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
>   */
>  int fuse_reply_err(fuse_req_t req, int err);
>
> +/**
> + * Ask caller to wait for lock.
> + *
> + * Possible requests:
> + *   setlkw
> + *
> + * If caller sends a blocking lock request (setlkw), then reply to caller
> + * that wait for lock to be available. Once lock is available caller will
> + * receive a notification with request's unique id. Notification will
> + * carry info whether lock was successfully obtained or not.
> + *
> + * @param req request handle
> + * @return zero for success, -errno for failure to send reply
> + */
> +int fuse_reply_wait(fuse_req_t req);
> +
>  /**
>   * Don't send reply
>   *
> @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>                                 off_t offset, struct fuse_bufvec *bufv);
>
> +/**
> + * Notify event related to previous lock request
> + *
> + * @param se the session object
> + * @param unique the unique id of the request which requested setlkw
> + * @param error zero for success, -errno for the failure
> + */
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> +                              int32_t error);
> +
>  /*
>   * Utility functions
>   */
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index a87e88e286..bb2d4456fc 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
>      vu_dispatch_unlock(qi->virtio_dev);
>  }
>
> +/* Returns NULL if queue is empty */
> +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> +{
> +    struct fuse_session *se = qi->virtio_dev->se;
> +    VuDev *dev = &se->virtio_dev->dev;
> +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> +    FVRequest *req;
> +
> +    vu_dispatch_rdlock(qi->virtio_dev);
> +    pthread_mutex_lock(&qi->vq_lock);
> +    /* Pop an element from queue */
> +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> +    pthread_mutex_unlock(&qi->vq_lock);
> +    vu_dispatch_unlock(qi->virtio_dev);
> +    return req;
> +}
> +
>  /*
>   * Called back by ll whenever it wants to send a reply/message back
>   * The 1st element of the iov starts with the fuse_out_header
> @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
>  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>                      struct iovec *iov, int count)
>  {
> -    FVRequest *req = container_of(ch, FVRequest, ch);
> -    struct fv_QueueInfo *qi = ch->qi;
> -    VuVirtqElement *elem = &req->elem;
> +    FVRequest *req;
> +    struct fv_QueueInfo *qi;
> +    VuVirtqElement *elem;
>      int ret = 0;
>
>      assert(count >= 1);
> @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>
>      size_t tosend_len = iov_size(iov, count);
>
> -    /* unique == 0 is notification, which we don't support */
> -    assert(out->unique);
> +    /* unique == 0 is notification */
> +    if (!out->unique) {
> +        if (!se->notify_enabled) {
> +            return -EOPNOTSUPP;
> +        }
> +        /* If notifications are enabled, queue index 1 is notification queue */
> +        qi = se->virtio_dev->qi[1];
> +        req = vq_pop_notify_elem(qi);
> +        if (!req) {
> +            /*
> +             * TODO: Implement some sort of ring buffer and queue notifications
> +             * on that and send these later when notification queue has space
> +             * available.
> +             */

Maybe add a trace / message here to debug more easily if we hit that case?

> +            return -ENOSPC;
> +        }
> +        req->reply_sent = false;
> +    } else {
> +        assert(ch);
> +        req = container_of(ch, FVRequest, ch);
> +        qi = ch->qi;
> +    }
> +
> +    elem = &req->elem;
>      assert(!req->reply_sent);
>
>      /* The 'in' part of the elem is to qemu */
> @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
>          struct fuse_notify_delete_out       delete_out;
>          struct fuse_notify_store_out        store_out;
>          struct fuse_notify_retrieve_out     retrieve_out;
> +        struct fuse_notify_lock_out         lock_out;
>      };
>
>      notify_size = sizeof(struct fuse_out_header) +
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 6928662e22..277f74762b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2131,13 +2131,35 @@ out:
>      }
>  }
>
> +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> +                                    int saverr)
> +{
> +    int ret;
> +
> +    do {
> +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> +        /*
> +         * Retry sending notification if notification queue does not have
> +         * free descriptor yet, otherwise break out of loop. Either we
> +         * successfully sent notifiation or some other error occurred.
> +         */
> +        if (ret != -ENOSPC) {
> +            break;
> +        }
> +        usleep(10000);
> +    } while (1);
> +}
> +
>  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>                       struct flock *lock, int sleep)
>  {
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode;
>      struct lo_inode_plock *plock;
> -    int ret, saverr = 0;
> +    int ret, saverr = 0, ofd;
> +    uint64_t unique;
> +    struct fuse_session *se = req->se;
> +    bool blocking_lock = false;
>
>      fuse_log(FUSE_LOG_DEBUG,
>               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>          return;
>      }
>
> -    if (sleep) {
> -        fuse_reply_err(req, EOPNOTSUPP);
> -        return;
> -    }
> -
>      inode = lo_inode(req, ino);
>      if (!inode) {
>          fuse_reply_err(req, EBADF);
> @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>
>      if (!plock) {
>          saverr = ret;
> +        pthread_mutex_unlock(&inode->plock_mutex);
>          goto out;
>      }
>
> +    /*
> +     * plock is now released when inode is going away. We already have
> +     * a reference on inode, so it is guaranteed that plock->fd is
> +     * still around even after dropping inode->plock_mutex lock
> +     */
> +    ofd = plock->fd;
> +    pthread_mutex_unlock(&inode->plock_mutex);
> +
> +    /*
> +     * If this lock request can block, request caller to wait for
> +     * notification. Do not access req after this. Once lock is
> +     * available, send a notification instead.
> +     */
> +    if (sleep && lock->l_type != F_UNLCK) {
> +        /*
> +         * If notification queue is not enabled, can't support async
> +         * locks.
> +         */
> +        if (!se->notify_enabled) {
> +            saverr = EOPNOTSUPP;
> +            goto out;
> +        }
> +        blocking_lock = true;
> +        unique = req->unique;
> +        fuse_reply_wait(req);
> +    }
> +
>      /* TODO: Is it alright to modify flock? */
>      lock->l_pid = 0;
> -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> +    if (blocking_lock) {
> +        ret = fcntl(ofd, F_OFD_SETLKW, lock);
> +    } else {
> +        ret = fcntl(ofd, F_OFD_SETLK, lock);
> +    }
>      if (ret == -1) {
>          saverr = errno;
>      }
>
>  out:
> -    pthread_mutex_unlock(&inode->plock_mutex);
>      lo_inode_put(lo, &inode);
>
> -    fuse_reply_err(req, saverr);
> +    if (!blocking_lock) {
> +        fuse_reply_err(req, saverr);
> +    } else {
> +        setlk_send_notification(se, unique, saverr);
> +    }
>  }
>
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,


--
Cheers,
Christophe de Dinechin (IRC c3d)



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
@ 2021-10-06 15:34     ` Christophe de Dinechin
  0 siblings, 0 replies; 106+ messages in thread
From: Christophe de Dinechin @ 2021-10-06 15:34 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos


On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> -EOPNOTSUPP.
>
> Change that by accepting these requests and returning a reply
> immediately asking caller to wait. Once lock is available, send a
> notification to the waiter indicating lock is available.
>
> In response to lock request, we are returning error value as "1", which
> signals to client to queue the lock request internally and later client
> will get a notification which will signal lock is taken (or error). And
> then fuse client should wake up the guest process.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
>  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
>  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
>  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
>  4 files changed, 167 insertions(+), 16 deletions(-)
>
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index e4679c73ab..2e7f4b786d 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>          .unique = req->unique,
>          .error = error,
>      };
> -
> -    if (error <= -1000 || error > 0) {
> +    /* error = 1 has been used to signal client to wait for notificaiton */
> +    if (error <= -1000 || error > 1) {

What about adding a #define for that special value 1?

(and while we are at it, the -1000 does not look too good either, that could
be a separate cleanup patch)

>          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
>          out.error = -ERANGE;
>      }
> @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
>      return send_reply(req, -err, NULL, 0);
>  }
>
> +int fuse_reply_wait(fuse_req_t req)
> +{
> +    return send_reply(req, 1, NULL, 0);

... to be used here too.

> +}
> +
>  void fuse_reply_none(fuse_req_t req)
>  {
>      fuse_free_req(req);
> @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
>      send_reply_ok(req, NULL, 0);
>  }
>
> +static int send_notify_iov(struct fuse_session *se, int notify_code,
> +                           struct iovec *iov, int count)
> +{
> +    struct fuse_out_header out;
> +    if (!se->got_init) {
> +        return -ENOTCONN;
> +    }
> +    out.unique = 0;
> +    out.error = notify_code;
> +    iov[0].iov_base = &out;
> +    iov[0].iov_len = sizeof(struct fuse_out_header);
> +    return fuse_send_msg(se, NULL, iov, count);
> +}
> +
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> +                  int32_t error)
> +{
> +    struct fuse_notify_lock_out outarg = {0};
> +    struct iovec iov[2];
> +
> +    outarg.unique = unique;
> +    outarg.error = -error;
> +
> +    iov[1].iov_base = &outarg;
> +    iov[1].iov_len = sizeof(outarg);
> +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> +}

This may be just me, but I find it odd that you fill iov[0] and iov[1] in
two separate functions, one of them being static and AFAICT only used once.
I understand that you are trying to split the notify logic from the lock.
But the logic is not fully isolated, e.g. the caller needs to know to add
one to the count, start filling at 1, etc.

Just a matter of taste, I guess ;-)

> +
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>                                 off_t offset, struct fuse_bufvec *bufv)
>  {
> diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> index c55c0ca2fc..64624b48dc 100644
> --- a/tools/virtiofsd/fuse_lowlevel.h
> +++ b/tools/virtiofsd/fuse_lowlevel.h
> @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
>   */
>  int fuse_reply_err(fuse_req_t req, int err);
>
> +/**
> + * Ask caller to wait for lock.
> + *
> + * Possible requests:
> + *   setlkw
> + *
> + * If caller sends a blocking lock request (setlkw), then reply to caller
> + * that wait for lock to be available. Once lock is available caller will
> + * receive a notification with request's unique id. Notification will
> + * carry info whether lock was successfully obtained or not.
> + *
> + * @param req request handle
> + * @return zero for success, -errno for failure to send reply
> + */
> +int fuse_reply_wait(fuse_req_t req);
> +
>  /**
>   * Don't send reply
>   *
> @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>                                 off_t offset, struct fuse_bufvec *bufv);
>
> +/**
> + * Notify event related to previous lock request
> + *
> + * @param se the session object
> + * @param unique the unique id of the request which requested setlkw
> + * @param error zero for success, -errno for the failure
> + */
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> +                              int32_t error);
> +
>  /*
>   * Utility functions
>   */
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index a87e88e286..bb2d4456fc 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
>      vu_dispatch_unlock(qi->virtio_dev);
>  }
>
> +/* Returns NULL if queue is empty */
> +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> +{
> +    struct fuse_session *se = qi->virtio_dev->se;
> +    VuDev *dev = &se->virtio_dev->dev;
> +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> +    FVRequest *req;
> +
> +    vu_dispatch_rdlock(qi->virtio_dev);
> +    pthread_mutex_lock(&qi->vq_lock);
> +    /* Pop an element from queue */
> +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> +    pthread_mutex_unlock(&qi->vq_lock);
> +    vu_dispatch_unlock(qi->virtio_dev);
> +    return req;
> +}
> +
>  /*
>   * Called back by ll whenever it wants to send a reply/message back
>   * The 1st element of the iov starts with the fuse_out_header
> @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
>  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>                      struct iovec *iov, int count)
>  {
> -    FVRequest *req = container_of(ch, FVRequest, ch);
> -    struct fv_QueueInfo *qi = ch->qi;
> -    VuVirtqElement *elem = &req->elem;
> +    FVRequest *req;
> +    struct fv_QueueInfo *qi;
> +    VuVirtqElement *elem;
>      int ret = 0;
>
>      assert(count >= 1);
> @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>
>      size_t tosend_len = iov_size(iov, count);
>
> -    /* unique == 0 is notification, which we don't support */
> -    assert(out->unique);
> +    /* unique == 0 is notification */
> +    if (!out->unique) {
> +        if (!se->notify_enabled) {
> +            return -EOPNOTSUPP;
> +        }
> +        /* If notifications are enabled, queue index 1 is notification queue */
> +        qi = se->virtio_dev->qi[1];
> +        req = vq_pop_notify_elem(qi);
> +        if (!req) {
> +            /*
> +             * TODO: Implement some sort of ring buffer and queue notifications
> +             * on that and send these later when notification queue has space
> +             * available.
> +             */

Maybe add a trace / message here to debug more easily if we hit that case?

> +            return -ENOSPC;
> +        }
> +        req->reply_sent = false;
> +    } else {
> +        assert(ch);
> +        req = container_of(ch, FVRequest, ch);
> +        qi = ch->qi;
> +    }
> +
> +    elem = &req->elem;
>      assert(!req->reply_sent);
>
>      /* The 'in' part of the elem is to qemu */
> @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
>          struct fuse_notify_delete_out       delete_out;
>          struct fuse_notify_store_out        store_out;
>          struct fuse_notify_retrieve_out     retrieve_out;
> +        struct fuse_notify_lock_out         lock_out;
>      };
>
>      notify_size = sizeof(struct fuse_out_header) +
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 6928662e22..277f74762b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2131,13 +2131,35 @@ out:
>      }
>  }
>
> +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> +                                    int saverr)
> +{
> +    int ret;
> +
> +    do {
> +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> +        /*
> +         * Retry sending notification if notification queue does not have
> +         * free descriptor yet, otherwise break out of loop. Either we
> +         * successfully sent notifiation or some other error occurred.
> +         */
> +        if (ret != -ENOSPC) {
> +            break;
> +        }
> +        usleep(10000);
> +    } while (1);
> +}
> +
>  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>                       struct flock *lock, int sleep)
>  {
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode;
>      struct lo_inode_plock *plock;
> -    int ret, saverr = 0;
> +    int ret, saverr = 0, ofd;
> +    uint64_t unique;
> +    struct fuse_session *se = req->se;
> +    bool blocking_lock = false;
>
>      fuse_log(FUSE_LOG_DEBUG,
>               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>          return;
>      }
>
> -    if (sleep) {
> -        fuse_reply_err(req, EOPNOTSUPP);
> -        return;
> -    }
> -
>      inode = lo_inode(req, ino);
>      if (!inode) {
>          fuse_reply_err(req, EBADF);
> @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>
>      if (!plock) {
>          saverr = ret;
> +        pthread_mutex_unlock(&inode->plock_mutex);
>          goto out;
>      }
>
> +    /*
> +     * plock is now released when inode is going away. We already have
> +     * a reference on inode, so it is guaranteed that plock->fd is
> +     * still around even after dropping inode->plock_mutex lock
> +     */
> +    ofd = plock->fd;
> +    pthread_mutex_unlock(&inode->plock_mutex);
> +
> +    /*
> +     * If this lock request can block, request caller to wait for
> +     * notification. Do not access req after this. Once lock is
> +     * available, send a notification instead.
> +     */
> +    if (sleep && lock->l_type != F_UNLCK) {
> +        /*
> +         * If notification queue is not enabled, can't support async
> +         * locks.
> +         */
> +        if (!se->notify_enabled) {
> +            saverr = EOPNOTSUPP;
> +            goto out;
> +        }
> +        blocking_lock = true;
> +        unique = req->unique;
> +        fuse_reply_wait(req);
> +    }
> +
>      /* TODO: Is it alright to modify flock? */
>      lock->l_pid = 0;
> -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> +    if (blocking_lock) {
> +        ret = fcntl(ofd, F_OFD_SETLKW, lock);
> +    } else {
> +        ret = fcntl(ofd, F_OFD_SETLK, lock);
> +    }
>      if (ret == -1) {
>          saverr = errno;
>      }
>
>  out:
> -    pthread_mutex_unlock(&inode->plock_mutex);
>      lo_inode_put(lo, &inode);
>
> -    fuse_reply_err(req, saverr);
> +    if (!blocking_lock) {
> +        fuse_reply_err(req, saverr);
> +    } else {
> +        setlk_send_notification(se, unique, saverr);
> +    }
>  }
>
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,


--
Cheers,
Christophe de Dinechin (IRC c3d)


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
  2021-10-06 13:35     ` Christophe de Dinechin
@ 2021-10-06 17:40       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-06 17:40 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: virtio-fs, qemu-devel, stefanha, miklos

On Wed, Oct 06, 2021 at 03:35:30PM +0200, Christophe de Dinechin wrote:
> 
> On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> > Add helpers to create/cleanup virtuqueues and use those helpers. I will
> 
> Typo, virtuqueues -> virtqueues
> 
> Also, while I'm nitpicking, virtqueue could be plural in commit description ;-)

Will do. Thanks. :-)

Vivek

> 
> > need to reconfigure queues in later patches and using helpers will allow
> > reusing the code.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
> >  1 file changed, 52 insertions(+), 35 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index c595957983..d1efbc5b18 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> >      }
> >  }
> >
> > +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    /*
> > +     * Not normally called; it's the daemon that handles the queue;
> > +     * however virtio's cleanup path can call this.
> > +     */
> > +}
> > +
> > +static void vuf_create_vqs(VirtIODevice *vdev)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +    unsigned int i;
> > +
> > +    /* Hiprio queue */
> > +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                     vuf_handle_output);
> > +
> > +    /* Request queues */
> > +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                          vuf_handle_output);
> > +    }
> > +
> > +    /* 1 high prio queue, plus the number configured */
> > +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > +}
> > +
> > +static void vuf_cleanup_vqs(VirtIODevice *vdev)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +    unsigned int i;
> > +
> > +    virtio_delete_queue(fs->hiprio_vq);
> > +    fs->hiprio_vq = NULL;
> > +
> > +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > +        virtio_delete_queue(fs->req_vqs[i]);
> > +    }
> > +
> > +    g_free(fs->req_vqs);
> > +    fs->req_vqs = NULL;
> > +
> > +    fs->vhost_dev.nvqs = 0;
> > +    g_free(fs->vhost_dev.vqs);
> > +    fs->vhost_dev.vqs = NULL;
> > +}
> > +
> >  static uint64_t vuf_get_features(VirtIODevice *vdev,
> >                                   uint64_t features,
> >                                   Error **errp)
> > @@ -148,14 +197,6 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
> >      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> >  }
> >
> > -static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > -{
> > -    /*
> > -     * Not normally called; it's the daemon that handles the queue;
> > -     * however virtio's cleanup path can call this.
> > -     */
> > -}
> > -
> >  static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
> >                                              bool mask)
> >  {
> > @@ -175,7 +216,6 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >  {
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >      VHostUserFS *fs = VHOST_USER_FS(dev);
> > -    unsigned int i;
> >      size_t len;
> >      int ret;
> >
> > @@ -222,18 +262,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >      virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
> >                  sizeof(struct virtio_fs_config));
> >
> > -    /* Hiprio queue */
> > -    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
> > -
> > -    /* Request queues */
> > -    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > -        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
> > -    }
> > -
> > -    /* 1 high prio queue, plus the number configured */
> > -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > -    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > +    vuf_create_vqs(vdev);
> >      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
> >                           VHOST_BACKEND_TYPE_USER, 0, errp);
> >      if (ret < 0) {
> > @@ -244,13 +273,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >
> >  err_virtio:
> >      vhost_user_cleanup(&fs->vhost_user);
> > -    virtio_delete_queue(fs->hiprio_vq);
> > -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > -        virtio_delete_queue(fs->req_vqs[i]);
> > -    }
> > -    g_free(fs->req_vqs);
> > +    vuf_cleanup_vqs(vdev);
> >      virtio_cleanup(vdev);
> > -    g_free(fs->vhost_dev.vqs);
> >      return;
> >  }
> >
> > @@ -258,7 +282,6 @@ static void vuf_device_unrealize(DeviceState *dev)
> >  {
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >      VHostUserFS *fs = VHOST_USER_FS(dev);
> > -    int i;
> >
> >      /* This will stop vhost backend if appropriate. */
> >      vuf_set_status(vdev, 0);
> > @@ -267,14 +290,8 @@ static void vuf_device_unrealize(DeviceState *dev)
> >
> >      vhost_user_cleanup(&fs->vhost_user);
> >
> > -    virtio_delete_queue(fs->hiprio_vq);
> > -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > -        virtio_delete_queue(fs->req_vqs[i]);
> > -    }
> > -    g_free(fs->req_vqs);
> > +    vuf_cleanup_vqs(vdev);
> >      virtio_cleanup(vdev);
> > -    g_free(fs->vhost_dev.vqs);
> > -    fs->vhost_dev.vqs = NULL;
> >  }
> >
> >  static const VMStateDescription vuf_vmstate = {
> 
> 
> --
> Cheers,
> Christophe de Dinechin (IRC c3d)
> 



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue
@ 2021-10-06 17:40       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-06 17:40 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: virtio-fs, qemu-devel, miklos

On Wed, Oct 06, 2021 at 03:35:30PM +0200, Christophe de Dinechin wrote:
> 
> On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> > Add helpers to create/cleanup virtuqueues and use those helpers. I will
> 
> Typo, virtuqueues -> virtqueues
> 
> Also, while I'm nitpicking, virtqueue could be plural in commit description ;-)

Will do. Thanks. :-)

Vivek

> 
> > need to reconfigure queues in later patches and using helpers will allow
> > reusing the code.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  hw/virtio/vhost-user-fs.c | 87 +++++++++++++++++++++++----------------
> >  1 file changed, 52 insertions(+), 35 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index c595957983..d1efbc5b18 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -139,6 +139,55 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> >      }
> >  }
> >
> > +static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    /*
> > +     * Not normally called; it's the daemon that handles the queue;
> > +     * however virtio's cleanup path can call this.
> > +     */
> > +}
> > +
> > +static void vuf_create_vqs(VirtIODevice *vdev)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +    unsigned int i;
> > +
> > +    /* Hiprio queue */
> > +    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                     vuf_handle_output);
> > +
> > +    /* Request queues */
> > +    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > +        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size,
> > +                                          vuf_handle_output);
> > +    }
> > +
> > +    /* 1 high prio queue, plus the number configured */
> > +    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > +    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > +}
> > +
> > +static void vuf_cleanup_vqs(VirtIODevice *vdev)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +    unsigned int i;
> > +
> > +    virtio_delete_queue(fs->hiprio_vq);
> > +    fs->hiprio_vq = NULL;
> > +
> > +    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > +        virtio_delete_queue(fs->req_vqs[i]);
> > +    }
> > +
> > +    g_free(fs->req_vqs);
> > +    fs->req_vqs = NULL;
> > +
> > +    fs->vhost_dev.nvqs = 0;
> > +    g_free(fs->vhost_dev.vqs);
> > +    fs->vhost_dev.vqs = NULL;
> > +}
> > +
> >  static uint64_t vuf_get_features(VirtIODevice *vdev,
> >                                   uint64_t features,
> >                                   Error **errp)
> > @@ -148,14 +197,6 @@ static uint64_t vuf_get_features(VirtIODevice *vdev,
> >      return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> >  }
> >
> > -static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > -{
> > -    /*
> > -     * Not normally called; it's the daemon that handles the queue;
> > -     * however virtio's cleanup path can call this.
> > -     */
> > -}
> > -
> >  static void vuf_guest_notifier_mask(VirtIODevice *vdev, int idx,
> >                                              bool mask)
> >  {
> > @@ -175,7 +216,6 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >  {
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >      VHostUserFS *fs = VHOST_USER_FS(dev);
> > -    unsigned int i;
> >      size_t len;
> >      int ret;
> >
> > @@ -222,18 +262,7 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >      virtio_init(vdev, "vhost-user-fs", VIRTIO_ID_FS,
> >                  sizeof(struct virtio_fs_config));
> >
> > -    /* Hiprio queue */
> > -    fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
> > -
> > -    /* Request queues */
> > -    fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
> > -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > -        fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
> > -    }
> > -
> > -    /* 1 high prio queue, plus the number configured */
> > -    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
> > -    fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > +    vuf_create_vqs(vdev);
> >      ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
> >                           VHOST_BACKEND_TYPE_USER, 0, errp);
> >      if (ret < 0) {
> > @@ -244,13 +273,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >
> >  err_virtio:
> >      vhost_user_cleanup(&fs->vhost_user);
> > -    virtio_delete_queue(fs->hiprio_vq);
> > -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > -        virtio_delete_queue(fs->req_vqs[i]);
> > -    }
> > -    g_free(fs->req_vqs);
> > +    vuf_cleanup_vqs(vdev);
> >      virtio_cleanup(vdev);
> > -    g_free(fs->vhost_dev.vqs);
> >      return;
> >  }
> >
> > @@ -258,7 +282,6 @@ static void vuf_device_unrealize(DeviceState *dev)
> >  {
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >      VHostUserFS *fs = VHOST_USER_FS(dev);
> > -    int i;
> >
> >      /* This will stop vhost backend if appropriate. */
> >      vuf_set_status(vdev, 0);
> > @@ -267,14 +290,8 @@ static void vuf_device_unrealize(DeviceState *dev)
> >
> >      vhost_user_cleanup(&fs->vhost_user);
> >
> > -    virtio_delete_queue(fs->hiprio_vq);
> > -    for (i = 0; i < fs->conf.num_request_queues; i++) {
> > -        virtio_delete_queue(fs->req_vqs[i]);
> > -    }
> > -    g_free(fs->req_vqs);
> > +    vuf_cleanup_vqs(vdev);
> >      virtio_cleanup(vdev);
> > -    g_free(fs->vhost_dev.vqs);
> > -    fs->vhost_dev.vqs = NULL;
> >  }
> >
> >  static const VMStateDescription vuf_vmstate = {
> 
> 
> --
> Cheers,
> Christophe de Dinechin (IRC c3d)
> 


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
  2021-10-06 15:15     ` Christophe de Dinechin
@ 2021-10-06 17:58       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-06 17:58 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: virtio-fs, qemu-devel, stefanha, miklos

On Wed, Oct 06, 2021 at 05:15:57PM +0200, Christophe de Dinechin wrote:
> 
> On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> > So far we did not have the notion of cross queue traffic. That is, we
> > get request on a queue and send back response on same queue. So if a
> > request be being processed and at the same time a stop queue request
> > comes in, we wait for all pending requests to finish and then queue
> > is stopped and associated data structure cleaned.
> >
> > But with notification queue, now it is possible that we get a locking
> > request on request queue and send the notification back on a different
> > queue (notificaiton queue). This means, we need to make sure that
> 
> typo: notification (I just saw Stefan noticed it too)
> 
> > notifiation queue has not already been shutdown or is not being
> 
> typo: notification ;-)
> 
> > shutdown in parallel while we are trying to send a notification back.
> > Otherwise bad things are bound to happen.
> >
> > One way to solve this problem is that stop notification queue in the
> > end. First stop hiprio and all request queues.
> 
> I do not understand that sentence. Maybe you meant to write "is to stop
> notification queue in the end", but even so I don't understand if you mean
> "in the end" (of what) or "last" (relative to other queues)? I guess you
> meant last.

I meant "is to stop notification queue last". Will fix it.

> 
> > That means by the
> > time we are trying to stop notification queue, we know no other
> > request can be in progress which can try to send something on
> > notification queue.
> >
> > But problem is that currently we don't have any control on in what
> > order queues should be stopped. If there was a notion of whole device
> > being stopped, then we could decide in what order queues should be
> > stopped.
> >
> > Stefan mentioned that there is a command to stop whole device
> > VHOST_USER_SET_STATUS but it is not implemented in libvhost-user
> > yet. Also we probably could not move away from per queue stop
> > logic we have as of now.
> >
> > As an alternative, he said if we stop all queue when qidx 0 is
> > being stopped, it should be fine and we can solve the issue of
> > notification queue shutdown order.
> >
> > So in this patch I am shutting down all queues when queue 0
> > is being shutdown. And also changed shutdown order in such a
> > way that notification queue is shutdown last.
> 
> For my education: I assume there is no valid case where there is no queue
> and only the notification queue?

Yes. Minimum two queues have to be there. queue 0 is hiprio requests
and queue 1 is regular requests.

> >
> > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_virtio.c | 27 ++++++++++++++++++++++++++-
> >  1 file changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index c67c2e0e7a..a87e88e286 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -826,6 +826,11 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
> >      assert(qidx < vud->nqueues);
> >      ourqi = vud->qi[qidx];
> >
> > +    /* Queue is already stopped */
> > +    if (!ourqi) {
> > +        return;
> > +    }
> > +
> >      /* qidx == 1 is the notification queue if notifications are enabled */
> >      if (!se->notify_enabled || qidx != 1) {
> >          /* Kill the thread */
> > @@ -847,14 +852,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
> >
> >  static void stop_all_queues(struct fv_VuDev *vud)
> >  {
> > +    struct fuse_session *se = vud->se;
> > +
> >      for (int i = 0; i < vud->nqueues; i++) {
> >          if (!vud->qi[i]) {
> >              continue;
> >          }
> >
> > +        /* Shutdown notification queue in the end */
> > +        if (se->notify_enabled && i == 1) {
> > +            continue;
> > +        }
> >          fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
> >          fv_queue_cleanup_thread(vud, i);
> >      }
> > +
> > +    if (se->notify_enabled) {
> > +        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, 1);
> > +        fv_queue_cleanup_thread(vud, 1);
> > +    }
> >  }
> >
> >  /* Callback from libvhost-user on start or stop of a queue */
> > @@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >           * the queue thread doesn't block in virtio_send_msg().
> >           */
> >          vu_dispatch_unlock(vud);
> > -        fv_queue_cleanup_thread(vud, qidx);
> > +
> > +        /*
> > +         * If queue 0 is being shutdown, treat it as if device is being
> > +         * shutdown and stop all queues.
> > +         */
> > +        if (qidx == 0) {
> > +            stop_all_queues(vud);
> > +        } else {
> > +            fv_queue_cleanup_thread(vud, qidx);
> > +        }
> >          vu_dispatch_wrlock(vud);
> >      }
> >  }
> 
> For my education: given that we dropped the write lock above, what prevents
> queue 0 from being shutdown on one thread while another cleans up another
> queue. What makes it safe in that case? I think this is worth a comment.

I think only one queue shutdown message can progress at a time. These
are processed in virtio_loop() and that in turn calls
fv_queue_set_started(started = false).

So while one queue shutdown is in progress, virtio_loop() will go back
to reading next message only after current queue shutdown has finished.

Thanks
Vivek



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 11/13] virtiofsd: Shutdown notification queue in the end
@ 2021-10-06 17:58       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-06 17:58 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: virtio-fs, qemu-devel, miklos

On Wed, Oct 06, 2021 at 05:15:57PM +0200, Christophe de Dinechin wrote:
> 
> On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> > So far we did not have the notion of cross queue traffic. That is, we
> > get request on a queue and send back response on same queue. So if a
> > request be being processed and at the same time a stop queue request
> > comes in, we wait for all pending requests to finish and then queue
> > is stopped and associated data structure cleaned.
> >
> > But with notification queue, now it is possible that we get a locking
> > request on request queue and send the notification back on a different
> > queue (notificaiton queue). This means, we need to make sure that
> 
> typo: notification (I just saw Stefan noticed it too)
> 
> > notifiation queue has not already been shutdown or is not being
> 
> typo: notification ;-)
> 
> > shutdown in parallel while we are trying to send a notification back.
> > Otherwise bad things are bound to happen.
> >
> > One way to solve this problem is that stop notification queue in the
> > end. First stop hiprio and all request queues.
> 
> I do not understand that sentence. Maybe you meant to write "is to stop
> notification queue in the end", but even so I don't understand if you mean
> "in the end" (of what) or "last" (relative to other queues)? I guess you
> meant last.

I meant "is to stop notification queue last". Will fix it.

> 
> > That means by the
> > time we are trying to stop notification queue, we know no other
> > request can be in progress which can try to send something on
> > notification queue.
> >
> > But problem is that currently we don't have any control on in what
> > order queues should be stopped. If there was a notion of whole device
> > being stopped, then we could decide in what order queues should be
> > stopped.
> >
> > Stefan mentioned that there is a command to stop whole device
> > VHOST_USER_SET_STATUS but it is not implemented in libvhost-user
> > yet. Also we probably could not move away from per queue stop
> > logic we have as of now.
> >
> > As an alternative, he said if we stop all queue when qidx 0 is
> > being stopped, it should be fine and we can solve the issue of
> > notification queue shutdown order.
> >
> > So in this patch I am shutting down all queues when queue 0
> > is being shutdown. And also changed shutdown order in such a
> > way that notification queue is shutdown last.
> 
> For my education: I assume there is no valid case where there is no queue
> and only the notification queue?

Yes. Minimum two queues have to be there. queue 0 is hiprio requests
and queue 1 is regular requests.

> >
> > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_virtio.c | 27 ++++++++++++++++++++++++++-
> >  1 file changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index c67c2e0e7a..a87e88e286 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -826,6 +826,11 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
> >      assert(qidx < vud->nqueues);
> >      ourqi = vud->qi[qidx];
> >
> > +    /* Queue is already stopped */
> > +    if (!ourqi) {
> > +        return;
> > +    }
> > +
> >      /* qidx == 1 is the notification queue if notifications are enabled */
> >      if (!se->notify_enabled || qidx != 1) {
> >          /* Kill the thread */
> > @@ -847,14 +852,25 @@ static void fv_queue_cleanup_thread(struct fv_VuDev *vud, int qidx)
> >
> >  static void stop_all_queues(struct fv_VuDev *vud)
> >  {
> > +    struct fuse_session *se = vud->se;
> > +
> >      for (int i = 0; i < vud->nqueues; i++) {
> >          if (!vud->qi[i]) {
> >              continue;
> >          }
> >
> > +        /* Shutdown notification queue in the end */
> > +        if (se->notify_enabled && i == 1) {
> > +            continue;
> > +        }
> >          fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, i);
> >          fv_queue_cleanup_thread(vud, i);
> >      }
> > +
> > +    if (se->notify_enabled) {
> > +        fuse_log(FUSE_LOG_INFO, "%s: Stopping queue %d thread\n", __func__, 1);
> > +        fv_queue_cleanup_thread(vud, 1);
> > +    }
> >  }
> >
> >  /* Callback from libvhost-user on start or stop of a queue */
> > @@ -934,7 +950,16 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >           * the queue thread doesn't block in virtio_send_msg().
> >           */
> >          vu_dispatch_unlock(vud);
> > -        fv_queue_cleanup_thread(vud, qidx);
> > +
> > +        /*
> > +         * If queue 0 is being shutdown, treat it as if device is being
> > +         * shutdown and stop all queues.
> > +         */
> > +        if (qidx == 0) {
> > +            stop_all_queues(vud);
> > +        } else {
> > +            fv_queue_cleanup_thread(vud, qidx);
> > +        }
> >          vu_dispatch_wrlock(vud);
> >      }
> >  }
> 
> For my education: given that we dropped the write lock above, what prevents
> queue 0 from being shutdown on one thread while another cleans up another
> queue. What makes it safe in that case? I think this is worth a comment.

I think only one queue shutdown message can progress at a time. These
are processed in virtio_loop() and that in turn calls
fv_queue_set_started(started = false).

So while one queue shutdown is in progress, virtio_loop() will go back
to reading next message only after current queue shutdown has finished.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
  2021-10-06 15:34     ` Christophe de Dinechin
@ 2021-10-06 18:17       ` Vivek Goyal
  -1 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-06 18:17 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: virtio-fs, qemu-devel, stefanha, miklos

On Wed, Oct 06, 2021 at 05:34:59PM +0200, Christophe de Dinechin wrote:
> 
> On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> > As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> > -EOPNOTSUPP.
> >
> > Change that by accepting these requests and returning a reply
> > immediately asking caller to wait. Once lock is available, send a
> > notification to the waiter indicating lock is available.
> >
> > In response to lock request, we are returning error value as "1", which
> > signals to client to queue the lock request internally and later client
> > will get a notification which will signal lock is taken (or error). And
> > then fuse client should wake up the guest process.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
> >  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
> >  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
> >  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
> >  4 files changed, 167 insertions(+), 16 deletions(-)
> >
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index e4679c73ab..2e7f4b786d 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
> >          .unique = req->unique,
> >          .error = error,
> >      };
> > -
> > -    if (error <= -1000 || error > 0) {
> > +    /* error = 1 has been used to signal client to wait for notificaiton */
> > +    if (error <= -1000 || error > 1) {
> 
> What about adding a #define for that special value 1?

Will do. Miklos wants that as well.

> 
> (and while we are at it, the -1000 does not look too good either, that could
> be a separate cleanup patch)

Hmm..., that's an unrelated cleanup. May be for some other day.

> 
> >          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
> >          out.error = -ERANGE;
> >      }
> > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
> >      return send_reply(req, -err, NULL, 0);
> >  }
> >
> > +int fuse_reply_wait(fuse_req_t req)
> > +{
> > +    return send_reply(req, 1, NULL, 0);
> 
> ... to be used here too.

Yes. Wil use new define here too.

> 
> > +}
> > +
> >  void fuse_reply_none(fuse_req_t req)
> >  {
> >      fuse_free_req(req);
> > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
> >      send_reply_ok(req, NULL, 0);
> >  }
> >
> > +static int send_notify_iov(struct fuse_session *se, int notify_code,
> > +                           struct iovec *iov, int count)
> > +{
> > +    struct fuse_out_header out;
> > +    if (!se->got_init) {
> > +        return -ENOTCONN;
> > +    }
> > +    out.unique = 0;
> > +    out.error = notify_code;
> > +    iov[0].iov_base = &out;
> > +    iov[0].iov_len = sizeof(struct fuse_out_header);
> > +    return fuse_send_msg(se, NULL, iov, count);
> > +}
> > +
> > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > +                  int32_t error)
> > +{
> > +    struct fuse_notify_lock_out outarg = {0};
> > +    struct iovec iov[2];
> > +
> > +    outarg.unique = unique;
> > +    outarg.error = -error;
> > +
> > +    iov[1].iov_base = &outarg;
> > +    iov[1].iov_len = sizeof(outarg);
> > +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> > +}
> 
> This may be just me, but I find it odd that you fill iov[0] and iov[1] in
> two separate functions, one of them being static and AFAICT only used once.
> I understand that you are trying to split the notify logic from the lock.
> But the logic is not fully isolated, e.g. the caller needs to know to add
> one to the count, start filling at 1, etc.
> 
> Just a matter of taste, I guess ;-)

I thought that multiple notification types can use common code 
send_notify_iov() because it requires filling common fuse_out_header.
So if in future I introduce another notification say, FUSE_NOTIFY_FOO,
then I can just define one function fuse_lowlevel_notify_foo() and
it can also use send_notify_iov().  I think that's the thought I 
had in mind.


> 
> > +
> >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> >                                 off_t offset, struct fuse_bufvec *bufv)
> >  {
> > diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> > index c55c0ca2fc..64624b48dc 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.h
> > +++ b/tools/virtiofsd/fuse_lowlevel.h
> > @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
> >   */
> >  int fuse_reply_err(fuse_req_t req, int err);
> >
> > +/**
> > + * Ask caller to wait for lock.
> > + *
> > + * Possible requests:
> > + *   setlkw
> > + *
> > + * If caller sends a blocking lock request (setlkw), then reply to caller
> > + * that wait for lock to be available. Once lock is available caller will
> > + * receive a notification with request's unique id. Notification will
> > + * carry info whether lock was successfully obtained or not.
> > + *
> > + * @param req request handle
> > + * @return zero for success, -errno for failure to send reply
> > + */
> > +int fuse_reply_wait(fuse_req_t req);
> > +
> >  /**
> >   * Don't send reply
> >   *
> > @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
> >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> >                                 off_t offset, struct fuse_bufvec *bufv);
> >
> > +/**
> > + * Notify event related to previous lock request
> > + *
> > + * @param se the session object
> > + * @param unique the unique id of the request which requested setlkw
> > + * @param error zero for success, -errno for the failure
> > + */
> > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > +                              int32_t error);
> > +
> >  /*
> >   * Utility functions
> >   */
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index a87e88e286..bb2d4456fc 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> >      vu_dispatch_unlock(qi->virtio_dev);
> >  }
> >
> > +/* Returns NULL if queue is empty */
> > +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> > +{
> > +    struct fuse_session *se = qi->virtio_dev->se;
> > +    VuDev *dev = &se->virtio_dev->dev;
> > +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> > +    FVRequest *req;
> > +
> > +    vu_dispatch_rdlock(qi->virtio_dev);
> > +    pthread_mutex_lock(&qi->vq_lock);
> > +    /* Pop an element from queue */
> > +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> > +    pthread_mutex_unlock(&qi->vq_lock);
> > +    vu_dispatch_unlock(qi->virtio_dev);
> > +    return req;
> > +}
> > +
> >  /*
> >   * Called back by ll whenever it wants to send a reply/message back
> >   * The 1st element of the iov starts with the fuse_out_header
> > @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> >  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >                      struct iovec *iov, int count)
> >  {
> > -    FVRequest *req = container_of(ch, FVRequest, ch);
> > -    struct fv_QueueInfo *qi = ch->qi;
> > -    VuVirtqElement *elem = &req->elem;
> > +    FVRequest *req;
> > +    struct fv_QueueInfo *qi;
> > +    VuVirtqElement *elem;
> >      int ret = 0;
> >
> >      assert(count >= 1);
> > @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >
> >      size_t tosend_len = iov_size(iov, count);
> >
> > -    /* unique == 0 is notification, which we don't support */
> > -    assert(out->unique);
> > +    /* unique == 0 is notification */
> > +    if (!out->unique) {
> > +        if (!se->notify_enabled) {
> > +            return -EOPNOTSUPP;
> > +        }
> > +        /* If notifications are enabled, queue index 1 is notification queue */
> > +        qi = se->virtio_dev->qi[1];
> > +        req = vq_pop_notify_elem(qi);
> > +        if (!req) {
> > +            /*
> > +             * TODO: Implement some sort of ring buffer and queue notifications
> > +             * on that and send these later when notification queue has space
> > +             * available.
> > +             */
> 
> Maybe add a trace / message here to debug more easily if we hit that case?

Maybe I could add a pr_debug() message. But now this code will probably
change. Stefan wants me to wait on some conditional variable for
descriptors to become available (instead of returning -ENOSPC to 
the caller. And be woken up when new descriptors are available (through
queue kick path). In new structure, a message might not be needed.

Thanks
Vivek

> 
> > +            return -ENOSPC;
> > +        }
> > +        req->reply_sent = false;
> > +    } else {
> > +        assert(ch);
> > +        req = container_of(ch, FVRequest, ch);
> > +        qi = ch->qi;
> > +    }
> > +
> > +    elem = &req->elem;
> >      assert(!req->reply_sent);
> >
> >      /* The 'in' part of the elem is to qemu */
> > @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> >          struct fuse_notify_delete_out       delete_out;
> >          struct fuse_notify_store_out        store_out;
> >          struct fuse_notify_retrieve_out     retrieve_out;
> > +        struct fuse_notify_lock_out         lock_out;
> >      };
> >
> >      notify_size = sizeof(struct fuse_out_header) +
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 6928662e22..277f74762b 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2131,13 +2131,35 @@ out:
> >      }
> >  }
> >
> > +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> > +                                    int saverr)
> > +{
> > +    int ret;
> > +
> > +    do {
> > +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> > +        /*
> > +         * Retry sending notification if notification queue does not have
> > +         * free descriptor yet, otherwise break out of loop. Either we
> > +         * successfully sent notifiation or some other error occurred.
> > +         */
> > +        if (ret != -ENOSPC) {
> > +            break;
> > +        }
> > +        usleep(10000);
> > +    } while (1);
> > +}
> > +
> >  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >                       struct flock *lock, int sleep)
> >  {
> >      struct lo_data *lo = lo_data(req);
> >      struct lo_inode *inode;
> >      struct lo_inode_plock *plock;
> > -    int ret, saverr = 0;
> > +    int ret, saverr = 0, ofd;
> > +    uint64_t unique;
> > +    struct fuse_session *se = req->se;
> > +    bool blocking_lock = false;
> >
> >      fuse_log(FUSE_LOG_DEBUG,
> >               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> > @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >          return;
> >      }
> >
> > -    if (sleep) {
> > -        fuse_reply_err(req, EOPNOTSUPP);
> > -        return;
> > -    }
> > -
> >      inode = lo_inode(req, ino);
> >      if (!inode) {
> >          fuse_reply_err(req, EBADF);
> > @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >
> >      if (!plock) {
> >          saverr = ret;
> > +        pthread_mutex_unlock(&inode->plock_mutex);
> >          goto out;
> >      }
> >
> > +    /*
> > +     * plock is now released when inode is going away. We already have
> > +     * a reference on inode, so it is guaranteed that plock->fd is
> > +     * still around even after dropping inode->plock_mutex lock
> > +     */
> > +    ofd = plock->fd;
> > +    pthread_mutex_unlock(&inode->plock_mutex);
> > +
> > +    /*
> > +     * If this lock request can block, request caller to wait for
> > +     * notification. Do not access req after this. Once lock is
> > +     * available, send a notification instead.
> > +     */
> > +    if (sleep && lock->l_type != F_UNLCK) {
> > +        /*
> > +         * If notification queue is not enabled, can't support async
> > +         * locks.
> > +         */
> > +        if (!se->notify_enabled) {
> > +            saverr = EOPNOTSUPP;
> > +            goto out;
> > +        }
> > +        blocking_lock = true;
> > +        unique = req->unique;
> > +        fuse_reply_wait(req);
> > +    }
> > +
> >      /* TODO: Is it alright to modify flock? */
> >      lock->l_pid = 0;
> > -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > +    if (blocking_lock) {
> > +        ret = fcntl(ofd, F_OFD_SETLKW, lock);
> > +    } else {
> > +        ret = fcntl(ofd, F_OFD_SETLK, lock);
> > +    }
> >      if (ret == -1) {
> >          saverr = errno;
> >      }
> >
> >  out:
> > -    pthread_mutex_unlock(&inode->plock_mutex);
> >      lo_inode_put(lo, &inode);
> >
> > -    fuse_reply_err(req, saverr);
> > +    if (!blocking_lock) {
> > +        fuse_reply_err(req, saverr);
> > +    } else {
> > +        setlk_send_notification(se, unique, saverr);
> > +    }
> >  }
> >
> >  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
> 
> 
> --
> Cheers,
> Christophe de Dinechin (IRC c3d)
> 



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 12/13] virtiofsd: Implement blocking posix locks
@ 2021-10-06 18:17       ` Vivek Goyal
  0 siblings, 0 replies; 106+ messages in thread
From: Vivek Goyal @ 2021-10-06 18:17 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: virtio-fs, qemu-devel, miklos

On Wed, Oct 06, 2021 at 05:34:59PM +0200, Christophe de Dinechin wrote:
> 
> On 2021-09-30 at 11:30 -04, Vivek Goyal <vgoyal@redhat.com> wrote...
> > As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> > -EOPNOTSUPP.
> >
> > Change that by accepting these requests and returning a reply
> > immediately asking caller to wait. Once lock is available, send a
> > notification to the waiter indicating lock is available.
> >
> > In response to lock request, we are returning error value as "1", which
> > signals to client to queue the lock request internally and later client
> > will get a notification which will signal lock is taken (or error). And
> > then fuse client should wake up the guest process.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Ioannis Angelakopoulos <iangelak@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c  | 37 ++++++++++++++++-
> >  tools/virtiofsd/fuse_lowlevel.h  | 26 ++++++++++++
> >  tools/virtiofsd/fuse_virtio.c    | 50 ++++++++++++++++++++---
> >  tools/virtiofsd/passthrough_ll.c | 70 ++++++++++++++++++++++++++++----
> >  4 files changed, 167 insertions(+), 16 deletions(-)
> >
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index e4679c73ab..2e7f4b786d 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -179,8 +179,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
> >          .unique = req->unique,
> >          .error = error,
> >      };
> > -
> > -    if (error <= -1000 || error > 0) {
> > +    /* error = 1 has been used to signal client to wait for notificaiton */
> > +    if (error <= -1000 || error > 1) {
> 
> What about adding a #define for that special value 1?

Will do. Miklos wants that as well.

> 
> (and while we are at it, the -1000 does not look too good either, that could
> be a separate cleanup patch)

Hmm..., that's an unrelated cleanup. May be for some other day.

> 
> >          fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
> >          out.error = -ERANGE;
> >      }
> > @@ -290,6 +290,11 @@ int fuse_reply_err(fuse_req_t req, int err)
> >      return send_reply(req, -err, NULL, 0);
> >  }
> >
> > +int fuse_reply_wait(fuse_req_t req)
> > +{
> > +    return send_reply(req, 1, NULL, 0);
> 
> ... to be used here too.

Yes. Wil use new define here too.

> 
> > +}
> > +
> >  void fuse_reply_none(fuse_req_t req)
> >  {
> >      fuse_free_req(req);
> > @@ -2165,6 +2170,34 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
> >      send_reply_ok(req, NULL, 0);
> >  }
> >
> > +static int send_notify_iov(struct fuse_session *se, int notify_code,
> > +                           struct iovec *iov, int count)
> > +{
> > +    struct fuse_out_header out;
> > +    if (!se->got_init) {
> > +        return -ENOTCONN;
> > +    }
> > +    out.unique = 0;
> > +    out.error = notify_code;
> > +    iov[0].iov_base = &out;
> > +    iov[0].iov_len = sizeof(struct fuse_out_header);
> > +    return fuse_send_msg(se, NULL, iov, count);
> > +}
> > +
> > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > +                  int32_t error)
> > +{
> > +    struct fuse_notify_lock_out outarg = {0};
> > +    struct iovec iov[2];
> > +
> > +    outarg.unique = unique;
> > +    outarg.error = -error;
> > +
> > +    iov[1].iov_base = &outarg;
> > +    iov[1].iov_len = sizeof(outarg);
> > +    return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> > +}
> 
> This may be just me, but I find it odd that you fill iov[0] and iov[1] in
> two separate functions, one of them being static and AFAICT only used once.
> I understand that you are trying to split the notify logic from the lock.
> But the logic is not fully isolated, e.g. the caller needs to know to add
> one to the count, start filling at 1, etc.
> 
> Just a matter of taste, I guess ;-)

I thought that multiple notification types can use common code 
send_notify_iov() because it requires filling common fuse_out_header.
So if in future I introduce another notification say, FUSE_NOTIFY_FOO,
then I can just define one function fuse_lowlevel_notify_foo() and
it can also use send_notify_iov().  I think that's the thought I 
had in mind.


> 
> > +
> >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> >                                 off_t offset, struct fuse_bufvec *bufv)
> >  {
> > diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> > index c55c0ca2fc..64624b48dc 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.h
> > +++ b/tools/virtiofsd/fuse_lowlevel.h
> > @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
> >   */
> >  int fuse_reply_err(fuse_req_t req, int err);
> >
> > +/**
> > + * Ask caller to wait for lock.
> > + *
> > + * Possible requests:
> > + *   setlkw
> > + *
> > + * If caller sends a blocking lock request (setlkw), then reply to caller
> > + * that wait for lock to be available. Once lock is available caller will
> > + * receive a notification with request's unique id. Notification will
> > + * carry info whether lock was successfully obtained or not.
> > + *
> > + * @param req request handle
> > + * @return zero for success, -errno for failure to send reply
> > + */
> > +int fuse_reply_wait(fuse_req_t req);
> > +
> >  /**
> >   * Don't send reply
> >   *
> > @@ -1685,6 +1701,16 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
> >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> >                                 off_t offset, struct fuse_bufvec *bufv);
> >
> > +/**
> > + * Notify event related to previous lock request
> > + *
> > + * @param se the session object
> > + * @param unique the unique id of the request which requested setlkw
> > + * @param error zero for success, -errno for the failure
> > + */
> > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
> > +                              int32_t error);
> > +
> >  /*
> >   * Utility functions
> >   */
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index a87e88e286..bb2d4456fc 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -273,6 +273,23 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> >      vu_dispatch_unlock(qi->virtio_dev);
> >  }
> >
> > +/* Returns NULL if queue is empty */
> > +static FVRequest *vq_pop_notify_elem(struct fv_QueueInfo *qi)
> > +{
> > +    struct fuse_session *se = qi->virtio_dev->se;
> > +    VuDev *dev = &se->virtio_dev->dev;
> > +    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> > +    FVRequest *req;
> > +
> > +    vu_dispatch_rdlock(qi->virtio_dev);
> > +    pthread_mutex_lock(&qi->vq_lock);
> > +    /* Pop an element from queue */
> > +    req = vu_queue_pop(dev, q, sizeof(FVRequest));
> > +    pthread_mutex_unlock(&qi->vq_lock);
> > +    vu_dispatch_unlock(qi->virtio_dev);
> > +    return req;
> > +}
> > +
> >  /*
> >   * Called back by ll whenever it wants to send a reply/message back
> >   * The 1st element of the iov starts with the fuse_out_header
> > @@ -281,9 +298,9 @@ static void vq_send_element(struct fv_QueueInfo *qi, VuVirtqElement *elem,
> >  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >                      struct iovec *iov, int count)
> >  {
> > -    FVRequest *req = container_of(ch, FVRequest, ch);
> > -    struct fv_QueueInfo *qi = ch->qi;
> > -    VuVirtqElement *elem = &req->elem;
> > +    FVRequest *req;
> > +    struct fv_QueueInfo *qi;
> > +    VuVirtqElement *elem;
> >      int ret = 0;
> >
> >      assert(count >= 1);
> > @@ -294,8 +311,30 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >
> >      size_t tosend_len = iov_size(iov, count);
> >
> > -    /* unique == 0 is notification, which we don't support */
> > -    assert(out->unique);
> > +    /* unique == 0 is notification */
> > +    if (!out->unique) {
> > +        if (!se->notify_enabled) {
> > +            return -EOPNOTSUPP;
> > +        }
> > +        /* If notifications are enabled, queue index 1 is notification queue */
> > +        qi = se->virtio_dev->qi[1];
> > +        req = vq_pop_notify_elem(qi);
> > +        if (!req) {
> > +            /*
> > +             * TODO: Implement some sort of ring buffer and queue notifications
> > +             * on that and send these later when notification queue has space
> > +             * available.
> > +             */
> 
> Maybe add a trace / message here to debug more easily if we hit that case?

Maybe I could add a pr_debug() message. But now this code will probably
change. Stefan wants me to wait on some conditional variable for
descriptors to become available (instead of returning -ENOSPC to 
the caller. And be woken up when new descriptors are available (through
queue kick path). In new structure, a message might not be needed.

Thanks
Vivek

> 
> > +            return -ENOSPC;
> > +        }
> > +        req->reply_sent = false;
> > +    } else {
> > +        assert(ch);
> > +        req = container_of(ch, FVRequest, ch);
> > +        qi = ch->qi;
> > +    }
> > +
> > +    elem = &req->elem;
> >      assert(!req->reply_sent);
> >
> >      /* The 'in' part of the elem is to qemu */
> > @@ -985,6 +1024,7 @@ static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> >          struct fuse_notify_delete_out       delete_out;
> >          struct fuse_notify_store_out        store_out;
> >          struct fuse_notify_retrieve_out     retrieve_out;
> > +        struct fuse_notify_lock_out         lock_out;
> >      };
> >
> >      notify_size = sizeof(struct fuse_out_header) +
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 6928662e22..277f74762b 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2131,13 +2131,35 @@ out:
> >      }
> >  }
> >
> > +static void setlk_send_notification(struct fuse_session *se, uint64_t unique,
> > +                                    int saverr)
> > +{
> > +    int ret;
> > +
> > +    do {
> > +        ret = fuse_lowlevel_notify_lock(se, unique, saverr);
> > +        /*
> > +         * Retry sending notification if notification queue does not have
> > +         * free descriptor yet, otherwise break out of loop. Either we
> > +         * successfully sent notifiation or some other error occurred.
> > +         */
> > +        if (ret != -ENOSPC) {
> > +            break;
> > +        }
> > +        usleep(10000);
> > +    } while (1);
> > +}
> > +
> >  static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >                       struct flock *lock, int sleep)
> >  {
> >      struct lo_data *lo = lo_data(req);
> >      struct lo_inode *inode;
> >      struct lo_inode_plock *plock;
> > -    int ret, saverr = 0;
> > +    int ret, saverr = 0, ofd;
> > +    uint64_t unique;
> > +    struct fuse_session *se = req->se;
> > +    bool blocking_lock = false;
> >
> >      fuse_log(FUSE_LOG_DEBUG,
> >               "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> > @@ -2151,11 +2173,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >          return;
> >      }
> >
> > -    if (sleep) {
> > -        fuse_reply_err(req, EOPNOTSUPP);
> > -        return;
> > -    }
> > -
> >      inode = lo_inode(req, ino);
> >      if (!inode) {
> >          fuse_reply_err(req, EBADF);
> > @@ -2168,21 +2185,56 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >
> >      if (!plock) {
> >          saverr = ret;
> > +        pthread_mutex_unlock(&inode->plock_mutex);
> >          goto out;
> >      }
> >
> > +    /*
> > +     * plock is now released when inode is going away. We already have
> > +     * a reference on inode, so it is guaranteed that plock->fd is
> > +     * still around even after dropping inode->plock_mutex lock
> > +     */
> > +    ofd = plock->fd;
> > +    pthread_mutex_unlock(&inode->plock_mutex);
> > +
> > +    /*
> > +     * If this lock request can block, request caller to wait for
> > +     * notification. Do not access req after this. Once lock is
> > +     * available, send a notification instead.
> > +     */
> > +    if (sleep && lock->l_type != F_UNLCK) {
> > +        /*
> > +         * If notification queue is not enabled, can't support async
> > +         * locks.
> > +         */
> > +        if (!se->notify_enabled) {
> > +            saverr = EOPNOTSUPP;
> > +            goto out;
> > +        }
> > +        blocking_lock = true;
> > +        unique = req->unique;
> > +        fuse_reply_wait(req);
> > +    }
> > +
> >      /* TODO: Is it alright to modify flock? */
> >      lock->l_pid = 0;
> > -    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > +    if (blocking_lock) {
> > +        ret = fcntl(ofd, F_OFD_SETLKW, lock);
> > +    } else {
> > +        ret = fcntl(ofd, F_OFD_SETLK, lock);
> > +    }
> >      if (ret == -1) {
> >          saverr = errno;
> >      }
> >
> >  out:
> > -    pthread_mutex_unlock(&inode->plock_mutex);
> >      lo_inode_put(lo, &inode);
> >
> > -    fuse_reply_err(req, saverr);
> > +    if (!blocking_lock) {
> > +        fuse_reply_err(req, saverr);
> > +    } else {
> > +        setlk_send_notification(se, unique, saverr);
> > +    }
> >  }
> >
> >  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
> 
> 
> --
> Cheers,
> Christophe de Dinechin (IRC c3d)
> 


^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 00/13] virtiofsd: Support notification queue and
  2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
@ 2021-10-25 18:00   ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 106+ messages in thread
From: Dr. David Alan Gilbert @ 2021-10-25 18:00 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, qemu-devel, iangelak, virtio-fs, jaggel

* Vivek Goyal (vgoyal@redhat.com) wrote:
> Hi,
> 
> Here are the patches to support notification queue and blocking
> posix locks. One of the biggest change since las time has been
> creation of custom thread pool for handling locking requests. 
> Thanks to Ioannis for doing most of the work on custom thread
> pool.
> 
> I have posted corresponding kernel changes here.
> 
> https://lore.kernel.org/linux-fsdevel/20210930143850.1188628-1-vgoyal@redhat.com/T/#mb2d0fbfdb580ef33b6e812d0acbd16333b11f2cf

I'm queuing:
[PATCH 03/13] virtiofsd: Remove unused virtio_fs_config definition
[PATCH 04/13] virtiofsd: Add a helper to send element on virtqueue
[PATCH 05/13] virtiofsd: Add a helper to stop all queues

from this series; they're separate cleanups.

Dave

> Any feedback is welcome.
> 
> Thanks
> Vivek
> 
> Vivek Goyal (13):
>   virtio_fs.h: Add notification queue feature bit
>   virtiofsd: fuse.h header file changes for lock notification
>   virtiofsd: Remove unused virtio_fs_config definition
>   virtiofsd: Add a helper to send element on virtqueue
>   virtiofsd: Add a helper to stop all queues
>   vhost-user-fs: Use helpers to create/cleanup virtqueue
>   virtiofsd: Release file locks using F_UNLCK
>   virtiofsd: Create a notification queue
>   virtiofsd: Specify size of notification buffer using config space
>   virtiofsd: Custom threadpool for remote blocking posix locks requests
>   virtiofsd: Shutdown notification queue in the end
>   virtiofsd: Implement blocking posix locks
>   virtiofsd, seccomp: Add clock_nanosleep() to allow list
> 
>  hw/virtio/vhost-user-fs-pci.c              |   4 +-
>  hw/virtio/vhost-user-fs.c                  | 158 ++++++++--
>  include/hw/virtio/vhost-user-fs.h          |   4 +
>  include/standard-headers/linux/fuse.h      |  11 +-
>  include/standard-headers/linux/virtio_fs.h |   5 +
>  tools/virtiofsd/fuse_i.h                   |   1 +
>  tools/virtiofsd/fuse_lowlevel.c            |  37 ++-
>  tools/virtiofsd/fuse_lowlevel.h            |  26 ++
>  tools/virtiofsd/fuse_virtio.c              | 339 +++++++++++++++++----
>  tools/virtiofsd/meson.build                |   1 +
>  tools/virtiofsd/passthrough_ll.c           |  91 +++++-
>  tools/virtiofsd/passthrough_seccomp.c      |   2 +
>  tools/virtiofsd/tpool.c                    | 331 ++++++++++++++++++++
>  tools/virtiofsd/tpool.h                    |  18 ++
>  14 files changed, 915 insertions(+), 113 deletions(-)
>  create mode 100644 tools/virtiofsd/tpool.c
>  create mode 100644 tools/virtiofsd/tpool.h
> 
> -- 
> 2.31.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Virtio-fs] [PATCH 00/13] virtiofsd: Support notification queue and
@ 2021-10-25 18:00   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 106+ messages in thread
From: Dr. David Alan Gilbert @ 2021-10-25 18:00 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: miklos, qemu-devel, virtio-fs

* Vivek Goyal (vgoyal@redhat.com) wrote:
> Hi,
> 
> Here are the patches to support notification queue and blocking
> posix locks. One of the biggest change since las time has been
> creation of custom thread pool for handling locking requests. 
> Thanks to Ioannis for doing most of the work on custom thread
> pool.
> 
> I have posted corresponding kernel changes here.
> 
> https://lore.kernel.org/linux-fsdevel/20210930143850.1188628-1-vgoyal@redhat.com/T/#mb2d0fbfdb580ef33b6e812d0acbd16333b11f2cf

I'm queuing:
[PATCH 03/13] virtiofsd: Remove unused virtio_fs_config definition
[PATCH 04/13] virtiofsd: Add a helper to send element on virtqueue
[PATCH 05/13] virtiofsd: Add a helper to stop all queues

from this series; they're separate cleanups.

Dave

> Any feedback is welcome.
> 
> Thanks
> Vivek
> 
> Vivek Goyal (13):
>   virtio_fs.h: Add notification queue feature bit
>   virtiofsd: fuse.h header file changes for lock notification
>   virtiofsd: Remove unused virtio_fs_config definition
>   virtiofsd: Add a helper to send element on virtqueue
>   virtiofsd: Add a helper to stop all queues
>   vhost-user-fs: Use helpers to create/cleanup virtqueue
>   virtiofsd: Release file locks using F_UNLCK
>   virtiofsd: Create a notification queue
>   virtiofsd: Specify size of notification buffer using config space
>   virtiofsd: Custom threadpool for remote blocking posix locks requests
>   virtiofsd: Shutdown notification queue in the end
>   virtiofsd: Implement blocking posix locks
>   virtiofsd, seccomp: Add clock_nanosleep() to allow list
> 
>  hw/virtio/vhost-user-fs-pci.c              |   4 +-
>  hw/virtio/vhost-user-fs.c                  | 158 ++++++++--
>  include/hw/virtio/vhost-user-fs.h          |   4 +
>  include/standard-headers/linux/fuse.h      |  11 +-
>  include/standard-headers/linux/virtio_fs.h |   5 +
>  tools/virtiofsd/fuse_i.h                   |   1 +
>  tools/virtiofsd/fuse_lowlevel.c            |  37 ++-
>  tools/virtiofsd/fuse_lowlevel.h            |  26 ++
>  tools/virtiofsd/fuse_virtio.c              | 339 +++++++++++++++++----
>  tools/virtiofsd/meson.build                |   1 +
>  tools/virtiofsd/passthrough_ll.c           |  91 +++++-
>  tools/virtiofsd/passthrough_seccomp.c      |   2 +
>  tools/virtiofsd/tpool.c                    | 331 ++++++++++++++++++++
>  tools/virtiofsd/tpool.h                    |  18 ++
>  14 files changed, 915 insertions(+), 113 deletions(-)
>  create mode 100644 tools/virtiofsd/tpool.c
>  create mode 100644 tools/virtiofsd/tpool.h
> 
> -- 
> 2.31.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2021-10-25 18:03 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 15:30 [PATCH 00/13] virtiofsd: Support notification queue and Vivek Goyal
2021-09-30 15:30 ` [Virtio-fs] " Vivek Goyal
2021-09-30 15:30 ` [PATCH 01/13] virtio_fs.h: Add notification queue feature bit Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 13:12   ` Stefan Hajnoczi
2021-10-04 13:12     ` [Virtio-fs] " Stefan Hajnoczi
2021-09-30 15:30 ` [PATCH 02/13] virtiofsd: fuse.h header file changes for lock notification Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 13:16   ` Stefan Hajnoczi
2021-10-04 13:16     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-04 14:01     ` Vivek Goyal
2021-10-04 14:01       ` [Virtio-fs] " Vivek Goyal
2021-09-30 15:30 ` [PATCH 03/13] virtiofsd: Remove unused virtio_fs_config definition Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 13:17   ` Stefan Hajnoczi
2021-10-04 13:17     ` [Virtio-fs] " Stefan Hajnoczi
2021-09-30 15:30 ` [PATCH 04/13] virtiofsd: Add a helper to send element on virtqueue Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 13:19   ` Stefan Hajnoczi
2021-10-04 13:19     ` [Virtio-fs] " Stefan Hajnoczi
2021-09-30 15:30 ` [PATCH 05/13] virtiofsd: Add a helper to stop all queues Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 13:22   ` Stefan Hajnoczi
2021-10-04 13:22     ` [Virtio-fs] " Stefan Hajnoczi
2021-09-30 15:30 ` [PATCH 06/13] vhost-user-fs: Use helpers to create/cleanup virtqueue Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 13:54   ` Stefan Hajnoczi
2021-10-04 13:54     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-04 19:58     ` Vivek Goyal
2021-10-04 19:58       ` [Virtio-fs] " Vivek Goyal
2021-10-05  8:09       ` Stefan Hajnoczi
2021-10-05  8:09         ` [Virtio-fs] " Stefan Hajnoczi
2021-10-06 13:35   ` Christophe de Dinechin
2021-10-06 13:35     ` Christophe de Dinechin
2021-10-06 17:40     ` Vivek Goyal
2021-10-06 17:40       ` Vivek Goyal
2021-09-30 15:30 ` [PATCH 07/13] virtiofsd: Release file locks using F_UNLCK Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-05 13:37   ` Christophe de Dinechin
2021-10-05 13:37     ` Christophe de Dinechin
2021-10-05 15:38     ` Vivek Goyal
2021-10-05 15:38       ` Vivek Goyal
2021-09-30 15:30 ` [PATCH 08/13] virtiofsd: Create a notification queue Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 14:30   ` Stefan Hajnoczi
2021-10-04 14:30     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-04 21:01     ` Vivek Goyal
2021-10-04 21:01       ` [Virtio-fs] " Vivek Goyal
2021-10-05  8:14       ` Stefan Hajnoczi
2021-10-05  8:14         ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 12:31         ` Vivek Goyal
2021-10-05 12:31           ` [Virtio-fs] " Vivek Goyal
2021-09-30 15:30 ` [PATCH 09/13] virtiofsd: Specify size of notification buffer using config space Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 14:33   ` Stefan Hajnoczi
2021-10-04 14:33     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-04 21:10     ` Vivek Goyal
2021-10-04 21:10       ` [Virtio-fs] " Vivek Goyal
2021-10-06 10:05   ` Christophe de Dinechin
2021-10-06 10:05     ` Christophe de Dinechin
2021-09-30 15:30 ` [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 14:54   ` Stefan Hajnoczi
2021-10-04 14:54     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 13:06     ` Vivek Goyal
2021-10-05 13:06       ` [Virtio-fs] " Vivek Goyal
2021-10-05 20:09     ` Vivek Goyal
2021-10-05 20:09       ` [Virtio-fs] " Vivek Goyal
2021-10-06 10:26       ` Stefan Hajnoczi
2021-10-06 10:26         ` [Virtio-fs] " Stefan Hajnoczi
2021-09-30 15:30 ` [PATCH 11/13] virtiofsd: Shutdown notification queue in the end Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 15:01   ` Stefan Hajnoczi
2021-10-04 15:01     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 13:19     ` Vivek Goyal
2021-10-05 13:19       ` [Virtio-fs] " Vivek Goyal
2021-10-06 15:15   ` Christophe de Dinechin
2021-10-06 15:15     ` Christophe de Dinechin
2021-10-06 17:58     ` Vivek Goyal
2021-10-06 17:58       ` Vivek Goyal
2021-09-30 15:30 ` [PATCH 12/13] virtiofsd: Implement blocking posix locks Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-04 15:07   ` Stefan Hajnoczi
2021-10-04 15:07     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 13:26     ` Vivek Goyal
2021-10-05 13:26       ` [Virtio-fs] " Vivek Goyal
2021-10-05 12:22   ` Stefan Hajnoczi
2021-10-05 12:22     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 15:14     ` Vivek Goyal
2021-10-05 15:14       ` [Virtio-fs] " Vivek Goyal
2021-10-05 15:49       ` Stefan Hajnoczi
2021-10-05 15:49         ` [Virtio-fs] " Stefan Hajnoczi
2021-10-06 15:34   ` Christophe de Dinechin
2021-10-06 15:34     ` Christophe de Dinechin
2021-10-06 18:17     ` Vivek Goyal
2021-10-06 18:17       ` Vivek Goyal
2021-09-30 15:30 ` [PATCH 13/13] virtiofsd, seccomp: Add clock_nanosleep() to allow list Vivek Goyal
2021-09-30 15:30   ` [Virtio-fs] " Vivek Goyal
2021-10-05 12:22   ` Stefan Hajnoczi
2021-10-05 12:22     ` [Virtio-fs] " Stefan Hajnoczi
2021-10-05 15:16     ` Vivek Goyal
2021-10-05 15:50       ` Stefan Hajnoczi
2021-10-05 17:28         ` Vivek Goyal
2021-10-06 10:27           ` Stefan Hajnoczi
2021-10-25 18:00 ` [PATCH 00/13] virtiofsd: Support notification queue and Dr. David Alan Gilbert
2021-10-25 18:00   ` [Virtio-fs] " Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.