[PATCH 0/4] [RFC] virtiofsd, vhost-user-fs: Add support for notification queue

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] [RFC] virtiofsd, vhost-user-fs: Add support for notification queue
@ 2019-11-15 20:55 ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, stefanha, vgoyal, dgilbert

Hi,

Here are RFC patches for adding a notification queue to virtio-fs to
send notifications from host to guest. It also has patches to support
remote posix locks which make use of this newly introduced notification
queue.

These patches apply on top of following.

https://gitlab.com/virtio-fs/qemu/tree/virtio-fs-dev

These change require changes in spec also. I have yet to create spec
changes and that's why this is still an RFC patch series.

Any feedback is appreciated.

Thanks
Vivek

Vivek Goyal (4):
  virtiofsd: Release file locks using F_UNLCK
  virtiofd: Create a notification queue
  virtiofsd: Specify size of notification buffer using config space
  virtiofsd: Implement blocking posix locks

 contrib/virtiofsd/fuse_i.h                 |   1 +
 contrib/virtiofsd/fuse_kernel.h            |   7 +
 contrib/virtiofsd/fuse_lowlevel.c          |  23 ++-
 contrib/virtiofsd/fuse_lowlevel.h          |  25 +++
 contrib/virtiofsd/fuse_virtio.c            | 226 +++++++++++++++++++--
 contrib/virtiofsd/passthrough_ll.c         |  79 +++++--
 hw/virtio/vhost-user-fs-pci.c              |   2 +-
 hw/virtio/vhost-user-fs.c                  |  63 +++++-
 include/hw/virtio/vhost-user-fs.h          |   3 +
 include/standard-headers/linux/virtio_fs.h |   5 +
 10 files changed, 386 insertions(+), 48 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Virtio-fs] [PATCH 0/4] [RFC] virtiofsd, vhost-user-fs: Add support for notification queue
@ 2019-11-15 20:55 ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, vgoyal

Hi,

Here are RFC patches for adding a notification queue to virtio-fs to
send notifications from host to guest. It also has patches to support
remote posix locks which make use of this newly introduced notification
queue.

These patches apply on top of following.

https://gitlab.com/virtio-fs/qemu/tree/virtio-fs-dev

These change require changes in spec also. I have yet to create spec
changes and that's why this is still an RFC patch series.

Any feedback is appreciated.

Thanks
Vivek

Vivek Goyal (4):
  virtiofsd: Release file locks using F_UNLCK
  virtiofd: Create a notification queue
  virtiofsd: Specify size of notification buffer using config space
  virtiofsd: Implement blocking posix locks

 contrib/virtiofsd/fuse_i.h                 |   1 +
 contrib/virtiofsd/fuse_kernel.h            |   7 +
 contrib/virtiofsd/fuse_lowlevel.c          |  23 ++-
 contrib/virtiofsd/fuse_lowlevel.h          |  25 +++
 contrib/virtiofsd/fuse_virtio.c            | 226 +++++++++++++++++++--
 contrib/virtiofsd/passthrough_ll.c         |  79 +++++--
 hw/virtio/vhost-user-fs-pci.c              |   2 +-
 hw/virtio/vhost-user-fs.c                  |  63 +++++-
 include/hw/virtio/vhost-user-fs.h          |   3 +
 include/standard-headers/linux/virtio_fs.h |   5 +
 10 files changed, 386 insertions(+), 48 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/4] virtiofsd: Release file locks using F_UNLCK
  2019-11-15 20:55 ` [Virtio-fs] " Vivek Goyal
@ 2019-11-15 20:55   ` Vivek Goyal
  -1 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, stefanha, vgoyal, dgilbert

We are emulating posix locks for guest using open file description locks
in virtiofsd. When any of the fd is closed in guest, we find associated
OFD lock fd (if there is one) and close it to release all the locks.

Assumption here is that there is no other thread using lo_inode_plock
structure or plock->fd, hence it is safe to do so.

But now we are about to introduce blocking variant of locks (SETLKW),
and that means we might be waiting to a lock to be available and
using plock->fd. And that means there are still users of plock structure.

So release locks using fcntl(SETLK, F_UNLCK) instead and plock will
be freed later.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index bc214df0c7..028e7da273 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -936,6 +936,14 @@ static void put_shared(struct lo_data *lo, struct lo_inode *inode)
 	}
 }
 
+static void release_plock(gpointer data)
+{
+	struct lo_inode_plock *plock = data;
+
+	close(plock->fd);
+	free(plock);
+}
+
 /* Increments nlookup and caller must release refcount using
  * lo_inode_put(&parent).
  */
@@ -994,7 +1002,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 		inode->key.ino = e->attr.st_ino;
 		inode->key.dev = e->attr.st_dev;
 		pthread_mutex_init(&inode->plock_mutex, NULL);
-		inode->posix_locks = g_hash_table_new(g_direct_hash, g_direct_equal);
+		inode->posix_locks = g_hash_table_new_full(g_direct_hash,
+					g_direct_equal, NULL, release_plock);
 
 		get_shared(lo, inode);
 
@@ -1436,9 +1445,6 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
 	if (!inode->nlookup) {
 		lo_map_remove(&lo->ino_map, inode->fuse_ino);
                 g_hash_table_remove(lo->inodes, &inode->key);
-		if (g_hash_table_size(inode->posix_locks)) {
-			fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
-		}
 		g_hash_table_destroy(inode->posix_locks);
 		pthread_mutex_destroy(&inode->plock_mutex);
 
@@ -1868,6 +1874,7 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
 	plock->fd = fd;
 	g_hash_table_insert(inode->posix_locks,
 			    GUINT_TO_POINTER(plock->lock_owner), plock);
+	fuse_log(FUSE_LOG_DEBUG, "lookup_create_plock_ctx(): Inserted element in posix_locks hash table with value pointer %p\n", plock);
 	return plock;
 }
 
@@ -2046,6 +2053,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 	(void) ino;
 	struct lo_inode *inode;
 	struct lo_inode_plock *plock;
+	struct flock flock;
 
 	inode = lo_inode(req, ino);
 	if (!inode) {
@@ -2058,14 +2066,16 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 	plock = g_hash_table_lookup(inode->posix_locks,
 				    GUINT_TO_POINTER(fi->lock_owner));
 	if (plock) {
-		g_hash_table_remove(inode->posix_locks,
-				    GUINT_TO_POINTER(fi->lock_owner));
 		/*
-		 * We had used open() for locks and had only one fd. So
-		 * closing this fd should release all OFD locks.
+		 * An fd is being closed. For posix locks, this means
+		 * drop all the associated locks.
 		 */
-		close(plock->fd);
-		free(plock);
+		memset(&flock, 0, sizeof(struct flock));
+		flock.l_type = F_UNLCK;
+		flock.l_whence = SEEK_SET;
+		/* Unlock whole file */
+		flock.l_start = flock.l_len = 0;
+		fcntl(plock->fd, F_SETLK, &flock);
 	}
 	pthread_mutex_unlock(&inode->plock_mutex);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Virtio-fs] [PATCH 1/4] virtiofsd: Release file locks using F_UNLCK
@ 2019-11-15 20:55   ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, vgoyal

We are emulating posix locks for guest using open file description locks
in virtiofsd. When any of the fd is closed in guest, we find associated
OFD lock fd (if there is one) and close it to release all the locks.

Assumption here is that there is no other thread using lo_inode_plock
structure or plock->fd, hence it is safe to do so.

But now we are about to introduce blocking variant of locks (SETLKW),
and that means we might be waiting to a lock to be available and
using plock->fd. And that means there are still users of plock structure.

So release locks using fcntl(SETLK, F_UNLCK) instead and plock will
be freed later.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index bc214df0c7..028e7da273 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -936,6 +936,14 @@ static void put_shared(struct lo_data *lo, struct lo_inode *inode)
 	}
 }
 
+static void release_plock(gpointer data)
+{
+	struct lo_inode_plock *plock = data;
+
+	close(plock->fd);
+	free(plock);
+}
+
 /* Increments nlookup and caller must release refcount using
  * lo_inode_put(&parent).
  */
@@ -994,7 +1002,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 		inode->key.ino = e->attr.st_ino;
 		inode->key.dev = e->attr.st_dev;
 		pthread_mutex_init(&inode->plock_mutex, NULL);
-		inode->posix_locks = g_hash_table_new(g_direct_hash, g_direct_equal);
+		inode->posix_locks = g_hash_table_new_full(g_direct_hash,
+					g_direct_equal, NULL, release_plock);
 
 		get_shared(lo, inode);
 
@@ -1436,9 +1445,6 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
 	if (!inode->nlookup) {
 		lo_map_remove(&lo->ino_map, inode->fuse_ino);
                 g_hash_table_remove(lo->inodes, &inode->key);
-		if (g_hash_table_size(inode->posix_locks)) {
-			fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
-		}
 		g_hash_table_destroy(inode->posix_locks);
 		pthread_mutex_destroy(&inode->plock_mutex);
 
@@ -1868,6 +1874,7 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
 	plock->fd = fd;
 	g_hash_table_insert(inode->posix_locks,
 			    GUINT_TO_POINTER(plock->lock_owner), plock);
+	fuse_log(FUSE_LOG_DEBUG, "lookup_create_plock_ctx(): Inserted element in posix_locks hash table with value pointer %p\n", plock);
 	return plock;
 }
 
@@ -2046,6 +2053,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 	(void) ino;
 	struct lo_inode *inode;
 	struct lo_inode_plock *plock;
+	struct flock flock;
 
 	inode = lo_inode(req, ino);
 	if (!inode) {
@@ -2058,14 +2066,16 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 	plock = g_hash_table_lookup(inode->posix_locks,
 				    GUINT_TO_POINTER(fi->lock_owner));
 	if (plock) {
-		g_hash_table_remove(inode->posix_locks,
-				    GUINT_TO_POINTER(fi->lock_owner));
 		/*
-		 * We had used open() for locks and had only one fd. So
-		 * closing this fd should release all OFD locks.
+		 * An fd is being closed. For posix locks, this means
+		 * drop all the associated locks.
 		 */
-		close(plock->fd);
-		free(plock);
+		memset(&flock, 0, sizeof(struct flock));
+		flock.l_type = F_UNLCK;
+		flock.l_whence = SEEK_SET;
+		/* Unlock whole file */
+		flock.l_start = flock.l_len = 0;
+		fcntl(plock->fd, F_SETLK, &flock);
 	}
 	pthread_mutex_unlock(&inode->plock_mutex);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/4] virtiofd: Create a notification queue
  2019-11-15 20:55 ` [Virtio-fs] " Vivek Goyal
@ 2019-11-15 20:55   ` Vivek Goyal
  -1 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, stefanha, vgoyal, dgilbert

Add a notification queue which will be used to send async notifications
for file lock availability.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 contrib/virtiofsd/fuse_i.h                 |   1 +
 contrib/virtiofsd/fuse_virtio.c            | 108 ++++++++++++++++++---
 hw/virtio/vhost-user-fs-pci.c              |   2 +-
 hw/virtio/vhost-user-fs.c                  |  37 +++++--
 include/hw/virtio/vhost-user-fs.h          |   1 +
 include/standard-headers/linux/virtio_fs.h |   3 +
 6 files changed, 130 insertions(+), 22 deletions(-)

diff --git a/contrib/virtiofsd/fuse_i.h b/contrib/virtiofsd/fuse_i.h
index 966b1a3baa..4eeae0bfeb 100644
--- a/contrib/virtiofsd/fuse_i.h
+++ b/contrib/virtiofsd/fuse_i.h
@@ -74,6 +74,7 @@ struct fuse_session {
 	char *vu_socket_lock;
 	struct fv_VuDev *virtio_dev;
 	int thread_pool_size;
+	bool notify_enabled;
 };
 
 struct fuse_chan {
diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 31c8542b6c..411114c9b3 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu/iov.h"
 #include "qapi/error.h"
+#include "standard-headers/linux/virtio_fs.h"
 #include "fuse_i.h"
 #include "fuse_kernel.h"
 #include "fuse_misc.h"
@@ -98,23 +99,31 @@ struct fv_VuDev {
      */
     size_t nqueues;
     struct fv_QueueInfo **qi;
-};
-
-/* From spec */
-struct virtio_fs_config {
-    char tag[36];
-    uint32_t num_queues;
+    /* True if notification queue is being used */
+    bool notify_enabled;
 };
 
 /* Callback from libvhost-user */
 static uint64_t fv_get_features(VuDev *dev)
 {
-    return 1ULL << VIRTIO_F_VERSION_1;
+    uint64_t features;
+
+    features = 1ull << VIRTIO_F_VERSION_1 |
+               1ull << VIRTIO_FS_F_NOTIFICATION;
+
+    return features;
 }
 
 /* Callback from libvhost-user */
 static void fv_set_features(VuDev *dev, uint64_t features)
 {
+    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
+    struct fuse_session *se = vud->se;
+
+    if ((1 << VIRTIO_FS_F_NOTIFICATION) & features) {
+        vud->notify_enabled = true;
+        se->notify_enabled = true;
+    }
 }
 
 /*
@@ -662,6 +671,65 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
     free(req);
 }
 
+static void *fv_queue_notify_thread(void *opaque)
+{
+    struct fv_QueueInfo *qi = opaque;
+
+    fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
+             qi->qidx, qi->kick_fd);
+
+    while (1) {
+        struct pollfd pf[2];
+
+        pf[0].fd = qi->kick_fd;
+        pf[0].events = POLLIN;
+        pf[0].revents = 0;
+        pf[1].fd = qi->kill_fd;
+        pf[1].events = POLLIN;
+        pf[1].revents = 0;
+
+        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
+                 qi->qidx);
+        int poll_res = ppoll(pf, 2, NULL, NULL);
+
+        if (poll_res == -1) {
+            if (errno == EINTR) {
+                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
+                         __func__);
+                continue;
+            }
+            fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
+            break;
+        }
+        assert(poll_res >= 1);
+        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
+            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
+                     __func__, pf[0].revents, qi->qidx);
+             break;
+        }
+        if (pf[1].revents & (POLLERR | POLLHUP | POLLNVAL)) {
+            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d"
+                     "killfd\n", __func__, pf[1].revents, qi->qidx);
+            break;
+        }
+        if (pf[1].revents) {
+            fuse_log(FUSE_LOG_INFO, "%s: kill event on queue %d - quitting\n",
+                     __func__, qi->qidx);
+            break;
+        }
+        assert(pf[0].revents & POLLIN);
+        fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
+                 qi->qidx);
+
+        eventfd_t evalue;
+        if (eventfd_read(qi->kick_fd, &evalue)) {
+            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
+            break;
+        }
+    }
+    return NULL;
+}
+
 /* Thread function for individual queues, created when a queue is 'started' */
 static void *fv_queue_thread(void *opaque)
 {
@@ -771,6 +839,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
 {
     struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
     struct fv_QueueInfo *ourqi;
+    void * (*thread_func) (void *) = fv_queue_thread;
+    int valid_queues = 2; /* One hiprio queue and one request queue */
 
     fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
              started);
@@ -782,10 +852,12 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
      * well-behaved client in mind and may not protect against all types of
      * races yet.
      */
-    if (qidx > 1) {
-        fuse_log(FUSE_LOG_ERR,
-                 "%s: multiple request queues not yet implemented, please only "
-                 "configure 1 request queue\n",
+    if (vud->notify_enabled)
+        valid_queues++;
+
+    if (qidx >= valid_queues) {
+        fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
+                 "implemented, please only configure 1 request queue\n",
                  __func__);
         exit(EXIT_FAILURE);
     }
@@ -813,9 +885,17 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
 
         ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
         assert(ourqi->kill_fd != -1);
-        pthread_mutex_init(&ourqi->vq_lock, NULL);
+        /*
+         * First queue (idx = 0)  is hiprio queue. Second queue is
+         * notification queue (if enabled). And rest are request
+         * queues.
+         */
+        if (vud->notify_enabled && qidx == 1) {
+            thread_func = fv_queue_notify_thread;
+        }
 
-        if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
+        pthread_mutex_init(&ourqi->vq_lock, NULL);
+        if (pthread_create(&ourqi->thread, NULL, thread_func, ourqi)) {
             fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
                      __func__, qidx);
             assert(0);
@@ -1040,7 +1120,7 @@ int virtio_session_mount(struct fuse_session *se)
     se->virtio_dev = calloc(sizeof(struct fv_VuDev), 1);
     se->virtio_dev->se = se;
     pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
-    vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
+    vu_init(&se->virtio_dev->dev, 3, se->vu_socketfd, fv_panic, fv_set_watch,
             fv_remove_watch, &fv_iface);
 
     return 0;
diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
index 0f3c3c8711..95f9fe5c5c 100644
--- a/hw/virtio/vhost-user-fs-pci.c
+++ b/hw/virtio/vhost-user-fs-pci.c
@@ -44,7 +44,7 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     uint64_t totalsize;
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 1;
+        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
     }
 
     qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 455e97beea..5555fe9dbe 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,10 @@
 #include "exec/address-spaces.h"
 #include "trace.h"
 
+static const int user_feature_bits[] = {
+    VIRTIO_FS_F_NOTIFICATION,
+};
+
 uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
                                  int fd)
 {
@@ -378,12 +382,23 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
     }
 }
 
-static uint64_t vuf_get_features(VirtIODevice *vdev,
-                                      uint64_t requested_features,
-                                      Error **errp)
+static uint64_t vuf_get_features(VirtIODevice *vdev, uint64_t features,
+                                 Error **errp)
 {
-    /* No feature bits used yet */
-    return requested_features;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
+
+    return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
+}
+
+static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
+{
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
+        fs->notify_enabled = true;
+    }
 }
 
 static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
@@ -515,13 +530,20 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
     /* Hiprio queue */
     virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
 
+    /* Notification queue. Feature negotiation happens later. So at this
+     * point of time we don't know if driver will use notification queue
+     * or not.
+     */
+    virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
+
     /* Request queues */
     for (i = 0; i < fs->conf.num_request_queues; i++) {
         virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
     }
 
-    /* 1 high prio queue, plus the number configured */
-    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
+    /* 1 high prio queue, 1 notification queue plus the number configured */
+    fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
+
     fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0);
@@ -584,6 +606,7 @@ static void vuf_class_init(ObjectClass *klass, void *data)
     vdc->realize = vuf_device_realize;
     vdc->unrealize = vuf_device_unrealize;
     vdc->get_features = vuf_get_features;
+    vdc->set_features = vuf_set_features;
     vdc->get_config = vuf_get_config;
     vdc->set_status = vuf_set_status;
     vdc->guest_notifier_mask = vuf_guest_notifier_mask;
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 4e7be1f312..bd47e0da98 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -64,6 +64,7 @@ typedef struct {
     /* Metadata version table */
     size_t mdvt_size;
     MemoryRegion mdvt;
+    bool notify_enabled;
 } VHostUserFS;
 
 /* Callbacks from the vhost-user code for slave commands */
diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
index 310210b7b6..9ee95f584f 100644
--- a/include/standard-headers/linux/virtio_fs.h
+++ b/include/standard-headers/linux/virtio_fs.h
@@ -8,6 +8,9 @@
 #include "standard-headers/linux/virtio_config.h"
 #include "standard-headers/linux/virtio_types.h"
 
+/* Feature bits */
+#define VIRTIO_FS_F_NOTIFICATION	0	/* Notification queue supported */
+
 struct virtio_fs_config {
 	/* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */
 	uint8_t tag[36];
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Virtio-fs] [PATCH 2/4] virtiofd: Create a notification queue
@ 2019-11-15 20:55   ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, vgoyal

Add a notification queue which will be used to send async notifications
for file lock availability.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 contrib/virtiofsd/fuse_i.h                 |   1 +
 contrib/virtiofsd/fuse_virtio.c            | 108 ++++++++++++++++++---
 hw/virtio/vhost-user-fs-pci.c              |   2 +-
 hw/virtio/vhost-user-fs.c                  |  37 +++++--
 include/hw/virtio/vhost-user-fs.h          |   1 +
 include/standard-headers/linux/virtio_fs.h |   3 +
 6 files changed, 130 insertions(+), 22 deletions(-)

diff --git a/contrib/virtiofsd/fuse_i.h b/contrib/virtiofsd/fuse_i.h
index 966b1a3baa..4eeae0bfeb 100644
--- a/contrib/virtiofsd/fuse_i.h
+++ b/contrib/virtiofsd/fuse_i.h
@@ -74,6 +74,7 @@ struct fuse_session {
 	char *vu_socket_lock;
 	struct fv_VuDev *virtio_dev;
 	int thread_pool_size;
+	bool notify_enabled;
 };
 
 struct fuse_chan {
diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 31c8542b6c..411114c9b3 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu/iov.h"
 #include "qapi/error.h"
+#include "standard-headers/linux/virtio_fs.h"
 #include "fuse_i.h"
 #include "fuse_kernel.h"
 #include "fuse_misc.h"
@@ -98,23 +99,31 @@ struct fv_VuDev {
      */
     size_t nqueues;
     struct fv_QueueInfo **qi;
-};
-
-/* From spec */
-struct virtio_fs_config {
-    char tag[36];
-    uint32_t num_queues;
+    /* True if notification queue is being used */
+    bool notify_enabled;
 };
 
 /* Callback from libvhost-user */
 static uint64_t fv_get_features(VuDev *dev)
 {
-    return 1ULL << VIRTIO_F_VERSION_1;
+    uint64_t features;
+
+    features = 1ull << VIRTIO_F_VERSION_1 |
+               1ull << VIRTIO_FS_F_NOTIFICATION;
+
+    return features;
 }
 
 /* Callback from libvhost-user */
 static void fv_set_features(VuDev *dev, uint64_t features)
 {
+    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
+    struct fuse_session *se = vud->se;
+
+    if ((1 << VIRTIO_FS_F_NOTIFICATION) & features) {
+        vud->notify_enabled = true;
+        se->notify_enabled = true;
+    }
 }
 
 /*
@@ -662,6 +671,65 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
     free(req);
 }
 
+static void *fv_queue_notify_thread(void *opaque)
+{
+    struct fv_QueueInfo *qi = opaque;
+
+    fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
+             qi->qidx, qi->kick_fd);
+
+    while (1) {
+        struct pollfd pf[2];
+
+        pf[0].fd = qi->kick_fd;
+        pf[0].events = POLLIN;
+        pf[0].revents = 0;
+        pf[1].fd = qi->kill_fd;
+        pf[1].events = POLLIN;
+        pf[1].revents = 0;
+
+        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
+                 qi->qidx);
+        int poll_res = ppoll(pf, 2, NULL, NULL);
+
+        if (poll_res == -1) {
+            if (errno == EINTR) {
+                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
+                         __func__);
+                continue;
+            }
+            fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
+            break;
+        }
+        assert(poll_res >= 1);
+        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
+            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
+                     __func__, pf[0].revents, qi->qidx);
+             break;
+        }
+        if (pf[1].revents & (POLLERR | POLLHUP | POLLNVAL)) {
+            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d"
+                     "killfd\n", __func__, pf[1].revents, qi->qidx);
+            break;
+        }
+        if (pf[1].revents) {
+            fuse_log(FUSE_LOG_INFO, "%s: kill event on queue %d - quitting\n",
+                     __func__, qi->qidx);
+            break;
+        }
+        assert(pf[0].revents & POLLIN);
+        fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
+                 qi->qidx);
+
+        eventfd_t evalue;
+        if (eventfd_read(qi->kick_fd, &evalue)) {
+            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
+            break;
+        }
+    }
+    return NULL;
+}
+
 /* Thread function for individual queues, created when a queue is 'started' */
 static void *fv_queue_thread(void *opaque)
 {
@@ -771,6 +839,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
 {
     struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
     struct fv_QueueInfo *ourqi;
+    void * (*thread_func) (void *) = fv_queue_thread;
+    int valid_queues = 2; /* One hiprio queue and one request queue */
 
     fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
              started);
@@ -782,10 +852,12 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
      * well-behaved client in mind and may not protect against all types of
      * races yet.
      */
-    if (qidx > 1) {
-        fuse_log(FUSE_LOG_ERR,
-                 "%s: multiple request queues not yet implemented, please only "
-                 "configure 1 request queue\n",
+    if (vud->notify_enabled)
+        valid_queues++;
+
+    if (qidx >= valid_queues) {
+        fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
+                 "implemented, please only configure 1 request queue\n",
                  __func__);
         exit(EXIT_FAILURE);
     }
@@ -813,9 +885,17 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
 
         ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
         assert(ourqi->kill_fd != -1);
-        pthread_mutex_init(&ourqi->vq_lock, NULL);
+        /*
+         * First queue (idx = 0)  is hiprio queue. Second queue is
+         * notification queue (if enabled). And rest are request
+         * queues.
+         */
+        if (vud->notify_enabled && qidx == 1) {
+            thread_func = fv_queue_notify_thread;
+        }
 
-        if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
+        pthread_mutex_init(&ourqi->vq_lock, NULL);
+        if (pthread_create(&ourqi->thread, NULL, thread_func, ourqi)) {
             fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
                      __func__, qidx);
             assert(0);
@@ -1040,7 +1120,7 @@ int virtio_session_mount(struct fuse_session *se)
     se->virtio_dev = calloc(sizeof(struct fv_VuDev), 1);
     se->virtio_dev->se = se;
     pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
-    vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
+    vu_init(&se->virtio_dev->dev, 3, se->vu_socketfd, fv_panic, fv_set_watch,
             fv_remove_watch, &fv_iface);
 
     return 0;
diff --git a/hw/virtio/vhost-user-fs-pci.c b/hw/virtio/vhost-user-fs-pci.c
index 0f3c3c8711..95f9fe5c5c 100644
--- a/hw/virtio/vhost-user-fs-pci.c
+++ b/hw/virtio/vhost-user-fs-pci.c
@@ -44,7 +44,7 @@ static void vhost_user_fs_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     uint64_t totalsize;
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 1;
+        vpci_dev->nvectors = dev->vdev.conf.num_request_queues + 2;
     }
 
     qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 455e97beea..5555fe9dbe 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,10 @@
 #include "exec/address-spaces.h"
 #include "trace.h"
 
+static const int user_feature_bits[] = {
+    VIRTIO_FS_F_NOTIFICATION,
+};
+
 uint64_t vhost_user_fs_slave_map(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
                                  int fd)
 {
@@ -378,12 +382,23 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
     }
 }
 
-static uint64_t vuf_get_features(VirtIODevice *vdev,
-                                      uint64_t requested_features,
-                                      Error **errp)
+static uint64_t vuf_get_features(VirtIODevice *vdev, uint64_t features,
+                                 Error **errp)
 {
-    /* No feature bits used yet */
-    return requested_features;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
+
+    return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
+}
+
+static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
+{
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
+        fs->notify_enabled = true;
+    }
 }
 
 static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
@@ -515,13 +530,20 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
     /* Hiprio queue */
     virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
 
+    /* Notification queue. Feature negotiation happens later. So at this
+     * point of time we don't know if driver will use notification queue
+     * or not.
+     */
+    virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
+
     /* Request queues */
     for (i = 0; i < fs->conf.num_request_queues; i++) {
         virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
     }
 
-    /* 1 high prio queue, plus the number configured */
-    fs->vhost_dev.nvqs = 1 + fs->conf.num_request_queues;
+    /* 1 high prio queue, 1 notification queue plus the number configured */
+    fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
+
     fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0);
@@ -584,6 +606,7 @@ static void vuf_class_init(ObjectClass *klass, void *data)
     vdc->realize = vuf_device_realize;
     vdc->unrealize = vuf_device_unrealize;
     vdc->get_features = vuf_get_features;
+    vdc->set_features = vuf_set_features;
     vdc->get_config = vuf_get_config;
     vdc->set_status = vuf_set_status;
     vdc->guest_notifier_mask = vuf_guest_notifier_mask;
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index 4e7be1f312..bd47e0da98 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -64,6 +64,7 @@ typedef struct {
     /* Metadata version table */
     size_t mdvt_size;
     MemoryRegion mdvt;
+    bool notify_enabled;
 } VHostUserFS;
 
 /* Callbacks from the vhost-user code for slave commands */
diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
index 310210b7b6..9ee95f584f 100644
--- a/include/standard-headers/linux/virtio_fs.h
+++ b/include/standard-headers/linux/virtio_fs.h
@@ -8,6 +8,9 @@
 #include "standard-headers/linux/virtio_config.h"
 #include "standard-headers/linux/virtio_types.h"
 
+/* Feature bits */
+#define VIRTIO_FS_F_NOTIFICATION	0	/* Notification queue supported */
+
 struct virtio_fs_config {
 	/* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */
 	uint8_t tag[36];
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/4] virtiofsd: Specify size of notification buffer using config space
  2019-11-15 20:55 ` [Virtio-fs] " Vivek Goyal
@ 2019-11-15 20:55   ` Vivek Goyal
  -1 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, stefanha, vgoyal, dgilbert

Daemon specifies size of notification buffer needed and that should be done
using config space.

Only ->notify_buf_size value of config space comes from daemon. Rest of
it is filled by qemu device emulation code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 contrib/virtiofsd/fuse_virtio.c            | 26 +++++++++++++++++++++-
 hw/virtio/vhost-user-fs.c                  | 26 ++++++++++++++++++++++
 include/hw/virtio/vhost-user-fs.h          |  2 ++
 include/standard-headers/linux/virtio_fs.h |  2 ++
 4 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 411114c9b3..982b6ad0bd 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -109,7 +109,8 @@ static uint64_t fv_get_features(VuDev *dev)
     uint64_t features;
 
     features = 1ull << VIRTIO_F_VERSION_1 |
-               1ull << VIRTIO_FS_F_NOTIFICATION;
+               1ull << VIRTIO_FS_F_NOTIFICATION |
+               1ull << VHOST_USER_F_PROTOCOL_FEATURES;
 
     return features;
 }
@@ -927,6 +928,27 @@ static bool fv_queue_order(VuDev *dev, int qidx)
     return false;
 }
 
+static uint64_t fv_get_protocol_features(VuDev *dev)
+{
+	return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+}
+
+static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
+{
+	struct virtio_fs_config fscfg = {};
+
+	fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
+                 sizeof(struct fuse_notify_lock_out));
+	/*
+	 * As of now only notification related to lock is supported. As more
+	 * notification types are supported, bump up the size accordingly.
+	 */
+	fscfg.notify_buf_size = sizeof(struct fuse_notify_lock_out);
+
+	memcpy(config, &fscfg, len);
+	return 0;
+}
+
 static const VuDevIface fv_iface = {
     .get_features = fv_get_features,
     .set_features = fv_set_features,
@@ -935,6 +957,8 @@ static const VuDevIface fv_iface = {
     .queue_set_started = fv_queue_set_started,
 
     .queue_is_processed_in_order = fv_queue_order,
+    .get_protocol_features = fv_get_protocol_features,
+    .get_config = fv_get_config,
 };
 
 /*
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 5555fe9dbe..8dd9b1496f 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -277,16 +277,40 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
     return (uint64_t)done;
 }
 
+static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
+{
+    return 0;
+}
+
+const VhostDevConfigOps fs_ops = {
+    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
+};
 
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
     struct virtio_fs_config fscfg = {};
+    int ret;
+
+    /*
+     * As of now we only get notification buffer size from device. And that's
+     * needed only if notification queue is enabled.
+     */
+    if (fs->notify_enabled) {
+        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
+                                   sizeof(struct virtio_fs_config));
+	if (ret < 0) {
+            error_report("vhost-user-fs: get device config space failed."
+                         " ret=%d\n", ret);
+            return;
+        }
+    }
 
     memcpy((char *)fscfg.tag, fs->conf.tag,
            MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
 
     virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
+    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
 
     memcpy(config, &fscfg, sizeof(fscfg));
 }
@@ -545,6 +569,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
     fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
 
     fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
+
+    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index bd47e0da98..f667cc4b5a 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -14,6 +14,7 @@
 #ifndef _QEMU_VHOST_USER_FS_H
 #define _QEMU_VHOST_USER_FS_H
 
+#include "standard-headers/linux/virtio_fs.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
@@ -58,6 +59,7 @@ typedef struct {
     struct vhost_virtqueue *vhost_vqs;
     struct vhost_dev vhost_dev;
     VhostUserState vhost_user;
+    struct virtio_fs_config fscfg;
 
     /*< public >*/
     MemoryRegion cache;
diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
index 9ee95f584f..719216a262 100644
--- a/include/standard-headers/linux/virtio_fs.h
+++ b/include/standard-headers/linux/virtio_fs.h
@@ -17,6 +17,8 @@ struct virtio_fs_config {
 
 	/* Number of request queues */
 	uint32_t num_request_queues;
+	/* Size of notification buffer */
+	uint32_t notify_buf_size;
 } QEMU_PACKED;
 
 #define VIRTIO_FS_PCI_CACHE_BAR 2
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Virtio-fs] [PATCH 3/4] virtiofsd: Specify size of notification buffer using config space
@ 2019-11-15 20:55   ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, vgoyal

Daemon specifies size of notification buffer needed and that should be done
using config space.

Only ->notify_buf_size value of config space comes from daemon. Rest of
it is filled by qemu device emulation code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 contrib/virtiofsd/fuse_virtio.c            | 26 +++++++++++++++++++++-
 hw/virtio/vhost-user-fs.c                  | 26 ++++++++++++++++++++++
 include/hw/virtio/vhost-user-fs.h          |  2 ++
 include/standard-headers/linux/virtio_fs.h |  2 ++
 4 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 411114c9b3..982b6ad0bd 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -109,7 +109,8 @@ static uint64_t fv_get_features(VuDev *dev)
     uint64_t features;
 
     features = 1ull << VIRTIO_F_VERSION_1 |
-               1ull << VIRTIO_FS_F_NOTIFICATION;
+               1ull << VIRTIO_FS_F_NOTIFICATION |
+               1ull << VHOST_USER_F_PROTOCOL_FEATURES;
 
     return features;
 }
@@ -927,6 +928,27 @@ static bool fv_queue_order(VuDev *dev, int qidx)
     return false;
 }
 
+static uint64_t fv_get_protocol_features(VuDev *dev)
+{
+	return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+}
+
+static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
+{
+	struct virtio_fs_config fscfg = {};
+
+	fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
+                 sizeof(struct fuse_notify_lock_out));
+	/*
+	 * As of now only notification related to lock is supported. As more
+	 * notification types are supported, bump up the size accordingly.
+	 */
+	fscfg.notify_buf_size = sizeof(struct fuse_notify_lock_out);
+
+	memcpy(config, &fscfg, len);
+	return 0;
+}
+
 static const VuDevIface fv_iface = {
     .get_features = fv_get_features,
     .set_features = fv_set_features,
@@ -935,6 +957,8 @@ static const VuDevIface fv_iface = {
     .queue_set_started = fv_queue_set_started,
 
     .queue_is_processed_in_order = fv_queue_order,
+    .get_protocol_features = fv_get_protocol_features,
+    .get_config = fv_get_config,
 };
 
 /*
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 5555fe9dbe..8dd9b1496f 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -277,16 +277,40 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, VhostUserFSSlaveMsg *sm,
     return (uint64_t)done;
 }
 
+static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
+{
+    return 0;
+}
+
+const VhostDevConfigOps fs_ops = {
+    .vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
+};
 
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VHostUserFS *fs = VHOST_USER_FS(vdev);
     struct virtio_fs_config fscfg = {};
+    int ret;
+
+    /*
+     * As of now we only get notification buffer size from device. And that's
+     * needed only if notification queue is enabled.
+     */
+    if (fs->notify_enabled) {
+        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
+                                   sizeof(struct virtio_fs_config));
+	if (ret < 0) {
+            error_report("vhost-user-fs: get device config space failed."
+                         " ret=%d\n", ret);
+            return;
+        }
+    }
 
     memcpy((char *)fscfg.tag, fs->conf.tag,
            MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
 
     virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
+    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
 
     memcpy(config, &fscfg, sizeof(fscfg));
 }
@@ -545,6 +569,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
     fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
 
     fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
+
+    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
     ret = vhost_dev_init(&fs->vhost_dev, &fs->vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h
index bd47e0da98..f667cc4b5a 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -14,6 +14,7 @@
 #ifndef _QEMU_VHOST_USER_FS_H
 #define _QEMU_VHOST_USER_FS_H
 
+#include "standard-headers/linux/virtio_fs.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
@@ -58,6 +59,7 @@ typedef struct {
     struct vhost_virtqueue *vhost_vqs;
     struct vhost_dev vhost_dev;
     VhostUserState vhost_user;
+    struct virtio_fs_config fscfg;
 
     /*< public >*/
     MemoryRegion cache;
diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h
index 9ee95f584f..719216a262 100644
--- a/include/standard-headers/linux/virtio_fs.h
+++ b/include/standard-headers/linux/virtio_fs.h
@@ -17,6 +17,8 @@ struct virtio_fs_config {
 
 	/* Number of request queues */
 	uint32_t num_request_queues;
+	/* Size of notification buffer */
+	uint32_t notify_buf_size;
 } QEMU_PACKED;
 
 #define VIRTIO_FS_PCI_CACHE_BAR 2
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/4] virtiofsd: Implement blocking posix locks
  2019-11-15 20:55 ` [Virtio-fs] " Vivek Goyal
@ 2019-11-15 20:55   ` Vivek Goyal
  -1 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, stefanha, vgoyal, dgilbert

As of now we don't support fcntl(F_SETLKW) and if we see one, we return
-EOPNOTSUPP.

Change that by accepting these requests and returning a reply immediately
asking caller to wait. Once lock is available, send a notification to
the waiter indicating lock is available.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 contrib/virtiofsd/fuse_kernel.h    |  7 +++
 contrib/virtiofsd/fuse_lowlevel.c  | 23 +++++++-
 contrib/virtiofsd/fuse_lowlevel.h  | 25 ++++++++
 contrib/virtiofsd/fuse_virtio.c    | 94 ++++++++++++++++++++++++++++--
 contrib/virtiofsd/passthrough_ll.c | 49 +++++++++++++---
 5 files changed, 182 insertions(+), 16 deletions(-)

diff --git a/contrib/virtiofsd/fuse_kernel.h b/contrib/virtiofsd/fuse_kernel.h
index 2bdc8b1c88..d4d65c5414 100644
--- a/contrib/virtiofsd/fuse_kernel.h
+++ b/contrib/virtiofsd/fuse_kernel.h
@@ -444,6 +444,7 @@ enum fuse_notify_code {
 	FUSE_NOTIFY_STORE = 4,
 	FUSE_NOTIFY_RETRIEVE = 5,
 	FUSE_NOTIFY_DELETE = 6,
+	FUSE_NOTIFY_LOCK = 7,
 	FUSE_NOTIFY_CODE_MAX,
 };
 
@@ -836,6 +837,12 @@ struct fuse_notify_retrieve_in {
 	uint64_t	dummy4;
 };
 
+struct fuse_notify_lock_out {
+	uint64_t	id;
+	int32_t		error;
+	int32_t		padding;
+};
+
 /* Device ioctls: */
 #define FUSE_DEV_IOC_CLONE	_IOR(229, 0, uint32_t)
 
diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
index d4a42d9804..f706e440bf 100644
--- a/contrib/virtiofsd/fuse_lowlevel.c
+++ b/contrib/virtiofsd/fuse_lowlevel.c
@@ -183,7 +183,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
 {
 	struct fuse_out_header out;
 
-	if (error <= -1000 || error > 0) {
+	/* error = 1 has been used to signal client to wait for notificaiton */
+	if (error <= -1000 || error > 1) {
 		fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",	error);
 		error = -ERANGE;
 	}
@@ -291,6 +292,12 @@ int fuse_reply_err(fuse_req_t req, int err)
 	return send_reply(req, -err, NULL, 0);
 }
 
+int fuse_reply_wait(fuse_req_t req)
+{
+	/* TODO: This is a hack. Fix it */
+	return send_reply(req, 1, NULL, 0);
+}
+
 void fuse_reply_none(fuse_req_t req)
 {
 	fuse_free_req(req);
@@ -2207,6 +2214,20 @@ static int send_notify_iov(struct fuse_session *se, int notify_code,
 	return fuse_send_msg(se, NULL, iov, count);
 }
 
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
+			      int32_t error)
+{
+	struct fuse_notify_lock_out outarg;
+	struct iovec iov[2];
+
+	outarg.id = req_id;
+	outarg.error = -error;
+
+	iov[1].iov_base = &outarg;
+	iov[1].iov_len = sizeof(outarg);
+	return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
+}
+
 int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph)
 {
 	if (ph != NULL) {
diff --git a/contrib/virtiofsd/fuse_lowlevel.h b/contrib/virtiofsd/fuse_lowlevel.h
index e664d2d12d..f0a94683b5 100644
--- a/contrib/virtiofsd/fuse_lowlevel.h
+++ b/contrib/virtiofsd/fuse_lowlevel.h
@@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
  */
 int fuse_reply_err(fuse_req_t req, int err);
 
+/**
+ * Ask caller to wait for lock.
+ *
+ * Possible requests:
+ *   setlkw
+ *
+ * If caller sends a blocking lock request (setlkw), then reply to caller
+ * that wait for lock to be available. Once lock is available caller will
+ * receive a notification with request's unique id. Notification will
+ * carry info whether lock was successfully obtained or not.
+ *
+ * @param req request handle
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_wait(fuse_req_t req);
+
 /**
  * Don't send reply
  *
@@ -1704,6 +1720,15 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
 			       off_t offset, struct fuse_bufvec *bufv,
 			       enum fuse_buf_copy_flags flags);
+/**
+ * Notify event related to previous lock request
+ *
+ * @param se the session object
+ * @param req_id the id of the request which requested setlkw
+ * @param error zero for success, -errno for the failure
+ */
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
+			      int32_t error);
 
 /* ----------------------------------------------------------- *
  * Utility functions					       *
diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 982b6ad0bd..98d27e7642 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -215,6 +215,81 @@ static void copy_iov(struct iovec *src_iov, int src_count,
     }
 }
 
+static int virtio_send_notify_msg(struct fuse_session *se, struct iovec *iov,
+				  int count)
+{
+    struct fv_QueueInfo *qi;
+    VuDev *dev = &se->virtio_dev->dev;
+    VuVirtq *q;
+    FVRequest *req;
+    VuVirtqElement *elem;
+    unsigned int in_num, bad_in_num = 0, bad_out_num = 0;
+    struct fuse_out_header *out = iov[0].iov_base;
+    size_t in_len, tosend_len = iov_size(iov, count);
+    struct iovec *in_sg;
+    int ret = 0;
+
+    /* Notifications have unique == 0 */
+    assert (!out->unique);
+
+    if (!se->notify_enabled)
+        return -EOPNOTSUPP;
+
+    /* If notifications are enabled, queue index 1 is notification queue */
+    qi = se->virtio_dev->qi[1];
+    q = vu_get_queue(dev, qi->qidx);
+
+    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+    pthread_mutex_lock(&qi->vq_lock);
+    /* Pop an element from queue */
+    req = vu_queue_pop(dev, q, sizeof(FVRequest), &bad_in_num, &bad_out_num);
+    if (!req) {
+        /* TODO: Implement some sort of ring buffer and queue notifications
+	 * on that and send these later when notification queue has space
+	 * available.
+	 */
+        return -ENOSPC;
+    }
+    pthread_mutex_unlock(&qi->vq_lock);
+    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+
+    out->len = tosend_len;
+    elem = &req->elem;
+    in_num = elem->in_num;
+    in_sg = elem->in_sg;
+    in_len = iov_size(in_sg, in_num);
+    fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
+             __func__, elem->index, in_num,  in_len);
+
+    if (in_len < sizeof(struct fuse_out_header)) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
+                 __func__, elem->index);
+        ret = -E2BIG;
+        goto out;
+    }
+
+    if (in_len < tosend_len) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len"
+                 " %zd\n", __func__, elem->index, tosend_len);
+        ret = -E2BIG;
+        goto out;
+    }
+
+    /* First copy the header data from iov->in_sg */
+    copy_iov(iov, count, in_sg, in_num, tosend_len);
+
+    /* TODO: Add bad_innum handling */
+    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+    pthread_mutex_lock(&qi->vq_lock);
+    vu_queue_push(dev, q, elem, tosend_len);
+    vu_queue_notify(dev, q);
+    pthread_mutex_unlock(&qi->vq_lock);
+    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+out:
+    free(req);
+    return ret;
+}
+
 /*
  * Called back by ll whenever it wants to send a reply/message back
  * The 1st element of the iov starts with the fuse_out_header
@@ -223,11 +298,11 @@ static void copy_iov(struct iovec *src_iov, int src_count,
 int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
                     struct iovec *iov, int count)
 {
-    FVRequest *req = container_of(ch, FVRequest, ch);
-    struct fv_QueueInfo *qi = ch->qi;
+    FVRequest *req;
+    struct fv_QueueInfo *qi;
     VuDev *dev = &se->virtio_dev->dev;
-    VuVirtq *q = vu_get_queue(dev, qi->qidx);
-    VuVirtqElement *elem = &req->elem;
+    VuVirtq *q;
+    VuVirtqElement *elem;
     int ret = 0;
 
     assert(count >= 1);
@@ -238,8 +313,15 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 
     size_t tosend_len = iov_size(iov, count);
 
-    /* unique == 0 is notification, which we don't support */
-    assert(out->unique);
+    /* unique == 0 is notification */
+    if (!out->unique)
+        return virtio_send_notify_msg(se, iov, count);
+
+    assert(ch);
+    req = container_of(ch, FVRequest, ch);
+    elem = &req->elem;
+    qi = ch->qi;
+    q = vu_get_queue(dev, qi->qidx);
     assert(!req->reply_sent);
 
     /* The 'in' part of the elem is to qemu */
diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 028e7da273..ed52953565 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -1925,7 +1925,10 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
 	struct lo_data *lo = lo_data(req);
 	struct lo_inode *inode;
 	struct lo_inode_plock *plock;
-	int ret, saverr = 0;
+	int ret, saverr = 0, ofd;
+	uint64_t unique;
+	struct fuse_session *se = req->se;
+	bool async_lock = false;
 
 	fuse_log(FUSE_LOG_DEBUG, "lo_setlk(ino=%" PRIu64 ", flags=%d)"
 		 " cmd=%d pid=%d owner=0x%lx sleep=%d l_whence=%d"
@@ -1933,11 +1936,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
 		 lock->l_type, lock->l_pid, fi->lock_owner, sleep,
 		 lock->l_whence, lock->l_start, lock->l_len);
 
-	if (sleep) {
-		fuse_reply_err(req, EOPNOTSUPP);
-		return;
-	}
-
 	inode = lo_inode(req, ino);
 	if (!inode) {
 		fuse_reply_err(req, EBADF);
@@ -1950,21 +1948,54 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
 
 	if (!plock) {
 		saverr = ret;
+		pthread_mutex_unlock(&inode->plock_mutex);
 		goto out;
 	}
 
+	/*
+	 * plock is now released when inode is going away. We already have
+	 * a reference on inode, so it is guaranteed that plock->fd is
+	 * still around even after dropping inode->plock_mutex lock
+	 */
+	ofd = plock->fd;
+	pthread_mutex_unlock(&inode->plock_mutex);
+
+	/*
+	 * If this lock request can block, request caller to wait for
+	 * notification. Do not access req after this. Once lock is
+	 * available, send a notification instead.
+	 */
+	if (sleep && lock->l_type != F_UNLCK) {
+		/*
+		 * If notification queue is not enabled, can't support async
+		 * locks.
+		 */
+		if (!se->notify_enabled) {
+			saverr = EOPNOTSUPP;
+			goto out;
+		}
+		async_lock = true;
+		unique = req->unique;
+		fuse_reply_wait(req);
+	}
 	/* TODO: Is it alright to modify flock? */
 	lock->l_pid = 0;
-	ret = fcntl(plock->fd, F_OFD_SETLK, lock);
+	if (async_lock)
+		ret = fcntl(ofd, F_OFD_SETLKW, lock);
+	else
+		ret = fcntl(ofd, F_OFD_SETLK, lock);
 	if (ret == -1) {
 		saverr = errno;
 	}
 
 out:
-	pthread_mutex_unlock(&inode->plock_mutex);
 	lo_inode_put(lo, &inode);
 
-	fuse_reply_err(req, saverr);
+	if (!async_lock)
+		fuse_reply_err(req, saverr);
+	else {
+		fuse_lowlevel_notify_lock(se, unique, saverr);
+	}
 }
 
 static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Virtio-fs] [PATCH 4/4] virtiofsd: Implement blocking posix locks
@ 2019-11-15 20:55   ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-15 20:55 UTC (permalink / raw)
  To: virtio-fs, qemu-devel; +Cc: miklos, vgoyal

As of now we don't support fcntl(F_SETLKW) and if we see one, we return
-EOPNOTSUPP.

Change that by accepting these requests and returning a reply immediately
asking caller to wait. Once lock is available, send a notification to
the waiter indicating lock is available.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 contrib/virtiofsd/fuse_kernel.h    |  7 +++
 contrib/virtiofsd/fuse_lowlevel.c  | 23 +++++++-
 contrib/virtiofsd/fuse_lowlevel.h  | 25 ++++++++
 contrib/virtiofsd/fuse_virtio.c    | 94 ++++++++++++++++++++++++++++--
 contrib/virtiofsd/passthrough_ll.c | 49 +++++++++++++---
 5 files changed, 182 insertions(+), 16 deletions(-)

diff --git a/contrib/virtiofsd/fuse_kernel.h b/contrib/virtiofsd/fuse_kernel.h
index 2bdc8b1c88..d4d65c5414 100644
--- a/contrib/virtiofsd/fuse_kernel.h
+++ b/contrib/virtiofsd/fuse_kernel.h
@@ -444,6 +444,7 @@ enum fuse_notify_code {
 	FUSE_NOTIFY_STORE = 4,
 	FUSE_NOTIFY_RETRIEVE = 5,
 	FUSE_NOTIFY_DELETE = 6,
+	FUSE_NOTIFY_LOCK = 7,
 	FUSE_NOTIFY_CODE_MAX,
 };
 
@@ -836,6 +837,12 @@ struct fuse_notify_retrieve_in {
 	uint64_t	dummy4;
 };
 
+struct fuse_notify_lock_out {
+	uint64_t	id;
+	int32_t		error;
+	int32_t		padding;
+};
+
 /* Device ioctls: */
 #define FUSE_DEV_IOC_CLONE	_IOR(229, 0, uint32_t)
 
diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
index d4a42d9804..f706e440bf 100644
--- a/contrib/virtiofsd/fuse_lowlevel.c
+++ b/contrib/virtiofsd/fuse_lowlevel.c
@@ -183,7 +183,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
 {
 	struct fuse_out_header out;
 
-	if (error <= -1000 || error > 0) {
+	/* error = 1 has been used to signal client to wait for notificaiton */
+	if (error <= -1000 || error > 1) {
 		fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",	error);
 		error = -ERANGE;
 	}
@@ -291,6 +292,12 @@ int fuse_reply_err(fuse_req_t req, int err)
 	return send_reply(req, -err, NULL, 0);
 }
 
+int fuse_reply_wait(fuse_req_t req)
+{
+	/* TODO: This is a hack. Fix it */
+	return send_reply(req, 1, NULL, 0);
+}
+
 void fuse_reply_none(fuse_req_t req)
 {
 	fuse_free_req(req);
@@ -2207,6 +2214,20 @@ static int send_notify_iov(struct fuse_session *se, int notify_code,
 	return fuse_send_msg(se, NULL, iov, count);
 }
 
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
+			      int32_t error)
+{
+	struct fuse_notify_lock_out outarg;
+	struct iovec iov[2];
+
+	outarg.id = req_id;
+	outarg.error = -error;
+
+	iov[1].iov_base = &outarg;
+	iov[1].iov_len = sizeof(outarg);
+	return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
+}
+
 int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph)
 {
 	if (ph != NULL) {
diff --git a/contrib/virtiofsd/fuse_lowlevel.h b/contrib/virtiofsd/fuse_lowlevel.h
index e664d2d12d..f0a94683b5 100644
--- a/contrib/virtiofsd/fuse_lowlevel.h
+++ b/contrib/virtiofsd/fuse_lowlevel.h
@@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
  */
 int fuse_reply_err(fuse_req_t req, int err);
 
+/**
+ * Ask caller to wait for lock.
+ *
+ * Possible requests:
+ *   setlkw
+ *
+ * If caller sends a blocking lock request (setlkw), then reply to caller
+ * that wait for lock to be available. Once lock is available caller will
+ * receive a notification with request's unique id. Notification will
+ * carry info whether lock was successfully obtained or not.
+ *
+ * @param req request handle
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_wait(fuse_req_t req);
+
 /**
  * Don't send reply
  *
@@ -1704,6 +1720,15 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
 			       off_t offset, struct fuse_bufvec *bufv,
 			       enum fuse_buf_copy_flags flags);
+/**
+ * Notify event related to previous lock request
+ *
+ * @param se the session object
+ * @param req_id the id of the request which requested setlkw
+ * @param error zero for success, -errno for the failure
+ */
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
+			      int32_t error);
 
 /* ----------------------------------------------------------- *
  * Utility functions					       *
diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 982b6ad0bd..98d27e7642 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -215,6 +215,81 @@ static void copy_iov(struct iovec *src_iov, int src_count,
     }
 }
 
+static int virtio_send_notify_msg(struct fuse_session *se, struct iovec *iov,
+				  int count)
+{
+    struct fv_QueueInfo *qi;
+    VuDev *dev = &se->virtio_dev->dev;
+    VuVirtq *q;
+    FVRequest *req;
+    VuVirtqElement *elem;
+    unsigned int in_num, bad_in_num = 0, bad_out_num = 0;
+    struct fuse_out_header *out = iov[0].iov_base;
+    size_t in_len, tosend_len = iov_size(iov, count);
+    struct iovec *in_sg;
+    int ret = 0;
+
+    /* Notifications have unique == 0 */
+    assert (!out->unique);
+
+    if (!se->notify_enabled)
+        return -EOPNOTSUPP;
+
+    /* If notifications are enabled, queue index 1 is notification queue */
+    qi = se->virtio_dev->qi[1];
+    q = vu_get_queue(dev, qi->qidx);
+
+    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+    pthread_mutex_lock(&qi->vq_lock);
+    /* Pop an element from queue */
+    req = vu_queue_pop(dev, q, sizeof(FVRequest), &bad_in_num, &bad_out_num);
+    if (!req) {
+        /* TODO: Implement some sort of ring buffer and queue notifications
+	 * on that and send these later when notification queue has space
+	 * available.
+	 */
+        return -ENOSPC;
+    }
+    pthread_mutex_unlock(&qi->vq_lock);
+    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+
+    out->len = tosend_len;
+    elem = &req->elem;
+    in_num = elem->in_num;
+    in_sg = elem->in_sg;
+    in_len = iov_size(in_sg, in_num);
+    fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
+             __func__, elem->index, in_num,  in_len);
+
+    if (in_len < sizeof(struct fuse_out_header)) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
+                 __func__, elem->index);
+        ret = -E2BIG;
+        goto out;
+    }
+
+    if (in_len < tosend_len) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len"
+                 " %zd\n", __func__, elem->index, tosend_len);
+        ret = -E2BIG;
+        goto out;
+    }
+
+    /* First copy the header data from iov->in_sg */
+    copy_iov(iov, count, in_sg, in_num, tosend_len);
+
+    /* TODO: Add bad_innum handling */
+    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+    pthread_mutex_lock(&qi->vq_lock);
+    vu_queue_push(dev, q, elem, tosend_len);
+    vu_queue_notify(dev, q);
+    pthread_mutex_unlock(&qi->vq_lock);
+    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+out:
+    free(req);
+    return ret;
+}
+
 /*
  * Called back by ll whenever it wants to send a reply/message back
  * The 1st element of the iov starts with the fuse_out_header
@@ -223,11 +298,11 @@ static void copy_iov(struct iovec *src_iov, int src_count,
 int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
                     struct iovec *iov, int count)
 {
-    FVRequest *req = container_of(ch, FVRequest, ch);
-    struct fv_QueueInfo *qi = ch->qi;
+    FVRequest *req;
+    struct fv_QueueInfo *qi;
     VuDev *dev = &se->virtio_dev->dev;
-    VuVirtq *q = vu_get_queue(dev, qi->qidx);
-    VuVirtqElement *elem = &req->elem;
+    VuVirtq *q;
+    VuVirtqElement *elem;
     int ret = 0;
 
     assert(count >= 1);
@@ -238,8 +313,15 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 
     size_t tosend_len = iov_size(iov, count);
 
-    /* unique == 0 is notification, which we don't support */
-    assert(out->unique);
+    /* unique == 0 is notification */
+    if (!out->unique)
+        return virtio_send_notify_msg(se, iov, count);
+
+    assert(ch);
+    req = container_of(ch, FVRequest, ch);
+    elem = &req->elem;
+    qi = ch->qi;
+    q = vu_get_queue(dev, qi->qidx);
     assert(!req->reply_sent);
 
     /* The 'in' part of the elem is to qemu */
diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 028e7da273..ed52953565 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -1925,7 +1925,10 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
 	struct lo_data *lo = lo_data(req);
 	struct lo_inode *inode;
 	struct lo_inode_plock *plock;
-	int ret, saverr = 0;
+	int ret, saverr = 0, ofd;
+	uint64_t unique;
+	struct fuse_session *se = req->se;
+	bool async_lock = false;
 
 	fuse_log(FUSE_LOG_DEBUG, "lo_setlk(ino=%" PRIu64 ", flags=%d)"
 		 " cmd=%d pid=%d owner=0x%lx sleep=%d l_whence=%d"
@@ -1933,11 +1936,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
 		 lock->l_type, lock->l_pid, fi->lock_owner, sleep,
 		 lock->l_whence, lock->l_start, lock->l_len);
 
-	if (sleep) {
-		fuse_reply_err(req, EOPNOTSUPP);
-		return;
-	}
-
 	inode = lo_inode(req, ino);
 	if (!inode) {
 		fuse_reply_err(req, EBADF);
@@ -1950,21 +1948,54 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
 
 	if (!plock) {
 		saverr = ret;
+		pthread_mutex_unlock(&inode->plock_mutex);
 		goto out;
 	}
 
+	/*
+	 * plock is now released when inode is going away. We already have
+	 * a reference on inode, so it is guaranteed that plock->fd is
+	 * still around even after dropping inode->plock_mutex lock
+	 */
+	ofd = plock->fd;
+	pthread_mutex_unlock(&inode->plock_mutex);
+
+	/*
+	 * If this lock request can block, request caller to wait for
+	 * notification. Do not access req after this. Once lock is
+	 * available, send a notification instead.
+	 */
+	if (sleep && lock->l_type != F_UNLCK) {
+		/*
+		 * If notification queue is not enabled, can't support async
+		 * locks.
+		 */
+		if (!se->notify_enabled) {
+			saverr = EOPNOTSUPP;
+			goto out;
+		}
+		async_lock = true;
+		unique = req->unique;
+		fuse_reply_wait(req);
+	}
 	/* TODO: Is it alright to modify flock? */
 	lock->l_pid = 0;
-	ret = fcntl(plock->fd, F_OFD_SETLK, lock);
+	if (async_lock)
+		ret = fcntl(ofd, F_OFD_SETLKW, lock);
+	else
+		ret = fcntl(ofd, F_OFD_SETLK, lock);
 	if (ret == -1) {
 		saverr = errno;
 	}
 
 out:
-	pthread_mutex_unlock(&inode->plock_mutex);
 	lo_inode_put(lo, &inode);
 
-	fuse_reply_err(req, saverr);
+	if (!async_lock)
+		fuse_reply_err(req, saverr);
+	else {
+		fuse_lowlevel_notify_lock(se, unique, saverr);
+	}
 }
 
 static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] virtiofsd: Release file locks using F_UNLCK
  2019-11-15 20:55   ` [Virtio-fs] " Vivek Goyal
@ 2019-11-22 10:07     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-11-22 10:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, miklos, qemu-devel, dgilbert

[-- Attachment #1: Type: text/plain, Size: 841 bytes --]

On Fri, Nov 15, 2019 at 03:55:40PM -0500, Vivek Goyal wrote:
> diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
> index bc214df0c7..028e7da273 100644
> --- a/contrib/virtiofsd/passthrough_ll.c
> +++ b/contrib/virtiofsd/passthrough_ll.c
> @@ -936,6 +936,14 @@ static void put_shared(struct lo_data *lo, struct lo_inode *inode)
>  	}
>  }
>  
> +static void release_plock(gpointer data)

The name posix_locks_value_destroy() would be clearer because it matches
g_hash_table_new_full() terminology and the function cannot be confused
with a lock acquire/release operation.

This patch conflicts with the cleanups that are currently being made to
virtiofsd:
https://github.com/stefanha/qemu/commit/1e493175feca58a81a2d0cbdac93b92e5425d850#diff-ca2dea995d1e6cdb95c8a47c7cca51ceR773

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 1/4] virtiofsd: Release file locks using F_UNLCK
@ 2019-11-22 10:07     ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-11-22 10:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, miklos, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 841 bytes --]

On Fri, Nov 15, 2019 at 03:55:40PM -0500, Vivek Goyal wrote:
> diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
> index bc214df0c7..028e7da273 100644
> --- a/contrib/virtiofsd/passthrough_ll.c
> +++ b/contrib/virtiofsd/passthrough_ll.c
> @@ -936,6 +936,14 @@ static void put_shared(struct lo_data *lo, struct lo_inode *inode)
>  	}
>  }
>  
> +static void release_plock(gpointer data)

The name posix_locks_value_destroy() would be clearer because it matches
g_hash_table_new_full() terminology and the function cannot be confused
with a lock acquire/release operation.

This patch conflicts with the cleanups that are currently being made to
virtiofsd:
https://github.com/stefanha/qemu/commit/1e493175feca58a81a2d0cbdac93b92e5425d850#diff-ca2dea995d1e6cdb95c8a47c7cca51ceR773

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] virtiofd: Create a notification queue
  2019-11-15 20:55   ` [Virtio-fs] " Vivek Goyal
@ 2019-11-22 10:19     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-11-22 10:19 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, miklos, qemu-devel, dgilbert

[-- Attachment #1: Type: text/plain, Size: 4250 bytes --]

On Fri, Nov 15, 2019 at 03:55:41PM -0500, Vivek Goyal wrote:
>  /* Callback from libvhost-user */
>  static void fv_set_features(VuDev *dev, uint64_t features)
>  {
> +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> +    struct fuse_session *se = vud->se;
> +
> +    if ((1 << VIRTIO_FS_F_NOTIFICATION) & features) {

For consistency 1ull should be used.  That way the reader does not have
to check the bit position to verify that the bitmap isn't truncated at
32 bits.

> +        vud->notify_enabled = true;
> +        se->notify_enabled = true;

Only one copy of this field is needed.  vud has a pointer to se.

> +    }
>  }
>  
>  /*
> @@ -662,6 +671,65 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
>      free(req);
>  }
>  
> +static void *fv_queue_notify_thread(void *opaque)
> +{
> +    struct fv_QueueInfo *qi = opaque;
> +
> +    fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
> +             qi->qidx, qi->kick_fd);
> +
> +    while (1) {
> +        struct pollfd pf[2];
> +
> +        pf[0].fd = qi->kick_fd;
> +        pf[0].events = POLLIN;
> +        pf[0].revents = 0;
> +        pf[1].fd = qi->kill_fd;
> +        pf[1].events = POLLIN;
> +        pf[1].revents = 0;
> +
> +        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
> +                 qi->qidx);
> +        int poll_res = ppoll(pf, 2, NULL, NULL);
> +
> +        if (poll_res == -1) {
> +            if (errno == EINTR) {
> +                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
> +                         __func__);
> +                continue;
> +            }
> +            fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
> +            break;
> +        }
> +        assert(poll_res >= 1);
> +        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
> +                     __func__, pf[0].revents, qi->qidx);
> +             break;
> +        }
> +        if (pf[1].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d"
> +                     "killfd\n", __func__, pf[1].revents, qi->qidx);
> +            break;
> +        }
> +        if (pf[1].revents) {
> +            fuse_log(FUSE_LOG_INFO, "%s: kill event on queue %d - quitting\n",
> +                     __func__, qi->qidx);
> +            break;
> +        }
> +        assert(pf[0].revents & POLLIN);
> +        fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
> +                 qi->qidx);
> +
> +        eventfd_t evalue;
> +        if (eventfd_read(qi->kick_fd, &evalue)) {
> +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> +            break;
> +        }
> +    }
> +    return NULL;
> +}

It's difficult to review function without any actual functionality using
the virtqueue.  I'm not sure a thread is even needed since the device
only needs to get a buffer when it has a notification for the driver.
I'll have to wait for the following patches to see what happens here...

> @@ -378,12 +382,23 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
>      }
>  }
>  
> -static uint64_t vuf_get_features(VirtIODevice *vdev,
> -                                      uint64_t requested_features,
> -                                      Error **errp)
> +static uint64_t vuf_get_features(VirtIODevice *vdev, uint64_t features,
> +                                 Error **errp)
>  {
> -    /* No feature bits used yet */
> -    return requested_features;
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +
> +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> +
> +    return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> +}
> +
> +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +
> +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> +        fs->notify_enabled = true;

This field is unused, please remove it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 2/4] virtiofd: Create a notification queue
@ 2019-11-22 10:19     ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-11-22 10:19 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, miklos, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4250 bytes --]

On Fri, Nov 15, 2019 at 03:55:41PM -0500, Vivek Goyal wrote:
>  /* Callback from libvhost-user */
>  static void fv_set_features(VuDev *dev, uint64_t features)
>  {
> +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> +    struct fuse_session *se = vud->se;
> +
> +    if ((1 << VIRTIO_FS_F_NOTIFICATION) & features) {

For consistency 1ull should be used.  That way the reader does not have
to check the bit position to verify that the bitmap isn't truncated at
32 bits.

> +        vud->notify_enabled = true;
> +        se->notify_enabled = true;

Only one copy of this field is needed.  vud has a pointer to se.

> +    }
>  }
>  
>  /*
> @@ -662,6 +671,65 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
>      free(req);
>  }
>  
> +static void *fv_queue_notify_thread(void *opaque)
> +{
> +    struct fv_QueueInfo *qi = opaque;
> +
> +    fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
> +             qi->qidx, qi->kick_fd);
> +
> +    while (1) {
> +        struct pollfd pf[2];
> +
> +        pf[0].fd = qi->kick_fd;
> +        pf[0].events = POLLIN;
> +        pf[0].revents = 0;
> +        pf[1].fd = qi->kill_fd;
> +        pf[1].events = POLLIN;
> +        pf[1].revents = 0;
> +
> +        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
> +                 qi->qidx);
> +        int poll_res = ppoll(pf, 2, NULL, NULL);
> +
> +        if (poll_res == -1) {
> +            if (errno == EINTR) {
> +                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
> +                         __func__);
> +                continue;
> +            }
> +            fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
> +            break;
> +        }
> +        assert(poll_res >= 1);
> +        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
> +                     __func__, pf[0].revents, qi->qidx);
> +             break;
> +        }
> +        if (pf[1].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d"
> +                     "killfd\n", __func__, pf[1].revents, qi->qidx);
> +            break;
> +        }
> +        if (pf[1].revents) {
> +            fuse_log(FUSE_LOG_INFO, "%s: kill event on queue %d - quitting\n",
> +                     __func__, qi->qidx);
> +            break;
> +        }
> +        assert(pf[0].revents & POLLIN);
> +        fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
> +                 qi->qidx);
> +
> +        eventfd_t evalue;
> +        if (eventfd_read(qi->kick_fd, &evalue)) {
> +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> +            break;
> +        }
> +    }
> +    return NULL;
> +}

It's difficult to review function without any actual functionality using
the virtqueue.  I'm not sure a thread is even needed since the device
only needs to get a buffer when it has a notification for the driver.
I'll have to wait for the following patches to see what happens here...

> @@ -378,12 +382,23 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
>      }
>  }
>  
> -static uint64_t vuf_get_features(VirtIODevice *vdev,
> -                                      uint64_t requested_features,
> -                                      Error **errp)
> +static uint64_t vuf_get_features(VirtIODevice *vdev, uint64_t features,
> +                                 Error **errp)
>  {
> -    /* No feature bits used yet */
> -    return requested_features;
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +
> +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> +
> +    return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> +}
> +
> +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> +{
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +
> +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> +        fs->notify_enabled = true;

This field is unused, please remove it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] virtiofsd: Specify size of notification buffer using config space
  2019-11-15 20:55   ` [Virtio-fs] " Vivek Goyal
@ 2019-11-22 10:33     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-11-22 10:33 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, miklos, qemu-devel, dgilbert

[-- Attachment #1: Type: text/plain, Size: 4180 bytes --]

On Fri, Nov 15, 2019 at 03:55:42PM -0500, Vivek Goyal wrote:
> diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
> index 411114c9b3..982b6ad0bd 100644
> --- a/contrib/virtiofsd/fuse_virtio.c
> +++ b/contrib/virtiofsd/fuse_virtio.c
> @@ -109,7 +109,8 @@ static uint64_t fv_get_features(VuDev *dev)
>      uint64_t features;
>  
>      features = 1ull << VIRTIO_F_VERSION_1 |
> -               1ull << VIRTIO_FS_F_NOTIFICATION;
> +               1ull << VIRTIO_FS_F_NOTIFICATION |
> +               1ull << VHOST_USER_F_PROTOCOL_FEATURES;

This is not needed since VHOST_USER_F_PROTOCOL_FEATURES is already added
by vu_get_features_exec():

  vu_get_features_exec(VuDev *dev, VhostUserMsg *vmsg)
  {
      vmsg->payload.u64 =
          1ULL << VHOST_F_LOG_ALL |
          1ULL << VHOST_USER_F_PROTOCOL_FEATURES;

      if (dev->iface->get_features) {
          vmsg->payload.u64 |= dev->iface->get_features(dev);
      }

>  
>      return features;
>  }
> @@ -927,6 +928,27 @@ static bool fv_queue_order(VuDev *dev, int qidx)
>      return false;
>  }
>  
> +static uint64_t fv_get_protocol_features(VuDev *dev)
> +{
> +	return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> +}

Please change vu_get_protocol_features_exec() in a separate patch so
that devices don't need this boilerplate .get_protocol_features() code:

  static bool
  vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
  {
      ...
 -    if (dev->iface->get_config && dev->iface->set_config) {
 +    if (dev->iface->get_config || dev->iface->set_config) {
          features |= 1ULL << VHOST_USER_PROTOCOL_F_CONFIG;
      }

> +
> +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> +{
> +	struct virtio_fs_config fscfg = {};
> +
> +	fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
> +                 sizeof(struct fuse_notify_lock_out));
> +	/*
> +	 * As of now only notification related to lock is supported. As more
> +	 * notification types are supported, bump up the size accordingly.
> +	 */
> +	fscfg.notify_buf_size = sizeof(struct fuse_notify_lock_out);

Missing cpu_to_le32().

I'm not sure about specifying the size precisely down to the last byte
because any change to guest-visible aspects of the device (like VIRTIO
Configuration Space) are not compatible across live migration.  It will
be necessary to introduce a device version command-line option for live
migration compatibility so that existing guests can be migrated to a new
virtiofsd without the device changing underneath them.

How about rounding this up to 4 KB?

>  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>      struct virtio_fs_config fscfg = {};
> +    int ret;
> +
> +    /*
> +     * As of now we only get notification buffer size from device. And that's
> +     * needed only if notification queue is enabled.
> +     */
> +    if (fs->notify_enabled) {
> +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> +                                   sizeof(struct virtio_fs_config));
> +	if (ret < 0) {

Indentation.

> +            error_report("vhost-user-fs: get device config space failed."
> +                         " ret=%d\n", ret);
> +            return;
> +        }

Missing le32_to_cpu() for notify_buf_size.

> +    }
>  
>      memcpy((char *)fscfg.tag, fs->conf.tag,
>             MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
>  
>      virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
> +    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
>  
>      memcpy(config, &fscfg, sizeof(fscfg));
>  }
> @@ -545,6 +569,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>      fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
>  
>      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> +
> +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);

Is this really needed since vhost_user_fs_handle_config_change() ignores
it?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 3/4] virtiofsd: Specify size of notification buffer using config space
@ 2019-11-22 10:33     ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-11-22 10:33 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, miklos, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4180 bytes --]

On Fri, Nov 15, 2019 at 03:55:42PM -0500, Vivek Goyal wrote:
> diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
> index 411114c9b3..982b6ad0bd 100644
> --- a/contrib/virtiofsd/fuse_virtio.c
> +++ b/contrib/virtiofsd/fuse_virtio.c
> @@ -109,7 +109,8 @@ static uint64_t fv_get_features(VuDev *dev)
>      uint64_t features;
>  
>      features = 1ull << VIRTIO_F_VERSION_1 |
> -               1ull << VIRTIO_FS_F_NOTIFICATION;
> +               1ull << VIRTIO_FS_F_NOTIFICATION |
> +               1ull << VHOST_USER_F_PROTOCOL_FEATURES;

This is not needed since VHOST_USER_F_PROTOCOL_FEATURES is already added
by vu_get_features_exec():

  vu_get_features_exec(VuDev *dev, VhostUserMsg *vmsg)
  {
      vmsg->payload.u64 =
          1ULL << VHOST_F_LOG_ALL |
          1ULL << VHOST_USER_F_PROTOCOL_FEATURES;

      if (dev->iface->get_features) {
          vmsg->payload.u64 |= dev->iface->get_features(dev);
      }

>  
>      return features;
>  }
> @@ -927,6 +928,27 @@ static bool fv_queue_order(VuDev *dev, int qidx)
>      return false;
>  }
>  
> +static uint64_t fv_get_protocol_features(VuDev *dev)
> +{
> +	return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> +}

Please change vu_get_protocol_features_exec() in a separate patch so
that devices don't need this boilerplate .get_protocol_features() code:

  static bool
  vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
  {
      ...
 -    if (dev->iface->get_config && dev->iface->set_config) {
 +    if (dev->iface->get_config || dev->iface->set_config) {
          features |= 1ULL << VHOST_USER_PROTOCOL_F_CONFIG;
      }

> +
> +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> +{
> +	struct virtio_fs_config fscfg = {};
> +
> +	fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
> +                 sizeof(struct fuse_notify_lock_out));
> +	/*
> +	 * As of now only notification related to lock is supported. As more
> +	 * notification types are supported, bump up the size accordingly.
> +	 */
> +	fscfg.notify_buf_size = sizeof(struct fuse_notify_lock_out);

Missing cpu_to_le32().

I'm not sure about specifying the size precisely down to the last byte
because any change to guest-visible aspects of the device (like VIRTIO
Configuration Space) are not compatible across live migration.  It will
be necessary to introduce a device version command-line option for live
migration compatibility so that existing guests can be migrated to a new
virtiofsd without the device changing underneath them.

How about rounding this up to 4 KB?

>  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
>  {
>      VHostUserFS *fs = VHOST_USER_FS(vdev);
>      struct virtio_fs_config fscfg = {};
> +    int ret;
> +
> +    /*
> +     * As of now we only get notification buffer size from device. And that's
> +     * needed only if notification queue is enabled.
> +     */
> +    if (fs->notify_enabled) {
> +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> +                                   sizeof(struct virtio_fs_config));
> +	if (ret < 0) {

Indentation.

> +            error_report("vhost-user-fs: get device config space failed."
> +                         " ret=%d\n", ret);
> +            return;
> +        }

Missing le32_to_cpu() for notify_buf_size.

> +    }
>  
>      memcpy((char *)fscfg.tag, fs->conf.tag,
>             MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
>  
>      virtio_stl_p(vdev, &fscfg.num_request_queues, fs->conf.num_request_queues);
> +    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
>  
>      memcpy(config, &fscfg, sizeof(fscfg));
>  }
> @@ -545,6 +569,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
>      fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
>  
>      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> +
> +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);

Is this really needed since vhost_user_fs_handle_config_change() ignores
it?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] virtiofsd: Implement blocking posix locks
  2019-11-15 20:55   ` [Virtio-fs] " Vivek Goyal
@ 2019-11-22 10:53     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-11-22 10:53 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, miklos, qemu-devel, dgilbert

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

On Fri, Nov 15, 2019 at 03:55:43PM -0500, Vivek Goyal wrote:
> diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
> index d4a42d9804..f706e440bf 100644
> --- a/contrib/virtiofsd/fuse_lowlevel.c
> +++ b/contrib/virtiofsd/fuse_lowlevel.c
> @@ -183,7 +183,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>  {
>  	struct fuse_out_header out;
>  
> -	if (error <= -1000 || error > 0) {
> +	/* error = 1 has been used to signal client to wait for notificaiton */
> +	if (error <= -1000 || error > 1) {
>  		fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",	error);
>  		error = -ERANGE;
>  	}

What is this?

> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
> +			      int32_t error)
> +{
> +	struct fuse_notify_lock_out outarg;

Missing = {} initialization to avoid information leaks to the guest.

> @@ -1704,6 +1720,15 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>  			       off_t offset, struct fuse_bufvec *bufv,
>  			       enum fuse_buf_copy_flags flags);
> +/**
> + * Notify event related to previous lock request
> + *
> + * @param se the session object
> + * @param req_id the id of the request which requested setlkw

The rest of the code calls this id "unique":

  + * @param req_unique the unique id of the setlkw request

> +    /* Pop an element from queue */
> +    req = vu_queue_pop(dev, q, sizeof(FVRequest), &bad_in_num, &bad_out_num);
> +    if (!req) {
> +        /* TODO: Implement some sort of ring buffer and queue notifications
> +	 * on that and send these later when notification queue has space
> +	 * available.
> +	 */
> +        return -ENOSPC;

Ah, I thought the point of the notifications processing thread was
exactly this case.  It could wake any threads waiting for buffers.

This wakeup could be implemented with a condvar - no ring buffer
necessary.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 4/4] virtiofsd: Implement blocking posix locks
@ 2019-11-22 10:53     ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-11-22 10:53 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, miklos, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

On Fri, Nov 15, 2019 at 03:55:43PM -0500, Vivek Goyal wrote:
> diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
> index d4a42d9804..f706e440bf 100644
> --- a/contrib/virtiofsd/fuse_lowlevel.c
> +++ b/contrib/virtiofsd/fuse_lowlevel.c
> @@ -183,7 +183,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>  {
>  	struct fuse_out_header out;
>  
> -	if (error <= -1000 || error > 0) {
> +	/* error = 1 has been used to signal client to wait for notificaiton */
> +	if (error <= -1000 || error > 1) {
>  		fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",	error);
>  		error = -ERANGE;
>  	}

What is this?

> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
> +			      int32_t error)
> +{
> +	struct fuse_notify_lock_out outarg;

Missing = {} initialization to avoid information leaks to the guest.

> @@ -1704,6 +1720,15 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>  			       off_t offset, struct fuse_bufvec *bufv,
>  			       enum fuse_buf_copy_flags flags);
> +/**
> + * Notify event related to previous lock request
> + *
> + * @param se the session object
> + * @param req_id the id of the request which requested setlkw

The rest of the code calls this id "unique":

  + * @param req_unique the unique id of the setlkw request

> +    /* Pop an element from queue */
> +    req = vu_queue_pop(dev, q, sizeof(FVRequest), &bad_in_num, &bad_out_num);
> +    if (!req) {
> +        /* TODO: Implement some sort of ring buffer and queue notifications
> +	 * on that and send these later when notification queue has space
> +	 * available.
> +	 */
> +        return -ENOSPC;

Ah, I thought the point of the notifications processing thread was
exactly this case.  It could wake any threads waiting for buffers.

This wakeup could be implemented with a condvar - no ring buffer
necessary.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] virtiofsd: Release file locks using F_UNLCK
  2019-11-22 10:07     ` [Virtio-fs] " Stefan Hajnoczi
@ 2019-11-22 13:45       ` Vivek Goyal
  -1 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-22 13:45 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, miklos, qemu-devel, dgilbert

On Fri, Nov 22, 2019 at 10:07:13AM +0000, Stefan Hajnoczi wrote:
> On Fri, Nov 15, 2019 at 03:55:40PM -0500, Vivek Goyal wrote:
> > diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
> > index bc214df0c7..028e7da273 100644
> > --- a/contrib/virtiofsd/passthrough_ll.c
> > +++ b/contrib/virtiofsd/passthrough_ll.c
> > @@ -936,6 +936,14 @@ static void put_shared(struct lo_data *lo, struct lo_inode *inode)
> >  	}
> >  }
> >  
> > +static void release_plock(gpointer data)
> 
> The name posix_locks_value_destroy() would be clearer because it matches
> g_hash_table_new_full() terminology and the function cannot be confused
> with a lock acquire/release operation.

Ok, will use this name.

> 
> This patch conflicts with the cleanups that are currently being made to
> virtiofsd:
> https://github.com/stefanha/qemu/commit/1e493175feca58a81a2d0cbdac93b92e5425d850#diff-ca2dea995d1e6cdb95c8a47c7cca51ceR773

Yes it will. I see you are removing element from hash table on lo_flush().
This works fine today but with waiting locks, we drop the
inode->plock_mutex lock and then wait for the lock and expect
"lo_inode_plock" to not go away.

So I don't think you can remove the element from hash table upon
lo_flush(). May be we can refcount lo_inode_plock structure and first
release all the locks using setlk(UNLCK) and then drop the reference. If
this is last refernce, it will be freed.

And waiting lock code, will obtain a refernce under inode->posix_locks
and then wait for lock outside the lock.

IOW, I will say don't do this optimization of lookup + remove because
it will not work with blocking locks.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 1/4] virtiofsd: Release file locks using F_UNLCK
@ 2019-11-22 13:45       ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-22 13:45 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, miklos, qemu-devel

On Fri, Nov 22, 2019 at 10:07:13AM +0000, Stefan Hajnoczi wrote:
> On Fri, Nov 15, 2019 at 03:55:40PM -0500, Vivek Goyal wrote:
> > diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
> > index bc214df0c7..028e7da273 100644
> > --- a/contrib/virtiofsd/passthrough_ll.c
> > +++ b/contrib/virtiofsd/passthrough_ll.c
> > @@ -936,6 +936,14 @@ static void put_shared(struct lo_data *lo, struct lo_inode *inode)
> >  	}
> >  }
> >  
> > +static void release_plock(gpointer data)
> 
> The name posix_locks_value_destroy() would be clearer because it matches
> g_hash_table_new_full() terminology and the function cannot be confused
> with a lock acquire/release operation.

Ok, will use this name.

> 
> This patch conflicts with the cleanups that are currently being made to
> virtiofsd:
> https://github.com/stefanha/qemu/commit/1e493175feca58a81a2d0cbdac93b92e5425d850#diff-ca2dea995d1e6cdb95c8a47c7cca51ceR773

Yes it will. I see you are removing element from hash table on lo_flush().
This works fine today but with waiting locks, we drop the
inode->plock_mutex lock and then wait for the lock and expect
"lo_inode_plock" to not go away.

So I don't think you can remove the element from hash table upon
lo_flush(). May be we can refcount lo_inode_plock structure and first
release all the locks using setlk(UNLCK) and then drop the reference. If
this is last refernce, it will be freed.

And waiting lock code, will obtain a refernce under inode->posix_locks
and then wait for lock outside the lock.

IOW, I will say don't do this optimization of lookup + remove because
it will not work with blocking locks.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] virtiofd: Create a notification queue
  2019-11-22 10:19     ` [Virtio-fs] " Stefan Hajnoczi
@ 2019-11-22 14:47       ` Vivek Goyal
  -1 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-22 14:47 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, miklos, qemu-devel, dgilbert

On Fri, Nov 22, 2019 at 10:19:03AM +0000, Stefan Hajnoczi wrote:
> On Fri, Nov 15, 2019 at 03:55:41PM -0500, Vivek Goyal wrote:
> >  /* Callback from libvhost-user */
> >  static void fv_set_features(VuDev *dev, uint64_t features)
> >  {
> > +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> > +    struct fuse_session *se = vud->se;
> > +
> > +    if ((1 << VIRTIO_FS_F_NOTIFICATION) & features) {
> 
> For consistency 1ull should be used.  That way the reader does not have
> to check the bit position to verify that the bitmap isn't truncated at
> 32 bits.

Ok, will do.

> 
> > +        vud->notify_enabled = true;
> > +        se->notify_enabled = true;
> 
> Only one copy of this field is needed.  vud has a pointer to se.

I need to access ->notify_enabled in passthrough_ll.c to determine if
notification queue is enabled or not. That determines if async locks are
supported or not.  And based on that either -EOPNOTSUPP is returned or
a response to wait is returned.

I did not see passthrough_ll.c accessing vud. I did see it having access
to session object though. So I created a copy there.

But I am open to suggestions on what's the best way to access this
information in passthrough_ll.c

> 
> > +    }
> >  }
> >  
> >  /*
> > @@ -662,6 +671,65 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
> >      free(req);
> >  }
> >  
> > +static void *fv_queue_notify_thread(void *opaque)
> > +{
> > +    struct fv_QueueInfo *qi = opaque;
> > +
> > +    fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
> > +             qi->qidx, qi->kick_fd);
> > +
> > +    while (1) {
> > +        struct pollfd pf[2];
> > +
> > +        pf[0].fd = qi->kick_fd;
> > +        pf[0].events = POLLIN;
> > +        pf[0].revents = 0;
> > +        pf[1].fd = qi->kill_fd;
> > +        pf[1].events = POLLIN;
> > +        pf[1].revents = 0;
> > +
> > +        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
> > +                 qi->qidx);
> > +        int poll_res = ppoll(pf, 2, NULL, NULL);
> > +
> > +        if (poll_res == -1) {
> > +            if (errno == EINTR) {
> > +                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
> > +                         __func__);
> > +                continue;
> > +            }
> > +            fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
> > +            break;
> > +        }
> > +        assert(poll_res >= 1);
> > +        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> > +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
> > +                     __func__, pf[0].revents, qi->qidx);
> > +             break;
> > +        }
> > +        if (pf[1].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> > +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d"
> > +                     "killfd\n", __func__, pf[1].revents, qi->qidx);
> > +            break;
> > +        }
> > +        if (pf[1].revents) {
> > +            fuse_log(FUSE_LOG_INFO, "%s: kill event on queue %d - quitting\n",
> > +                     __func__, qi->qidx);
> > +            break;
> > +        }
> > +        assert(pf[0].revents & POLLIN);
> > +        fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
> > +                 qi->qidx);
> > +
> > +        eventfd_t evalue;
> > +        if (eventfd_read(qi->kick_fd, &evalue)) {
> > +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> > +            break;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> 
> It's difficult to review function without any actual functionality using
> the virtqueue.  I'm not sure a thread is even needed since the device
> only needs to get a buffer when it has a notification for the driver.
> I'll have to wait for the following patches to see what happens here...

This might very well be redundant. I am not sure. Can get rid of
this thread if not needed at all. So we don't need to monitor even
kill_fd and take any special action?

> 
> > @@ -378,12 +382,23 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> >      }
> >  }
> >  
> > -static uint64_t vuf_get_features(VirtIODevice *vdev,
> > -                                      uint64_t requested_features,
> > -                                      Error **errp)
> > +static uint64_t vuf_get_features(VirtIODevice *vdev, uint64_t features,
> > +                                 Error **errp)
> >  {
> > -    /* No feature bits used yet */
> > -    return requested_features;
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +
> > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > +
> > +    return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> > +}
> > +
> > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +
> > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > +        fs->notify_enabled = true;
> 
> This field is unused, please remove it.

vuf_get_config() uses it.

Thanks
Vivek



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 2/4] virtiofd: Create a notification queue
@ 2019-11-22 14:47       ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-22 14:47 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, miklos, qemu-devel

On Fri, Nov 22, 2019 at 10:19:03AM +0000, Stefan Hajnoczi wrote:
> On Fri, Nov 15, 2019 at 03:55:41PM -0500, Vivek Goyal wrote:
> >  /* Callback from libvhost-user */
> >  static void fv_set_features(VuDev *dev, uint64_t features)
> >  {
> > +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> > +    struct fuse_session *se = vud->se;
> > +
> > +    if ((1 << VIRTIO_FS_F_NOTIFICATION) & features) {
> 
> For consistency 1ull should be used.  That way the reader does not have
> to check the bit position to verify that the bitmap isn't truncated at
> 32 bits.

Ok, will do.

> 
> > +        vud->notify_enabled = true;
> > +        se->notify_enabled = true;
> 
> Only one copy of this field is needed.  vud has a pointer to se.

I need to access ->notify_enabled in passthrough_ll.c to determine if
notification queue is enabled or not. That determines if async locks are
supported or not.  And based on that either -EOPNOTSUPP is returned or
a response to wait is returned.

I did not see passthrough_ll.c accessing vud. I did see it having access
to session object though. So I created a copy there.

But I am open to suggestions on what's the best way to access this
information in passthrough_ll.c

> 
> > +    }
> >  }
> >  
> >  /*
> > @@ -662,6 +671,65 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
> >      free(req);
> >  }
> >  
> > +static void *fv_queue_notify_thread(void *opaque)
> > +{
> > +    struct fv_QueueInfo *qi = opaque;
> > +
> > +    fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
> > +             qi->qidx, qi->kick_fd);
> > +
> > +    while (1) {
> > +        struct pollfd pf[2];
> > +
> > +        pf[0].fd = qi->kick_fd;
> > +        pf[0].events = POLLIN;
> > +        pf[0].revents = 0;
> > +        pf[1].fd = qi->kill_fd;
> > +        pf[1].events = POLLIN;
> > +        pf[1].revents = 0;
> > +
> > +        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
> > +                 qi->qidx);
> > +        int poll_res = ppoll(pf, 2, NULL, NULL);
> > +
> > +        if (poll_res == -1) {
> > +            if (errno == EINTR) {
> > +                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
> > +                         __func__);
> > +                continue;
> > +            }
> > +            fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
> > +            break;
> > +        }
> > +        assert(poll_res >= 1);
> > +        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> > +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
> > +                     __func__, pf[0].revents, qi->qidx);
> > +             break;
> > +        }
> > +        if (pf[1].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> > +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d"
> > +                     "killfd\n", __func__, pf[1].revents, qi->qidx);
> > +            break;
> > +        }
> > +        if (pf[1].revents) {
> > +            fuse_log(FUSE_LOG_INFO, "%s: kill event on queue %d - quitting\n",
> > +                     __func__, qi->qidx);
> > +            break;
> > +        }
> > +        assert(pf[0].revents & POLLIN);
> > +        fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
> > +                 qi->qidx);
> > +
> > +        eventfd_t evalue;
> > +        if (eventfd_read(qi->kick_fd, &evalue)) {
> > +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> > +            break;
> > +        }
> > +    }
> > +    return NULL;
> > +}
> 
> It's difficult to review function without any actual functionality using
> the virtqueue.  I'm not sure a thread is even needed since the device
> only needs to get a buffer when it has a notification for the driver.
> I'll have to wait for the following patches to see what happens here...

This might very well be redundant. I am not sure. Can get rid of
this thread if not needed at all. So we don't need to monitor even
kill_fd and take any special action?

> 
> > @@ -378,12 +382,23 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> >      }
> >  }
> >  
> > -static uint64_t vuf_get_features(VirtIODevice *vdev,
> > -                                      uint64_t requested_features,
> > -                                      Error **errp)
> > +static uint64_t vuf_get_features(VirtIODevice *vdev, uint64_t features,
> > +                                 Error **errp)
> >  {
> > -    /* No feature bits used yet */
> > -    return requested_features;
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +
> > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > +
> > +    return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> > +}
> > +
> > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > +{
> > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > +
> > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > +        fs->notify_enabled = true;
> 
> This field is unused, please remove it.

vuf_get_config() uses it.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] virtiofd: Create a notification queue
  2019-11-22 14:47       ` [Virtio-fs] " Vivek Goyal
@ 2019-11-22 17:29         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 33+ messages in thread
From: Dr. David Alan Gilbert @ 2019-11-22 17:29 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, miklos

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Fri, Nov 22, 2019 at 10:19:03AM +0000, Stefan Hajnoczi wrote:
> > On Fri, Nov 15, 2019 at 03:55:41PM -0500, Vivek Goyal wrote:
> > >  /* Callback from libvhost-user */
> > >  static void fv_set_features(VuDev *dev, uint64_t features)
> > >  {
> > > +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> > > +    struct fuse_session *se = vud->se;
> > > +
> > > +    if ((1 << VIRTIO_FS_F_NOTIFICATION) & features) {
> > 
> > For consistency 1ull should be used.  That way the reader does not have
> > to check the bit position to verify that the bitmap isn't truncated at
> > 32 bits.
> 
> Ok, will do.
> 
> > 
> > > +        vud->notify_enabled = true;
> > > +        se->notify_enabled = true;
> > 
> > Only one copy of this field is needed.  vud has a pointer to se.
> 
> I need to access ->notify_enabled in passthrough_ll.c to determine if
> notification queue is enabled or not. That determines if async locks are
> supported or not.  And based on that either -EOPNOTSUPP is returned or
> a response to wait is returned.
> 
> I did not see passthrough_ll.c accessing vud. I did see it having access
> to session object though. So I created a copy there.
> 
> But I am open to suggestions on what's the best way to access this
> information in passthrough_ll.c
> 
> > 
> > > +    }
> > >  }
> > >  
> > >  /*
> > > @@ -662,6 +671,65 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
> > >      free(req);
> > >  }
> > >  
> > > +static void *fv_queue_notify_thread(void *opaque)
> > > +{
> > > +    struct fv_QueueInfo *qi = opaque;
> > > +
> > > +    fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
> > > +             qi->qidx, qi->kick_fd);
> > > +
> > > +    while (1) {
> > > +        struct pollfd pf[2];
> > > +
> > > +        pf[0].fd = qi->kick_fd;
> > > +        pf[0].events = POLLIN;
> > > +        pf[0].revents = 0;
> > > +        pf[1].fd = qi->kill_fd;
> > > +        pf[1].events = POLLIN;
> > > +        pf[1].revents = 0;
> > > +
> > > +        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
> > > +                 qi->qidx);
> > > +        int poll_res = ppoll(pf, 2, NULL, NULL);
> > > +
> > > +        if (poll_res == -1) {
> > > +            if (errno == EINTR) {
> > > +                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
> > > +                         __func__);
> > > +                continue;
> > > +            }
> > > +            fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
> > > +            break;
> > > +        }
> > > +        assert(poll_res >= 1);
> > > +        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> > > +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
> > > +                     __func__, pf[0].revents, qi->qidx);
> > > +             break;
> > > +        }
> > > +        if (pf[1].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> > > +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d"
> > > +                     "killfd\n", __func__, pf[1].revents, qi->qidx);
> > > +            break;
> > > +        }
> > > +        if (pf[1].revents) {
> > > +            fuse_log(FUSE_LOG_INFO, "%s: kill event on queue %d - quitting\n",
> > > +                     __func__, qi->qidx);
> > > +            break;
> > > +        }
> > > +        assert(pf[0].revents & POLLIN);
> > > +        fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
> > > +                 qi->qidx);
> > > +
> > > +        eventfd_t evalue;
> > > +        if (eventfd_read(qi->kick_fd, &evalue)) {
> > > +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> > > +            break;
> > > +        }
> > > +    }
> > > +    return NULL;
> > > +}
> > 
> > It's difficult to review function without any actual functionality using
> > the virtqueue.  I'm not sure a thread is even needed since the device
> > only needs to get a buffer when it has a notification for the driver.
> > I'll have to wait for the following patches to see what happens here...
> 
> This might very well be redundant. I am not sure. Can get rid of
> this thread if not needed at all. So we don't need to monitor even
> kill_fd and take any special action?

The kill_fd is internal to virtiofsd; it's only used as a way for the
main thread to cause the queue thread to exit;  if you've not got the
thread, you don't need the kill_fd.

Dave

> > 
> > > @@ -378,12 +382,23 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> > >      }
> > >  }
> > >  
> > > -static uint64_t vuf_get_features(VirtIODevice *vdev,
> > > -                                      uint64_t requested_features,
> > > -                                      Error **errp)
> > > +static uint64_t vuf_get_features(VirtIODevice *vdev, uint64_t features,
> > > +                                 Error **errp)
> > >  {
> > > -    /* No feature bits used yet */
> > > -    return requested_features;
> > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > +
> > > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > > +
> > > +    return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> > > +}
> > > +
> > > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > > +{
> > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > +
> > > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > > +        fs->notify_enabled = true;
> > 
> > This field is unused, please remove it.
> 
> vuf_get_config() uses it.
> 
> Thanks
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 2/4] virtiofd: Create a notification queue
@ 2019-11-22 17:29         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 33+ messages in thread
From: Dr. David Alan Gilbert @ 2019-11-22 17:29 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Fri, Nov 22, 2019 at 10:19:03AM +0000, Stefan Hajnoczi wrote:
> > On Fri, Nov 15, 2019 at 03:55:41PM -0500, Vivek Goyal wrote:
> > >  /* Callback from libvhost-user */
> > >  static void fv_set_features(VuDev *dev, uint64_t features)
> > >  {
> > > +    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
> > > +    struct fuse_session *se = vud->se;
> > > +
> > > +    if ((1 << VIRTIO_FS_F_NOTIFICATION) & features) {
> > 
> > For consistency 1ull should be used.  That way the reader does not have
> > to check the bit position to verify that the bitmap isn't truncated at
> > 32 bits.
> 
> Ok, will do.
> 
> > 
> > > +        vud->notify_enabled = true;
> > > +        se->notify_enabled = true;
> > 
> > Only one copy of this field is needed.  vud has a pointer to se.
> 
> I need to access ->notify_enabled in passthrough_ll.c to determine if
> notification queue is enabled or not. That determines if async locks are
> supported or not.  And based on that either -EOPNOTSUPP is returned or
> a response to wait is returned.
> 
> I did not see passthrough_ll.c accessing vud. I did see it having access
> to session object though. So I created a copy there.
> 
> But I am open to suggestions on what's the best way to access this
> information in passthrough_ll.c
> 
> > 
> > > +    }
> > >  }
> > >  
> > >  /*
> > > @@ -662,6 +671,65 @@ static void fv_queue_worker(gpointer data, gpointer user_data)
> > >      free(req);
> > >  }
> > >  
> > > +static void *fv_queue_notify_thread(void *opaque)
> > > +{
> > > +    struct fv_QueueInfo *qi = opaque;
> > > +
> > > +    fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
> > > +             qi->qidx, qi->kick_fd);
> > > +
> > > +    while (1) {
> > > +        struct pollfd pf[2];
> > > +
> > > +        pf[0].fd = qi->kick_fd;
> > > +        pf[0].events = POLLIN;
> > > +        pf[0].revents = 0;
> > > +        pf[1].fd = qi->kill_fd;
> > > +        pf[1].events = POLLIN;
> > > +        pf[1].revents = 0;
> > > +
> > > +        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
> > > +                 qi->qidx);
> > > +        int poll_res = ppoll(pf, 2, NULL, NULL);
> > > +
> > > +        if (poll_res == -1) {
> > > +            if (errno == EINTR) {
> > > +                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
> > > +                         __func__);
> > > +                continue;
> > > +            }
> > > +            fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
> > > +            break;
> > > +        }
> > > +        assert(poll_res >= 1);
> > > +        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> > > +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
> > > +                     __func__, pf[0].revents, qi->qidx);
> > > +             break;
> > > +        }
> > > +        if (pf[1].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> > > +            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d"
> > > +                     "killfd\n", __func__, pf[1].revents, qi->qidx);
> > > +            break;
> > > +        }
> > > +        if (pf[1].revents) {
> > > +            fuse_log(FUSE_LOG_INFO, "%s: kill event on queue %d - quitting\n",
> > > +                     __func__, qi->qidx);
> > > +            break;
> > > +        }
> > > +        assert(pf[0].revents & POLLIN);
> > > +        fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
> > > +                 qi->qidx);
> > > +
> > > +        eventfd_t evalue;
> > > +        if (eventfd_read(qi->kick_fd, &evalue)) {
> > > +            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
> > > +            break;
> > > +        }
> > > +    }
> > > +    return NULL;
> > > +}
> > 
> > It's difficult to review function without any actual functionality using
> > the virtqueue.  I'm not sure a thread is even needed since the device
> > only needs to get a buffer when it has a notification for the driver.
> > I'll have to wait for the following patches to see what happens here...
> 
> This might very well be redundant. I am not sure. Can get rid of
> this thread if not needed at all. So we don't need to monitor even
> kill_fd and take any special action?

The kill_fd is internal to virtiofsd; it's only used as a way for the
main thread to cause the queue thread to exit;  if you've not got the
thread, you don't need the kill_fd.

Dave

> > 
> > > @@ -378,12 +382,23 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
> > >      }
> > >  }
> > >  
> > > -static uint64_t vuf_get_features(VirtIODevice *vdev,
> > > -                                      uint64_t requested_features,
> > > -                                      Error **errp)
> > > +static uint64_t vuf_get_features(VirtIODevice *vdev, uint64_t features,
> > > +                                 Error **errp)
> > >  {
> > > -    /* No feature bits used yet */
> > > -    return requested_features;
> > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > +
> > > +    virtio_add_feature(&features, VIRTIO_FS_F_NOTIFICATION);
> > > +
> > > +    return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
> > > +}
> > > +
> > > +static void vuf_set_features(VirtIODevice *vdev, uint64_t features)
> > > +{
> > > +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> > > +
> > > +    if (virtio_has_feature(features, VIRTIO_FS_F_NOTIFICATION)) {
> > > +        fs->notify_enabled = true;
> > 
> > This field is unused, please remove it.
> 
> vuf_get_config() uses it.
> 
> Thanks
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] virtiofsd: Implement blocking posix locks
  2019-11-15 20:55   ` [Virtio-fs] " Vivek Goyal
@ 2019-11-22 17:47     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 33+ messages in thread
From: Dr. David Alan Gilbert @ 2019-11-22 17:47 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, stefanha, miklos

* Vivek Goyal (vgoyal@redhat.com) wrote:
> As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> -EOPNOTSUPP.
> 
> Change that by accepting these requests and returning a reply immediately
> asking caller to wait. Once lock is available, send a notification to
> the waiter indicating lock is available.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  contrib/virtiofsd/fuse_kernel.h    |  7 +++
>  contrib/virtiofsd/fuse_lowlevel.c  | 23 +++++++-
>  contrib/virtiofsd/fuse_lowlevel.h  | 25 ++++++++
>  contrib/virtiofsd/fuse_virtio.c    | 94 ++++++++++++++++++++++++++++--
>  contrib/virtiofsd/passthrough_ll.c | 49 +++++++++++++---
>  5 files changed, 182 insertions(+), 16 deletions(-)
> 
> diff --git a/contrib/virtiofsd/fuse_kernel.h b/contrib/virtiofsd/fuse_kernel.h
> index 2bdc8b1c88..d4d65c5414 100644
> --- a/contrib/virtiofsd/fuse_kernel.h
> +++ b/contrib/virtiofsd/fuse_kernel.h
> @@ -444,6 +444,7 @@ enum fuse_notify_code {
>  	FUSE_NOTIFY_STORE = 4,
>  	FUSE_NOTIFY_RETRIEVE = 5,
>  	FUSE_NOTIFY_DELETE = 6,
> +	FUSE_NOTIFY_LOCK = 7,
>  	FUSE_NOTIFY_CODE_MAX,
>  };
>  
> @@ -836,6 +837,12 @@ struct fuse_notify_retrieve_in {
>  	uint64_t	dummy4;
>  };
>  
> +struct fuse_notify_lock_out {
> +	uint64_t	id;
> +	int32_t		error;
> +	int32_t		padding;
> +};
> +
>  /* Device ioctls: */
>  #define FUSE_DEV_IOC_CLONE	_IOR(229, 0, uint32_t)
>  
> diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
> index d4a42d9804..f706e440bf 100644
> --- a/contrib/virtiofsd/fuse_lowlevel.c
> +++ b/contrib/virtiofsd/fuse_lowlevel.c
> @@ -183,7 +183,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>  {
>  	struct fuse_out_header out;
>  
> -	if (error <= -1000 || error > 0) {
> +	/* error = 1 has been used to signal client to wait for notificaiton */
> +	if (error <= -1000 || error > 1) {
>  		fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",	error);
>  		error = -ERANGE;
>  	}
> @@ -291,6 +292,12 @@ int fuse_reply_err(fuse_req_t req, int err)
>  	return send_reply(req, -err, NULL, 0);
>  }
>  
> +int fuse_reply_wait(fuse_req_t req)
> +{
> +	/* TODO: This is a hack. Fix it */
> +	return send_reply(req, 1, NULL, 0);
> +}
> +
>  void fuse_reply_none(fuse_req_t req)
>  {
>  	fuse_free_req(req);
> @@ -2207,6 +2214,20 @@ static int send_notify_iov(struct fuse_session *se, int notify_code,
>  	return fuse_send_msg(se, NULL, iov, count);
>  }
>  
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
> +			      int32_t error)
> +{
> +	struct fuse_notify_lock_out outarg;
> +	struct iovec iov[2];
> +
> +	outarg.id = req_id;
> +	outarg.error = -error;
> +
> +	iov[1].iov_base = &outarg;
> +	iov[1].iov_len = sizeof(outarg);
> +	return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> +}
> +
>  int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph)
>  {
>  	if (ph != NULL) {
> diff --git a/contrib/virtiofsd/fuse_lowlevel.h b/contrib/virtiofsd/fuse_lowlevel.h
> index e664d2d12d..f0a94683b5 100644
> --- a/contrib/virtiofsd/fuse_lowlevel.h
> +++ b/contrib/virtiofsd/fuse_lowlevel.h
> @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
>   */
>  int fuse_reply_err(fuse_req_t req, int err);
>  
> +/**
> + * Ask caller to wait for lock.
> + *
> + * Possible requests:
> + *   setlkw
> + *
> + * If caller sends a blocking lock request (setlkw), then reply to caller
> + * that wait for lock to be available. Once lock is available caller will
> + * receive a notification with request's unique id. Notification will
> + * carry info whether lock was successfully obtained or not.
> + *
> + * @param req request handle
> + * @return zero for success, -errno for failure to send reply
> + */
> +int fuse_reply_wait(fuse_req_t req);
> +
>  /**
>   * Don't send reply
>   *
> @@ -1704,6 +1720,15 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>  			       off_t offset, struct fuse_bufvec *bufv,
>  			       enum fuse_buf_copy_flags flags);
> +/**
> + * Notify event related to previous lock request
> + *
> + * @param se the session object
> + * @param req_id the id of the request which requested setlkw
> + * @param error zero for success, -errno for the failure
> + */
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
> +			      int32_t error);
>  
>  /* ----------------------------------------------------------- *
>   * Utility functions					       *
> diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
> index 982b6ad0bd..98d27e7642 100644
> --- a/contrib/virtiofsd/fuse_virtio.c
> +++ b/contrib/virtiofsd/fuse_virtio.c
> @@ -215,6 +215,81 @@ static void copy_iov(struct iovec *src_iov, int src_count,
>      }
>  }
>  
> +static int virtio_send_notify_msg(struct fuse_session *se, struct iovec *iov,
> +				  int count)
> +{
> +    struct fv_QueueInfo *qi;
> +    VuDev *dev = &se->virtio_dev->dev;
> +    VuVirtq *q;
> +    FVRequest *req;
> +    VuVirtqElement *elem;
> +    unsigned int in_num, bad_in_num = 0, bad_out_num = 0;
> +    struct fuse_out_header *out = iov[0].iov_base;
> +    size_t in_len, tosend_len = iov_size(iov, count);
> +    struct iovec *in_sg;
> +    int ret = 0;
> +
> +    /* Notifications have unique == 0 */
> +    assert (!out->unique);
> +
> +    if (!se->notify_enabled)
> +        return -EOPNOTSUPP;
> +
> +    /* If notifications are enabled, queue index 1 is notification queue */
> +    qi = se->virtio_dev->qi[1];
> +    q = vu_get_queue(dev, qi->qidx);
> +
> +    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
> +    pthread_mutex_lock(&qi->vq_lock);
> +    /* Pop an element from queue */
> +    req = vu_queue_pop(dev, q, sizeof(FVRequest), &bad_in_num, &bad_out_num);

You don't need bad_in_num/bad_out_num - just pass NULL for both; they're
only needed if you expect to read/write data that's not mappable (i.e.
in our direct write case).

> +    if (!req) {
> +        /* TODO: Implement some sort of ring buffer and queue notifications
> +	 * on that and send these later when notification queue has space
> +	 * available.
> +	 */
> +        return -ENOSPC;
> +    }
> +    pthread_mutex_unlock(&qi->vq_lock);
> +    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
> +
> +    out->len = tosend_len;
> +    elem = &req->elem;
> +    in_num = elem->in_num;
> +    in_sg = elem->in_sg;
> +    in_len = iov_size(in_sg, in_num);
> +    fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
> +             __func__, elem->index, in_num,  in_len);
> +
> +    if (in_len < sizeof(struct fuse_out_header)) {
> +        fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
> +                 __func__, elem->index);
> +        ret = -E2BIG;
> +        goto out;
> +    }
> +
> +    if (in_len < tosend_len) {
> +        fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len"
> +                 " %zd\n", __func__, elem->index, tosend_len);
> +        ret = -E2BIG;
> +        goto out;
> +    }
> +
> +    /* First copy the header data from iov->in_sg */
> +    copy_iov(iov, count, in_sg, in_num, tosend_len);
> +
> +    /* TODO: Add bad_innum handling */
> +    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
> +    pthread_mutex_lock(&qi->vq_lock);
> +    vu_queue_push(dev, q, elem, tosend_len);
> +    vu_queue_notify(dev, q);
> +    pthread_mutex_unlock(&qi->vq_lock);
> +    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
> +out:
> +    free(req);
> +    return ret;
> +}
> +
>  /*
>   * Called back by ll whenever it wants to send a reply/message back
>   * The 1st element of the iov starts with the fuse_out_header
> @@ -223,11 +298,11 @@ static void copy_iov(struct iovec *src_iov, int src_count,
>  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>                      struct iovec *iov, int count)
>  {
> -    FVRequest *req = container_of(ch, FVRequest, ch);
> -    struct fv_QueueInfo *qi = ch->qi;
> +    FVRequest *req;
> +    struct fv_QueueInfo *qi;
>      VuDev *dev = &se->virtio_dev->dev;
> -    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> -    VuVirtqElement *elem = &req->elem;
> +    VuVirtq *q;
> +    VuVirtqElement *elem;
>      int ret = 0;
>  
>      assert(count >= 1);
> @@ -238,8 +313,15 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>  
>      size_t tosend_len = iov_size(iov, count);
>  
> -    /* unique == 0 is notification, which we don't support */
> -    assert(out->unique);
> +    /* unique == 0 is notification */
> +    if (!out->unique)
> +        return virtio_send_notify_msg(se, iov, count);
> +
> +    assert(ch);
> +    req = container_of(ch, FVRequest, ch);
> +    elem = &req->elem;
> +    qi = ch->qi;
> +    q = vu_get_queue(dev, qi->qidx);
>      assert(!req->reply_sent);
>  
>      /* The 'in' part of the elem is to qemu */
> diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
> index 028e7da273..ed52953565 100644
> --- a/contrib/virtiofsd/passthrough_ll.c
> +++ b/contrib/virtiofsd/passthrough_ll.c
> @@ -1925,7 +1925,10 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
>  	struct lo_data *lo = lo_data(req);
>  	struct lo_inode *inode;
>  	struct lo_inode_plock *plock;
> -	int ret, saverr = 0;
> +	int ret, saverr = 0, ofd;
> +	uint64_t unique;
> +	struct fuse_session *se = req->se;
> +	bool async_lock = false;
>  
>  	fuse_log(FUSE_LOG_DEBUG, "lo_setlk(ino=%" PRIu64 ", flags=%d)"
>  		 " cmd=%d pid=%d owner=0x%lx sleep=%d l_whence=%d"
> @@ -1933,11 +1936,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
>  		 lock->l_type, lock->l_pid, fi->lock_owner, sleep,
>  		 lock->l_whence, lock->l_start, lock->l_len);
>  
> -	if (sleep) {
> -		fuse_reply_err(req, EOPNOTSUPP);
> -		return;
> -	}
> -
>  	inode = lo_inode(req, ino);
>  	if (!inode) {
>  		fuse_reply_err(req, EBADF);
> @@ -1950,21 +1948,54 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
>  
>  	if (!plock) {
>  		saverr = ret;
> +		pthread_mutex_unlock(&inode->plock_mutex);
>  		goto out;
>  	}
>  
> +	/*
> +	 * plock is now released when inode is going away. We already have
> +	 * a reference on inode, so it is guaranteed that plock->fd is
> +	 * still around even after dropping inode->plock_mutex lock
> +	 */
> +	ofd = plock->fd;
> +	pthread_mutex_unlock(&inode->plock_mutex);
> +
> +	/*
> +	 * If this lock request can block, request caller to wait for
> +	 * notification. Do not access req after this. Once lock is
> +	 * available, send a notification instead.
> +	 */
> +	if (sleep && lock->l_type != F_UNLCK) {
> +		/*
> +		 * If notification queue is not enabled, can't support async
> +		 * locks.
> +		 */
> +		if (!se->notify_enabled) {
> +			saverr = EOPNOTSUPP;
> +			goto out;
> +		}
> +		async_lock = true;
> +		unique = req->unique;
> +		fuse_reply_wait(req);
> +	}
>  	/* TODO: Is it alright to modify flock? */
>  	lock->l_pid = 0;
> -	ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> +	if (async_lock)
> +		ret = fcntl(ofd, F_OFD_SETLKW, lock);
> +	else
> +		ret = fcntl(ofd, F_OFD_SETLK, lock);

What happens if the guest is rebooted after it's asked
for, but not been granted a lock?

Dave

>  	if (ret == -1) {
>  		saverr = errno;
>  	}
>  
>  out:
> -	pthread_mutex_unlock(&inode->plock_mutex);
>  	lo_inode_put(lo, &inode);
>  
> -	fuse_reply_err(req, saverr);
> +	if (!async_lock)
> +		fuse_reply_err(req, saverr);
> +	else {
> +		fuse_lowlevel_notify_lock(se, unique, saverr);
> +	}
>  }
>  
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 4/4] virtiofsd: Implement blocking posix locks
@ 2019-11-22 17:47     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 33+ messages in thread
From: Dr. David Alan Gilbert @ 2019-11-22 17:47 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos

* Vivek Goyal (vgoyal@redhat.com) wrote:
> As of now we don't support fcntl(F_SETLKW) and if we see one, we return
> -EOPNOTSUPP.
> 
> Change that by accepting these requests and returning a reply immediately
> asking caller to wait. Once lock is available, send a notification to
> the waiter indicating lock is available.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  contrib/virtiofsd/fuse_kernel.h    |  7 +++
>  contrib/virtiofsd/fuse_lowlevel.c  | 23 +++++++-
>  contrib/virtiofsd/fuse_lowlevel.h  | 25 ++++++++
>  contrib/virtiofsd/fuse_virtio.c    | 94 ++++++++++++++++++++++++++++--
>  contrib/virtiofsd/passthrough_ll.c | 49 +++++++++++++---
>  5 files changed, 182 insertions(+), 16 deletions(-)
> 
> diff --git a/contrib/virtiofsd/fuse_kernel.h b/contrib/virtiofsd/fuse_kernel.h
> index 2bdc8b1c88..d4d65c5414 100644
> --- a/contrib/virtiofsd/fuse_kernel.h
> +++ b/contrib/virtiofsd/fuse_kernel.h
> @@ -444,6 +444,7 @@ enum fuse_notify_code {
>  	FUSE_NOTIFY_STORE = 4,
>  	FUSE_NOTIFY_RETRIEVE = 5,
>  	FUSE_NOTIFY_DELETE = 6,
> +	FUSE_NOTIFY_LOCK = 7,
>  	FUSE_NOTIFY_CODE_MAX,
>  };
>  
> @@ -836,6 +837,12 @@ struct fuse_notify_retrieve_in {
>  	uint64_t	dummy4;
>  };
>  
> +struct fuse_notify_lock_out {
> +	uint64_t	id;
> +	int32_t		error;
> +	int32_t		padding;
> +};
> +
>  /* Device ioctls: */
>  #define FUSE_DEV_IOC_CLONE	_IOR(229, 0, uint32_t)
>  
> diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
> index d4a42d9804..f706e440bf 100644
> --- a/contrib/virtiofsd/fuse_lowlevel.c
> +++ b/contrib/virtiofsd/fuse_lowlevel.c
> @@ -183,7 +183,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>  {
>  	struct fuse_out_header out;
>  
> -	if (error <= -1000 || error > 0) {
> +	/* error = 1 has been used to signal client to wait for notificaiton */
> +	if (error <= -1000 || error > 1) {
>  		fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",	error);
>  		error = -ERANGE;
>  	}
> @@ -291,6 +292,12 @@ int fuse_reply_err(fuse_req_t req, int err)
>  	return send_reply(req, -err, NULL, 0);
>  }
>  
> +int fuse_reply_wait(fuse_req_t req)
> +{
> +	/* TODO: This is a hack. Fix it */
> +	return send_reply(req, 1, NULL, 0);
> +}
> +
>  void fuse_reply_none(fuse_req_t req)
>  {
>  	fuse_free_req(req);
> @@ -2207,6 +2214,20 @@ static int send_notify_iov(struct fuse_session *se, int notify_code,
>  	return fuse_send_msg(se, NULL, iov, count);
>  }
>  
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
> +			      int32_t error)
> +{
> +	struct fuse_notify_lock_out outarg;
> +	struct iovec iov[2];
> +
> +	outarg.id = req_id;
> +	outarg.error = -error;
> +
> +	iov[1].iov_base = &outarg;
> +	iov[1].iov_len = sizeof(outarg);
> +	return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
> +}
> +
>  int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph)
>  {
>  	if (ph != NULL) {
> diff --git a/contrib/virtiofsd/fuse_lowlevel.h b/contrib/virtiofsd/fuse_lowlevel.h
> index e664d2d12d..f0a94683b5 100644
> --- a/contrib/virtiofsd/fuse_lowlevel.h
> +++ b/contrib/virtiofsd/fuse_lowlevel.h
> @@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
>   */
>  int fuse_reply_err(fuse_req_t req, int err);
>  
> +/**
> + * Ask caller to wait for lock.
> + *
> + * Possible requests:
> + *   setlkw
> + *
> + * If caller sends a blocking lock request (setlkw), then reply to caller
> + * that wait for lock to be available. Once lock is available caller will
> + * receive a notification with request's unique id. Notification will
> + * carry info whether lock was successfully obtained or not.
> + *
> + * @param req request handle
> + * @return zero for success, -errno for failure to send reply
> + */
> +int fuse_reply_wait(fuse_req_t req);
> +
>  /**
>   * Don't send reply
>   *
> @@ -1704,6 +1720,15 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
>  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
>  			       off_t offset, struct fuse_bufvec *bufv,
>  			       enum fuse_buf_copy_flags flags);
> +/**
> + * Notify event related to previous lock request
> + *
> + * @param se the session object
> + * @param req_id the id of the request which requested setlkw
> + * @param error zero for success, -errno for the failure
> + */
> +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
> +			      int32_t error);
>  
>  /* ----------------------------------------------------------- *
>   * Utility functions					       *
> diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
> index 982b6ad0bd..98d27e7642 100644
> --- a/contrib/virtiofsd/fuse_virtio.c
> +++ b/contrib/virtiofsd/fuse_virtio.c
> @@ -215,6 +215,81 @@ static void copy_iov(struct iovec *src_iov, int src_count,
>      }
>  }
>  
> +static int virtio_send_notify_msg(struct fuse_session *se, struct iovec *iov,
> +				  int count)
> +{
> +    struct fv_QueueInfo *qi;
> +    VuDev *dev = &se->virtio_dev->dev;
> +    VuVirtq *q;
> +    FVRequest *req;
> +    VuVirtqElement *elem;
> +    unsigned int in_num, bad_in_num = 0, bad_out_num = 0;
> +    struct fuse_out_header *out = iov[0].iov_base;
> +    size_t in_len, tosend_len = iov_size(iov, count);
> +    struct iovec *in_sg;
> +    int ret = 0;
> +
> +    /* Notifications have unique == 0 */
> +    assert (!out->unique);
> +
> +    if (!se->notify_enabled)
> +        return -EOPNOTSUPP;
> +
> +    /* If notifications are enabled, queue index 1 is notification queue */
> +    qi = se->virtio_dev->qi[1];
> +    q = vu_get_queue(dev, qi->qidx);
> +
> +    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
> +    pthread_mutex_lock(&qi->vq_lock);
> +    /* Pop an element from queue */
> +    req = vu_queue_pop(dev, q, sizeof(FVRequest), &bad_in_num, &bad_out_num);

You don't need bad_in_num/bad_out_num - just pass NULL for both; they're
only needed if you expect to read/write data that's not mappable (i.e.
in our direct write case).

> +    if (!req) {
> +        /* TODO: Implement some sort of ring buffer and queue notifications
> +	 * on that and send these later when notification queue has space
> +	 * available.
> +	 */
> +        return -ENOSPC;
> +    }
> +    pthread_mutex_unlock(&qi->vq_lock);
> +    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
> +
> +    out->len = tosend_len;
> +    elem = &req->elem;
> +    in_num = elem->in_num;
> +    in_sg = elem->in_sg;
> +    in_len = iov_size(in_sg, in_num);
> +    fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
> +             __func__, elem->index, in_num,  in_len);
> +
> +    if (in_len < sizeof(struct fuse_out_header)) {
> +        fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
> +                 __func__, elem->index);
> +        ret = -E2BIG;
> +        goto out;
> +    }
> +
> +    if (in_len < tosend_len) {
> +        fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len"
> +                 " %zd\n", __func__, elem->index, tosend_len);
> +        ret = -E2BIG;
> +        goto out;
> +    }
> +
> +    /* First copy the header data from iov->in_sg */
> +    copy_iov(iov, count, in_sg, in_num, tosend_len);
> +
> +    /* TODO: Add bad_innum handling */
> +    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
> +    pthread_mutex_lock(&qi->vq_lock);
> +    vu_queue_push(dev, q, elem, tosend_len);
> +    vu_queue_notify(dev, q);
> +    pthread_mutex_unlock(&qi->vq_lock);
> +    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
> +out:
> +    free(req);
> +    return ret;
> +}
> +
>  /*
>   * Called back by ll whenever it wants to send a reply/message back
>   * The 1st element of the iov starts with the fuse_out_header
> @@ -223,11 +298,11 @@ static void copy_iov(struct iovec *src_iov, int src_count,
>  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>                      struct iovec *iov, int count)
>  {
> -    FVRequest *req = container_of(ch, FVRequest, ch);
> -    struct fv_QueueInfo *qi = ch->qi;
> +    FVRequest *req;
> +    struct fv_QueueInfo *qi;
>      VuDev *dev = &se->virtio_dev->dev;
> -    VuVirtq *q = vu_get_queue(dev, qi->qidx);
> -    VuVirtqElement *elem = &req->elem;
> +    VuVirtq *q;
> +    VuVirtqElement *elem;
>      int ret = 0;
>  
>      assert(count >= 1);
> @@ -238,8 +313,15 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>  
>      size_t tosend_len = iov_size(iov, count);
>  
> -    /* unique == 0 is notification, which we don't support */
> -    assert(out->unique);
> +    /* unique == 0 is notification */
> +    if (!out->unique)
> +        return virtio_send_notify_msg(se, iov, count);
> +
> +    assert(ch);
> +    req = container_of(ch, FVRequest, ch);
> +    elem = &req->elem;
> +    qi = ch->qi;
> +    q = vu_get_queue(dev, qi->qidx);
>      assert(!req->reply_sent);
>  
>      /* The 'in' part of the elem is to qemu */
> diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
> index 028e7da273..ed52953565 100644
> --- a/contrib/virtiofsd/passthrough_ll.c
> +++ b/contrib/virtiofsd/passthrough_ll.c
> @@ -1925,7 +1925,10 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
>  	struct lo_data *lo = lo_data(req);
>  	struct lo_inode *inode;
>  	struct lo_inode_plock *plock;
> -	int ret, saverr = 0;
> +	int ret, saverr = 0, ofd;
> +	uint64_t unique;
> +	struct fuse_session *se = req->se;
> +	bool async_lock = false;
>  
>  	fuse_log(FUSE_LOG_DEBUG, "lo_setlk(ino=%" PRIu64 ", flags=%d)"
>  		 " cmd=%d pid=%d owner=0x%lx sleep=%d l_whence=%d"
> @@ -1933,11 +1936,6 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
>  		 lock->l_type, lock->l_pid, fi->lock_owner, sleep,
>  		 lock->l_whence, lock->l_start, lock->l_len);
>  
> -	if (sleep) {
> -		fuse_reply_err(req, EOPNOTSUPP);
> -		return;
> -	}
> -
>  	inode = lo_inode(req, ino);
>  	if (!inode) {
>  		fuse_reply_err(req, EBADF);
> @@ -1950,21 +1948,54 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
>  
>  	if (!plock) {
>  		saverr = ret;
> +		pthread_mutex_unlock(&inode->plock_mutex);
>  		goto out;
>  	}
>  
> +	/*
> +	 * plock is now released when inode is going away. We already have
> +	 * a reference on inode, so it is guaranteed that plock->fd is
> +	 * still around even after dropping inode->plock_mutex lock
> +	 */
> +	ofd = plock->fd;
> +	pthread_mutex_unlock(&inode->plock_mutex);
> +
> +	/*
> +	 * If this lock request can block, request caller to wait for
> +	 * notification. Do not access req after this. Once lock is
> +	 * available, send a notification instead.
> +	 */
> +	if (sleep && lock->l_type != F_UNLCK) {
> +		/*
> +		 * If notification queue is not enabled, can't support async
> +		 * locks.
> +		 */
> +		if (!se->notify_enabled) {
> +			saverr = EOPNOTSUPP;
> +			goto out;
> +		}
> +		async_lock = true;
> +		unique = req->unique;
> +		fuse_reply_wait(req);
> +	}
>  	/* TODO: Is it alright to modify flock? */
>  	lock->l_pid = 0;
> -	ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> +	if (async_lock)
> +		ret = fcntl(ofd, F_OFD_SETLKW, lock);
> +	else
> +		ret = fcntl(ofd, F_OFD_SETLK, lock);

What happens if the guest is rebooted after it's asked
for, but not been granted a lock?

Dave

>  	if (ret == -1) {
>  		saverr = errno;
>  	}
>  
>  out:
> -	pthread_mutex_unlock(&inode->plock_mutex);
>  	lo_inode_put(lo, &inode);
>  
> -	fuse_reply_err(req, saverr);
> +	if (!async_lock)
> +		fuse_reply_err(req, saverr);
> +	else {
> +		fuse_lowlevel_notify_lock(se, unique, saverr);
> +	}
>  }
>  
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/4] virtiofsd: Specify size of notification buffer using config space
  2019-11-22 10:33     ` [Virtio-fs] " Stefan Hajnoczi
@ 2019-11-25 14:57       ` Vivek Goyal
  -1 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-25 14:57 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, miklos, qemu-devel, dgilbert

On Fri, Nov 22, 2019 at 10:33:00AM +0000, Stefan Hajnoczi wrote:
> On Fri, Nov 15, 2019 at 03:55:42PM -0500, Vivek Goyal wrote:
> > diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
> > index 411114c9b3..982b6ad0bd 100644
> > --- a/contrib/virtiofsd/fuse_virtio.c
> > +++ b/contrib/virtiofsd/fuse_virtio.c
> > @@ -109,7 +109,8 @@ static uint64_t fv_get_features(VuDev *dev)
> >      uint64_t features;
> >  
> >      features = 1ull << VIRTIO_F_VERSION_1 |
> > -               1ull << VIRTIO_FS_F_NOTIFICATION;
> > +               1ull << VIRTIO_FS_F_NOTIFICATION |
> > +               1ull << VHOST_USER_F_PROTOCOL_FEATURES;
> 
> This is not needed since VHOST_USER_F_PROTOCOL_FEATURES is already added
> by vu_get_features_exec():

Will do.

> 
>   vu_get_features_exec(VuDev *dev, VhostUserMsg *vmsg)
>   {
>       vmsg->payload.u64 =
>           1ULL << VHOST_F_LOG_ALL |
>           1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
> 
>       if (dev->iface->get_features) {
>           vmsg->payload.u64 |= dev->iface->get_features(dev);
>       }
> 
> >  
> >      return features;
> >  }
> > @@ -927,6 +928,27 @@ static bool fv_queue_order(VuDev *dev, int qidx)
> >      return false;
> >  }
> >  
> > +static uint64_t fv_get_protocol_features(VuDev *dev)
> > +{
> > +	return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> > +}
> 
> Please change vu_get_protocol_features_exec() in a separate patch so
> that devices don't need this boilerplate .get_protocol_features() code:
> 
>   static bool
>   vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
>   {
>       ...
>  -    if (dev->iface->get_config && dev->iface->set_config) {
>  +    if (dev->iface->get_config || dev->iface->set_config) {
>           features |= 1ULL << VHOST_USER_PROTOCOL_F_CONFIG;

This seems more like a nice to have thing. Can we leave it for sometime
later.

>       }
> 
> > +
> > +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> > +{
> > +	struct virtio_fs_config fscfg = {};
> > +
> > +	fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
> > +                 sizeof(struct fuse_notify_lock_out));
> > +	/*
> > +	 * As of now only notification related to lock is supported. As more
> > +	 * notification types are supported, bump up the size accordingly.
> > +	 */
> > +	fscfg.notify_buf_size = sizeof(struct fuse_notify_lock_out);
> 
> Missing cpu_to_le32().

Not sure. Deivce converts to le32 when guests asks for it. So there should
not be any need to do this conversion between vhost-user daemon and
device. I am assuming that both daemon and qemu are using same endianess
and if that's the case, first converting it to le32 and undoing this
operation on other end (if we are running on an architecture with big
endian), seems unnecessary and confusing.

static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
{
    ...
    ...
    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
}

> 
> I'm not sure about specifying the size precisely down to the last byte
> because any change to guest-visible aspects of the device (like VIRTIO
> Configuration Space) are not compatible across live migration.  It will
> be necessary to introduce a device version command-line option for live
> migration compatibility so that existing guests can be migrated to a new
> virtiofsd without the device changing underneath them.

I am not sure I understand this point. If we were to support live
migration, will we not have to reset the queue and regoniate with
device again on destination host.
> 
> How about rounding this up to 4 KB?

Not sure how will that help. Right now it feels just wasteful of memory.

> 
> >  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >      struct virtio_fs_config fscfg = {};
> > +    int ret;
> > +
> > +    /*
> > +     * As of now we only get notification buffer size from device. And that's
> > +     * needed only if notification queue is enabled.
> > +     */
> > +    if (fs->notify_enabled) {
> > +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> > +                                   sizeof(struct virtio_fs_config));
> > +	if (ret < 0) {
> 
> Indentation.

Will fix.

> 
> > +            error_report("vhost-user-fs: get device config space failed."
> > +                         " ret=%d\n", ret);
> > +            return;
> > +        }
> 
> Missing le32_to_cpu() for notify_buf_size.

See above.

[..]
> > @@ -545,6 +569,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >      fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> >  
> >      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > +
> > +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
> 
> Is this really needed since vhost_user_fs_handle_config_change() ignores
> it?

Initially I did not introduce it but code did not work. Looked little
closer and noticed following code in vhost_user_backend_init().

        if (!dev->config_ops || !dev->config_ops->vhost_dev_config_notifier) {
            /* Don't acknowledge CONFIG feature if device doesn't support it */
            dev->protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_CONFIG);

So if dev->config_ops->vhost_dev_config_notifier is not provided, 
feature VHOST_USER_PROTOCOL_F_CONFIG will be reset.

Its kind of odd that its a hard requirement. Anyway, that's the reason
I added it so that VHOST_USER_PROTOCOL_F_CONFIG continues to work.

Thanks
Vivek



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 3/4] virtiofsd: Specify size of notification buffer using config space
@ 2019-11-25 14:57       ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-25 14:57 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, miklos, qemu-devel

On Fri, Nov 22, 2019 at 10:33:00AM +0000, Stefan Hajnoczi wrote:
> On Fri, Nov 15, 2019 at 03:55:42PM -0500, Vivek Goyal wrote:
> > diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
> > index 411114c9b3..982b6ad0bd 100644
> > --- a/contrib/virtiofsd/fuse_virtio.c
> > +++ b/contrib/virtiofsd/fuse_virtio.c
> > @@ -109,7 +109,8 @@ static uint64_t fv_get_features(VuDev *dev)
> >      uint64_t features;
> >  
> >      features = 1ull << VIRTIO_F_VERSION_1 |
> > -               1ull << VIRTIO_FS_F_NOTIFICATION;
> > +               1ull << VIRTIO_FS_F_NOTIFICATION |
> > +               1ull << VHOST_USER_F_PROTOCOL_FEATURES;
> 
> This is not needed since VHOST_USER_F_PROTOCOL_FEATURES is already added
> by vu_get_features_exec():

Will do.

> 
>   vu_get_features_exec(VuDev *dev, VhostUserMsg *vmsg)
>   {
>       vmsg->payload.u64 =
>           1ULL << VHOST_F_LOG_ALL |
>           1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
> 
>       if (dev->iface->get_features) {
>           vmsg->payload.u64 |= dev->iface->get_features(dev);
>       }
> 
> >  
> >      return features;
> >  }
> > @@ -927,6 +928,27 @@ static bool fv_queue_order(VuDev *dev, int qidx)
> >      return false;
> >  }
> >  
> > +static uint64_t fv_get_protocol_features(VuDev *dev)
> > +{
> > +	return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
> > +}
> 
> Please change vu_get_protocol_features_exec() in a separate patch so
> that devices don't need this boilerplate .get_protocol_features() code:
> 
>   static bool
>   vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
>   {
>       ...
>  -    if (dev->iface->get_config && dev->iface->set_config) {
>  +    if (dev->iface->get_config || dev->iface->set_config) {
>           features |= 1ULL << VHOST_USER_PROTOCOL_F_CONFIG;

This seems more like a nice to have thing. Can we leave it for sometime
later.

>       }
> 
> > +
> > +static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
> > +{
> > +	struct virtio_fs_config fscfg = {};
> > +
> > +	fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
> > +                 sizeof(struct fuse_notify_lock_out));
> > +	/*
> > +	 * As of now only notification related to lock is supported. As more
> > +	 * notification types are supported, bump up the size accordingly.
> > +	 */
> > +	fscfg.notify_buf_size = sizeof(struct fuse_notify_lock_out);
> 
> Missing cpu_to_le32().

Not sure. Deivce converts to le32 when guests asks for it. So there should
not be any need to do this conversion between vhost-user daemon and
device. I am assuming that both daemon and qemu are using same endianess
and if that's the case, first converting it to le32 and undoing this
operation on other end (if we are running on an architecture with big
endian), seems unnecessary and confusing.

static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
{
    ...
    ...
    virtio_stl_p(vdev, &fscfg.notify_buf_size, fs->fscfg.notify_buf_size);
}

> 
> I'm not sure about specifying the size precisely down to the last byte
> because any change to guest-visible aspects of the device (like VIRTIO
> Configuration Space) are not compatible across live migration.  It will
> be necessary to introduce a device version command-line option for live
> migration compatibility so that existing guests can be migrated to a new
> virtiofsd without the device changing underneath them.

I am not sure I understand this point. If we were to support live
migration, will we not have to reset the queue and regoniate with
device again on destination host.
> 
> How about rounding this up to 4 KB?

Not sure how will that help. Right now it feels just wasteful of memory.

> 
> >  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
> >  {
> >      VHostUserFS *fs = VHOST_USER_FS(vdev);
> >      struct virtio_fs_config fscfg = {};
> > +    int ret;
> > +
> > +    /*
> > +     * As of now we only get notification buffer size from device. And that's
> > +     * needed only if notification queue is enabled.
> > +     */
> > +    if (fs->notify_enabled) {
> > +        ret = vhost_dev_get_config(&fs->vhost_dev, (uint8_t *)&fs->fscfg,
> > +                                   sizeof(struct virtio_fs_config));
> > +	if (ret < 0) {
> 
> Indentation.

Will fix.

> 
> > +            error_report("vhost-user-fs: get device config space failed."
> > +                         " ret=%d\n", ret);
> > +            return;
> > +        }
> 
> Missing le32_to_cpu() for notify_buf_size.

See above.

[..]
> > @@ -545,6 +569,8 @@ static void vuf_device_realize(DeviceState *dev, Error **errp)
> >      fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
> >  
> >      fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
> > +
> > +    vhost_dev_set_config_notifier(&fs->vhost_dev, &fs_ops);
> 
> Is this really needed since vhost_user_fs_handle_config_change() ignores
> it?

Initially I did not introduce it but code did not work. Looked little
closer and noticed following code in vhost_user_backend_init().

        if (!dev->config_ops || !dev->config_ops->vhost_dev_config_notifier) {
            /* Don't acknowledge CONFIG feature if device doesn't support it */
            dev->protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_CONFIG);

So if dev->config_ops->vhost_dev_config_notifier is not provided, 
feature VHOST_USER_PROTOCOL_F_CONFIG will be reset.

Its kind of odd that its a hard requirement. Anyway, that's the reason
I added it so that VHOST_USER_PROTOCOL_F_CONFIG continues to work.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 4/4] virtiofsd: Implement blocking posix locks
  2019-11-22 10:53     ` [Virtio-fs] " Stefan Hajnoczi
  (?)
@ 2019-11-25 15:38     ` Vivek Goyal
  -1 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-11-25 15:38 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, qemu-devel, miklos

On Fri, Nov 22, 2019 at 10:53:24AM +0000, Stefan Hajnoczi wrote:
> On Fri, Nov 15, 2019 at 03:55:43PM -0500, Vivek Goyal wrote:
> > diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
> > index d4a42d9804..f706e440bf 100644
> > --- a/contrib/virtiofsd/fuse_lowlevel.c
> > +++ b/contrib/virtiofsd/fuse_lowlevel.c
> > @@ -183,7 +183,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
> >  {
> >  	struct fuse_out_header out;
> >  
> > -	if (error <= -1000 || error > 0) {
> > +	/* error = 1 has been used to signal client to wait for notificaiton */
> > +	if (error <= -1000 || error > 1) {
> >  		fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",	error);
> >  		error = -ERANGE;
> >  	}
> 
> What is this?

When a waiting lock request comes in, we need a way to reply back saying
wait for the notification. So I used value "1" for the
fuse_out_header->error field for this purpose. As of now, 0 is returned
for success and negative values for error code. So positive values seem
to be unused.

> 
> > +int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t req_id,
> > +			      int32_t error)
> > +{
> > +	struct fuse_notify_lock_out outarg;
> 
> Missing = {} initialization to avoid information leaks to the guest.

Will do.

> 
> > @@ -1704,6 +1720,15 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
> >  int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
> >  			       off_t offset, struct fuse_bufvec *bufv,
> >  			       enum fuse_buf_copy_flags flags);
> > +/**
> > + * Notify event related to previous lock request
> > + *
> > + * @param se the session object
> > + * @param req_id the id of the request which requested setlkw
> 
> The rest of the code calls this id "unique":

Will change it.

> 
>   + * @param req_unique the unique id of the setlkw request
> 
> > +    /* Pop an element from queue */
> > +    req = vu_queue_pop(dev, q, sizeof(FVRequest), &bad_in_num, &bad_out_num);
> > +    if (!req) {
> > +        /* TODO: Implement some sort of ring buffer and queue notifications
> > +	 * on that and send these later when notification queue has space
> > +	 * available.
> > +	 */
> > +        return -ENOSPC;
> 
> Ah, I thought the point of the notifications processing thread was
> exactly this case.  It could wake any threads waiting for buffers.
> 
> This wakeup could be implemented with a condvar - no ring buffer
> necessary.

I was thinking that thread sending notification should not block. It can
just queue the notification reuqest and some other thread (including
notification thread could send it later). Number of pre-allocated buffers
could be of fixed and we will drop notifications if guest is not
responding. This will also take care of concerns w.r.t rogue guest
blocking filesystem code in daemon.

Anyway, this is a TODO item and not implemented yet. 

Thanks
Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 4/4] virtiofsd: Implement blocking posix locks
  2019-11-22 17:47     ` [Virtio-fs] " Dr. David Alan Gilbert
  (?)
@ 2019-11-25 15:44     ` Vivek Goyal
  2019-11-26 13:02       ` Dr. David Alan Gilbert
  -1 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2019-11-25 15:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-fs, qemu-devel, miklos

On Fri, Nov 22, 2019 at 05:47:32PM +0000, Dr. David Alan Gilbert wrote:

[..]
> > +static int virtio_send_notify_msg(struct fuse_session *se, struct iovec *iov,
> > +				  int count)
> > +{
> > +    struct fv_QueueInfo *qi;
> > +    VuDev *dev = &se->virtio_dev->dev;
> > +    VuVirtq *q;
> > +    FVRequest *req;
> > +    VuVirtqElement *elem;
> > +    unsigned int in_num, bad_in_num = 0, bad_out_num = 0;
> > +    struct fuse_out_header *out = iov[0].iov_base;
> > +    size_t in_len, tosend_len = iov_size(iov, count);
> > +    struct iovec *in_sg;
> > +    int ret = 0;
> > +
> > +    /* Notifications have unique == 0 */
> > +    assert (!out->unique);
> > +
> > +    if (!se->notify_enabled)
> > +        return -EOPNOTSUPP;
> > +
> > +    /* If notifications are enabled, queue index 1 is notification queue */
> > +    qi = se->virtio_dev->qi[1];
> > +    q = vu_get_queue(dev, qi->qidx);
> > +
> > +    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
> > +    pthread_mutex_lock(&qi->vq_lock);
> > +    /* Pop an element from queue */
> > +    req = vu_queue_pop(dev, q, sizeof(FVRequest), &bad_in_num, &bad_out_num);
> 
> You don't need bad_in_num/bad_out_num - just pass NULL for both; they're
> only needed if you expect to read/write data that's not mappable (i.e.
> in our direct write case).

Will do.

[..]
> > @@ -1950,21 +1948,54 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
> >  
> >  	if (!plock) {
> >  		saverr = ret;
> > +		pthread_mutex_unlock(&inode->plock_mutex);
> >  		goto out;
> >  	}
> >  
> > +	/*
> > +	 * plock is now released when inode is going away. We already have
> > +	 * a reference on inode, so it is guaranteed that plock->fd is
> > +	 * still around even after dropping inode->plock_mutex lock
> > +	 */
> > +	ofd = plock->fd;
> > +	pthread_mutex_unlock(&inode->plock_mutex);
> > +
> > +	/*
> > +	 * If this lock request can block, request caller to wait for
> > +	 * notification. Do not access req after this. Once lock is
> > +	 * available, send a notification instead.
> > +	 */
> > +	if (sleep && lock->l_type != F_UNLCK) {
> > +		/*
> > +		 * If notification queue is not enabled, can't support async
> > +		 * locks.
> > +		 */
> > +		if (!se->notify_enabled) {
> > +			saverr = EOPNOTSUPP;
> > +			goto out;
> > +		}
> > +		async_lock = true;
> > +		unique = req->unique;
> > +		fuse_reply_wait(req);
> > +	}
> >  	/* TODO: Is it alright to modify flock? */
> >  	lock->l_pid = 0;
> > -	ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > +	if (async_lock)
> > +		ret = fcntl(ofd, F_OFD_SETLKW, lock);
> > +	else
> > +		ret = fcntl(ofd, F_OFD_SETLK, lock);
> 
> What happens if the guest is rebooted after it's asked
> for, but not been granted a lock?

I think a regular reboot can't be done till a request is pending, because
virtio-fs can't be unmounted and unmount will wait for all pending
requests to finish.

Destroying qemu will destroy deamon too.

Are there any other reboot paths I have missed.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 4/4] virtiofsd: Implement blocking posix locks
  2019-11-25 15:44     ` Vivek Goyal
@ 2019-11-26 13:02       ` Dr. David Alan Gilbert
  2019-11-27 19:08         ` Vivek Goyal
  0 siblings, 1 reply; 33+ messages in thread
From: Dr. David Alan Gilbert @ 2019-11-26 13:02 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Fri, Nov 22, 2019 at 05:47:32PM +0000, Dr. David Alan Gilbert wrote:
> 
> [..]
> > > +static int virtio_send_notify_msg(struct fuse_session *se, struct iovec *iov,
> > > +				  int count)
> > > +{
> > > +    struct fv_QueueInfo *qi;
> > > +    VuDev *dev = &se->virtio_dev->dev;
> > > +    VuVirtq *q;
> > > +    FVRequest *req;
> > > +    VuVirtqElement *elem;
> > > +    unsigned int in_num, bad_in_num = 0, bad_out_num = 0;
> > > +    struct fuse_out_header *out = iov[0].iov_base;
> > > +    size_t in_len, tosend_len = iov_size(iov, count);
> > > +    struct iovec *in_sg;
> > > +    int ret = 0;
> > > +
> > > +    /* Notifications have unique == 0 */
> > > +    assert (!out->unique);
> > > +
> > > +    if (!se->notify_enabled)
> > > +        return -EOPNOTSUPP;
> > > +
> > > +    /* If notifications are enabled, queue index 1 is notification queue */
> > > +    qi = se->virtio_dev->qi[1];
> > > +    q = vu_get_queue(dev, qi->qidx);
> > > +
> > > +    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
> > > +    pthread_mutex_lock(&qi->vq_lock);
> > > +    /* Pop an element from queue */
> > > +    req = vu_queue_pop(dev, q, sizeof(FVRequest), &bad_in_num, &bad_out_num);
> > 
> > You don't need bad_in_num/bad_out_num - just pass NULL for both; they're
> > only needed if you expect to read/write data that's not mappable (i.e.
> > in our direct write case).
> 
> Will do.
> 
> [..]
> > > @@ -1950,21 +1948,54 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
> > >  
> > >  	if (!plock) {
> > >  		saverr = ret;
> > > +		pthread_mutex_unlock(&inode->plock_mutex);
> > >  		goto out;
> > >  	}
> > >  
> > > +	/*
> > > +	 * plock is now released when inode is going away. We already have
> > > +	 * a reference on inode, so it is guaranteed that plock->fd is
> > > +	 * still around even after dropping inode->plock_mutex lock
> > > +	 */
> > > +	ofd = plock->fd;
> > > +	pthread_mutex_unlock(&inode->plock_mutex);
> > > +
> > > +	/*
> > > +	 * If this lock request can block, request caller to wait for
> > > +	 * notification. Do not access req after this. Once lock is
> > > +	 * available, send a notification instead.
> > > +	 */
> > > +	if (sleep && lock->l_type != F_UNLCK) {
> > > +		/*
> > > +		 * If notification queue is not enabled, can't support async
> > > +		 * locks.
> > > +		 */
> > > +		if (!se->notify_enabled) {
> > > +			saverr = EOPNOTSUPP;
> > > +			goto out;
> > > +		}
> > > +		async_lock = true;
> > > +		unique = req->unique;
> > > +		fuse_reply_wait(req);
> > > +	}
> > >  	/* TODO: Is it alright to modify flock? */
> > >  	lock->l_pid = 0;
> > > -	ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > > +	if (async_lock)
> > > +		ret = fcntl(ofd, F_OFD_SETLKW, lock);
> > > +	else
> > > +		ret = fcntl(ofd, F_OFD_SETLK, lock);
> > 
> > What happens if the guest is rebooted after it's asked
> > for, but not been granted a lock?
> 
> I think a regular reboot can't be done till a request is pending, because
> virtio-fs can't be unmounted and unmount will wait for all pending
> requests to finish.
> 
> Destroying qemu will destroy deamon too.
> 
> Are there any other reboot paths I have missed.

Yes, there are a few other ways the guest can reboot:
  a) A echo b > /proc/sysrq-trigger
  b) Telling qemu to do a reset

probably a few more as well; but they all end up with the daemon
still running over the same connection.   See
'virtiofsd: Handle hard reboot' where I handle that case where
a FUSE_INIT turns up unexpectedly.

Dave


> Thanks
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 4/4] virtiofsd: Implement blocking posix locks
  2019-11-26 13:02       ` Dr. David Alan Gilbert
@ 2019-11-27 19:08         ` Vivek Goyal
  2019-12-09 11:06           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2019-11-27 19:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-fs, qemu-devel, miklos

On Tue, Nov 26, 2019 at 01:02:29PM +0000, Dr. David Alan Gilbert wrote:

[..]
> > > > @@ -1950,21 +1948,54 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
> > > >  
> > > >  	if (!plock) {
> > > >  		saverr = ret;
> > > > +		pthread_mutex_unlock(&inode->plock_mutex);
> > > >  		goto out;
> > > >  	}
> > > >  
> > > > +	/*
> > > > +	 * plock is now released when inode is going away. We already have
> > > > +	 * a reference on inode, so it is guaranteed that plock->fd is
> > > > +	 * still around even after dropping inode->plock_mutex lock
> > > > +	 */
> > > > +	ofd = plock->fd;
> > > > +	pthread_mutex_unlock(&inode->plock_mutex);
> > > > +
> > > > +	/*
> > > > +	 * If this lock request can block, request caller to wait for
> > > > +	 * notification. Do not access req after this. Once lock is
> > > > +	 * available, send a notification instead.
> > > > +	 */
> > > > +	if (sleep && lock->l_type != F_UNLCK) {
> > > > +		/*
> > > > +		 * If notification queue is not enabled, can't support async
> > > > +		 * locks.
> > > > +		 */
> > > > +		if (!se->notify_enabled) {
> > > > +			saverr = EOPNOTSUPP;
> > > > +			goto out;
> > > > +		}
> > > > +		async_lock = true;
> > > > +		unique = req->unique;
> > > > +		fuse_reply_wait(req);
> > > > +	}
> > > >  	/* TODO: Is it alright to modify flock? */
> > > >  	lock->l_pid = 0;
> > > > -	ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > > > +	if (async_lock)
> > > > +		ret = fcntl(ofd, F_OFD_SETLKW, lock);
> > > > +	else
> > > > +		ret = fcntl(ofd, F_OFD_SETLK, lock);
> > > 
> > > What happens if the guest is rebooted after it's asked
> > > for, but not been granted a lock?
> > 
> > I think a regular reboot can't be done till a request is pending, because
> > virtio-fs can't be unmounted and unmount will wait for all pending
> > requests to finish.
> > 
> > Destroying qemu will destroy deamon too.
> > 
> > Are there any other reboot paths I have missed.
> 
> Yes, there are a few other ways the guest can reboot:
>   a) A echo b > /proc/sysrq-trigger

I tried it. Both qemu and virtiofsd hang. virtiofsd wants to stop a 
queue. And that tries to stop thrad pool. But one of the threads in
thread pool is blocked on setlkw. So g_thread_pool_free() hangs.

I am not seeing any option in glib thread pool API to stop or send
signal to threads which are blocked.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Virtio-fs] [PATCH 4/4] virtiofsd: Implement blocking posix locks
  2019-11-27 19:08         ` Vivek Goyal
@ 2019-12-09 11:06           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 33+ messages in thread
From: Dr. David Alan Gilbert @ 2019-12-09 11:06 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, miklos

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Tue, Nov 26, 2019 at 01:02:29PM +0000, Dr. David Alan Gilbert wrote:
> 
> [..]
> > > > > @@ -1950,21 +1948,54 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
> > > > >  
> > > > >  	if (!plock) {
> > > > >  		saverr = ret;
> > > > > +		pthread_mutex_unlock(&inode->plock_mutex);
> > > > >  		goto out;
> > > > >  	}
> > > > >  
> > > > > +	/*
> > > > > +	 * plock is now released when inode is going away. We already have
> > > > > +	 * a reference on inode, so it is guaranteed that plock->fd is
> > > > > +	 * still around even after dropping inode->plock_mutex lock
> > > > > +	 */
> > > > > +	ofd = plock->fd;
> > > > > +	pthread_mutex_unlock(&inode->plock_mutex);
> > > > > +
> > > > > +	/*
> > > > > +	 * If this lock request can block, request caller to wait for
> > > > > +	 * notification. Do not access req after this. Once lock is
> > > > > +	 * available, send a notification instead.
> > > > > +	 */
> > > > > +	if (sleep && lock->l_type != F_UNLCK) {
> > > > > +		/*
> > > > > +		 * If notification queue is not enabled, can't support async
> > > > > +		 * locks.
> > > > > +		 */
> > > > > +		if (!se->notify_enabled) {
> > > > > +			saverr = EOPNOTSUPP;
> > > > > +			goto out;
> > > > > +		}
> > > > > +		async_lock = true;
> > > > > +		unique = req->unique;
> > > > > +		fuse_reply_wait(req);
> > > > > +	}
> > > > >  	/* TODO: Is it alright to modify flock? */
> > > > >  	lock->l_pid = 0;
> > > > > -	ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > > > > +	if (async_lock)
> > > > > +		ret = fcntl(ofd, F_OFD_SETLKW, lock);
> > > > > +	else
> > > > > +		ret = fcntl(ofd, F_OFD_SETLK, lock);
> > > > 
> > > > What happens if the guest is rebooted after it's asked
> > > > for, but not been granted a lock?
> > > 
> > > I think a regular reboot can't be done till a request is pending, because
> > > virtio-fs can't be unmounted and unmount will wait for all pending
> > > requests to finish.
> > > 
> > > Destroying qemu will destroy deamon too.
> > > 
> > > Are there any other reboot paths I have missed.
> > 
> > Yes, there are a few other ways the guest can reboot:
> >   a) A echo b > /proc/sysrq-trigger
> 
> I tried it. Both qemu and virtiofsd hang. virtiofsd wants to stop a 
> queue. And that tries to stop thrad pool. But one of the threads in
> thread pool is blocked on setlkw. So g_thread_pool_free() hangs.
> 
> I am not seeing any option in glib thread pool API to stop or send
> signal to threads which are blocked.

Is there a way to setup pthread_cancel ?  The upstream libfuse code
has somec ases where it enables cancellation very carefully around
something that might block, does it, then disables cancellation.

Dave

> Thanks
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-12-09 11:06 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-15 20:55 [PATCH 0/4] [RFC] virtiofsd, vhost-user-fs: Add support for notification queue Vivek Goyal
2019-11-15 20:55 ` [Virtio-fs] " Vivek Goyal
2019-11-15 20:55 ` [PATCH 1/4] virtiofsd: Release file locks using F_UNLCK Vivek Goyal
2019-11-15 20:55   ` [Virtio-fs] " Vivek Goyal
2019-11-22 10:07   ` Stefan Hajnoczi
2019-11-22 10:07     ` [Virtio-fs] " Stefan Hajnoczi
2019-11-22 13:45     ` Vivek Goyal
2019-11-22 13:45       ` [Virtio-fs] " Vivek Goyal
2019-11-15 20:55 ` [PATCH 2/4] virtiofd: Create a notification queue Vivek Goyal
2019-11-15 20:55   ` [Virtio-fs] " Vivek Goyal
2019-11-22 10:19   ` Stefan Hajnoczi
2019-11-22 10:19     ` [Virtio-fs] " Stefan Hajnoczi
2019-11-22 14:47     ` Vivek Goyal
2019-11-22 14:47       ` [Virtio-fs] " Vivek Goyal
2019-11-22 17:29       ` Dr. David Alan Gilbert
2019-11-22 17:29         ` [Virtio-fs] " Dr. David Alan Gilbert
2019-11-15 20:55 ` [PATCH 3/4] virtiofsd: Specify size of notification buffer using config space Vivek Goyal
2019-11-15 20:55   ` [Virtio-fs] " Vivek Goyal
2019-11-22 10:33   ` Stefan Hajnoczi
2019-11-22 10:33     ` [Virtio-fs] " Stefan Hajnoczi
2019-11-25 14:57     ` Vivek Goyal
2019-11-25 14:57       ` [Virtio-fs] " Vivek Goyal
2019-11-15 20:55 ` [PATCH 4/4] virtiofsd: Implement blocking posix locks Vivek Goyal
2019-11-15 20:55   ` [Virtio-fs] " Vivek Goyal
2019-11-22 10:53   ` Stefan Hajnoczi
2019-11-22 10:53     ` [Virtio-fs] " Stefan Hajnoczi
2019-11-25 15:38     ` Vivek Goyal
2019-11-22 17:47   ` Dr. David Alan Gilbert
2019-11-22 17:47     ` [Virtio-fs] " Dr. David Alan Gilbert
2019-11-25 15:44     ` Vivek Goyal
2019-11-26 13:02       ` Dr. David Alan Gilbert
2019-11-27 19:08         ` Vivek Goyal
2019-12-09 11:06           ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.