linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
@ 2019-08-21 17:37 Vivek Goyal
  2019-08-21 17:37 ` [PATCH 01/13] fuse: delete dentry if timeout is zero Vivek Goyal
                   ` (14 more replies)
  0 siblings, 15 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos; +Cc: virtio-fs, vgoyal, stefanha, dgilbert

Hi,

Here are the V3 patches for virtio-fs filesystem. This time I have
broken the patch series in two parts. This is first part which does
not contain DAX support. Second patch series will contain the patches
for DAX support.

I have also dropped RFC tag from first patch series as we believe its
in good enough shape that it should get a consideration for inclusion
upstream.

These patches apply on top of 5.3-rc5 kernel and are also available
here.

https://github.com/rhvgoyal/linux/commits/vivek-5.3-aug-21-2019

Patches for V1 and V2 were posted here.

https://lwn.net/ml/linux-fsdevel/20181210171318.16998-1-vgoyal@redhat.com/
http://lkml.iu.edu/hypermail/linux/kernel/1905.1/07232.html

More information about the project can be found here.

https://virtio-fs.gitlab.io

Changes from V2
===============
- Various bug fixes and performance improvements.

HOWTO
======
We have put instructions on how to use it here.

https://virtio-fs.gitlab.io/

Some Performance Numbers
========================
I have basically run bunch of fio jobs to get a sense of speed of
various operations. I wrote a simple wrapper script to run fio jobs
3 times and take their average and report it. These scripts are available
here.

https://github.com/rhvgoyal/virtiofs-tests

I set up a directory on ramfs on host and exported that directory inside
guest using virtio-9p and virtio-fs and ran tests inside guests. Ran
tests with cache=none both for virtio-9p and virtio-fs so that no caching
happens in guest. For virtio-fs, I ran an additional set of tests with
dax enabled. Dax is not part of first patch series but I included
results here because dax seems to get the maximum performance advantage
and its shows the real potential of virtio-fs.

Test Setup
-----------
- A fedora 28 host with 32G RAM, 2 sockets (6 cores per socket, 2
  threads per core)

- Using ramfs on host as backing store. 4 fio files of 2G each.

- Created a VM with 16 VCPUS and 8GB memory. An 8GB cache window (for dax
  mmap).

Test Results
------------
- Results in three configurations have been reported. 9p (cache=none),
  virtio-fs (cache=none) and virtio-fs (cache=none + dax).

  There are other caching modes as well but to me cache=none seemed most
  interesting for now because it does not cache anything in guest
  and provides strong coherence. Other modes which provide less strong
  coherence and hence are faster are yet to be benchmarked.

- Three fio ioengines psync, libaio and mmap have been used.

- I/O Workload of randread, radwrite, seqread and seqwrite have been run.

- Each file size is 2G. Block size 4K. iodepth=16 

- "multi" means same operation was done with 4 jobs and each job is
  operating on a file of size 2G. 

- Some results are "0 (KiB/s)". That means that particular operation is
  not supported in that configuration.

NAME                    I/O Operation           BW(Read/Write)

9p-cache-none           seqread-psync           27(MiB/s)
virtiofs-cache-none     seqread-psync           35(MiB/s)
virtiofs-dax-cache-none seqread-psync           245(MiB/s)

9p-cache-none           seqread-psync-multi     117(MiB/s)
virtiofs-cache-none     seqread-psync-multi     162(MiB/s)
virtiofs-dax-cache-none seqread-psync-multi     894(MiB/s)

9p-cache-none           seqread-mmap            24(MiB/s)
virtiofs-cache-none     seqread-mmap            0(KiB/s)
virtiofs-dax-cache-none seqread-mmap            168(MiB/s)

9p-cache-none           seqread-mmap-multi      115(MiB/s)
virtiofs-cache-none     seqread-mmap-multi      0(KiB/s)
virtiofs-dax-cache-none seqread-mmap-multi      614(MiB/s)

9p-cache-none           seqread-libaio          26(MiB/s)
virtiofs-cache-none     seqread-libaio          139(MiB/s)
virtiofs-dax-cache-none seqread-libaio          160(MiB/s)

9p-cache-none           seqread-libaio-multi    129(MiB/s)
virtiofs-cache-none     seqread-libaio-multi    142(MiB/s)
virtiofs-dax-cache-none seqread-libaio-multi    577(MiB/s)

9p-cache-none           randread-psync          29(MiB/s)
virtiofs-cache-none     randread-psync          34(MiB/s)
virtiofs-dax-cache-none randread-psync          256(MiB/s)

9p-cache-none           randread-psync-multi    139(MiB/s)
virtiofs-cache-none     randread-psync-multi    153(MiB/s)
virtiofs-dax-cache-none randread-psync-multi    245(MiB/s)

9p-cache-none           randread-mmap           22(MiB/s)
virtiofs-cache-none     randread-mmap           0(KiB/s)
virtiofs-dax-cache-none randread-mmap           162(MiB/s)

9p-cache-none           randread-mmap-multi     111(MiB/s)
virtiofs-cache-none     randread-mmap-multi     0(KiB/s)
virtiofs-dax-cache-none randread-mmap-multi     215(MiB/s)

9p-cache-none           randread-libaio         26(MiB/s)
virtiofs-cache-none     randread-libaio         135(MiB/s)
virtiofs-dax-cache-none randread-libaio         157(MiB/s)

9p-cache-none           randread-libaio-multi   133(MiB/s)
virtiofs-cache-none     randread-libaio-multi   245(MiB/s)
virtiofs-dax-cache-none randread-libaio-multi   163(MiB/s)

9p-cache-none           seqwrite-psync          28(MiB/s)
virtiofs-cache-none     seqwrite-psync          34(MiB/s)
virtiofs-dax-cache-none seqwrite-psync          203(MiB/s)

9p-cache-none           seqwrite-psync-multi    128(MiB/s)
virtiofs-cache-none     seqwrite-psync-multi    155(MiB/s)
virtiofs-dax-cache-none seqwrite-psync-multi    717(MiB/s)

9p-cache-none           seqwrite-mmap           0(KiB/s)
virtiofs-cache-none     seqwrite-mmap           0(KiB/s)
virtiofs-dax-cache-none seqwrite-mmap           165(MiB/s)

9p-cache-none           seqwrite-mmap-multi     0(KiB/s)
virtiofs-cache-none     seqwrite-mmap-multi     0(KiB/s)
virtiofs-dax-cache-none seqwrite-mmap-multi     511(MiB/s)

9p-cache-none           seqwrite-libaio         27(MiB/s)
virtiofs-cache-none     seqwrite-libaio         128(MiB/s)
virtiofs-dax-cache-none seqwrite-libaio         141(MiB/s)

9p-cache-none           seqwrite-libaio-multi   119(MiB/s)
virtiofs-cache-none     seqwrite-libaio-multi   242(MiB/s)
virtiofs-dax-cache-none seqwrite-libaio-multi   505(MiB/s)

9p-cache-none           randwrite-psync         27(MiB/s)
virtiofs-cache-none     randwrite-psync         34(MiB/s)
virtiofs-dax-cache-none randwrite-psync         189(MiB/s)

9p-cache-none           randwrite-psync-multi   137(MiB/s)
virtiofs-cache-none     randwrite-psync-multi   150(MiB/s)
virtiofs-dax-cache-none randwrite-psync-multi   233(MiB/s)

9p-cache-none           randwrite-mmap          0(KiB/s)
virtiofs-cache-none     randwrite-mmap          0(KiB/s)
virtiofs-dax-cache-none randwrite-mmap          120(MiB/s)

9p-cache-none           randwrite-mmap-multi    0(KiB/s)
virtiofs-cache-none     randwrite-mmap-multi    0(KiB/s)
virtiofs-dax-cache-none randwrite-mmap-multi    200(MiB/s)

9p-cache-none           randwrite-libaio        25(MiB/s)
virtiofs-cache-none     randwrite-libaio        124(MiB/s)
virtiofs-dax-cache-none randwrite-libaio        131(MiB/s)

9p-cache-none           randwrite-libaio-multi  125(MiB/s)
virtiofs-cache-none     randwrite-libaio-multi  241(MiB/s)
virtiofs-dax-cache-none randwrite-libaio-multi  163(MiB/s)

Conclusions
===========
- In general virtio-fs seems faster than virtio-9p. Using dax makes it
  really interesting.

Note:
  Right now dax window is 8G and max fio file size is 8G as well (4
  files of 2G each). That means everything fits into dax window and no
  reclaim is needed. Dax window reclaim logic is slower and if file
  size is bigger than dax window size, performance slows down.

Description from previous postings
==================================

Design Overview
===============
With the goal of designing something with better performance and local file
system semantics, a bunch of ideas were proposed.

- Use fuse protocol (instead of 9p) for communication between guest
  and host. Guest kernel will be fuse client and a fuse server will
  run on host to serve the requests.

- For data access inside guest, mmap portion of file in QEMU address
  space and guest accesses this memory using dax. That way guest page
  cache is bypassed and there is only one copy of data (on host). This
  will also enable mmap(MAP_SHARED) between guests.

- For metadata coherency, there is a shared memory region which contains
  version number associated with metadata and any guest changing metadata
  updates version number and other guests refresh metadata on next
  access. This is yet to be implemented.

How virtio-fs differs from existing approaches
==============================================
The unique idea behind virtio-fs is to take advantage of the co-location
of the virtual machine and hypervisor to avoid communication (vmexits).

DAX allows file contents to be accessed without communication with the
hypervisor. The shared memory region for metadata avoids communication in
the common case where metadata is unchanged.

By replacing expensive communication with cheaper shared memory accesses,
we expect to achieve better performance than approaches based on network
file system protocols. In addition, this also makes it easier to achieve
local file system semantics (coherency).

These techniques are not applicable to network file system protocols since
the communications channel is bypassed by taking advantage of shared memory
on a local machine. This is why we decided to build virtio-fs rather than
focus on 9P or NFS.

Caching Modes
=============
Like virtio-9p, different caching modes are supported which determine the
coherency level as well. The “cache=FOO” and “writeback” options control the
level of coherence between the guest and host filesystems.

- cache=none
  metadata, data and pathname lookup are not cached in guest. They are always
  fetched from host and any changes are immediately pushed to host.

- cache=always
  metadata, data and pathname lookup are cached in guest and never expire.

- cache=auto
  metadata and pathname lookup cache expires after a configured amount of time
  (default is 1 second). Data is cached while the file is open (close to open
  consistency).

- writeback/no_writeback
  These options control the writeback strategy.  If writeback is disabled,
  then normal writes will immediately be synchronized with the host fs. If
  writeback is enabled, then writes may be cached in the guest until the file
  is closed or an fsync(2) performed. This option has no effect on mmap-ed
  writes or writes going through the DAX mechanism.

Thanks
Vivek

Miklos Szeredi (2):
  fuse: delete dentry if timeout is zero
  fuse: Use default_file_splice_read for direct IO

Stefan Hajnoczi (6):
  fuse: export fuse_end_request()
  fuse: export fuse_len_args()
  fuse: export fuse_get_unique()
  fuse: extract fuse_fill_super_common()
  fuse: add fuse_iqueue_ops callbacks
  virtio_fs: add skeleton virtio_fs.ko module

Vivek Goyal (5):
  fuse: Export fuse_send_init_request()
  Export fuse_dequeue_forget() function
  fuse: Separate fuse device allocation and installation in fuse_conn
  virtio-fs: Do not provide abort interface in fusectl
  init/do_mounts.c: add virtio_fs root fs support

 fs/fuse/Kconfig                 |   11 +
 fs/fuse/Makefile                |    1 +
 fs/fuse/control.c               |    4 +-
 fs/fuse/cuse.c                  |    4 +-
 fs/fuse/dev.c                   |   89 ++-
 fs/fuse/dir.c                   |   26 +-
 fs/fuse/file.c                  |   15 +-
 fs/fuse/fuse_i.h                |  120 +++-
 fs/fuse/inode.c                 |  203 +++---
 fs/fuse/virtio_fs.c             | 1061 +++++++++++++++++++++++++++++++
 fs/splice.c                     |    3 +-
 include/linux/fs.h              |    2 +
 include/uapi/linux/virtio_fs.h  |   41 ++
 include/uapi/linux/virtio_ids.h |    1 +
 init/do_mounts.c                |   10 +
 15 files changed, 1462 insertions(+), 129 deletions(-)
 create mode 100644 fs/fuse/virtio_fs.c
 create mode 100644 include/uapi/linux/virtio_fs.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/13] fuse: delete dentry if timeout is zero
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 02/13] fuse: Use default_file_splice_read for direct IO Vivek Goyal
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos
  Cc: virtio-fs, vgoyal, stefanha, dgilbert, Miklos Szeredi

From: Miklos Szeredi <mszeredi@redhat.com>

Don't hold onto dentry in lru list if need to re-lookup it anyway at next
access.

More advanced version of this patch would periodically flush out dentries
from the lru which have gone stale.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/fuse/dir.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index dd0f64f7bc06..fd8636e67ae9 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -29,12 +29,26 @@ union fuse_dentry {
 	struct rcu_head rcu;
 };
 
-static inline void fuse_dentry_settime(struct dentry *entry, u64 time)
+static void fuse_dentry_settime(struct dentry *dentry, u64 time)
 {
-	((union fuse_dentry *) entry->d_fsdata)->time = time;
+	/*
+	 * Mess with DCACHE_OP_DELETE because dput() will be faster without it.
+	 *  Don't care about races, either way it's just an optimization
+	 */
+	if ((time && (dentry->d_flags & DCACHE_OP_DELETE)) ||
+	    (!time && !(dentry->d_flags & DCACHE_OP_DELETE))) {
+		spin_lock(&dentry->d_lock);
+		if (time)
+			dentry->d_flags &= ~DCACHE_OP_DELETE;
+		else
+			dentry->d_flags |= DCACHE_OP_DELETE;
+		spin_unlock(&dentry->d_lock);
+	}
+
+	((union fuse_dentry *) dentry->d_fsdata)->time = time;
 }
 
-static inline u64 fuse_dentry_time(struct dentry *entry)
+static inline u64 fuse_dentry_time(const struct dentry *entry)
 {
 	return ((union fuse_dentry *) entry->d_fsdata)->time;
 }
@@ -255,8 +269,14 @@ static void fuse_dentry_release(struct dentry *dentry)
 	kfree_rcu(fd, rcu);
 }
 
+static int fuse_dentry_delete(const struct dentry *dentry)
+{
+	return time_before64(fuse_dentry_time(dentry), get_jiffies_64());
+}
+
 const struct dentry_operations fuse_dentry_operations = {
 	.d_revalidate	= fuse_dentry_revalidate,
+	.d_delete	= fuse_dentry_delete,
 	.d_init		= fuse_dentry_init,
 	.d_release	= fuse_dentry_release,
 };
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 02/13] fuse: Use default_file_splice_read for direct IO
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
  2019-08-21 17:37 ` [PATCH 01/13] fuse: delete dentry if timeout is zero Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-28  7:45   ` Miklos Szeredi
  2019-08-21 17:37 ` [PATCH 03/13] fuse: export fuse_end_request() Vivek Goyal
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos
  Cc: virtio-fs, vgoyal, stefanha, dgilbert, Miklos Szeredi

From: Miklos Szeredi <mszeredi@redhat.com>

---
 fs/fuse/file.c     | 15 ++++++++++++++-
 fs/splice.c        |  3 ++-
 include/linux/fs.h |  2 ++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 5ae2828beb00..c45ffe6f1ecb 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2172,6 +2172,19 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
+static ssize_t fuse_file_splice_read(struct file *in, loff_t *ppos,
+				     struct pipe_inode_info *pipe, size_t len,
+				     unsigned int flags)
+{
+	struct fuse_file *ff = in->private_data;
+
+	if (ff->open_flags & FOPEN_DIRECT_IO)
+		return default_file_splice_read(in, ppos, pipe, len, flags);
+	else
+		return generic_file_splice_read(in, ppos, pipe, len, flags);
+
+}
+
 static int convert_fuse_file_lock(struct fuse_conn *fc,
 				  const struct fuse_file_lock *ffl,
 				  struct file_lock *fl)
@@ -3228,7 +3241,7 @@ static const struct file_operations fuse_file_operations = {
 	.fsync		= fuse_fsync,
 	.lock		= fuse_file_lock,
 	.flock		= fuse_file_flock,
-	.splice_read	= generic_file_splice_read,
+	.splice_read	= fuse_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.unlocked_ioctl	= fuse_file_ioctl,
 	.compat_ioctl	= fuse_file_compat_ioctl,
diff --git a/fs/splice.c b/fs/splice.c
index 98412721f056..652f541d953d 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -362,7 +362,7 @@ static ssize_t kernel_readv(struct file *file, const struct kvec *vec,
 	return res;
 }
 
-static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
+ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 				 struct pipe_inode_info *pipe, size_t len,
 				 unsigned int flags)
 {
@@ -426,6 +426,7 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 	iov_iter_advance(&to, copied);	/* truncates and discards */
 	return res;
 }
+EXPORT_SYMBOL(default_file_splice_read);
 
 /*
  * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos'
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 997a530ff4e9..15ae8f5dd24e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3062,6 +3062,8 @@ extern void block_sync_page(struct page *page);
 /* fs/splice.c */
 extern ssize_t generic_file_splice_read(struct file *, loff_t *,
 		struct pipe_inode_info *, size_t, unsigned int);
+extern ssize_t default_file_splice_read(struct file *, loff_t *,
+		struct pipe_inode_info *, size_t, unsigned int);
 extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
 		struct file *, loff_t *, size_t, unsigned int);
 extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 03/13] fuse: export fuse_end_request()
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
  2019-08-21 17:37 ` [PATCH 01/13] fuse: delete dentry if timeout is zero Vivek Goyal
  2019-08-21 17:37 ` [PATCH 02/13] fuse: Use default_file_splice_read for direct IO Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 04/13] fuse: export fuse_len_args() Vivek Goyal
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos; +Cc: virtio-fs, vgoyal, stefanha, dgilbert

From: Stefan Hajnoczi <stefanha@redhat.com>

virtio-fs will need to complete requests from outside fs/fuse/dev.c.
Make the symbol visible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 fs/fuse/dev.c    | 19 ++++++++++---------
 fs/fuse/fuse_i.h |  5 +++++
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ea8237513dfa..34dd1436cec2 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -427,7 +427,7 @@ static void flush_bg_queue(struct fuse_conn *fc)
  * the 'end' callback is called if given, else the reference to the
  * request is released
  */
-static void request_end(struct fuse_conn *fc, struct fuse_req *req)
+void fuse_request_end(struct fuse_conn *fc, struct fuse_req *req)
 {
 	struct fuse_iqueue *fiq = &fc->iq;
 
@@ -480,6 +480,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
 put_request:
 	fuse_put_request(fc, req);
 }
+EXPORT_SYMBOL_GPL(fuse_request_end);
 
 static int queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
 {
@@ -567,12 +568,12 @@ static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req)
 		req->in.h.unique = fuse_get_unique(fiq);
 		queue_request(fiq, req);
 		/* acquire extra reference, since request is still needed
-		   after request_end() */
+		   after fuse_request_end() */
 		__fuse_get_request(req);
 		spin_unlock(&fiq->waitq.lock);
 
 		request_wait_answer(fc, req);
-		/* Pairs with smp_wmb() in request_end() */
+		/* Pairs with smp_wmb() in fuse_request_end() */
 		smp_rmb();
 	}
 }
@@ -1302,7 +1303,7 @@ __releases(fiq->waitq.lock)
  * the pending list and copies request data to userspace buffer.  If
  * no reply is needed (FORGET) or request has been aborted or there
  * was an error during the copying then it's finished by calling
- * request_end().  Otherwise add it to the processing list, and set
+ * fuse_request_end().  Otherwise add it to the processing list, and set
  * the 'sent' flag.
  */
 static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
@@ -1362,7 +1363,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 		/* SETXATTR is special, since it may contain too large data */
 		if (in->h.opcode == FUSE_SETXATTR)
 			req->out.h.error = -E2BIG;
-		request_end(fc, req);
+		fuse_request_end(fc, req);
 		goto restart;
 	}
 	spin_lock(&fpq->lock);
@@ -1405,7 +1406,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	if (!test_bit(FR_PRIVATE, &req->flags))
 		list_del_init(&req->list);
 	spin_unlock(&fpq->lock);
-	request_end(fc, req);
+	fuse_request_end(fc, req);
 	return err;
 
  err_unlock:
@@ -1913,7 +1914,7 @@ static int copy_out_args(struct fuse_copy_state *cs, struct fuse_out *out,
  * the write buffer.  The request is then searched on the processing
  * list by the unique ID found in the header.  If found, then remove
  * it from the list and copy the rest of the buffer to the request.
- * The request is finished by calling request_end()
+ * The request is finished by calling fuse_request_end().
  */
 static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 				 struct fuse_copy_state *cs, size_t nbytes)
@@ -2000,7 +2001,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 		list_del_init(&req->list);
 	spin_unlock(&fpq->lock);
 
-	request_end(fc, req);
+	fuse_request_end(fc, req);
 out:
 	return err ? err : nbytes;
 
@@ -2140,7 +2141,7 @@ static void end_requests(struct fuse_conn *fc, struct list_head *head)
 		req->out.h.error = -ECONNABORTED;
 		clear_bit(FR_SENT, &req->flags);
 		list_del_init(&req->list);
-		request_end(fc, req);
+		fuse_request_end(fc, req);
 	}
 }
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 24dbca777775..67521103d3b2 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -956,6 +956,11 @@ ssize_t fuse_simple_request(struct fuse_conn *fc, struct fuse_args *args);
 void fuse_request_send_background(struct fuse_conn *fc, struct fuse_req *req);
 bool fuse_request_queue_background(struct fuse_conn *fc, struct fuse_req *req);
 
+/**
+ * End a finished request
+ */
+void fuse_request_end(struct fuse_conn *fc, struct fuse_req *req);
+
 /* Abort all requests */
 void fuse_abort_conn(struct fuse_conn *fc);
 void fuse_wait_aborted(struct fuse_conn *fc);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 04/13] fuse: export fuse_len_args()
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (2 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 03/13] fuse: export fuse_end_request() Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 05/13] fuse: Export fuse_send_init_request() Vivek Goyal
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos; +Cc: virtio-fs, vgoyal, stefanha, dgilbert

From: Stefan Hajnoczi <stefanha@redhat.com>

virtio-fs will need to query the length of fuse_arg lists.  Make the
symbol visible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 fs/fuse/dev.c    | 7 ++++---
 fs/fuse/fuse_i.h | 5 +++++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 34dd1436cec2..d90dba54f0d4 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -350,7 +350,7 @@ void fuse_put_request(struct fuse_conn *fc, struct fuse_req *req)
 }
 EXPORT_SYMBOL_GPL(fuse_put_request);
 
-static unsigned len_args(unsigned numargs, struct fuse_arg *args)
+unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args)
 {
 	unsigned nbytes = 0;
 	unsigned i;
@@ -360,6 +360,7 @@ static unsigned len_args(unsigned numargs, struct fuse_arg *args)
 
 	return nbytes;
 }
+EXPORT_SYMBOL_GPL(fuse_len_args);
 
 static u64 fuse_get_unique(struct fuse_iqueue *fiq)
 {
@@ -375,7 +376,7 @@ static unsigned int fuse_req_hash(u64 unique)
 static void queue_request(struct fuse_iqueue *fiq, struct fuse_req *req)
 {
 	req->in.h.len = sizeof(struct fuse_in_header) +
-		len_args(req->in.numargs, (struct fuse_arg *) req->in.args);
+		fuse_len_args(req->in.numargs, (struct fuse_arg *) req->in.args);
 	list_add_tail(&req->list, &fiq->pending);
 	wake_up_locked(&fiq->waitq);
 	kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
@@ -1894,7 +1895,7 @@ static int copy_out_args(struct fuse_copy_state *cs, struct fuse_out *out,
 	if (out->h.error)
 		return nbytes != reqsize ? -EINVAL : 0;
 
-	reqsize += len_args(out->numargs, out->args);
+	reqsize += fuse_len_args(out->numargs, out->args);
 
 	if (reqsize < nbytes || (reqsize > nbytes && !out->argvar))
 		return -EINVAL;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 67521103d3b2..f41ce8f39006 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1098,4 +1098,9 @@ int fuse_set_acl(struct inode *inode, struct posix_acl *acl, int type);
 /* readdir.c */
 int fuse_readdir(struct file *file, struct dir_context *ctx);
 
+/**
+ * Return the number of bytes in an arguments list
+ */
+unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args);
+
 #endif /* _FS_FUSE_I_H */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 05/13] fuse: Export fuse_send_init_request()
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (3 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 04/13] fuse: export fuse_len_args() Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 06/13] fuse: export fuse_get_unique() Vivek Goyal
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos; +Cc: virtio-fs, vgoyal, stefanha, dgilbert

This will be used by virtio-fs to send init request to fuse server after
initialization of virt queues.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/dev.c    | 1 +
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 3 ++-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index d90dba54f0d4..14c8ea3d189c 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -139,6 +139,7 @@ void fuse_request_free(struct fuse_req *req)
 	fuse_req_pages_free(req);
 	kmem_cache_free(fuse_req_cachep, req);
 }
+EXPORT_SYMBOL_GPL(fuse_request_free);
 
 void __fuse_get_request(struct fuse_req *req)
 {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f41ce8f39006..49f83f00c79c 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -994,6 +994,7 @@ void fuse_conn_put(struct fuse_conn *fc);
 
 struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc);
 void fuse_dev_free(struct fuse_dev *fud);
+void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req);
 
 /**
  * Add connection to control filesystem
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 4bb885b0f032..14a4e915294c 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -955,7 +955,7 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req)
 	wake_up_all(&fc->blocked_waitq);
 }
 
-static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req)
+void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req)
 {
 	struct fuse_init_in *arg = &req->misc.init_in;
 
@@ -985,6 +985,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req)
 	req->end = process_init_reply;
 	fuse_request_send_background(fc, req);
 }
+EXPORT_SYMBOL_GPL(fuse_send_init);
 
 static void fuse_free_conn(struct fuse_conn *fc)
 {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 06/13] fuse: export fuse_get_unique()
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (4 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 05/13] fuse: Export fuse_send_init_request() Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 07/13] Export fuse_dequeue_forget() function Vivek Goyal
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos; +Cc: virtio-fs, vgoyal, stefanha, dgilbert

From: Stefan Hajnoczi <stefanha@redhat.com>

virtio-fs will need unique IDs for FORGET requests from outside
fs/fuse/dev.c.  Make the symbol visible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 fs/fuse/dev.c    | 3 ++-
 fs/fuse/fuse_i.h | 5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 14c8ea3d189c..6215bc7d4255 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -363,11 +363,12 @@ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args)
 }
 EXPORT_SYMBOL_GPL(fuse_len_args);
 
-static u64 fuse_get_unique(struct fuse_iqueue *fiq)
+u64 fuse_get_unique(struct fuse_iqueue *fiq)
 {
 	fiq->reqctr += FUSE_REQ_ID_STEP;
 	return fiq->reqctr;
 }
+EXPORT_SYMBOL_GPL(fuse_get_unique);
 
 static unsigned int fuse_req_hash(u64 unique)
 {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 49f83f00c79c..7da756e2859e 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1104,4 +1104,9 @@ int fuse_readdir(struct file *file, struct dir_context *ctx);
  */
 unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args);
 
+/**
+ * Get the next unique ID for a request
+ */
+u64 fuse_get_unique(struct fuse_iqueue *fiq);
+
 #endif /* _FS_FUSE_I_H */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 07/13] Export fuse_dequeue_forget() function
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (5 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 06/13] fuse: export fuse_get_unique() Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 08/13] fuse: extract fuse_fill_super_common() Vivek Goyal
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos; +Cc: virtio-fs, vgoyal, stefanha, dgilbert

File systems like virtio-fs need to do not have to play directly with
forget list data structures. There is a helper function use that instead.

Rename dequeue_forget() to fuse_dequeue_forget() and export it so that
stacked filesystems can use it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/dev.c    | 9 +++++----
 fs/fuse/fuse_i.h | 3 +++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6215bc7d4255..2bbfc1f77875 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1185,7 +1185,7 @@ __releases(fiq->waitq.lock)
 	return err ? err : reqsize;
 }
 
-static struct fuse_forget_link *dequeue_forget(struct fuse_iqueue *fiq,
+struct fuse_forget_link *fuse_dequeue_forget(struct fuse_iqueue *fiq,
 					       unsigned max,
 					       unsigned *countp)
 {
@@ -1206,6 +1206,7 @@ static struct fuse_forget_link *dequeue_forget(struct fuse_iqueue *fiq,
 
 	return head;
 }
+EXPORT_SYMBOL(fuse_dequeue_forget);
 
 static int fuse_read_single_forget(struct fuse_iqueue *fiq,
 				   struct fuse_copy_state *cs,
@@ -1213,7 +1214,7 @@ static int fuse_read_single_forget(struct fuse_iqueue *fiq,
 __releases(fiq->waitq.lock)
 {
 	int err;
-	struct fuse_forget_link *forget = dequeue_forget(fiq, 1, NULL);
+	struct fuse_forget_link *forget = fuse_dequeue_forget(fiq, 1, NULL);
 	struct fuse_forget_in arg = {
 		.nlookup = forget->forget_one.nlookup,
 	};
@@ -1261,7 +1262,7 @@ __releases(fiq->waitq.lock)
 	}
 
 	max_forgets = (nbytes - ih.len) / sizeof(struct fuse_forget_one);
-	head = dequeue_forget(fiq, max_forgets, &count);
+	head = fuse_dequeue_forget(fiq, max_forgets, &count);
 	spin_unlock(&fiq->waitq.lock);
 
 	arg.count = count;
@@ -2231,7 +2232,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
 			clear_bit(FR_PENDING, &req->flags);
 		list_splice_tail_init(&fiq->pending, &to_end);
 		while (forget_pending(fiq))
-			kfree(dequeue_forget(fiq, 1, NULL));
+			kfree(fuse_dequeue_forget(fiq, 1, NULL));
 		wake_up_all_locked(&fiq->waitq);
 		spin_unlock(&fiq->waitq.lock);
 		kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7da756e2859e..d4b27f048d81 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -820,6 +820,9 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget,
 
 struct fuse_forget_link *fuse_alloc_forget(void);
 
+struct fuse_forget_link *fuse_dequeue_forget(struct fuse_iqueue *fiq,
+					     unsigned max, unsigned *countp);
+
 /* Used by READDIRPLUS */
 void fuse_force_forget(struct file *file, u64 nodeid);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 08/13] fuse: extract fuse_fill_super_common()
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (6 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 07/13] Export fuse_dequeue_forget() function Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 09/13] fuse: add fuse_iqueue_ops callbacks Vivek Goyal
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos
  Cc: virtio-fs, vgoyal, stefanha, dgilbert, Miklos Szeredi

From: Stefan Hajnoczi <stefanha@redhat.com>

fuse_fill_super() includes code to process the fd= option and link the
struct fuse_dev to the fd's struct file.  In virtio-fs there is no file
descriptor because /dev/fuse is not used.

This patch extracts fuse_fill_super_common() so that both classic fuse
and virtio-fs can share the code to initialize a mount.

parse_fuse_opt() is also extracted so that the fuse_fill_super_common()
caller has access to the mount options.  This allows classic fuse to
handle the fd= option outside fuse_fill_super_common().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/fuse/fuse_i.h |  33 ++++++++++++
 fs/fuse/inode.c  | 137 ++++++++++++++++++++++++-----------------------
 2 files changed, 103 insertions(+), 67 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index d4b27f048d81..062f929348cc 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -60,6 +60,25 @@ extern struct mutex fuse_mutex;
 extern unsigned max_user_bgreq;
 extern unsigned max_user_congthresh;
 
+/** Mount options */
+struct fuse_mount_data {
+	int fd;
+	unsigned rootmode;
+	kuid_t user_id;
+	kgid_t group_id;
+	unsigned fd_present:1;
+	unsigned rootmode_present:1;
+	unsigned user_id_present:1;
+	unsigned group_id_present:1;
+	unsigned default_permissions:1;
+	unsigned allow_other:1;
+	unsigned max_read;
+	unsigned blksize;
+
+	/* fuse_dev pointer to fill in, should contain NULL on entry */
+	void **fudptr;
+};
+
 /* One forget request */
 struct fuse_forget_link {
 	struct fuse_forget_one forget_one;
@@ -999,6 +1018,20 @@ struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc);
 void fuse_dev_free(struct fuse_dev *fud);
 void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req);
 
+/**
+ * Parse a mount options string
+ */
+int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
+				struct user_namespace *user_ns);
+
+/**
+ * Fill in superblock and initialize fuse connection
+ * @sb: partially-initialized superblock to fill in
+ * @mount_data: mount parameters
+ */
+int fuse_fill_super_common(struct super_block *sb,
+			   struct fuse_mount_data *mount_data);
+
 /**
  * Add connection to control filesystem
  */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 14a4e915294c..9f21a0cb58b9 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -59,21 +59,6 @@ MODULE_PARM_DESC(max_user_congthresh,
 /** Congestion starts at 75% of maximum */
 #define FUSE_DEFAULT_CONGESTION_THRESHOLD (FUSE_DEFAULT_MAX_BACKGROUND * 3 / 4)
 
-struct fuse_mount_data {
-	int fd;
-	unsigned rootmode;
-	kuid_t user_id;
-	kgid_t group_id;
-	unsigned fd_present:1;
-	unsigned rootmode_present:1;
-	unsigned user_id_present:1;
-	unsigned group_id_present:1;
-	unsigned default_permissions:1;
-	unsigned allow_other:1;
-	unsigned max_read;
-	unsigned blksize;
-};
-
 struct fuse_forget_link *fuse_alloc_forget(void)
 {
 	return kzalloc(sizeof(struct fuse_forget_link), GFP_KERNEL);
@@ -477,7 +462,7 @@ static int fuse_match_uint(substring_t *s, unsigned int *res)
 	return err;
 }
 
-static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
+int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
 			  struct user_namespace *user_ns)
 {
 	char *p;
@@ -554,12 +539,13 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
 		}
 	}
 
-	if (!d->fd_present || !d->rootmode_present ||
-	    !d->user_id_present || !d->group_id_present)
+	if (!d->rootmode_present || !d->user_id_present ||
+	    !d->group_id_present)
 		return 0;
 
 	return 1;
 }
+EXPORT_SYMBOL_GPL(parse_fuse_opt);
 
 static int fuse_show_options(struct seq_file *m, struct dentry *root)
 {
@@ -1076,15 +1062,13 @@ void fuse_dev_free(struct fuse_dev *fud)
 }
 EXPORT_SYMBOL_GPL(fuse_dev_free);
 
-static int fuse_fill_super(struct super_block *sb, void *data, int silent)
+int fuse_fill_super_common(struct super_block *sb,
+			   struct fuse_mount_data *mount_data)
 {
 	struct fuse_dev *fud;
 	struct fuse_conn *fc;
 	struct inode *root;
-	struct fuse_mount_data d;
-	struct file *file;
 	struct dentry *root_dentry;
-	struct fuse_req *init_req;
 	int err;
 	int is_bdev = sb->s_bdev != NULL;
 
@@ -1094,13 +1078,10 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 
 	sb->s_flags &= ~(SB_NOSEC | SB_I_VERSION);
 
-	if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns))
-		goto err;
-
 	if (is_bdev) {
 #ifdef CONFIG_BLOCK
 		err = -EINVAL;
-		if (!sb_set_blocksize(sb, d.blksize))
+		if (!sb_set_blocksize(sb, mount_data->blksize))
 			goto err;
 #endif
 	} else {
@@ -1117,19 +1098,6 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	if (sb->s_user_ns != &init_user_ns)
 		sb->s_iflags |= SB_I_UNTRUSTED_MOUNTER;
 
-	file = fget(d.fd);
-	err = -EINVAL;
-	if (!file)
-		goto err;
-
-	/*
-	 * Require mount to happen from the same user namespace which
-	 * opened /dev/fuse to prevent potential attacks.
-	 */
-	if (file->f_op != &fuse_dev_operations ||
-	    file->f_cred->user_ns != sb->s_user_ns)
-		goto err_fput;
-
 	/*
 	 * If we are not in the initial user namespace posix
 	 * acls must be translated.
@@ -1140,7 +1108,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	fc = kmalloc(sizeof(*fc), GFP_KERNEL);
 	err = -ENOMEM;
 	if (!fc)
-		goto err_fput;
+		goto err;
 
 	fuse_conn_init(fc, sb->s_user_ns);
 	fc->release = fuse_free_conn;
@@ -1160,17 +1128,17 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 		fc->dont_mask = 1;
 	sb->s_flags |= SB_POSIXACL;
 
-	fc->default_permissions = d.default_permissions;
-	fc->allow_other = d.allow_other;
-	fc->user_id = d.user_id;
-	fc->group_id = d.group_id;
-	fc->max_read = max_t(unsigned, 4096, d.max_read);
+	fc->default_permissions = mount_data->default_permissions;
+	fc->allow_other = mount_data->allow_other;
+	fc->user_id = mount_data->user_id;
+	fc->group_id = mount_data->group_id;
+	fc->max_read = max_t(unsigned, 4096, mount_data->max_read);
 
 	/* Used by get_root_inode() */
 	sb->s_fs_info = fc;
 
 	err = -ENOMEM;
-	root = fuse_get_root_inode(sb, d.rootmode);
+	root = fuse_get_root_inode(sb, mount_data->rootmode);
 	sb->s_d_op = &fuse_root_dentry_operations;
 	root_dentry = d_make_root(root);
 	if (!root_dentry)
@@ -1178,20 +1146,15 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	/* Root dentry doesn't have .d_revalidate */
 	sb->s_d_op = &fuse_dentry_operations;
 
-	init_req = fuse_request_alloc(0);
-	if (!init_req)
-		goto err_put_root;
-	__set_bit(FR_BACKGROUND, &init_req->flags);
-
 	if (is_bdev) {
 		fc->destroy_req = fuse_request_alloc(0);
 		if (!fc->destroy_req)
-			goto err_free_init_req;
+			goto err_put_root;
 	}
 
 	mutex_lock(&fuse_mutex);
 	err = -EINVAL;
-	if (file->private_data)
+	if (*mount_data->fudptr)
 		goto err_unlock;
 
 	err = fuse_ctl_add_conn(fc);
@@ -1200,23 +1163,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 
 	list_add_tail(&fc->entry, &fuse_conn_list);
 	sb->s_root = root_dentry;
-	file->private_data = fud;
+	*mount_data->fudptr = fud;
 	mutex_unlock(&fuse_mutex);
-	/*
-	 * atomic_dec_and_test() in fput() provides the necessary
-	 * memory barrier for file->private_data to be visible on all
-	 * CPUs after this
-	 */
-	fput(file);
-
-	fuse_send_init(fc, init_req);
-
 	return 0;
 
  err_unlock:
 	mutex_unlock(&fuse_mutex);
- err_free_init_req:
-	fuse_request_free(init_req);
  err_put_root:
 	dput(root_dentry);
  err_dev_free:
@@ -1224,11 +1176,62 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
  err_put_conn:
 	fuse_conn_put(fc);
 	sb->s_fs_info = NULL;
- err_fput:
-	fput(file);
  err:
 	return err;
 }
+EXPORT_SYMBOL_GPL(fuse_fill_super_common);
+
+static int fuse_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct fuse_mount_data d;
+	struct file *file;
+	int is_bdev = sb->s_bdev != NULL;
+	int err;
+	struct fuse_req *init_req;
+
+	err = -EINVAL;
+	if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns))
+		goto err;
+	if (!d.fd_present)
+		goto err;
+
+	file = fget(d.fd);
+	if (!file)
+		goto err;
+
+	/*
+	 * Require mount to happen from the same user namespace which
+	 * opened /dev/fuse to prevent potential attacks.
+	 */
+	if ((file->f_op != &fuse_dev_operations) ||
+	    (file->f_cred->user_ns != sb->s_user_ns))
+		goto err_fput;
+
+	init_req = fuse_request_alloc(0);
+	if (!init_req)
+		goto err_fput;
+	__set_bit(FR_BACKGROUND, &init_req->flags);
+
+	d.fudptr = &file->private_data;
+	err = fuse_fill_super_common(sb, &d);
+	if (err < 0)
+		goto err_free_init_req;
+	/*
+	 * atomic_dec_and_test() in fput() provides the necessary
+	 * memory barrier for file->private_data to be visible on all
+	 * CPUs after this
+	 */
+	fput(file);
+	fuse_send_init(get_fuse_conn_super(sb), init_req);
+	return 0;
+
+err_free_init_req:
+	fuse_request_free(init_req);
+err_fput:
+	fput(file);
+err:
+	return err;
+}
 
 static struct dentry *fuse_mount(struct file_system_type *fs_type,
 		       int flags, const char *dev_name,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 09/13] fuse: add fuse_iqueue_ops callbacks
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (7 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 08/13] fuse: extract fuse_fill_super_common() Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 10/13] fuse: Separate fuse device allocation and installation in fuse_conn Vivek Goyal
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos
  Cc: virtio-fs, vgoyal, stefanha, dgilbert, Miklos Szeredi

From: Stefan Hajnoczi <stefanha@redhat.com>

The /dev/fuse device uses fiq->waitq and fasync to signal that requests
are available.  These mechanisms do not apply to virtio-fs.  This patch
introduces callbacks so alternative behavior can be used.

Note that queue_interrupt() changes along these lines:

  spin_lock(&fiq->waitq.lock);
  wake_up_locked(&fiq->waitq);
+ kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
  spin_unlock(&fiq->waitq.lock);
- kill_fasync(&fiq->fasync, SIGIO, POLL_IN);

Since queue_request() and queue_forget() also call kill_fasync() inside
the spinlock this should be safe.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/fuse/cuse.c   |  2 +-
 fs/fuse/dev.c    | 50 ++++++++++++++++++++++++++++++++----------------
 fs/fuse/fuse_i.h | 48 +++++++++++++++++++++++++++++++++++++++++++++-
 fs/fuse/inode.c  | 16 ++++++++++++----
 4 files changed, 94 insertions(+), 22 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index bab7a0db81dd..7bdab6521469 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -504,7 +504,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file)
 	 * Limit the cuse channel to requests that can
 	 * be represented in file->f_cred->user_ns.
 	 */
-	fuse_conn_init(&cc->fc, file->f_cred->user_ns);
+	fuse_conn_init(&cc->fc, file->f_cred->user_ns, &fuse_dev_fiq_ops, NULL);
 
 	fud = fuse_dev_alloc(&cc->fc);
 	if (!fud) {
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 2bbfc1f77875..fa6794e96811 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -375,13 +375,33 @@ static unsigned int fuse_req_hash(u64 unique)
 	return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS);
 }
 
-static void queue_request(struct fuse_iqueue *fiq, struct fuse_req *req)
+/**
+ * A new request is available, wake fiq->waitq
+ */
+static void fuse_dev_wake_and_unlock(struct fuse_iqueue *fiq)
+__releases(fiq->waitq.lock)
 {
-	req->in.h.len = sizeof(struct fuse_in_header) +
-		fuse_len_args(req->in.numargs, (struct fuse_arg *) req->in.args);
-	list_add_tail(&req->list, &fiq->pending);
 	wake_up_locked(&fiq->waitq);
 	kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
+	spin_unlock(&fiq->waitq.lock);
+}
+
+const struct fuse_iqueue_ops fuse_dev_fiq_ops = {
+	.wake_forget_and_unlock		= fuse_dev_wake_and_unlock,
+	.wake_interrupt_and_unlock	= fuse_dev_wake_and_unlock,
+	.wake_pending_and_unlock	= fuse_dev_wake_and_unlock,
+};
+EXPORT_SYMBOL_GPL(fuse_dev_fiq_ops);
+
+static void queue_request_and_unlock(struct fuse_iqueue *fiq,
+				     struct fuse_req *req)
+__releases(fiq->waitq.lock)
+{
+	req->in.h.len = sizeof(struct fuse_in_header) +
+		fuse_len_args(req->in.numargs,
+			      (struct fuse_arg *) req->in.args);
+	list_add_tail(&req->list, &fiq->pending);
+	fiq->ops->wake_pending_and_unlock(fiq);
 }
 
 void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget,
@@ -396,12 +416,11 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget,
 	if (fiq->connected) {
 		fiq->forget_list_tail->next = forget;
 		fiq->forget_list_tail = forget;
-		wake_up_locked(&fiq->waitq);
-		kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
+		fiq->ops->wake_forget_and_unlock(fiq);
 	} else {
 		kfree(forget);
+		spin_unlock(&fiq->waitq.lock);
 	}
-	spin_unlock(&fiq->waitq.lock);
 }
 
 static void flush_bg_queue(struct fuse_conn *fc)
@@ -417,8 +436,7 @@ static void flush_bg_queue(struct fuse_conn *fc)
 		fc->active_background++;
 		spin_lock(&fiq->waitq.lock);
 		req->in.h.unique = fuse_get_unique(fiq);
-		queue_request(fiq, req);
-		spin_unlock(&fiq->waitq.lock);
+		queue_request_and_unlock(fiq, req);
 	}
 }
 
@@ -506,10 +524,10 @@ static int queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
 			spin_unlock(&fiq->waitq.lock);
 			return 0;
 		}
-		wake_up_locked(&fiq->waitq);
-		kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
+		fiq->ops->wake_interrupt_and_unlock(fiq);
+	} else {
+		spin_unlock(&fiq->waitq.lock);
 	}
-	spin_unlock(&fiq->waitq.lock);
 	return 0;
 }
 
@@ -569,11 +587,10 @@ static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req)
 		req->out.h.error = -ENOTCONN;
 	} else {
 		req->in.h.unique = fuse_get_unique(fiq);
-		queue_request(fiq, req);
 		/* acquire extra reference, since request is still needed
 		   after fuse_request_end() */
 		__fuse_get_request(req);
-		spin_unlock(&fiq->waitq.lock);
+		queue_request_and_unlock(fiq, req);
 
 		request_wait_answer(fc, req);
 		/* Pairs with smp_wmb() in fuse_request_end() */
@@ -706,10 +723,11 @@ static int fuse_request_send_notify_reply(struct fuse_conn *fc,
 	req->in.h.unique = unique;
 	spin_lock(&fiq->waitq.lock);
 	if (fiq->connected) {
-		queue_request(fiq, req);
+		queue_request_and_unlock(fiq, req);
 		err = 0;
+	} else {
+		spin_unlock(&fiq->waitq.lock);
 	}
-	spin_unlock(&fiq->waitq.lock);
 
 	return err;
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 062f929348cc..70c1bde06293 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -75,6 +75,12 @@ struct fuse_mount_data {
 	unsigned max_read;
 	unsigned blksize;
 
+	/* fuse input queue operations */
+	const struct fuse_iqueue_ops *fiq_ops;
+
+	/* device-specific state for fuse_iqueue */
+	void *fiq_priv;
+
 	/* fuse_dev pointer to fill in, should contain NULL on entry */
 	void **fudptr;
 };
@@ -465,6 +471,39 @@ struct fuse_req {
 	struct file *stolen_file;
 };
 
+struct fuse_iqueue;
+
+/**
+ * Input queue callbacks
+ *
+ * Input queue signalling is device-specific.  For example, the /dev/fuse file
+ * uses fiq->waitq and fasync to wake processes that are waiting on queue
+ * readiness.  These callbacks allow other device types to respond to input
+ * queue activity.
+ */
+struct fuse_iqueue_ops {
+	/**
+	 * Signal that a forget has been queued
+	 */
+	void (*wake_forget_and_unlock)(struct fuse_iqueue *fiq)
+	__releases(fiq->waitq.lock);
+
+	/**
+	 * Signal that an INTERRUPT request has been queued
+	 */
+	void (*wake_interrupt_and_unlock)(struct fuse_iqueue *fiq)
+	__releases(fiq->waitq.lock);
+
+	/**
+	 * Signal that a request has been queued
+	 */
+	void (*wake_pending_and_unlock)(struct fuse_iqueue *fiq)
+	__releases(fiq->waitq.lock);
+};
+
+/** /dev/fuse input queue operations */
+extern const struct fuse_iqueue_ops fuse_dev_fiq_ops;
+
 struct fuse_iqueue {
 	/** Connection established */
 	unsigned connected;
@@ -490,6 +529,12 @@ struct fuse_iqueue {
 
 	/** O_ASYNC requests */
 	struct fasync_struct *fasync;
+
+	/** Device-specific callbacks */
+	const struct fuse_iqueue_ops *ops;
+
+	/** Device-specific state */
+	void *priv;
 };
 
 #define FUSE_PQ_HASH_BITS 8
@@ -1007,7 +1052,8 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
 /**
  * Initialize fuse_conn
  */
-void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns);
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns,
+		    const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv);
 
 /**
  * Release reference to fuse_conn
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9f21a0cb58b9..226af184a402 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -565,7 +565,9 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 	return 0;
 }
 
-static void fuse_iqueue_init(struct fuse_iqueue *fiq)
+static void fuse_iqueue_init(struct fuse_iqueue *fiq,
+			     const struct fuse_iqueue_ops *ops,
+			     void *priv)
 {
 	memset(fiq, 0, sizeof(struct fuse_iqueue));
 	init_waitqueue_head(&fiq->waitq);
@@ -573,6 +575,8 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq)
 	INIT_LIST_HEAD(&fiq->interrupts);
 	fiq->forget_list_tail = &fiq->forget_list_head;
 	fiq->connected = 1;
+	fiq->ops = ops;
+	fiq->priv = priv;
 }
 
 static void fuse_pqueue_init(struct fuse_pqueue *fpq)
@@ -586,7 +590,8 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq)
 	fpq->connected = 1;
 }
 
-void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns)
+void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns,
+		    const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv)
 {
 	memset(fc, 0, sizeof(*fc));
 	spin_lock_init(&fc->lock);
@@ -596,7 +601,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns)
 	atomic_set(&fc->dev_count, 1);
 	init_waitqueue_head(&fc->blocked_waitq);
 	init_waitqueue_head(&fc->reserved_req_waitq);
-	fuse_iqueue_init(&fc->iq);
+	fuse_iqueue_init(&fc->iq, fiq_ops, fiq_priv);
 	INIT_LIST_HEAD(&fc->bg_queue);
 	INIT_LIST_HEAD(&fc->entry);
 	INIT_LIST_HEAD(&fc->devices);
@@ -1110,7 +1115,8 @@ int fuse_fill_super_common(struct super_block *sb,
 	if (!fc)
 		goto err;
 
-	fuse_conn_init(fc, sb->s_user_ns);
+	fuse_conn_init(fc, sb->s_user_ns, mount_data->fiq_ops,
+		       mount_data->fiq_priv);
 	fc->release = fuse_free_conn;
 
 	fud = fuse_dev_alloc(fc);
@@ -1212,6 +1218,8 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 		goto err_fput;
 	__set_bit(FR_BACKGROUND, &init_req->flags);
 
+	d.fiq_ops = &fuse_dev_fiq_ops;
+	d.fiq_priv = NULL;
 	d.fudptr = &file->private_data;
 	err = fuse_fill_super_common(sb, &d);
 	if (err < 0)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 10/13] fuse: Separate fuse device allocation and installation in fuse_conn
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (8 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 09/13] fuse: add fuse_iqueue_ops callbacks Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 11/13] virtio_fs: add skeleton virtio_fs.ko module Vivek Goyal
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos; +Cc: virtio-fs, vgoyal, stefanha, dgilbert

As of now fuse_dev_alloc() both allocates a fuse device and installs it
in fuse_conn list. fuse_dev_alloc() can fail if fuse_device allocation
fails.

virtio-fs needs to initialize multiple fuse devices (one per virtio
queue). It initializes one fuse device as part of call to
fuse_fill_super_common() and rest of the devices are allocated and
installed after that.

But, we can't affort to fail after calling fuse_fill_super_common() as
we don't have a way to undo all the actions done by fuse_fill_super_common().
So to avoid failures after the call to fuse_fill_super_common(),
pre-allocate all fuse devices early and install them into fuse connection
later.

This patch provides two separate helpers for fuse device allocation and
fuse device installation in fuse_conn.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/cuse.c   |  2 +-
 fs/fuse/dev.c    |  2 +-
 fs/fuse/fuse_i.h |  4 +++-
 fs/fuse/inode.c  | 25 ++++++++++++++++++++-----
 4 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 7bdab6521469..04727540bdbb 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -506,7 +506,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file)
 	 */
 	fuse_conn_init(&cc->fc, file->f_cred->user_ns, &fuse_dev_fiq_ops, NULL);
 
-	fud = fuse_dev_alloc(&cc->fc);
+	fud = fuse_dev_alloc_install(&cc->fc);
 	if (!fud) {
 		kfree(cc);
 		return -ENOMEM;
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index fa6794e96811..556e44fa6b20 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2319,7 +2319,7 @@ static int fuse_device_clone(struct fuse_conn *fc, struct file *new)
 	if (new->private_data)
 		return -EINVAL;
 
-	fud = fuse_dev_alloc(fc);
+	fud = fuse_dev_alloc_install(fc);
 	if (!fud)
 		return -ENOMEM;
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 70c1bde06293..605217552350 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1060,7 +1060,9 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns,
  */
 void fuse_conn_put(struct fuse_conn *fc);
 
-struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc);
+struct fuse_dev *fuse_dev_alloc_install(struct fuse_conn *fc);
+struct fuse_dev *fuse_dev_alloc(void);
+void fuse_dev_install(struct fuse_dev *fud, struct fuse_conn *fc);
 void fuse_dev_free(struct fuse_dev *fud);
 void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 226af184a402..0df885d6fa00 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1024,8 +1024,7 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
 	return 0;
 }
 
-struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc)
-{
+struct fuse_dev *fuse_dev_alloc(void) {
 	struct fuse_dev *fud;
 	struct list_head *pq;
 
@@ -1040,16 +1039,32 @@ struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc)
 	}
 
 	fud->pq.processing = pq;
-	fud->fc = fuse_conn_get(fc);
 	fuse_pqueue_init(&fud->pq);
 
+	return fud;
+}
+EXPORT_SYMBOL_GPL(fuse_dev_alloc);
+
+void fuse_dev_install(struct fuse_dev *fud, struct fuse_conn *fc) {
+	fud->fc = fuse_conn_get(fc);
 	spin_lock(&fc->lock);
 	list_add_tail(&fud->entry, &fc->devices);
 	spin_unlock(&fc->lock);
+}
+EXPORT_SYMBOL_GPL(fuse_dev_install);
 
+struct fuse_dev *fuse_dev_alloc_install(struct fuse_conn *fc)
+{
+	struct fuse_dev *fud;
+
+	fud = fuse_dev_alloc();
+	if (!fud)
+		return NULL;
+
+	fuse_dev_install(fud, fc);
 	return fud;
 }
-EXPORT_SYMBOL_GPL(fuse_dev_alloc);
+EXPORT_SYMBOL_GPL(fuse_dev_alloc_install);
 
 void fuse_dev_free(struct fuse_dev *fud)
 {
@@ -1119,7 +1134,7 @@ int fuse_fill_super_common(struct super_block *sb,
 		       mount_data->fiq_priv);
 	fc->release = fuse_free_conn;
 
-	fud = fuse_dev_alloc(fc);
+	fud = fuse_dev_alloc_install(fc);
 	if (!fud)
 		goto err_put_conn;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 11/13] virtio_fs: add skeleton virtio_fs.ko module
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (9 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 10/13] fuse: Separate fuse device allocation and installation in fuse_conn Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 12/13] virtio-fs: Do not provide abort interface in fusectl Vivek Goyal
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos
  Cc: virtio-fs, vgoyal, stefanha, dgilbert, Miklos Szeredi

From: Stefan Hajnoczi <stefanha@redhat.com>

Add a basic file system module for virtio-fs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/fuse/Kconfig                 |   11 +
 fs/fuse/Makefile                |    1 +
 fs/fuse/fuse_i.h                |   12 +
 fs/fuse/inode.c                 |   27 +-
 fs/fuse/virtio_fs.c             | 1060 +++++++++++++++++++++++++++++++
 include/uapi/linux/virtio_fs.h  |   41 ++
 include/uapi/linux/virtio_ids.h |    1 +
 7 files changed, 1142 insertions(+), 11 deletions(-)
 create mode 100644 fs/fuse/virtio_fs.c
 create mode 100644 include/uapi/linux/virtio_fs.h

diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 24fc5a5c1b97..0635cba19971 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -27,3 +27,14 @@ config CUSE
 
 	  If you want to develop or use a userspace character device
 	  based on CUSE, answer Y or M.
+
+config VIRTIO_FS
+	tristate "Virtio Filesystem"
+	depends on FUSE_FS
+	select VIRTIO
+	help
+	  The Virtio Filesystem allows guests to mount file systems from the
+          host.
+
+	  If you want to share files between guests or with the host, answer Y
+          or M.
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 9485019c2a14..6419a2b3510d 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -5,5 +5,6 @@
 
 obj-$(CONFIG_FUSE_FS) += fuse.o
 obj-$(CONFIG_CUSE) += cuse.o
+obj-$(CONFIG_VIRTIO_FS) += virtio_fs.o
 
 fuse-objs := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 605217552350..73b23421b48e 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -72,6 +72,7 @@ struct fuse_mount_data {
 	unsigned group_id_present:1;
 	unsigned default_permissions:1;
 	unsigned allow_other:1;
+	unsigned destroy:1;
 	unsigned max_read;
 	unsigned blksize;
 
@@ -469,6 +470,9 @@ struct fuse_req {
 
 	/** Request is stolen from fuse_file->reserved_req */
 	struct file *stolen_file;
+
+	/** virtio-fs's physically contiguous buffer for in and out args */
+	void *argbuf;
 };
 
 struct fuse_iqueue;
@@ -1080,6 +1084,13 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
 int fuse_fill_super_common(struct super_block *sb,
 			   struct fuse_mount_data *mount_data);
 
+/**
+ * Disassociate fuse connection from superblock and kill the superblock
+ *
+ * Calls kill_anon_super(), use with do not use with bdev mounts.
+ */
+void fuse_kill_sb_anon(struct super_block *sb);
+
 /**
  * Add connection to control filesystem
  */
@@ -1192,5 +1203,6 @@ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args);
  * Get the next unique ID for a request
  */
 u64 fuse_get_unique(struct fuse_iqueue *fiq);
+void fuse_free_conn(struct fuse_conn *fc);
 
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 0df885d6fa00..fca81c40b2d7 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -978,11 +978,12 @@ void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req)
 }
 EXPORT_SYMBOL_GPL(fuse_send_init);
 
-static void fuse_free_conn(struct fuse_conn *fc)
+void fuse_free_conn(struct fuse_conn *fc)
 {
 	WARN_ON(!list_empty(&fc->devices));
 	kfree_rcu(fc, rcu);
 }
+EXPORT_SYMBOL_GPL(fuse_free_conn);
 
 static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
 {
@@ -1125,14 +1126,16 @@ int fuse_fill_super_common(struct super_block *sb,
 	if (sb->s_user_ns != &init_user_ns)
 		sb->s_xattr = fuse_no_acl_xattr_handlers;
 
-	fc = kmalloc(sizeof(*fc), GFP_KERNEL);
-	err = -ENOMEM;
-	if (!fc)
-		goto err;
-
-	fuse_conn_init(fc, sb->s_user_ns, mount_data->fiq_ops,
-		       mount_data->fiq_priv);
-	fc->release = fuse_free_conn;
+	fc = get_fuse_conn_super(sb);
+	if (!fc) {
+		fc = kmalloc(sizeof(*fc), GFP_KERNEL);
+		err = -ENOMEM;
+		if (!fc)
+			goto err;
+		fuse_conn_init(fc, sb->s_user_ns, mount_data->fiq_ops,
+			       mount_data->fiq_priv);
+		fc->release = fuse_free_conn;
+	}
 
 	fud = fuse_dev_alloc_install(fc);
 	if (!fud)
@@ -1167,7 +1170,7 @@ int fuse_fill_super_common(struct super_block *sb,
 	/* Root dentry doesn't have .d_revalidate */
 	sb->s_d_op = &fuse_dentry_operations;
 
-	if (is_bdev) {
+	if (mount_data->destroy) {
 		fc->destroy_req = fuse_request_alloc(0);
 		if (!fc->destroy_req)
 			goto err_put_root;
@@ -1236,6 +1239,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 	d.fiq_ops = &fuse_dev_fiq_ops;
 	d.fiq_priv = NULL;
 	d.fudptr = &file->private_data;
+	d.destroy = is_bdev;
 	err = fuse_fill_super_common(sb, &d);
 	if (err < 0)
 		goto err_free_init_req;
@@ -1279,11 +1283,12 @@ static void fuse_sb_destroy(struct super_block *sb)
 	}
 }
 
-static void fuse_kill_sb_anon(struct super_block *sb)
+void fuse_kill_sb_anon(struct super_block *sb)
 {
 	fuse_sb_destroy(sb);
 	kill_anon_super(sb);
 }
+EXPORT_SYMBOL_GPL(fuse_kill_sb_anon);
 
 static struct file_system_type fuse_fs_type = {
 	.owner		= THIS_MODULE,
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
new file mode 100644
index 000000000000..ce6b76598e74
--- /dev/null
+++ b/fs/fuse/virtio_fs.c
@@ -0,0 +1,1060 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio-fs: Virtio Filesystem
+ * Copyright (C) 2018 Red Hat, Inc.
+ */
+
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/virtio.h>
+#include <linux/virtio_fs.h>
+#include <linux/delay.h>
+#include "fuse_i.h"
+
+/* List of virtio-fs device instances and a lock for the list */
+static DEFINE_MUTEX(virtio_fs_mutex);
+static LIST_HEAD(virtio_fs_instances);
+
+enum {
+	VQ_HIPRIO,
+	VQ_REQUEST
+};
+
+/* Per-virtqueue state */
+struct virtio_fs_vq {
+	spinlock_t lock;
+	struct virtqueue *vq;     /* protected by ->lock */
+	struct work_struct done_work;
+	struct list_head queued_reqs;
+	struct delayed_work dispatch_work;
+	struct fuse_dev *fud;
+	bool connected;
+	long in_flight;
+	char name[24];
+} ____cacheline_aligned_in_smp;
+
+/* A virtio-fs device instance */
+struct virtio_fs {
+	struct list_head list;    /* on virtio_fs_instances */
+	char *tag;
+	struct virtio_fs_vq *vqs;
+	unsigned nvqs;            /* number of virtqueues */
+	unsigned num_queues;      /* number of request queues */
+};
+
+struct virtio_fs_forget {
+	struct fuse_in_header ih;
+	struct fuse_forget_in arg;
+	/* This request can be temporarily queued on virt queue */
+	struct list_head list;
+};
+
+static inline struct virtio_fs_vq *vq_to_fsvq(struct virtqueue *vq)
+{
+	struct virtio_fs *fs = vq->vdev->priv;
+
+	return &fs->vqs[vq->index];
+}
+
+static inline struct fuse_pqueue *vq_to_fpq(struct virtqueue *vq)
+{
+	return &vq_to_fsvq(vq)->fud->pq;
+}
+
+/* Add a new instance to the list or return -EEXIST if tag name exists*/
+static int virtio_fs_add_instance(struct virtio_fs *fs)
+{
+	struct virtio_fs *fs2;
+	bool duplicate = false;
+
+	mutex_lock(&virtio_fs_mutex);
+
+	list_for_each_entry(fs2, &virtio_fs_instances, list) {
+		if (strcmp(fs->tag, fs2->tag) == 0)
+			duplicate = true;
+	}
+
+	if (!duplicate)
+		list_add_tail(&fs->list, &virtio_fs_instances);
+
+	mutex_unlock(&virtio_fs_mutex);
+
+	if (duplicate)
+		return -EEXIST;
+	return 0;
+}
+
+/* Return the virtio_fs with a given tag, or NULL */
+static struct virtio_fs *virtio_fs_find_instance(const char *tag)
+{
+	struct virtio_fs *fs;
+
+	mutex_lock(&virtio_fs_mutex);
+
+	list_for_each_entry(fs, &virtio_fs_instances, list) {
+		if (strcmp(fs->tag, tag) == 0)
+			goto found;
+	}
+
+	fs = NULL; /* not found */
+
+found:
+	mutex_unlock(&virtio_fs_mutex);
+
+	return fs;
+}
+
+static void virtio_fs_free_devs(struct virtio_fs *fs)
+{
+	unsigned int i;
+
+	/* TODO lock */
+
+	for (i = 0; i < fs->nvqs; i++) {
+		struct virtio_fs_vq *fsvq = &fs->vqs[i];
+
+		if (!fsvq->fud)
+			continue;
+
+		flush_work(&fsvq->done_work);
+		flush_delayed_work(&fsvq->dispatch_work);
+
+		fuse_dev_free(fsvq->fud); /* TODO need to quiesce/end_requests/decrement dev_count */
+		fsvq->fud = NULL;
+	}
+}
+
+/* Read filesystem name from virtio config into fs->tag (must kfree()). */
+static int virtio_fs_read_tag(struct virtio_device *vdev, struct virtio_fs *fs)
+{
+	char tag_buf[sizeof_field(struct virtio_fs_config, tag)];
+	char *end;
+	size_t len;
+
+	virtio_cread_bytes(vdev, offsetof(struct virtio_fs_config, tag),
+			   &tag_buf, sizeof(tag_buf));
+	end = memchr(tag_buf, '\0', sizeof(tag_buf));
+	if (end == tag_buf)
+		return -EINVAL; /* empty tag */
+	if (!end)
+		end = &tag_buf[sizeof(tag_buf)];
+
+	len = end - tag_buf;
+	fs->tag = devm_kmalloc(&vdev->dev, len + 1, GFP_KERNEL);
+	if (!fs->tag)
+		return -ENOMEM;
+	memcpy(fs->tag, tag_buf, len);
+	fs->tag[len] = '\0';
+	return 0;
+}
+
+/* Work function for hiprio completion */
+static void virtio_fs_hiprio_done_work(struct work_struct *work)
+{
+	struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
+						 done_work);
+	struct virtqueue *vq = fsvq->vq;
+
+	/* Free completed FUSE_FORGET requests */
+	spin_lock(&fsvq->lock);
+	do {
+		unsigned len;
+		void *req;
+
+		virtqueue_disable_cb(vq);
+
+		while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+			kfree(req);
+			fsvq->in_flight--;
+		}
+	} while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq)));
+	spin_unlock(&fsvq->lock);
+}
+
+static void virtio_fs_dummy_dispatch_work(struct work_struct *work)
+{
+	return;
+}
+
+static void virtio_fs_hiprio_dispatch_work(struct work_struct *work)
+{
+	struct virtio_fs_forget *forget;
+	struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
+						 dispatch_work.work);
+	struct virtqueue *vq = fsvq->vq;
+	struct scatterlist sg;
+	struct scatterlist *sgs[] = {&sg};
+	bool notify;
+	int ret;
+
+	pr_debug("worker virtio_fs_hiprio_dispatch_work() called.\n");
+	while(1) {
+		spin_lock(&fsvq->lock);
+		forget = list_first_entry_or_null(&fsvq->queued_reqs,
+					struct virtio_fs_forget, list);
+		if (!forget) {
+			spin_unlock(&fsvq->lock);
+			return;
+		}
+
+		list_del(&forget->list);
+		if (!fsvq->connected) {
+			spin_unlock(&fsvq->lock);
+			kfree(forget);
+			continue;
+		}
+
+		sg_init_one(&sg, forget, sizeof(*forget));
+
+		/* Enqueue the request */
+		dev_dbg(&vq->vdev->dev, "%s\n", __func__);
+		ret = virtqueue_add_sgs(vq, sgs, 1, 0, forget, GFP_ATOMIC);
+		if (ret < 0) {
+			if (ret == -ENOMEM || ret == -ENOSPC) {
+				pr_debug("virtio-fs: Could not queue FORGET:"
+					 " err=%d. Will try later\n", ret);
+				list_add_tail(&forget->list,
+						&fsvq->queued_reqs);
+				schedule_delayed_work(&fsvq->dispatch_work,
+						msecs_to_jiffies(1));
+			} else {
+				pr_debug("virtio-fs: Could not queue FORGET:"
+					 " err=%d. Dropping it.\n", ret);
+				kfree(forget);
+			}
+			spin_unlock(&fsvq->lock);
+			return;
+		}
+
+		fsvq->in_flight++;
+		notify = virtqueue_kick_prepare(vq);
+		spin_unlock(&fsvq->lock);
+
+		if (notify)
+			virtqueue_notify(vq);
+		pr_debug("worker virtio_fs_hiprio_dispatch_work() dispatched one forget request.\n");
+	}
+}
+
+/* Allocate and copy args into req->argbuf */
+static int copy_args_to_argbuf(struct fuse_req *req)
+{
+	unsigned offset = 0;
+	unsigned num_in;
+	unsigned num_out;
+	unsigned len;
+	unsigned i;
+
+	num_in = req->in.numargs - req->in.argpages;
+	num_out = req->out.numargs - req->out.argpages;
+	len = fuse_len_args(num_in, (struct fuse_arg *)req->in.args) +
+	      fuse_len_args(num_out, req->out.args);
+
+	req->argbuf = kmalloc(len, GFP_ATOMIC);
+	if (!req->argbuf)
+		return -ENOMEM;
+
+	for (i = 0; i < num_in; i++) {
+		memcpy(req->argbuf + offset,
+		       req->in.args[i].value,
+		       req->in.args[i].size);
+		offset += req->in.args[i].size;
+	}
+
+	return 0;
+}
+
+/* Copy args out of and free req->argbuf */
+static void copy_args_from_argbuf(struct fuse_req *req)
+{
+	unsigned remaining;
+	unsigned offset;
+	unsigned num_in;
+	unsigned num_out;
+	unsigned i;
+
+	remaining = req->out.h.len - sizeof(req->out.h);
+	num_in = req->in.numargs - req->in.argpages;
+	num_out = req->out.numargs - req->out.argpages;
+	offset = fuse_len_args(num_in, (struct fuse_arg *)req->in.args);
+
+	for (i = 0; i < num_out; i++) {
+		unsigned argsize = req->out.args[i].size;
+
+		if (req->out.argvar &&
+		    i == req->out.numargs - 1 &&
+		    argsize > remaining) {
+			argsize = remaining;
+		}
+
+		memcpy(req->out.args[i].value, req->argbuf + offset, argsize);
+		offset += argsize;
+
+		if (i != req->out.numargs - 1)
+			remaining -= argsize;
+	}
+
+	/* Store the actual size of the variable-length arg */
+	if (req->out.argvar)
+		req->out.args[req->out.numargs - 1].size = remaining;
+
+	kfree(req->argbuf);
+	req->argbuf = NULL;
+}
+
+/* Work function for request completion */
+static void virtio_fs_requests_done_work(struct work_struct *work)
+{
+	struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
+						 done_work);
+	struct fuse_pqueue *fpq = &fsvq->fud->pq;
+	struct fuse_conn *fc = fsvq->fud->fc;
+	struct virtqueue *vq = fsvq->vq;
+	struct fuse_req *req;
+	struct fuse_req *next;
+	LIST_HEAD(reqs);
+
+	/* Collect completed requests off the virtqueue */
+	spin_lock(&fsvq->lock);
+	do {
+		unsigned len;
+
+		virtqueue_disable_cb(vq);
+
+		while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+			spin_lock(&fpq->lock);
+			list_move_tail(&req->list, &reqs);
+			spin_unlock(&fpq->lock);
+		}
+	} while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq)));
+	spin_unlock(&fsvq->lock);
+
+	/* End requests */
+	list_for_each_entry_safe(req, next, &reqs, list) {
+		/* TODO check unique */
+		/* TODO fuse_len_args(out) against oh.len */
+
+		copy_args_from_argbuf(req);
+
+		/* TODO zeroing? */
+
+		spin_lock(&fpq->lock);
+		clear_bit(FR_SENT, &req->flags);
+		list_del_init(&req->list);
+		spin_unlock(&fpq->lock);
+
+		fuse_request_end(fc, req);
+	}
+}
+
+/* Virtqueue interrupt handler */
+static void virtio_fs_vq_done(struct virtqueue *vq)
+{
+	struct virtio_fs_vq *fsvq = vq_to_fsvq(vq);
+
+	dev_dbg(&vq->vdev->dev, "%s %s\n", __func__, fsvq->name);
+
+	schedule_work(&fsvq->done_work);
+}
+
+/* Initialize virtqueues */
+static int virtio_fs_setup_vqs(struct virtio_device *vdev,
+			       struct virtio_fs *fs)
+{
+	struct virtqueue **vqs;
+	vq_callback_t **callbacks;
+	const char **names;
+	unsigned i;
+	int ret;
+
+	virtio_cread(vdev, struct virtio_fs_config, num_queues,
+		     &fs->num_queues);
+	if (fs->num_queues == 0)
+		return -EINVAL;
+
+	fs->nvqs = 1 + fs->num_queues;
+
+	fs->vqs = devm_kcalloc(&vdev->dev, fs->nvqs,
+				sizeof(fs->vqs[VQ_HIPRIO]), GFP_KERNEL);
+	if (!fs->vqs)
+		return -ENOMEM;
+
+	vqs = kmalloc_array(fs->nvqs, sizeof(vqs[VQ_HIPRIO]), GFP_KERNEL);
+	callbacks = kmalloc_array(fs->nvqs, sizeof(callbacks[VQ_HIPRIO]),
+					GFP_KERNEL);
+	names = kmalloc_array(fs->nvqs, sizeof(names[VQ_HIPRIO]), GFP_KERNEL);
+	if (!vqs || !callbacks || !names) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	callbacks[VQ_HIPRIO] = virtio_fs_vq_done;
+	snprintf(fs->vqs[VQ_HIPRIO].name, sizeof(fs->vqs[VQ_HIPRIO].name),
+			"hiprio");
+	names[VQ_HIPRIO] = fs->vqs[VQ_HIPRIO].name;
+	INIT_WORK(&fs->vqs[VQ_HIPRIO].done_work, virtio_fs_hiprio_done_work);
+	INIT_LIST_HEAD(&fs->vqs[VQ_HIPRIO].queued_reqs);
+	INIT_DELAYED_WORK(&fs->vqs[VQ_HIPRIO].dispatch_work,
+			virtio_fs_hiprio_dispatch_work);
+	spin_lock_init(&fs->vqs[VQ_HIPRIO].lock);
+
+	/* Initialize the requests virtqueues */
+	for (i = VQ_REQUEST; i < fs->nvqs; i++) {
+		spin_lock_init(&fs->vqs[i].lock);
+		INIT_WORK(&fs->vqs[i].done_work, virtio_fs_requests_done_work);
+		INIT_DELAYED_WORK(&fs->vqs[i].dispatch_work,
+					virtio_fs_dummy_dispatch_work);
+		INIT_LIST_HEAD(&fs->vqs[i].queued_reqs);
+		snprintf(fs->vqs[i].name, sizeof(fs->vqs[i].name),
+			 "requests.%u", i - VQ_REQUEST);
+		callbacks[i] = virtio_fs_vq_done;
+		names[i] = fs->vqs[i].name;
+	}
+
+	ret = virtio_find_vqs(vdev, fs->nvqs, vqs, callbacks, names, NULL);
+	if (ret < 0)
+		goto out;
+
+	for (i = 0; i < fs->nvqs; i++) {
+		fs->vqs[i].vq = vqs[i];
+		fs->vqs[i].connected = true;
+	}
+out:
+	kfree(names);
+	kfree(callbacks);
+	kfree(vqs);
+	return ret;
+}
+
+/* Free virtqueues (device must already be reset) */
+static void virtio_fs_cleanup_vqs(struct virtio_device *vdev,
+				  struct virtio_fs *fs)
+{
+	vdev->config->del_vqs(vdev);
+}
+
+static int virtio_fs_probe(struct virtio_device *vdev)
+{
+	struct virtio_fs *fs;
+	int ret;
+
+	fs = devm_kzalloc(&vdev->dev, sizeof(*fs), GFP_KERNEL);
+	if (!fs)
+		return -ENOMEM;
+	vdev->priv = fs;
+
+	ret = virtio_fs_read_tag(vdev, fs);
+	if (ret < 0)
+		goto out;
+
+	ret = virtio_fs_setup_vqs(vdev, fs);
+	if (ret < 0)
+		goto out;
+
+	/* TODO vq affinity */
+	/* TODO populate notifications vq */
+
+	/* Bring the device online in case the filesystem is mounted and
+	 * requests need to be sent before we return.
+	 */
+	virtio_device_ready(vdev);
+
+	ret = virtio_fs_add_instance(fs);
+	if (ret < 0)
+		goto out_vqs;
+
+	return 0;
+
+out_vqs:
+	vdev->config->reset(vdev);
+	virtio_fs_cleanup_vqs(vdev, fs);
+
+out:
+	vdev->priv = NULL;
+	return ret;
+}
+
+static void virtio_fs_remove(struct virtio_device *vdev)
+{
+	struct virtio_fs *fs = vdev->priv;
+
+	virtio_fs_free_devs(fs);
+
+	vdev->config->reset(vdev);
+	virtio_fs_cleanup_vqs(vdev, fs);
+
+	mutex_lock(&virtio_fs_mutex);
+	list_del(&fs->list);
+	mutex_unlock(&virtio_fs_mutex);
+
+	vdev->priv = NULL;
+}
+
+#ifdef CONFIG_PM
+static int virtio_fs_freeze(struct virtio_device *vdev)
+{
+	return 0; /* TODO */
+}
+
+static int virtio_fs_restore(struct virtio_device *vdev)
+{
+	return 0; /* TODO */
+}
+#endif /* CONFIG_PM */
+
+const static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_FS, VIRTIO_DEV_ANY_ID },
+	{},
+};
+
+const static unsigned int feature_table[] = {};
+
+static struct virtio_driver virtio_fs_driver = {
+	.driver.name		= KBUILD_MODNAME,
+	.driver.owner		= THIS_MODULE,
+	.id_table		= id_table,
+	.feature_table		= feature_table,
+	.feature_table_size	= ARRAY_SIZE(feature_table),
+	/* TODO validate config_get != NULL */
+	.probe			= virtio_fs_probe,
+	.remove			= virtio_fs_remove,
+#ifdef CONFIG_PM_SLEEP
+	.freeze			= virtio_fs_freeze,
+	.restore		= virtio_fs_restore,
+#endif
+};
+
+static void virtio_fs_wake_forget_and_unlock(struct fuse_iqueue *fiq)
+__releases(fiq->waitq.lock)
+{
+	struct fuse_forget_link *link;
+	struct virtio_fs_forget *forget;
+	struct scatterlist sg;
+	struct scatterlist *sgs[] = {&sg};
+	struct virtio_fs *fs;
+	struct virtqueue *vq;
+	struct virtio_fs_vq *fsvq;
+	bool notify;
+	u64 unique;
+	int ret;
+
+	link = fuse_dequeue_forget(fiq, 1, NULL);
+	unique = fuse_get_unique(fiq);
+
+	fs = fiq->priv;
+	fsvq = &fs->vqs[VQ_HIPRIO];
+	spin_unlock(&fiq->waitq.lock);
+
+	/* Allocate a buffer for the request */
+	forget = kmalloc(sizeof(*forget), GFP_ATOMIC);
+	if (!forget) {
+		pr_err("virtio-fs: dropped FORGET: kmalloc failed\n");
+		goto out; /* TODO avoid dropping it? */
+	}
+
+	forget->ih = (struct fuse_in_header){
+		.opcode = FUSE_FORGET,
+		.nodeid = link->forget_one.nodeid,
+		.unique = unique,
+		.len = sizeof(*forget),
+	};
+	forget->arg = (struct fuse_forget_in){
+		.nlookup = link->forget_one.nlookup,
+	};
+
+	sg_init_one(&sg, forget, sizeof(*forget));
+
+	/* Enqueue the request */
+	vq = fsvq->vq;
+	dev_dbg(&vq->vdev->dev, "%s\n", __func__);
+	spin_lock(&fsvq->lock);
+
+	ret = virtqueue_add_sgs(vq, sgs, 1, 0, forget, GFP_ATOMIC);
+	if (ret < 0) {
+		if (ret == -ENOMEM || ret == -ENOSPC) {
+			pr_debug("virtio-fs: Could not queue FORGET: err=%d."
+				 " Will try later.\n", ret);
+			list_add_tail(&forget->list, &fsvq->queued_reqs);
+			schedule_delayed_work(&fsvq->dispatch_work,
+					msecs_to_jiffies(1));
+		} else {
+			pr_debug("virtio-fs: Could not queue FORGET: err=%d."
+				 " Dropping it.\n", ret);
+			kfree(forget);
+		}
+		spin_unlock(&fsvq->lock);
+		goto out;
+	}
+
+	fsvq->in_flight++;
+	notify = virtqueue_kick_prepare(vq);
+
+	spin_unlock(&fsvq->lock);
+
+	if (notify)
+		virtqueue_notify(vq);
+out:
+	kfree(link);
+}
+
+static void virtio_fs_wake_interrupt_and_unlock(struct fuse_iqueue *fiq)
+__releases(fiq->waitq.lock)
+{
+	/* TODO */
+	spin_unlock(&fiq->waitq.lock);
+}
+
+/* Return the number of scatter-gather list elements required */
+static unsigned sg_count_fuse_req(struct fuse_req *req)
+{
+	unsigned total_sgs = 1 /* fuse_in_header */;
+
+	if (req->in.numargs - req->in.argpages)
+		total_sgs += 1;
+
+	if (req->in.argpages)
+		total_sgs += req->num_pages;
+
+	if (!test_bit(FR_ISREPLY, &req->flags))
+		return total_sgs;
+
+	total_sgs += 1 /* fuse_out_header */;
+
+	if (req->out.numargs - req->out.argpages)
+		total_sgs += 1;
+
+	if (req->out.argpages)
+		total_sgs += req->num_pages;
+
+	return total_sgs;
+}
+
+/* Add pages to scatter-gather list and return number of elements used */
+static unsigned sg_init_fuse_pages(struct scatterlist *sg,
+				   struct page **pages,
+				   struct fuse_page_desc *page_descs,
+				   unsigned num_pages)
+{
+	unsigned i;
+
+	for (i = 0; i < num_pages; i++) {
+		sg_init_table(&sg[i], 1);
+		sg_set_page(&sg[i], pages[i],
+			    page_descs[i].length,
+			    page_descs[i].offset);
+	}
+
+	return i;
+}
+
+/* Add args to scatter-gather list and return number of elements used */
+static unsigned sg_init_fuse_args(struct scatterlist *sg,
+				  struct fuse_req *req,
+				  struct fuse_arg *args,
+				  unsigned numargs,
+				  bool argpages,
+				  void *argbuf,
+				  unsigned *len_used)
+{
+	unsigned total_sgs = 0;
+	unsigned len;
+
+	len = fuse_len_args(numargs - argpages, args);
+	if (len)
+		sg_init_one(&sg[total_sgs++], argbuf, len);
+
+	if (argpages)
+		total_sgs += sg_init_fuse_pages(&sg[total_sgs],
+						req->pages,
+						req->page_descs,
+						req->num_pages);
+
+	if (len_used)
+		*len_used = len;
+
+	return total_sgs;
+}
+
+/* Add a request to a virtqueue and kick the device */
+static int virtio_fs_enqueue_req(struct virtqueue *vq, struct fuse_req *req)
+{
+	struct scatterlist *stack_sgs[6 /* requests need at least 4 elements */];
+	struct scatterlist stack_sg[ARRAY_SIZE(stack_sgs)];
+	struct scatterlist **sgs = stack_sgs;
+	struct scatterlist *sg = stack_sg;
+	struct virtio_fs_vq *fsvq;
+	unsigned argbuf_used = 0;
+	unsigned out_sgs = 0;
+	unsigned in_sgs = 0;
+	unsigned total_sgs;
+	unsigned i;
+	int ret;
+	bool notify;
+
+	/* Does the sglist fit on the stack? */
+	total_sgs = sg_count_fuse_req(req);
+	if (total_sgs > ARRAY_SIZE(stack_sgs)) {
+		sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC);
+		sg = kmalloc_array(total_sgs, sizeof(sg[0]), GFP_ATOMIC);
+		if (!sgs || !sg) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+
+	/* Use a bounce buffer since stack args cannot be mapped */
+	ret = copy_args_to_argbuf(req);
+	if (ret < 0)
+		goto out;
+
+	/* Request elements */
+	sg_init_one(&sg[out_sgs++], &req->in.h, sizeof(req->in.h));
+	out_sgs += sg_init_fuse_args(&sg[out_sgs], req,
+				     (struct fuse_arg *)req->in.args,
+				     req->in.numargs, req->in.argpages,
+				     req->argbuf, &argbuf_used);
+
+	/* Reply elements */
+	if (test_bit(FR_ISREPLY, &req->flags)) {
+		sg_init_one(&sg[out_sgs + in_sgs++],
+			    &req->out.h, sizeof(req->out.h));
+		in_sgs += sg_init_fuse_args(&sg[out_sgs + in_sgs], req,
+					    req->out.args, req->out.numargs,
+					    req->out.argpages,
+					    req->argbuf + argbuf_used, NULL);
+	}
+
+	BUG_ON(out_sgs + in_sgs != total_sgs);
+
+	for (i = 0; i < total_sgs; i++)
+		sgs[i] = &sg[i];
+
+	fsvq = vq_to_fsvq(vq);
+	spin_lock(&fsvq->lock);
+
+	ret = virtqueue_add_sgs(vq, sgs, out_sgs, in_sgs, req, GFP_ATOMIC);
+	if (ret < 0) {
+		/* TODO handle full virtqueue */
+		spin_unlock(&fsvq->lock);
+		goto out;
+	}
+
+	notify = virtqueue_kick_prepare(vq);
+
+	spin_unlock(&fsvq->lock);
+
+	if (notify)
+		virtqueue_notify(vq);
+
+out:
+	if (ret < 0 && req->argbuf) {
+		kfree(req->argbuf);
+		req->argbuf = NULL;
+	}
+	if (sgs != stack_sgs) {
+		kfree(sgs);
+		kfree(sg);
+	}
+
+	return ret;
+}
+
+static void virtio_fs_wake_pending_and_unlock(struct fuse_iqueue *fiq)
+__releases(fiq->waitq.lock)
+{
+	unsigned queue_id = VQ_REQUEST; /* TODO multiqueue */
+	struct virtio_fs *fs;
+	struct fuse_conn *fc;
+	struct fuse_req *req;
+	struct fuse_pqueue *fpq;
+	int ret;
+
+	BUG_ON(list_empty(&fiq->pending));
+	req = list_last_entry(&fiq->pending, struct fuse_req, list);
+	clear_bit(FR_PENDING, &req->flags);
+	list_del_init(&req->list);
+	BUG_ON(!list_empty(&fiq->pending));
+	spin_unlock(&fiq->waitq.lock);
+
+	fs = fiq->priv;
+	fc = fs->vqs[queue_id].fud->fc;
+
+	dev_dbg(&fs->vqs[queue_id].vq->vdev->dev,
+		"%s: opcode %u unique %#llx nodeid %#llx in.len %u out.len %u\n",
+		__func__, req->in.h.opcode, req->in.h.unique, req->in.h.nodeid,
+		req->in.h.len, fuse_len_args(req->out.numargs, req->out.args));
+
+	fpq = &fs->vqs[queue_id].fud->pq;
+	spin_lock(&fpq->lock);
+	if (!fpq->connected) {
+		spin_unlock(&fpq->lock);
+		req->out.h.error = -ENODEV;
+		printk(KERN_ERR "%s: disconnected\n", __func__);
+		fuse_request_end(fc, req);
+		return;
+	}
+	list_add_tail(&req->list, fpq->processing);
+	spin_unlock(&fpq->lock);
+	set_bit(FR_SENT, &req->flags);
+	/* matches barrier in request_wait_answer() */
+	smp_mb__after_atomic();
+	/* TODO check for FR_INTERRUPTED? */
+
+retry:
+	ret = virtio_fs_enqueue_req(fs->vqs[queue_id].vq, req);
+	if (ret < 0) {
+		if (ret == -ENOMEM || ret == -ENOSPC) {
+			/* Virtqueue full. Retry submission */
+			usleep_range(20, 30);
+			goto retry;
+		}
+		req->out.h.error = ret;
+		printk(KERN_ERR "%s: virtio_fs_enqueue_req failed %d\n",
+			__func__, ret);
+		fuse_request_end(fc, req);
+		return;
+	}
+}
+
+static void virtio_fs_flush_hiprio_queue(struct virtio_fs_vq *fsvq)
+{
+	struct virtio_fs_forget *forget;
+
+	WARN_ON(fsvq->in_flight < 0);
+
+	/* Go through pending forget reuests and free them */
+	spin_lock(&fsvq->lock);
+	while(1) {
+		forget = list_first_entry_or_null(&fsvq->queued_reqs,
+					struct virtio_fs_forget, list);
+		if (!forget)
+			break;
+		list_del(&forget->list);
+		kfree(forget);
+	}
+
+	spin_unlock(&fsvq->lock);
+
+	/* Wait for in flight requests to finish.*/
+	while (1) {
+		spin_lock(&fsvq->lock);
+		if (!fsvq->in_flight) {
+			spin_unlock(&fsvq->lock);
+			break;
+		}
+		spin_unlock(&fsvq->lock);
+		usleep_range(1000, 2000);
+	}
+}
+
+const static struct fuse_iqueue_ops virtio_fs_fiq_ops = {
+	.wake_forget_and_unlock		= virtio_fs_wake_forget_and_unlock,
+	.wake_interrupt_and_unlock	= virtio_fs_wake_interrupt_and_unlock,
+	.wake_pending_and_unlock	= virtio_fs_wake_pending_and_unlock,
+};
+
+static int virtio_fs_fill_super(struct super_block *sb, char *opts,
+				struct fuse_mount_data *d)
+{
+	struct fuse_conn *fc = get_fuse_conn_super(sb);
+	struct virtio_fs *fs = fc->iq.priv;
+	unsigned int i;
+	int err;
+	struct fuse_req *init_req;
+
+	/* TODO lock */
+	if (fs->vqs[VQ_REQUEST].fud) {
+		printk(KERN_ERR "virtio-fs: device already in use\n");
+		err = -EBUSY;
+		goto err;
+	}
+
+	err = -ENOMEM;
+	/* Allocate fuse_dev for hiprio and notification queues */
+	for (i = 0; i < VQ_REQUEST; i++) {
+		struct virtio_fs_vq *fsvq = &fs->vqs[i];
+
+		fsvq->fud = fuse_dev_alloc();
+		if (!fsvq->fud)
+			goto err_free_fuse_devs;
+	}
+
+	init_req = fuse_request_alloc(0);
+	if (!init_req)
+		goto err_free_fuse_devs;
+	__set_bit(FR_BACKGROUND, &init_req->flags);
+
+	d->fudptr = (void **)&fs->vqs[VQ_REQUEST].fud;
+	d->destroy = true; /* Send destroy request on unmount */
+	err = fuse_fill_super_common(sb, d);
+	if (err < 0)
+		goto err_free_init_req;
+
+	fc = fs->vqs[VQ_REQUEST].fud->fc;
+
+	/* TODO take fuse_mutex around this loop? */
+	for (i = 0; i < fs->nvqs; i++) {
+		struct virtio_fs_vq *fsvq = &fs->vqs[i];
+
+		if (i == VQ_REQUEST)
+			continue; /* already initialized */
+		fuse_dev_install(fsvq->fud, fc);
+		atomic_inc(&fc->dev_count);
+	}
+
+	fuse_send_init(fc, init_req);
+	return 0;
+
+err_free_init_req:
+	fuse_request_free(init_req);
+err_free_fuse_devs:
+	for (i = 0; i < fs->nvqs; i++) {
+		struct virtio_fs_vq *fsvq = &fs->vqs[i];
+		fuse_dev_free(fsvq->fud);
+	}
+err:
+	return err;
+}
+
+static void virtio_kill_sb(struct super_block *sb)
+{
+	struct fuse_conn *fc = get_fuse_conn_super(sb);
+	struct virtio_fs *vfs;
+	struct virtio_fs_vq *fsvq;
+
+	/* If mount failed, we can still be called without any fc */
+	if (!fc)
+		return fuse_kill_sb_anon(sb);
+
+	vfs = fc->iq.priv;
+	fsvq = &vfs->vqs[VQ_HIPRIO];
+
+	/* Stop forget queue. Soon destroy will be sent */
+	spin_lock(&fsvq->lock);
+	fsvq->connected = false;
+	spin_unlock(&fsvq->lock);
+	virtio_fs_flush_hiprio_queue(fsvq);
+
+	fuse_kill_sb_anon(sb);
+	virtio_fs_free_devs(vfs);
+}
+
+static int virtio_fs_test_super(struct super_block *s, void *data)
+{
+	struct fuse_conn *fc = data;
+
+	return fc->iq.priv == get_fuse_conn_super(s)->iq.priv;
+
+}
+
+static int virtio_fs_set_super(struct super_block *s, void *data)
+{
+	int err;
+
+	err = get_anon_bdev(&s->s_dev);
+	if (!err)
+		s->s_fs_info = fuse_conn_get(data);
+
+	return err;
+}
+
+static struct dentry *virtio_fs_mount(struct file_system_type *fs_type,
+				      int flags, const char *dev_name,
+				      void *opts)
+{
+	struct virtio_fs *fs;
+	struct super_block *s;
+	struct fuse_conn *fc;
+	int err;
+	struct fuse_mount_data d;
+
+	fs = virtio_fs_find_instance(dev_name);
+	if (!fs) {
+		pr_info("virtio-fs: tag <%s> not found\n", dev_name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (!parse_fuse_opt(opts, &d, 0, current_user_ns()))
+		return ERR_PTR(-EINVAL);
+
+	if (d.fd_present) {
+		printk(KERN_ERR "virtio-fs: fd option cannot be used\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	fc = kzalloc(sizeof(struct fuse_conn), GFP_KERNEL);
+	if (!fc)
+		return ERR_PTR(-ENOMEM);
+	fuse_conn_init(fc, get_user_ns(current_user_ns()), &virtio_fs_fiq_ops,
+		       fs);
+	fc->release = fuse_free_conn;
+
+	s = sget(fs_type, virtio_fs_test_super, virtio_fs_set_super, flags, fc);
+	err = PTR_ERR(s);
+	if (IS_ERR(s))
+		goto err_free_fc;
+
+	err = -EIO;
+	if (WARN_ON(fc->user_ns != s->s_user_ns))
+		goto deactivate;
+
+	if (s->s_root) {
+		err = -EBUSY;
+		if ((flags ^ s->s_flags) & SB_RDONLY)
+			goto deactivate;
+	} else {
+		err = virtio_fs_fill_super(s, opts, &d);
+		if (err)
+			goto deactivate;
+
+		s->s_flags |= SB_ACTIVE;
+	}
+	fuse_conn_put(fc);
+
+	return dget(s->s_root);
+
+deactivate:
+        deactivate_locked_super(s);
+
+err_free_fc:
+	fuse_conn_put(fc);
+	return ERR_PTR(err);
+}
+
+static struct file_system_type virtio_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= KBUILD_MODNAME,
+	.mount		= virtio_fs_mount,
+	.kill_sb	= virtio_kill_sb,
+};
+
+static int __init virtio_fs_init(void)
+{
+	int ret;
+
+	ret = register_virtio_driver(&virtio_fs_driver);
+	if (ret < 0)
+		return ret;
+
+	ret = register_filesystem(&virtio_fs_type);
+	if (ret < 0) {
+		unregister_virtio_driver(&virtio_fs_driver);
+		return ret;
+	}
+
+	return 0;
+}
+module_init(virtio_fs_init);
+
+static void __exit virtio_fs_exit(void)
+{
+	unregister_filesystem(&virtio_fs_type);
+	unregister_virtio_driver(&virtio_fs_driver);
+}
+module_exit(virtio_fs_exit);
+
+MODULE_AUTHOR("Stefan Hajnoczi <stefanha@redhat.com>");
+MODULE_DESCRIPTION("Virtio Filesystem");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_FS(KBUILD_MODNAME);
+MODULE_DEVICE_TABLE(virtio, id_table);
diff --git a/include/uapi/linux/virtio_fs.h b/include/uapi/linux/virtio_fs.h
new file mode 100644
index 000000000000..48f3590dcfbe
--- /dev/null
+++ b/include/uapi/linux/virtio_fs.h
@@ -0,0 +1,41 @@
+#ifndef _UAPI_LINUX_VIRTIO_FS_H
+#define _UAPI_LINUX_VIRTIO_FS_H
+/* This header is BSD licensed so anyone can use the definitions to implement
+ * compatible drivers/servers.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE. */
+#include <linux/types.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_types.h>
+
+struct virtio_fs_config {
+	/* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */
+	__u8 tag[36];
+
+	/* Number of request queues */
+	__u32 num_queues;
+} __attribute__((packed));
+
+#endif /* _UAPI_LINUX_VIRTIO_FS_H */
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 348fd0176f75..585e07b27333 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -44,6 +44,7 @@
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
 #define VIRTIO_ID_IOMMU        23 /* virtio IOMMU */
+#define VIRTIO_ID_FS           26 /* virtio filesystem */
 #define VIRTIO_ID_PMEM         27 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 12/13] virtio-fs: Do not provide abort interface in fusectl
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (10 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 11/13] virtio_fs: add skeleton virtio_fs.ko module Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-21 17:37 ` [PATCH 13/13] init/do_mounts.c: add virtio_fs root fs support Vivek Goyal
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos; +Cc: virtio-fs, vgoyal, stefanha, dgilbert

virtio-fs does not support aborting requests which are being processed. That
is requests which have been sent to fuse daemon on host.

So do not provide "abort" interface for virtio-fs in fusectl.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/control.c   | 4 ++--
 fs/fuse/fuse_i.h    | 4 ++++
 fs/fuse/inode.c     | 1 +
 fs/fuse/virtio_fs.c | 1 +
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index c23f6f243ad4..c4681efc5ece 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -279,8 +279,8 @@ int fuse_ctl_add_conn(struct fuse_conn *fc)
 
 	if (!fuse_ctl_add_dentry(parent, fc, "waiting", S_IFREG | 0400, 1,
 				 NULL, &fuse_ctl_waiting_ops) ||
-	    !fuse_ctl_add_dentry(parent, fc, "abort", S_IFREG | 0200, 1,
-				 NULL, &fuse_ctl_abort_ops) ||
+	    (!fc->no_abort && !fuse_ctl_add_dentry(parent, fc, "abort",
+			S_IFREG | 0200, 1, NULL, &fuse_ctl_abort_ops)) ||
 	    !fuse_ctl_add_dentry(parent, fc, "max_background", S_IFREG | 0600,
 				 1, NULL, &fuse_conn_max_background_ops) ||
 	    !fuse_ctl_add_dentry(parent, fc, "congestion_threshold",
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 73b23421b48e..25a6da6ee8c3 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -73,6 +73,7 @@ struct fuse_mount_data {
 	unsigned default_permissions:1;
 	unsigned allow_other:1;
 	unsigned destroy:1;
+	unsigned no_abort:1;
 	unsigned max_read;
 	unsigned blksize;
 
@@ -789,6 +790,9 @@ struct fuse_conn {
 	/** Does the filesystem support copy_file_range? */
 	unsigned no_copy_file_range:1;
 
+	/** Do not create abort file in fuse control fs */
+	unsigned no_abort:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index fca81c40b2d7..16bcf0f95979 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1157,6 +1157,7 @@ int fuse_fill_super_common(struct super_block *sb,
 	fc->user_id = mount_data->user_id;
 	fc->group_id = mount_data->group_id;
 	fc->max_read = max_t(unsigned, 4096, mount_data->max_read);
+	fc->no_abort = mount_data->no_abort;
 
 	/* Used by get_root_inode() */
 	sb->s_fs_info = fc;
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index ce6b76598e74..ce1de9acde84 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -886,6 +886,7 @@ static int virtio_fs_fill_super(struct super_block *sb, char *opts,
 
 	d->fudptr = (void **)&fs->vqs[VQ_REQUEST].fud;
 	d->destroy = true; /* Send destroy request on unmount */
+	d->no_abort = 1;
 	err = fuse_fill_super_common(sb, d);
 	if (err < 0)
 		goto err_free_init_req;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 13/13] init/do_mounts.c: add virtio_fs root fs support
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (11 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 12/13] virtio-fs: Do not provide abort interface in fusectl Vivek Goyal
@ 2019-08-21 17:37 ` Vivek Goyal
  2019-08-29  9:28 ` [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Miklos Szeredi
  2019-09-03  8:05 ` Miklos Szeredi
  14 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-21 17:37 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, miklos; +Cc: virtio-fs, vgoyal, stefanha, dgilbert

It is useful to mount the root file system via virtio_fs.  During
testing a monolithic kernel is more convenient than an initramfs but
we'll need to teach the kernel how to boot directly from virtio_fs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 init/do_mounts.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/init/do_mounts.c b/init/do_mounts.c
index 53cb37b66227..d7e7bb83f85b 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -554,6 +554,16 @@ void __init mount_root(void)
 			change_floppy("root floppy");
 	}
 #endif
+#ifdef CONFIG_VIRTIO_FS
+	if (root_fs_names && !strcmp(root_fs_names, "virtio_fs")) {
+		if (!do_mount_root(root_device_name, "virtio_fs",
+				   root_mountflags, root_mount_data))
+			return;
+
+		panic("VFS: Unable to mount root fs \"%s\" from virtio_fs",
+		      root_device_name);
+	}
+#endif
 #ifdef CONFIG_BLOCK
 	{
 		int err = create_dev("/dev/root", ROOT_DEV);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/13] fuse: Use default_file_splice_read for direct IO
  2019-08-21 17:37 ` [PATCH 02/13] fuse: Use default_file_splice_read for direct IO Vivek Goyal
@ 2019-08-28  7:45   ` Miklos Szeredi
  2019-08-28 12:27     ` Vivek Goyal
  0 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2019-08-28  7:45 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert, Miklos Szeredi

On Wed, Aug 21, 2019 at 7:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> From: Miklos Szeredi <mszeredi@redhat.com>

Nice patch, except I have no idea why I did this.  Splice with
FOPEN_DIRECT_IO  seems to work fine without it.

Anyway, I'll just drop this, unless someone has an idea why it is
actually needed.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/13] fuse: Use default_file_splice_read for direct IO
  2019-08-28  7:45   ` Miklos Szeredi
@ 2019-08-28 12:27     ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-28 12:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert, Miklos Szeredi

On Wed, Aug 28, 2019 at 09:45:28AM +0200, Miklos Szeredi wrote:
> On Wed, Aug 21, 2019 at 7:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > From: Miklos Szeredi <mszeredi@redhat.com>
> 
> Nice patch, except I have no idea why I did this.  Splice with
> FOPEN_DIRECT_IO  seems to work fine without it.

I don't know either. I took it because it was there in one of the branches
you were doing development on.

> 
> Anyway, I'll just drop this, unless someone has an idea why it is
> actually needed.

Sure, drop this patch. If need be, we can dig it up later.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (12 preceding siblings ...)
  2019-08-21 17:37 ` [PATCH 13/13] init/do_mounts.c: add virtio_fs root fs support Vivek Goyal
@ 2019-08-29  9:28 ` Miklos Szeredi
  2019-08-29 11:58   ` Vivek Goyal
                     ` (3 more replies)
  2019-09-03  8:05 ` Miklos Szeredi
  14 siblings, 4 replies; 33+ messages in thread
From: Miklos Szeredi @ 2019-08-29  9:28 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Wed, Aug 21, 2019 at 7:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> Hi,
>
> Here are the V3 patches for virtio-fs filesystem. This time I have
> broken the patch series in two parts. This is first part which does
> not contain DAX support. Second patch series will contain the patches
> for DAX support.
>
> I have also dropped RFC tag from first patch series as we believe its
> in good enough shape that it should get a consideration for inclusion
> upstream.

Pushed out to

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#for-next

Major changes compared to patchset:

 - renamed to "virtiofs".  Filesystem names don't usually have
underscore before "fs" postfix.

 - removed option parsing completely.  Virtiofs config is fixed to "-o
rootmode=040000,user_id=0,group_id=0,allow_other,default_permissions".
Does this sound reasonable?

There are miscellaneous changes, so needs to be thoroughly tested.

I think we also need something in
"Documentation/filesystems/virtiofs.rst" which describes the design
(how  request gets to userspace and back) and how to set up the
server, etc...  Stefan, Vivek can you do something like that?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-29  9:28 ` [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Miklos Szeredi
@ 2019-08-29 11:58   ` Vivek Goyal
  2019-08-29 12:35   ` Stefan Hajnoczi
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2019-08-29 11:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Thu, Aug 29, 2019 at 11:28:27AM +0200, Miklos Szeredi wrote:
> On Wed, Aug 21, 2019 at 7:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > Hi,
> >
> > Here are the V3 patches for virtio-fs filesystem. This time I have
> > broken the patch series in two parts. This is first part which does
> > not contain DAX support. Second patch series will contain the patches
> > for DAX support.
> >
> > I have also dropped RFC tag from first patch series as we believe its
> > in good enough shape that it should get a consideration for inclusion
> > upstream.
> 
> Pushed out to
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#for-next
> 

Awesome.

> Major changes compared to patchset:
> 
>  - renamed to "virtiofs".  Filesystem names don't usually have
> underscore before "fs" postfix.
> 

Sound good to me.

>  - removed option parsing completely.  Virtiofs config is fixed to "-o
> rootmode=040000,user_id=0,group_id=0,allow_other,default_permissions".
> Does this sound reasonable?

These are the options we are using now and looks like they make lot of
sense for virtiofs. I guess if somebody needs a different configuration
later we can introduce option parsing and override these defaults.

> 
> There are miscellaneous changes, so needs to be thoroughly tested.

I will test these.

> 
> I think we also need something in
> "Documentation/filesystems/virtiofs.rst" which describes the design
> (how  request gets to userspace and back) and how to set up the
> server, etc...  Stefan, Vivek can you do something like that?

Sure, I will write up something and take Stefan's input as well.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-29  9:28 ` [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Miklos Szeredi
  2019-08-29 11:58   ` Vivek Goyal
@ 2019-08-29 12:35   ` Stefan Hajnoczi
  2019-08-29 13:29   ` Vivek Goyal
  2019-08-29 16:01   ` Vivek Goyal
  3 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-08-29 12:35 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, linux-fsdevel, linux-kernel, virtio-fs,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

On Thu, Aug 29, 2019 at 11:28:27AM +0200, Miklos Szeredi wrote:
> On Wed, Aug 21, 2019 at 7:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:

Thanks!

>  - removed option parsing completely.  Virtiofs config is fixed to "-o
> rootmode=040000,user_id=0,group_id=0,allow_other,default_permissions".
> Does this sound reasonable?

That simplifies things for users and I've never seen a need to change
these options in virtio-fs.  If we need the control for some reason in
the future we can add options back in again.

> I think we also need something in
> "Documentation/filesystems/virtiofs.rst" which describes the design
> (how  request gets to userspace and back) and how to set up the
> server, etc...  Stefan, Vivek can you do something like that?

Sure.  I'll send a patch.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-29  9:28 ` [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Miklos Szeredi
  2019-08-29 11:58   ` Vivek Goyal
  2019-08-29 12:35   ` Stefan Hajnoczi
@ 2019-08-29 13:29   ` Vivek Goyal
  2019-08-29 13:41     ` Miklos Szeredi
  2019-08-29 16:01   ` Vivek Goyal
  3 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2019-08-29 13:29 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Thu, Aug 29, 2019 at 11:28:27AM +0200, Miklos Szeredi wrote:
> On Wed, Aug 21, 2019 at 7:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > Hi,
> >
> > Here are the V3 patches for virtio-fs filesystem. This time I have
> > broken the patch series in two parts. This is first part which does
> > not contain DAX support. Second patch series will contain the patches
> > for DAX support.
> >
> > I have also dropped RFC tag from first patch series as we believe its
> > in good enough shape that it should get a consideration for inclusion
> > upstream.
> 
> Pushed out to
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#for-next
> 

Hi Miklos,

Compilation of virtio-fs as module fails. While it works if compiled as
non-module.

fs/fuse/virtio_fs.c: In function ‘copy_args_to_argbuf’:
fs/fuse/virtio_fs.c:255:5: error: ‘struct fuse_req’ has no member named ‘argbuf’
  req->argbuf = kmalloc(len, GFP_ATOMIC);

It can't find req->argbuf. 

I noticed that you have put ifdef around argbuf.

#ifdef CONFIG_VIRTIO_FS
        /** virtio-fs's physically contiguous buffer for in and out args */
        void *argbuf;
#endif

It should have worked. Not sure why it is not working.

Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-29 13:29   ` Vivek Goyal
@ 2019-08-29 13:41     ` Miklos Szeredi
  2019-08-29 14:31       ` Vivek Goyal
  0 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2019-08-29 13:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Thu, Aug 29, 2019 at 3:29 PM Vivek Goyal <vgoyal@redhat.com> wrote:

> #ifdef CONFIG_VIRTIO_FS
>         /** virtio-fs's physically contiguous buffer for in and out args */
>         void *argbuf;
> #endif
>
> It should have worked. Not sure why it is not working.

Needs to be changed to

#if IS_ENABLED(CONFIG_VIRTIO_FS)

Pushed out fixed version.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-29 13:41     ` Miklos Szeredi
@ 2019-08-29 14:31       ` Vivek Goyal
  2019-08-29 14:47         ` Miklos Szeredi
  0 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2019-08-29 14:31 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Thu, Aug 29, 2019 at 03:41:26PM +0200, Miklos Szeredi wrote:
> On Thu, Aug 29, 2019 at 3:29 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > #ifdef CONFIG_VIRTIO_FS
> >         /** virtio-fs's physically contiguous buffer for in and out args */
> >         void *argbuf;
> > #endif
> >
> > It should have worked. Not sure why it is not working.
> 
> Needs to be changed to
> 
> #if IS_ENABLED(CONFIG_VIRTIO_FS)
> 
> Pushed out fixed version.

Cool. That works. Faced another compilation error with "make allmodconfig"
config file.

  HDRINST usr/include/linux/virtio_fs.h
error: include/uapi/linux/virtio_fs.h: missing "WITH Linux-syscall-note" for SPDX-License-Identifier
make[1]: *** [scripts/Makefile.headersinst:66: usr/include/linux/virtio_fs.h] Error 1

Looks like include/uapi/linux/virtio_fs.h needs following.

-/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */

Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-29 14:31       ` Vivek Goyal
@ 2019-08-29 14:47         ` Miklos Szeredi
  0 siblings, 0 replies; 33+ messages in thread
From: Miklos Szeredi @ 2019-08-29 14:47 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Thu, Aug 29, 2019 at 4:31 PM Vivek Goyal <vgoyal@redhat.com> wrote:

> Cool. That works. Faced another compilation error with "make allmodconfig"
> config file.
>
>   HDRINST usr/include/linux/virtio_fs.h
> error: include/uapi/linux/virtio_fs.h: missing "WITH Linux-syscall-note" for SPDX-License-Identifier
> make[1]: *** [scripts/Makefile.headersinst:66: usr/include/linux/virtio_fs.h] Error 1
>
> Looks like include/uapi/linux/virtio_fs.h needs following.
>
> -/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */

Fixed and pushed.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-29  9:28 ` [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Miklos Szeredi
                     ` (2 preceding siblings ...)
  2019-08-29 13:29   ` Vivek Goyal
@ 2019-08-29 16:01   ` Vivek Goyal
  2019-08-31  5:46     ` Miklos Szeredi
  3 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2019-08-29 16:01 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Thu, Aug 29, 2019 at 11:28:27AM +0200, Miklos Szeredi wrote:

[..]
> There are miscellaneous changes, so needs to be thoroughly tested.

Hi Miklos,

First round of tests passed. Ran pjdfstests, blogbench and bunch of fio
jobs and everyting looks good.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-29 16:01   ` Vivek Goyal
@ 2019-08-31  5:46     ` Miklos Szeredi
  0 siblings, 0 replies; 33+ messages in thread
From: Miklos Szeredi @ 2019-08-31  5:46 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-kernel, virtio-fs, Stefan Hajnoczi,
	Dr. David Alan Gilbert

On Thu, Aug 29, 2019 at 6:01 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Thu, Aug 29, 2019 at 11:28:27AM +0200, Miklos Szeredi wrote:
>
> [..]
> > There are miscellaneous changes, so needs to be thoroughly tested.
>
> Hi Miklos,
>
> First round of tests passed. Ran pjdfstests, blogbench and bunch of fio
> jobs and everyting looks good.

fsx-linux with cache=always revealed a couple of bugs in the argpages
case.  Pushed fixes for those too.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
                   ` (13 preceding siblings ...)
  2019-08-29  9:28 ` [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Miklos Szeredi
@ 2019-09-03  8:05 ` Miklos Szeredi
  2019-09-03  8:31   ` Michael S. Tsirkin
  14 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2019-09-03  8:05 UTC (permalink / raw)
  To: Vivek Goyal, Michael S. Tsirkin, Jason Wang
  Cc: virtualization, linux-fsdevel, linux-kernel, virtio-fs,
	Stefan Hajnoczi, Dr. David Alan Gilbert

[Cc:  virtualization@lists.linux-foundation.org, "Michael S. Tsirkin"
<mst@redhat.com>, Jason Wang <jasowang@redhat.com>]

It'd be nice to have an ACK for this from the virtio maintainers.

Thanks,
Miklos

On Wed, Aug 21, 2019 at 7:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> Hi,
>
> Here are the V3 patches for virtio-fs filesystem. This time I have
> broken the patch series in two parts. This is first part which does
> not contain DAX support. Second patch series will contain the patches
> for DAX support.
>
> I have also dropped RFC tag from first patch series as we believe its
> in good enough shape that it should get a consideration for inclusion
> upstream.
>
> These patches apply on top of 5.3-rc5 kernel and are also available
> here.
>
> https://github.com/rhvgoyal/linux/commits/vivek-5.3-aug-21-2019
>
> Patches for V1 and V2 were posted here.
>
> https://lwn.net/ml/linux-fsdevel/20181210171318.16998-1-vgoyal@redhat.com/
> http://lkml.iu.edu/hypermail/linux/kernel/1905.1/07232.html
>
> More information about the project can be found here.
>
> https://virtio-fs.gitlab.io
>
> Changes from V2
> ===============
> - Various bug fixes and performance improvements.
>
> HOWTO
> ======
> We have put instructions on how to use it here.
>
> https://virtio-fs.gitlab.io/
>
> Some Performance Numbers
> ========================
> I have basically run bunch of fio jobs to get a sense of speed of
> various operations. I wrote a simple wrapper script to run fio jobs
> 3 times and take their average and report it. These scripts are available
> here.
>
> https://github.com/rhvgoyal/virtiofs-tests
>
> I set up a directory on ramfs on host and exported that directory inside
> guest using virtio-9p and virtio-fs and ran tests inside guests. Ran
> tests with cache=none both for virtio-9p and virtio-fs so that no caching
> happens in guest. For virtio-fs, I ran an additional set of tests with
> dax enabled. Dax is not part of first patch series but I included
> results here because dax seems to get the maximum performance advantage
> and its shows the real potential of virtio-fs.
>
> Test Setup
> -----------
> - A fedora 28 host with 32G RAM, 2 sockets (6 cores per socket, 2
>   threads per core)
>
> - Using ramfs on host as backing store. 4 fio files of 2G each.
>
> - Created a VM with 16 VCPUS and 8GB memory. An 8GB cache window (for dax
>   mmap).
>
> Test Results
> ------------
> - Results in three configurations have been reported. 9p (cache=none),
>   virtio-fs (cache=none) and virtio-fs (cache=none + dax).
>
>   There are other caching modes as well but to me cache=none seemed most
>   interesting for now because it does not cache anything in guest
>   and provides strong coherence. Other modes which provide less strong
>   coherence and hence are faster are yet to be benchmarked.
>
> - Three fio ioengines psync, libaio and mmap have been used.
>
> - I/O Workload of randread, radwrite, seqread and seqwrite have been run.
>
> - Each file size is 2G. Block size 4K. iodepth=16
>
> - "multi" means same operation was done with 4 jobs and each job is
>   operating on a file of size 2G.
>
> - Some results are "0 (KiB/s)". That means that particular operation is
>   not supported in that configuration.
>
> NAME                    I/O Operation           BW(Read/Write)
>
> 9p-cache-none           seqread-psync           27(MiB/s)
> virtiofs-cache-none     seqread-psync           35(MiB/s)
> virtiofs-dax-cache-none seqread-psync           245(MiB/s)
>
> 9p-cache-none           seqread-psync-multi     117(MiB/s)
> virtiofs-cache-none     seqread-psync-multi     162(MiB/s)
> virtiofs-dax-cache-none seqread-psync-multi     894(MiB/s)
>
> 9p-cache-none           seqread-mmap            24(MiB/s)
> virtiofs-cache-none     seqread-mmap            0(KiB/s)
> virtiofs-dax-cache-none seqread-mmap            168(MiB/s)
>
> 9p-cache-none           seqread-mmap-multi      115(MiB/s)
> virtiofs-cache-none     seqread-mmap-multi      0(KiB/s)
> virtiofs-dax-cache-none seqread-mmap-multi      614(MiB/s)
>
> 9p-cache-none           seqread-libaio          26(MiB/s)
> virtiofs-cache-none     seqread-libaio          139(MiB/s)
> virtiofs-dax-cache-none seqread-libaio          160(MiB/s)
>
> 9p-cache-none           seqread-libaio-multi    129(MiB/s)
> virtiofs-cache-none     seqread-libaio-multi    142(MiB/s)
> virtiofs-dax-cache-none seqread-libaio-multi    577(MiB/s)
>
> 9p-cache-none           randread-psync          29(MiB/s)
> virtiofs-cache-none     randread-psync          34(MiB/s)
> virtiofs-dax-cache-none randread-psync          256(MiB/s)
>
> 9p-cache-none           randread-psync-multi    139(MiB/s)
> virtiofs-cache-none     randread-psync-multi    153(MiB/s)
> virtiofs-dax-cache-none randread-psync-multi    245(MiB/s)
>
> 9p-cache-none           randread-mmap           22(MiB/s)
> virtiofs-cache-none     randread-mmap           0(KiB/s)
> virtiofs-dax-cache-none randread-mmap           162(MiB/s)
>
> 9p-cache-none           randread-mmap-multi     111(MiB/s)
> virtiofs-cache-none     randread-mmap-multi     0(KiB/s)
> virtiofs-dax-cache-none randread-mmap-multi     215(MiB/s)
>
> 9p-cache-none           randread-libaio         26(MiB/s)
> virtiofs-cache-none     randread-libaio         135(MiB/s)
> virtiofs-dax-cache-none randread-libaio         157(MiB/s)
>
> 9p-cache-none           randread-libaio-multi   133(MiB/s)
> virtiofs-cache-none     randread-libaio-multi   245(MiB/s)
> virtiofs-dax-cache-none randread-libaio-multi   163(MiB/s)
>
> 9p-cache-none           seqwrite-psync          28(MiB/s)
> virtiofs-cache-none     seqwrite-psync          34(MiB/s)
> virtiofs-dax-cache-none seqwrite-psync          203(MiB/s)
>
> 9p-cache-none           seqwrite-psync-multi    128(MiB/s)
> virtiofs-cache-none     seqwrite-psync-multi    155(MiB/s)
> virtiofs-dax-cache-none seqwrite-psync-multi    717(MiB/s)
>
> 9p-cache-none           seqwrite-mmap           0(KiB/s)
> virtiofs-cache-none     seqwrite-mmap           0(KiB/s)
> virtiofs-dax-cache-none seqwrite-mmap           165(MiB/s)
>
> 9p-cache-none           seqwrite-mmap-multi     0(KiB/s)
> virtiofs-cache-none     seqwrite-mmap-multi     0(KiB/s)
> virtiofs-dax-cache-none seqwrite-mmap-multi     511(MiB/s)
>
> 9p-cache-none           seqwrite-libaio         27(MiB/s)
> virtiofs-cache-none     seqwrite-libaio         128(MiB/s)
> virtiofs-dax-cache-none seqwrite-libaio         141(MiB/s)
>
> 9p-cache-none           seqwrite-libaio-multi   119(MiB/s)
> virtiofs-cache-none     seqwrite-libaio-multi   242(MiB/s)
> virtiofs-dax-cache-none seqwrite-libaio-multi   505(MiB/s)
>
> 9p-cache-none           randwrite-psync         27(MiB/s)
> virtiofs-cache-none     randwrite-psync         34(MiB/s)
> virtiofs-dax-cache-none randwrite-psync         189(MiB/s)
>
> 9p-cache-none           randwrite-psync-multi   137(MiB/s)
> virtiofs-cache-none     randwrite-psync-multi   150(MiB/s)
> virtiofs-dax-cache-none randwrite-psync-multi   233(MiB/s)
>
> 9p-cache-none           randwrite-mmap          0(KiB/s)
> virtiofs-cache-none     randwrite-mmap          0(KiB/s)
> virtiofs-dax-cache-none randwrite-mmap          120(MiB/s)
>
> 9p-cache-none           randwrite-mmap-multi    0(KiB/s)
> virtiofs-cache-none     randwrite-mmap-multi    0(KiB/s)
> virtiofs-dax-cache-none randwrite-mmap-multi    200(MiB/s)
>
> 9p-cache-none           randwrite-libaio        25(MiB/s)
> virtiofs-cache-none     randwrite-libaio        124(MiB/s)
> virtiofs-dax-cache-none randwrite-libaio        131(MiB/s)
>
> 9p-cache-none           randwrite-libaio-multi  125(MiB/s)
> virtiofs-cache-none     randwrite-libaio-multi  241(MiB/s)
> virtiofs-dax-cache-none randwrite-libaio-multi  163(MiB/s)
>
> Conclusions
> ===========
> - In general virtio-fs seems faster than virtio-9p. Using dax makes it
>   really interesting.
>
> Note:
>   Right now dax window is 8G and max fio file size is 8G as well (4
>   files of 2G each). That means everything fits into dax window and no
>   reclaim is needed. Dax window reclaim logic is slower and if file
>   size is bigger than dax window size, performance slows down.
>
> Description from previous postings
> ==================================
>
> Design Overview
> ===============
> With the goal of designing something with better performance and local file
> system semantics, a bunch of ideas were proposed.
>
> - Use fuse protocol (instead of 9p) for communication between guest
>   and host. Guest kernel will be fuse client and a fuse server will
>   run on host to serve the requests.
>
> - For data access inside guest, mmap portion of file in QEMU address
>   space and guest accesses this memory using dax. That way guest page
>   cache is bypassed and there is only one copy of data (on host). This
>   will also enable mmap(MAP_SHARED) between guests.
>
> - For metadata coherency, there is a shared memory region which contains
>   version number associated with metadata and any guest changing metadata
>   updates version number and other guests refresh metadata on next
>   access. This is yet to be implemented.
>
> How virtio-fs differs from existing approaches
> ==============================================
> The unique idea behind virtio-fs is to take advantage of the co-location
> of the virtual machine and hypervisor to avoid communication (vmexits).
>
> DAX allows file contents to be accessed without communication with the
> hypervisor. The shared memory region for metadata avoids communication in
> the common case where metadata is unchanged.
>
> By replacing expensive communication with cheaper shared memory accesses,
> we expect to achieve better performance than approaches based on network
> file system protocols. In addition, this also makes it easier to achieve
> local file system semantics (coherency).
>
> These techniques are not applicable to network file system protocols since
> the communications channel is bypassed by taking advantage of shared memory
> on a local machine. This is why we decided to build virtio-fs rather than
> focus on 9P or NFS.
>
> Caching Modes
> =============
> Like virtio-9p, different caching modes are supported which determine the
> coherency level as well. The “cache=FOO” and “writeback” options control the
> level of coherence between the guest and host filesystems.
>
> - cache=none
>   metadata, data and pathname lookup are not cached in guest. They are always
>   fetched from host and any changes are immediately pushed to host.
>
> - cache=always
>   metadata, data and pathname lookup are cached in guest and never expire.
>
> - cache=auto
>   metadata and pathname lookup cache expires after a configured amount of time
>   (default is 1 second). Data is cached while the file is open (close to open
>   consistency).
>
> - writeback/no_writeback
>   These options control the writeback strategy.  If writeback is disabled,
>   then normal writes will immediately be synchronized with the host fs. If
>   writeback is enabled, then writes may be cached in the guest until the file
>   is closed or an fsync(2) performed. This option has no effect on mmap-ed
>   writes or writes going through the DAX mechanism.
>
> Thanks
> Vivek
>
> Miklos Szeredi (2):
>   fuse: delete dentry if timeout is zero
>   fuse: Use default_file_splice_read for direct IO
>
> Stefan Hajnoczi (6):
>   fuse: export fuse_end_request()
>   fuse: export fuse_len_args()
>   fuse: export fuse_get_unique()
>   fuse: extract fuse_fill_super_common()
>   fuse: add fuse_iqueue_ops callbacks
>   virtio_fs: add skeleton virtio_fs.ko module
>
> Vivek Goyal (5):
>   fuse: Export fuse_send_init_request()
>   Export fuse_dequeue_forget() function
>   fuse: Separate fuse device allocation and installation in fuse_conn
>   virtio-fs: Do not provide abort interface in fusectl
>   init/do_mounts.c: add virtio_fs root fs support
>
>  fs/fuse/Kconfig                 |   11 +
>  fs/fuse/Makefile                |    1 +
>  fs/fuse/control.c               |    4 +-
>  fs/fuse/cuse.c                  |    4 +-
>  fs/fuse/dev.c                   |   89 ++-
>  fs/fuse/dir.c                   |   26 +-
>  fs/fuse/file.c                  |   15 +-
>  fs/fuse/fuse_i.h                |  120 +++-
>  fs/fuse/inode.c                 |  203 +++---
>  fs/fuse/virtio_fs.c             | 1061 +++++++++++++++++++++++++++++++
>  fs/splice.c                     |    3 +-
>  include/linux/fs.h              |    2 +
>  include/uapi/linux/virtio_fs.h  |   41 ++
>  include/uapi/linux/virtio_ids.h |    1 +
>  init/do_mounts.c                |   10 +
>  15 files changed, 1462 insertions(+), 129 deletions(-)
>  create mode 100644 fs/fuse/virtio_fs.c
>  create mode 100644 include/uapi/linux/virtio_fs.h
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-09-03  8:05 ` Miklos Szeredi
@ 2019-09-03  8:31   ` Michael S. Tsirkin
  2019-09-03  9:17     ` Miklos Szeredi
  2019-09-03 14:07     ` Vivek Goyal
  0 siblings, 2 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2019-09-03  8:31 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Jason Wang, virtualization, linux-fsdevel,
	linux-kernel, virtio-fs, Stefan Hajnoczi, Dr. David Alan Gilbert

On Tue, Sep 03, 2019 at 10:05:02AM +0200, Miklos Szeredi wrote:
> [Cc:  virtualization@lists.linux-foundation.org, "Michael S. Tsirkin"
> <mst@redhat.com>, Jason Wang <jasowang@redhat.com>]
> 
> It'd be nice to have an ACK for this from the virtio maintainers.
> 
> Thanks,
> Miklos

Can the patches themselves be posted to the relevant list(s) please?
If possible, please also include "v3" in all patches so they are
easier to find.

I poked at
https://lwn.net/ml/linux-kernel/20190821173742.24574-1-vgoyal@redhat.com/
specifically
https://lwn.net/ml/linux-kernel/20190821173742.24574-12-vgoyal@redhat.com/
and things like:
+	/* TODO lock */
give me pause.

Cleanup generally seems broken to me - what pauses the FS

What about the rest of TODOs in that file?

use of usleep is hacky - can't we do better e.g. with a
completion?

Some typos - e.g. "reuests".


> 
> On Wed, Aug 21, 2019 at 7:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > Hi,
> >
> > Here are the V3 patches for virtio-fs filesystem. This time I have
> > broken the patch series in two parts. This is first part which does
> > not contain DAX support. Second patch series will contain the patches
> > for DAX support.
> >
> > I have also dropped RFC tag from first patch series as we believe its
> > in good enough shape that it should get a consideration for inclusion
> > upstream.
> >
> > These patches apply on top of 5.3-rc5 kernel and are also available
> > here.
> >
> > https://github.com/rhvgoyal/linux/commits/vivek-5.3-aug-21-2019
> >
> > Patches for V1 and V2 were posted here.
> >
> > https://lwn.net/ml/linux-fsdevel/20181210171318.16998-1-vgoyal@redhat.com/
> > http://lkml.iu.edu/hypermail/linux/kernel/1905.1/07232.html
> >
> > More information about the project can be found here.
> >
> > https://virtio-fs.gitlab.io
> >
> > Changes from V2
> > ===============
> > - Various bug fixes and performance improvements.
> >
> > HOWTO
> > ======
> > We have put instructions on how to use it here.
> >
> > https://virtio-fs.gitlab.io/
> >
> > Some Performance Numbers
> > ========================
> > I have basically run bunch of fio jobs to get a sense of speed of
> > various operations. I wrote a simple wrapper script to run fio jobs
> > 3 times and take their average and report it. These scripts are available
> > here.
> >
> > https://github.com/rhvgoyal/virtiofs-tests
> >
> > I set up a directory on ramfs on host and exported that directory inside
> > guest using virtio-9p and virtio-fs and ran tests inside guests. Ran
> > tests with cache=none both for virtio-9p and virtio-fs so that no caching
> > happens in guest. For virtio-fs, I ran an additional set of tests with
> > dax enabled. Dax is not part of first patch series but I included
> > results here because dax seems to get the maximum performance advantage
> > and its shows the real potential of virtio-fs.
> >
> > Test Setup
> > -----------
> > - A fedora 28 host with 32G RAM, 2 sockets (6 cores per socket, 2
> >   threads per core)
> >
> > - Using ramfs on host as backing store. 4 fio files of 2G each.
> >
> > - Created a VM with 16 VCPUS and 8GB memory. An 8GB cache window (for dax
> >   mmap).
> >
> > Test Results
> > ------------
> > - Results in three configurations have been reported. 9p (cache=none),
> >   virtio-fs (cache=none) and virtio-fs (cache=none + dax).
> >
> >   There are other caching modes as well but to me cache=none seemed most
> >   interesting for now because it does not cache anything in guest
> >   and provides strong coherence. Other modes which provide less strong
> >   coherence and hence are faster are yet to be benchmarked.
> >
> > - Three fio ioengines psync, libaio and mmap have been used.
> >
> > - I/O Workload of randread, radwrite, seqread and seqwrite have been run.
> >
> > - Each file size is 2G. Block size 4K. iodepth=16
> >
> > - "multi" means same operation was done with 4 jobs and each job is
> >   operating on a file of size 2G.
> >
> > - Some results are "0 (KiB/s)". That means that particular operation is
> >   not supported in that configuration.
> >
> > NAME                    I/O Operation           BW(Read/Write)
> >
> > 9p-cache-none           seqread-psync           27(MiB/s)
> > virtiofs-cache-none     seqread-psync           35(MiB/s)
> > virtiofs-dax-cache-none seqread-psync           245(MiB/s)
> >
> > 9p-cache-none           seqread-psync-multi     117(MiB/s)
> > virtiofs-cache-none     seqread-psync-multi     162(MiB/s)
> > virtiofs-dax-cache-none seqread-psync-multi     894(MiB/s)
> >
> > 9p-cache-none           seqread-mmap            24(MiB/s)
> > virtiofs-cache-none     seqread-mmap            0(KiB/s)
> > virtiofs-dax-cache-none seqread-mmap            168(MiB/s)
> >
> > 9p-cache-none           seqread-mmap-multi      115(MiB/s)
> > virtiofs-cache-none     seqread-mmap-multi      0(KiB/s)
> > virtiofs-dax-cache-none seqread-mmap-multi      614(MiB/s)
> >
> > 9p-cache-none           seqread-libaio          26(MiB/s)
> > virtiofs-cache-none     seqread-libaio          139(MiB/s)
> > virtiofs-dax-cache-none seqread-libaio          160(MiB/s)
> >
> > 9p-cache-none           seqread-libaio-multi    129(MiB/s)
> > virtiofs-cache-none     seqread-libaio-multi    142(MiB/s)
> > virtiofs-dax-cache-none seqread-libaio-multi    577(MiB/s)
> >
> > 9p-cache-none           randread-psync          29(MiB/s)
> > virtiofs-cache-none     randread-psync          34(MiB/s)
> > virtiofs-dax-cache-none randread-psync          256(MiB/s)
> >
> > 9p-cache-none           randread-psync-multi    139(MiB/s)
> > virtiofs-cache-none     randread-psync-multi    153(MiB/s)
> > virtiofs-dax-cache-none randread-psync-multi    245(MiB/s)
> >
> > 9p-cache-none           randread-mmap           22(MiB/s)
> > virtiofs-cache-none     randread-mmap           0(KiB/s)
> > virtiofs-dax-cache-none randread-mmap           162(MiB/s)
> >
> > 9p-cache-none           randread-mmap-multi     111(MiB/s)
> > virtiofs-cache-none     randread-mmap-multi     0(KiB/s)
> > virtiofs-dax-cache-none randread-mmap-multi     215(MiB/s)
> >
> > 9p-cache-none           randread-libaio         26(MiB/s)
> > virtiofs-cache-none     randread-libaio         135(MiB/s)
> > virtiofs-dax-cache-none randread-libaio         157(MiB/s)
> >
> > 9p-cache-none           randread-libaio-multi   133(MiB/s)
> > virtiofs-cache-none     randread-libaio-multi   245(MiB/s)
> > virtiofs-dax-cache-none randread-libaio-multi   163(MiB/s)
> >
> > 9p-cache-none           seqwrite-psync          28(MiB/s)
> > virtiofs-cache-none     seqwrite-psync          34(MiB/s)
> > virtiofs-dax-cache-none seqwrite-psync          203(MiB/s)
> >
> > 9p-cache-none           seqwrite-psync-multi    128(MiB/s)
> > virtiofs-cache-none     seqwrite-psync-multi    155(MiB/s)
> > virtiofs-dax-cache-none seqwrite-psync-multi    717(MiB/s)
> >
> > 9p-cache-none           seqwrite-mmap           0(KiB/s)
> > virtiofs-cache-none     seqwrite-mmap           0(KiB/s)
> > virtiofs-dax-cache-none seqwrite-mmap           165(MiB/s)
> >
> > 9p-cache-none           seqwrite-mmap-multi     0(KiB/s)
> > virtiofs-cache-none     seqwrite-mmap-multi     0(KiB/s)
> > virtiofs-dax-cache-none seqwrite-mmap-multi     511(MiB/s)
> >
> > 9p-cache-none           seqwrite-libaio         27(MiB/s)
> > virtiofs-cache-none     seqwrite-libaio         128(MiB/s)
> > virtiofs-dax-cache-none seqwrite-libaio         141(MiB/s)
> >
> > 9p-cache-none           seqwrite-libaio-multi   119(MiB/s)
> > virtiofs-cache-none     seqwrite-libaio-multi   242(MiB/s)
> > virtiofs-dax-cache-none seqwrite-libaio-multi   505(MiB/s)
> >
> > 9p-cache-none           randwrite-psync         27(MiB/s)
> > virtiofs-cache-none     randwrite-psync         34(MiB/s)
> > virtiofs-dax-cache-none randwrite-psync         189(MiB/s)
> >
> > 9p-cache-none           randwrite-psync-multi   137(MiB/s)
> > virtiofs-cache-none     randwrite-psync-multi   150(MiB/s)
> > virtiofs-dax-cache-none randwrite-psync-multi   233(MiB/s)
> >
> > 9p-cache-none           randwrite-mmap          0(KiB/s)
> > virtiofs-cache-none     randwrite-mmap          0(KiB/s)
> > virtiofs-dax-cache-none randwrite-mmap          120(MiB/s)
> >
> > 9p-cache-none           randwrite-mmap-multi    0(KiB/s)
> > virtiofs-cache-none     randwrite-mmap-multi    0(KiB/s)
> > virtiofs-dax-cache-none randwrite-mmap-multi    200(MiB/s)
> >
> > 9p-cache-none           randwrite-libaio        25(MiB/s)
> > virtiofs-cache-none     randwrite-libaio        124(MiB/s)
> > virtiofs-dax-cache-none randwrite-libaio        131(MiB/s)
> >
> > 9p-cache-none           randwrite-libaio-multi  125(MiB/s)
> > virtiofs-cache-none     randwrite-libaio-multi  241(MiB/s)
> > virtiofs-dax-cache-none randwrite-libaio-multi  163(MiB/s)
> >
> > Conclusions
> > ===========
> > - In general virtio-fs seems faster than virtio-9p. Using dax makes it
> >   really interesting.
> >
> > Note:
> >   Right now dax window is 8G and max fio file size is 8G as well (4
> >   files of 2G each). That means everything fits into dax window and no
> >   reclaim is needed. Dax window reclaim logic is slower and if file
> >   size is bigger than dax window size, performance slows down.
> >
> > Description from previous postings
> > ==================================
> >
> > Design Overview
> > ===============
> > With the goal of designing something with better performance and local file
> > system semantics, a bunch of ideas were proposed.
> >
> > - Use fuse protocol (instead of 9p) for communication between guest
> >   and host. Guest kernel will be fuse client and a fuse server will
> >   run on host to serve the requests.
> >
> > - For data access inside guest, mmap portion of file in QEMU address
> >   space and guest accesses this memory using dax. That way guest page
> >   cache is bypassed and there is only one copy of data (on host). This
> >   will also enable mmap(MAP_SHARED) between guests.
> >
> > - For metadata coherency, there is a shared memory region which contains
> >   version number associated with metadata and any guest changing metadata
> >   updates version number and other guests refresh metadata on next
> >   access. This is yet to be implemented.
> >
> > How virtio-fs differs from existing approaches
> > ==============================================
> > The unique idea behind virtio-fs is to take advantage of the co-location
> > of the virtual machine and hypervisor to avoid communication (vmexits).
> >
> > DAX allows file contents to be accessed without communication with the
> > hypervisor. The shared memory region for metadata avoids communication in
> > the common case where metadata is unchanged.
> >
> > By replacing expensive communication with cheaper shared memory accesses,
> > we expect to achieve better performance than approaches based on network
> > file system protocols. In addition, this also makes it easier to achieve
> > local file system semantics (coherency).
> >
> > These techniques are not applicable to network file system protocols since
> > the communications channel is bypassed by taking advantage of shared memory
> > on a local machine. This is why we decided to build virtio-fs rather than
> > focus on 9P or NFS.
> >
> > Caching Modes
> > =============
> > Like virtio-9p, different caching modes are supported which determine the
> > coherency level as well. The “cache=FOO” and “writeback” options control the
> > level of coherence between the guest and host filesystems.
> >
> > - cache=none
> >   metadata, data and pathname lookup are not cached in guest. They are always
> >   fetched from host and any changes are immediately pushed to host.
> >
> > - cache=always
> >   metadata, data and pathname lookup are cached in guest and never expire.
> >
> > - cache=auto
> >   metadata and pathname lookup cache expires after a configured amount of time
> >   (default is 1 second). Data is cached while the file is open (close to open
> >   consistency).
> >
> > - writeback/no_writeback
> >   These options control the writeback strategy.  If writeback is disabled,
> >   then normal writes will immediately be synchronized with the host fs. If
> >   writeback is enabled, then writes may be cached in the guest until the file
> >   is closed or an fsync(2) performed. This option has no effect on mmap-ed
> >   writes or writes going through the DAX mechanism.
> >
> > Thanks
> > Vivek
> >
> > Miklos Szeredi (2):
> >   fuse: delete dentry if timeout is zero
> >   fuse: Use default_file_splice_read for direct IO
> >
> > Stefan Hajnoczi (6):
> >   fuse: export fuse_end_request()
> >   fuse: export fuse_len_args()
> >   fuse: export fuse_get_unique()
> >   fuse: extract fuse_fill_super_common()
> >   fuse: add fuse_iqueue_ops callbacks
> >   virtio_fs: add skeleton virtio_fs.ko module
> >
> > Vivek Goyal (5):
> >   fuse: Export fuse_send_init_request()
> >   Export fuse_dequeue_forget() function
> >   fuse: Separate fuse device allocation and installation in fuse_conn
> >   virtio-fs: Do not provide abort interface in fusectl
> >   init/do_mounts.c: add virtio_fs root fs support
> >
> >  fs/fuse/Kconfig                 |   11 +
> >  fs/fuse/Makefile                |    1 +
> >  fs/fuse/control.c               |    4 +-
> >  fs/fuse/cuse.c                  |    4 +-
> >  fs/fuse/dev.c                   |   89 ++-
> >  fs/fuse/dir.c                   |   26 +-
> >  fs/fuse/file.c                  |   15 +-
> >  fs/fuse/fuse_i.h                |  120 +++-
> >  fs/fuse/inode.c                 |  203 +++---
> >  fs/fuse/virtio_fs.c             | 1061 +++++++++++++++++++++++++++++++
> >  fs/splice.c                     |    3 +-
> >  include/linux/fs.h              |    2 +
> >  include/uapi/linux/virtio_fs.h  |   41 ++
> >  include/uapi/linux/virtio_ids.h |    1 +
> >  init/do_mounts.c                |   10 +
> >  15 files changed, 1462 insertions(+), 129 deletions(-)
> >  create mode 100644 fs/fuse/virtio_fs.c
> >  create mode 100644 include/uapi/linux/virtio_fs.h

Don't the new files need a MAINTAINERS entry?
I think we want virtualization@lists.linux-foundation.org to be
copied.


> >
> > --
> > 2.20.1
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-09-03  8:31   ` Michael S. Tsirkin
@ 2019-09-03  9:17     ` Miklos Szeredi
  2019-09-04 15:54       ` Stefan Hajnoczi
  2019-09-03 14:07     ` Vivek Goyal
  1 sibling, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2019-09-03  9:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Vivek Goyal, Jason Wang, virtualization, linux-fsdevel,
	linux-kernel, virtio-fs, Stefan Hajnoczi, Dr. David Alan Gilbert

On Tue, Sep 3, 2019 at 10:31 AM Michael S. Tsirkin <mst@redhat.com> wrote:

> Can the patches themselves be posted to the relevant list(s) please?
> If possible, please also include "v3" in all patches so they are
> easier to find.

I'll post a v4.

> I poked at
> https://lwn.net/ml/linux-kernel/20190821173742.24574-1-vgoyal@redhat.com/
> specifically
> https://lwn.net/ml/linux-kernel/20190821173742.24574-12-vgoyal@redhat.com/
> and things like:
> +       /* TODO lock */
> give me pause.
>
> Cleanup generally seems broken to me - what pauses the FS
>
> What about the rest of TODOs in that file?

AFAICS, some of those are QOI issues, and some are long term
performance items.  I see no blockers or things that would pose a
security concern.

That said, it would be nice to get rid of them at some point.

> use of usleep is hacky - can't we do better e.g. with a
> completion?

Yes.  I have a different implementation, but was planning to leave
this one in until the dust settles.

> Some typos - e.g. "reuests".

Fixed.

> > >  fs/fuse/Kconfig                 |   11 +
> > >  fs/fuse/Makefile                |    1 +
> > >  fs/fuse/control.c               |    4 +-
> > >  fs/fuse/cuse.c                  |    4 +-
> > >  fs/fuse/dev.c                   |   89 ++-
> > >  fs/fuse/dir.c                   |   26 +-
> > >  fs/fuse/file.c                  |   15 +-
> > >  fs/fuse/fuse_i.h                |  120 +++-
> > >  fs/fuse/inode.c                 |  203 +++---
> > >  fs/fuse/virtio_fs.c             | 1061 +++++++++++++++++++++++++++++++
> > >  fs/splice.c                     |    3 +-
> > >  include/linux/fs.h              |    2 +
> > >  include/uapi/linux/virtio_fs.h  |   41 ++
> > >  include/uapi/linux/virtio_ids.h |    1 +
> > >  init/do_mounts.c                |   10 +
> > >  15 files changed, 1462 insertions(+), 129 deletions(-)
> > >  create mode 100644 fs/fuse/virtio_fs.c
> > >  create mode 100644 include/uapi/linux/virtio_fs.h
>
> Don't the new files need a MAINTAINERS entry?
> I think we want virtualization@lists.linux-foundation.org to be
> copied.

Yep.

Stefan, do you want to formally maintain this file?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-09-03  8:31   ` Michael S. Tsirkin
  2019-09-03  9:17     ` Miklos Szeredi
@ 2019-09-03 14:07     ` Vivek Goyal
  2019-09-03 14:12       ` Michael S. Tsirkin
  1 sibling, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2019-09-03 14:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Miklos Szeredi, Jason Wang, virtualization, linux-fsdevel,
	linux-kernel, virtio-fs, Stefan Hajnoczi, Dr. David Alan Gilbert

On Tue, Sep 03, 2019 at 04:31:38AM -0400, Michael S. Tsirkin wrote:

[..]
> +	/* TODO lock */
> give me pause.
> 
> Cleanup generally seems broken to me - what pauses the FS

I am looking into device removal aspect of it now. Thinking of adding
a reference count to virtiofs device and possibly also a bit flag to
indicate if device is still alive. That way, we should be able to cleanup
device more gracefully.

> 
> What about the rest of TODOs in that file?

I will also take a closer look at TODOs now. Better device cleanup path
might get rid of some of them. Some of them might not be valid anymore.

> 
> use of usleep is hacky - can't we do better e.g. with a
> completion?

Agreed.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-09-03 14:07     ` Vivek Goyal
@ 2019-09-03 14:12       ` Michael S. Tsirkin
  2019-09-03 14:18         ` Vivek Goyal
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2019-09-03 14:12 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Jason Wang, virtualization, linux-fsdevel,
	linux-kernel, virtio-fs, Stefan Hajnoczi, Dr. David Alan Gilbert

On Tue, Sep 03, 2019 at 10:07:52AM -0400, Vivek Goyal wrote:
> On Tue, Sep 03, 2019 at 04:31:38AM -0400, Michael S. Tsirkin wrote:
> 
> [..]
> > +	/* TODO lock */
> > give me pause.
> > 
> > Cleanup generally seems broken to me - what pauses the FS
> 
> I am looking into device removal aspect of it now. Thinking of adding
> a reference count to virtiofs device and possibly also a bit flag to
> indicate if device is still alive. That way, we should be able to cleanup
> device more gracefully.

Generally, the way to cleanup things is to first disconnect device from
linux so linux won't send new requests, wait for old ones to finish.




> > 
> > What about the rest of TODOs in that file?
> 
> I will also take a closer look at TODOs now. Better device cleanup path
> might get rid of some of them. Some of them might not be valid anymore.
> 
> > 
> > use of usleep is hacky - can't we do better e.g. with a
> > completion?
> 
> Agreed.
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-09-03 14:12       ` Michael S. Tsirkin
@ 2019-09-03 14:18         ` Vivek Goyal
  2019-09-03 17:15           ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2019-09-03 14:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Miklos Szeredi, Jason Wang, virtualization, linux-fsdevel,
	linux-kernel, virtio-fs, Stefan Hajnoczi, Dr. David Alan Gilbert

On Tue, Sep 03, 2019 at 10:12:16AM -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 03, 2019 at 10:07:52AM -0400, Vivek Goyal wrote:
> > On Tue, Sep 03, 2019 at 04:31:38AM -0400, Michael S. Tsirkin wrote:
> > 
> > [..]
> > > +	/* TODO lock */
> > > give me pause.
> > > 
> > > Cleanup generally seems broken to me - what pauses the FS
> > 
> > I am looking into device removal aspect of it now. Thinking of adding
> > a reference count to virtiofs device and possibly also a bit flag to
> > indicate if device is still alive. That way, we should be able to cleanup
> > device more gracefully.
> 
> Generally, the way to cleanup things is to first disconnect device from
> linux so linux won't send new requests, wait for old ones to finish.

I was thinking of following.

- Set a flag on device to indicate device is dead and not queue new
  requests. Device removal call can set this flag.

- Return errors when fs code tries to queue new request.

- Drop device creation reference in device removal path. If device is
  mounted at the time of removal, that reference will still be active
  and device state will not be cleaned up in kernel yet.

- User unmounts the fs, and that will drop last reference to device and
  will lead to cleanup of in kernel state of the device.

Does that sound reasonable.

Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-09-03 14:18         ` Vivek Goyal
@ 2019-09-03 17:15           ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2019-09-03 17:15 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Jason Wang, virtualization, linux-fsdevel,
	linux-kernel, virtio-fs, Stefan Hajnoczi, Dr. David Alan Gilbert

On Tue, Sep 03, 2019 at 10:18:51AM -0400, Vivek Goyal wrote:
> On Tue, Sep 03, 2019 at 10:12:16AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Sep 03, 2019 at 10:07:52AM -0400, Vivek Goyal wrote:
> > > On Tue, Sep 03, 2019 at 04:31:38AM -0400, Michael S. Tsirkin wrote:
> > > 
> > > [..]
> > > > +	/* TODO lock */
> > > > give me pause.
> > > > 
> > > > Cleanup generally seems broken to me - what pauses the FS
> > > 
> > > I am looking into device removal aspect of it now. Thinking of adding
> > > a reference count to virtiofs device and possibly also a bit flag to
> > > indicate if device is still alive. That way, we should be able to cleanup
> > > device more gracefully.
> > 
> > Generally, the way to cleanup things is to first disconnect device from
> > linux so linux won't send new requests, wait for old ones to finish.
> 
> I was thinking of following.
> 
> - Set a flag on device to indicate device is dead and not queue new
>   requests. Device removal call can set this flag.
> 
> - Return errors when fs code tries to queue new request.
> 
> - Drop device creation reference in device removal path. If device is
>   mounted at the time of removal, that reference will still be active
>   and device state will not be cleaned up in kernel yet.
> 
> - User unmounts the fs, and that will drop last reference to device and
>   will lead to cleanup of in kernel state of the device.
> 
> Does that sound reasonable.
> 
> Vivek

Just we aware of the fact that virtio device, all vqs etc
will be gone by the time remove returns.


-- 
MST

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
  2019-09-03  9:17     ` Miklos Szeredi
@ 2019-09-04 15:54       ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2019-09-04 15:54 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michael S. Tsirkin, Vivek Goyal, Jason Wang, virtualization,
	linux-fsdevel, linux-kernel, virtio-fs, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 1530 bytes --]

On Tue, Sep 03, 2019 at 11:17:35AM +0200, Miklos Szeredi wrote:
> On Tue, Sep 3, 2019 at 10:31 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >  fs/fuse/Kconfig                 |   11 +
> > > >  fs/fuse/Makefile                |    1 +
> > > >  fs/fuse/control.c               |    4 +-
> > > >  fs/fuse/cuse.c                  |    4 +-
> > > >  fs/fuse/dev.c                   |   89 ++-
> > > >  fs/fuse/dir.c                   |   26 +-
> > > >  fs/fuse/file.c                  |   15 +-
> > > >  fs/fuse/fuse_i.h                |  120 +++-
> > > >  fs/fuse/inode.c                 |  203 +++---
> > > >  fs/fuse/virtio_fs.c             | 1061 +++++++++++++++++++++++++++++++
> > > >  fs/splice.c                     |    3 +-
> > > >  include/linux/fs.h              |    2 +
> > > >  include/uapi/linux/virtio_fs.h  |   41 ++
> > > >  include/uapi/linux/virtio_ids.h |    1 +
> > > >  init/do_mounts.c                |   10 +
> > > >  15 files changed, 1462 insertions(+), 129 deletions(-)
> > > >  create mode 100644 fs/fuse/virtio_fs.c
> > > >  create mode 100644 include/uapi/linux/virtio_fs.h
> >
> > Don't the new files need a MAINTAINERS entry?
> > I think we want virtualization@lists.linux-foundation.org to be
> > copied.
> 
> Yep.
> 
> Stefan, do you want to formally maintain this file?

Vivek has been doing most of the kernel work lately and I would suggest
that he acts as maintainer.

But I'm happy to be added if you want two people or if Vivek is
unwilling.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-09-04 15:54 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21 17:37 [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Vivek Goyal
2019-08-21 17:37 ` [PATCH 01/13] fuse: delete dentry if timeout is zero Vivek Goyal
2019-08-21 17:37 ` [PATCH 02/13] fuse: Use default_file_splice_read for direct IO Vivek Goyal
2019-08-28  7:45   ` Miklos Szeredi
2019-08-28 12:27     ` Vivek Goyal
2019-08-21 17:37 ` [PATCH 03/13] fuse: export fuse_end_request() Vivek Goyal
2019-08-21 17:37 ` [PATCH 04/13] fuse: export fuse_len_args() Vivek Goyal
2019-08-21 17:37 ` [PATCH 05/13] fuse: Export fuse_send_init_request() Vivek Goyal
2019-08-21 17:37 ` [PATCH 06/13] fuse: export fuse_get_unique() Vivek Goyal
2019-08-21 17:37 ` [PATCH 07/13] Export fuse_dequeue_forget() function Vivek Goyal
2019-08-21 17:37 ` [PATCH 08/13] fuse: extract fuse_fill_super_common() Vivek Goyal
2019-08-21 17:37 ` [PATCH 09/13] fuse: add fuse_iqueue_ops callbacks Vivek Goyal
2019-08-21 17:37 ` [PATCH 10/13] fuse: Separate fuse device allocation and installation in fuse_conn Vivek Goyal
2019-08-21 17:37 ` [PATCH 11/13] virtio_fs: add skeleton virtio_fs.ko module Vivek Goyal
2019-08-21 17:37 ` [PATCH 12/13] virtio-fs: Do not provide abort interface in fusectl Vivek Goyal
2019-08-21 17:37 ` [PATCH 13/13] init/do_mounts.c: add virtio_fs root fs support Vivek Goyal
2019-08-29  9:28 ` [PATCH v3 00/13] virtio-fs: shared file system for virtual machines Miklos Szeredi
2019-08-29 11:58   ` Vivek Goyal
2019-08-29 12:35   ` Stefan Hajnoczi
2019-08-29 13:29   ` Vivek Goyal
2019-08-29 13:41     ` Miklos Szeredi
2019-08-29 14:31       ` Vivek Goyal
2019-08-29 14:47         ` Miklos Szeredi
2019-08-29 16:01   ` Vivek Goyal
2019-08-31  5:46     ` Miklos Szeredi
2019-09-03  8:05 ` Miklos Szeredi
2019-09-03  8:31   ` Michael S. Tsirkin
2019-09-03  9:17     ` Miklos Szeredi
2019-09-04 15:54       ` Stefan Hajnoczi
2019-09-03 14:07     ` Vivek Goyal
2019-09-03 14:12       ` Michael S. Tsirkin
2019-09-03 14:18         ` Vivek Goyal
2019-09-03 17:15           ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).