All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 000/104] virtiofs daemon [all]
@ 2019-12-12 16:37 Dr. David Alan Gilbert (git)
  2019-12-12 16:37 ` [PATCH 001/104] virtiofsd: Pull in upstream headers Dr. David Alan Gilbert (git)
                   ` (106 more replies)
  0 siblings, 107 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is a full set for virtiofsd - a daemon
that implements the user space side of virtiofs.

Unlike my previous post, this is a full set rather
than split up into base/security etc.

The set pulls in a big chunk of the upstream libfuse library
(unmodified so that it's easy to check it really is upstream),
chops all the stuff out we don't need and then adds the
new transport we need.

I've formatted everything into qemu's code style - using indent
and the clang tools for the files I've imported.  I've cleared all
reviewed-by's given how much I've rebased and tweeked it.

We can't just link with libfuse, since we have to make ABI incompatible
changes for the new transport and it's quite invasive; the library is
designed to be the basis for multiple filesystems, but all on the same
transport.

Running this daemon is typically done with:

   ./virtiofsd -o vhost_user_socket=/path/socket -o source=/path/to/fs

connected to a qemu that's then started with:
   -chardev socket,id=char0,path=/path/socket -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs

and then in the guest mount with:
   mount -t virtiofs myfs /mnt

Our development branch is: https://gitlab.com/virtio-fs/qemu/tree/virtio-fs-dev

This code is going into tools/virtiofsd  based on the previous long
discussion.
It relies on Paolo's 'build: rename CONFIG_LIBCAP to
CONFIG_LIBCAP_NG' patch.

Breakdown:
  Patches 1..12: Importing upstream libfuse
         13..31: Basic wiring to get vhost transport
         32..56: Security changes
         57..  : Individual fixes

Thank you to all those who have contributed code.

Dave

Dr. David Alan Gilbert (34):
  virtiofsd: Pull in upstream headers
  virtiofsd: Pull in kernel's fuse.h
  virtiofsd: Add auxiliary .c's
  virtiofsd: Add fuse_lowlevel.c
  virtiofsd: Add passthrough_ll
  virtiofsd: Trim down imported files
  virtiofsd: Format imported files to qemu style
  virtiofsd: Fix fuse_daemonize ignored return values
  virtiofsd: Fix common header and define for QEMU builds
  virtiofsd: Trim out compatibility code
  virtiofsd: Add options for virtio
  virtiofsd: Open vhost connection instead of mounting
  virtiofsd: Start wiring up vhost-user
  virtiofsd: Add main virtio loop
  virtiofsd: get/set features callbacks
  virtiofsd: Start queue threads
  virtiofsd: Poll kick_fd for queue
  virtiofsd: Start reading commands from queue
  virtiofsd: Send replies to messages
  virtiofsd: Keep track of replies
  virtiofsd: Add Makefile wiring for virtiofsd contrib
  virtiofsd: Fast path for virtio read
  virtiofs: Add maintainers entry
  virtiofsd: Plumb fuse_bufvec through to do_write_buf
  virtiofsd: Pass write iov's all the way through
  virtiofsd: cap-ng helpers
  virtiofsd: Handle reinit
  virtiofsd: Handle hard reboot
  virtiofsd: Kill threads when queues are stopped
  vhost-user: Print unexpected slave message types
  contrib/libvhost-user: Protect slave fd with mutex
  virtiofsd: Clean up inodes on destroy
  libvhost-user: Fix some memtable remap cases
  virtiofsd: Convert lo_destroy to take the lo->mutex lock itself

Eric Ren (1):
  virtiofsd: fix incorrect error handling in lo_do_lookup

Eryu Guan (2):
  virtiofsd: print log only when priority is high enough
  virtiofsd: convert more fprintf and perror to use fuse log infra

Jiufei Xue (1):
  virtiofsd: support nanosecond resolution for file timestamp

Liu Bo (6):
  virtiofsd: fix error handling in main()
  virtiofsd: cleanup allocated resource in se
  virtiofsd: fix memory leak on lo.source
  virtiofsd: add helper for lo_data cleanup
  virtiofsd: enable PARALLEL_DIROPS during INIT
  Virtiofsd: fix memory leak on fuse queueinfo

Masayoshi Mizuma (3):
  virtiofsd: Add ID to the log with FUSE_LOG_DEBUG level
  virtiofsd: Add timestamp to the log with FUSE_LOG_DEBUG level
  virtiofsd: Prevent multiply running with same vhost_user_socket

Miklos Szeredi (10):
  virtiofsd: passthrough_ll: add fallback for racy ops
  virtiofsd: passthrough_ll: add renameat2 support
  virtiofsd: passthrough_ll: disable readdirplus on cache=never
  virtiofsd: passthrough_ll: control readdirplus
  virtiofsd: rename unref_inode() to unref_inode_lolocked()
  virtiofsd: fail when parent inode isn't known in lo_do_lookup()
  virtiofsd: extract root inode init into setup_root()
  virtiofsd: passthrough_ll: fix refcounting on remove/rename
  virtiofsd: passthrough_ll: clean up cache related options
  virtiofsd: passthrough_ll: use hashtable

Misono Tomohiro (1):
  virtiofsd: Fix data corruption with O_APPEND wirte in writeback mode

Peng Tao (1):
  virtiofsd: do not always set FUSE_FLOCK_LOCKS

Stefan Hajnoczi (37):
  virtiofsd: remove mountpoint dummy argument
  virtiofsd: remove unused notify reply support
  virtiofsd: add -o source=PATH to help output
  virtiofsd: add --fd=FDNUM fd passing option
  virtiofsd: make -f (foreground) the default
  virtiofsd: add vhost-user.json file
  virtiofsd: add --print-capabilities option
  virtiofsd: passthrough_ll: add lo_map for ino/fh indirection
  virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers
  virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers
  virtiofsd: passthrough_ll: add fd_map to hide file descriptors
  virtiofsd: validate path components
  virtiofsd: add fuse_mbuf_iter API
  virtiofsd: validate input buffer sizes in do_write_buf()
  virtiofsd: check input buffer size in fuse_lowlevel.c ops
  virtiofsd: prevent ".." escape in lo_do_lookup()
  virtiofsd: prevent ".." escape in lo_do_readdir()
  virtiofsd: use /proc/self/fd/ O_PATH file descriptor
  virtiofsd: sandbox mount namespace
  virtiofsd: move to an empty network namespace
  virtiofsd: move to a new pid namespace
  virtiofsd: add seccomp whitelist
  virtiofsd: set maximum RLIMIT_NOFILE limit
  virtiofsd: fix libfuse information leaks
  virtiofsd: add security guide document
  virtiofsd: add --syslog command-line option
  virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy()
  virtiofsd: prevent fv_queue_thread() vs virtio_loop() races
  virtiofsd: make lo_release() atomic
  virtiofsd: prevent races with lo_dirp_put()
  virtiofsd: rename inode->refcount to inode->nlookup
  virtiofsd: add man page
  virtiofsd: introduce inode refcount to prevent use-after-free
  virtiofsd: process requests in a thread pool
  virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
  virtiofsd: fix lo_destroy() resource leaks
  virtiofsd: add --thread-pool-size=NUM option

Vivek Goyal (6):
  virtiofsd: Make fsync work even if only inode is passed in
  virtiofsd: passthrough_ll: create new files in caller's context
  virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV
  virtiofsd: Drop CAP_FSETID if client asked for it
  virtiofsd: Support remote posix locks
  virtiofsd: Reset O_DIRECT flag during file open

piaojun (2):
  virtiofsd: add definition of fuse_buf_writev()
  virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better
    performance

 .gitignore                                |    1 +
 MAINTAINERS                               |    8 +
 Makefile                                  |   20 +
 Makefile.objs                             |    1 +
 configure                                 |   16 +
 contrib/libvhost-user/libvhost-user.c     |   57 +-
 contrib/libvhost-user/libvhost-user.h     |    6 +
 docs/interop/vhost-user.json              |    4 +-
 hw/virtio/vhost-user.c                    |    2 +-
 include/standard-headers/linux/fuse.h     |  891 ++++++
 scripts/update-linux-headers.sh           |    1 +
 tools/virtiofsd/50-qemu-virtiofsd.json.in |    5 +
 tools/virtiofsd/Makefile.objs             |   13 +
 tools/virtiofsd/buffer.c                  |  422 +++
 tools/virtiofsd/fuse.h                    | 1287 +++++++++
 tools/virtiofsd/fuse_common.h             |  884 ++++++
 tools/virtiofsd/fuse_i.h                  |  134 +
 tools/virtiofsd/fuse_log.c                |   44 +
 tools/virtiofsd/fuse_log.h                |   74 +
 tools/virtiofsd/fuse_loop_mt.c            |   56 +
 tools/virtiofsd/fuse_lowlevel.c           | 2782 +++++++++++++++++++
 tools/virtiofsd/fuse_lowlevel.h           | 2043 ++++++++++++++
 tools/virtiofsd/fuse_misc.h               |   60 +
 tools/virtiofsd/fuse_opt.c                |  449 +++
 tools/virtiofsd/fuse_opt.h                |  272 ++
 tools/virtiofsd/fuse_signals.c            |   98 +
 tools/virtiofsd/fuse_virtio.c             |  968 +++++++
 tools/virtiofsd/fuse_virtio.h             |   33 +
 tools/virtiofsd/helper.c                  |  333 +++
 tools/virtiofsd/passthrough_helpers.h     |   51 +
 tools/virtiofsd/passthrough_ll.c          | 2998 +++++++++++++++++++++
 tools/virtiofsd/seccomp.c                 |  155 ++
 tools/virtiofsd/seccomp.h                 |   16 +
 tools/virtiofsd/security.rst              |  118 +
 tools/virtiofsd/virtiofsd.texi            |   85 +
 35 files changed, 14373 insertions(+), 14 deletions(-)
 create mode 100644 include/standard-headers/linux/fuse.h
 create mode 100644 tools/virtiofsd/50-qemu-virtiofsd.json.in
 create mode 100644 tools/virtiofsd/Makefile.objs
 create mode 100644 tools/virtiofsd/buffer.c
 create mode 100644 tools/virtiofsd/fuse.h
 create mode 100644 tools/virtiofsd/fuse_common.h
 create mode 100644 tools/virtiofsd/fuse_i.h
 create mode 100644 tools/virtiofsd/fuse_log.c
 create mode 100644 tools/virtiofsd/fuse_log.h
 create mode 100644 tools/virtiofsd/fuse_loop_mt.c
 create mode 100644 tools/virtiofsd/fuse_lowlevel.c
 create mode 100644 tools/virtiofsd/fuse_lowlevel.h
 create mode 100644 tools/virtiofsd/fuse_misc.h
 create mode 100644 tools/virtiofsd/fuse_opt.c
 create mode 100644 tools/virtiofsd/fuse_opt.h
 create mode 100644 tools/virtiofsd/fuse_signals.c
 create mode 100644 tools/virtiofsd/fuse_virtio.c
 create mode 100644 tools/virtiofsd/fuse_virtio.h
 create mode 100644 tools/virtiofsd/helper.c
 create mode 100644 tools/virtiofsd/passthrough_helpers.h
 create mode 100644 tools/virtiofsd/passthrough_ll.c
 create mode 100644 tools/virtiofsd/seccomp.c
 create mode 100644 tools/virtiofsd/seccomp.h
 create mode 100644 tools/virtiofsd/security.rst
 create mode 100644 tools/virtiofsd/virtiofsd.texi

-- 
2.23.0



^ permalink raw reply	[flat|nested] 307+ messages in thread

* [PATCH 001/104] virtiofsd: Pull in upstream headers
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 11:54   ` Daniel P. Berrangé
  2020-01-15 17:38   ` Philippe Mathieu-Daudé
  2019-12-12 16:37 ` [PATCH 002/104] virtiofsd: Pull in kernel's fuse.h Dr. David Alan Gilbert (git)
                   ` (105 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Pull in headers fromlibfuse's upstream fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse.h                | 1275 +++++++++++++++
 tools/virtiofsd/fuse_common.h         |  823 ++++++++++
 tools/virtiofsd/fuse_i.h              |  139 ++
 tools/virtiofsd/fuse_log.h            |   82 +
 tools/virtiofsd/fuse_lowlevel.h       | 2089 +++++++++++++++++++++++++
 tools/virtiofsd/fuse_misc.h           |   59 +
 tools/virtiofsd/fuse_opt.h            |  271 ++++
 tools/virtiofsd/passthrough_helpers.h |   76 +
 8 files changed, 4814 insertions(+)
 create mode 100644 tools/virtiofsd/fuse.h
 create mode 100644 tools/virtiofsd/fuse_common.h
 create mode 100644 tools/virtiofsd/fuse_i.h
 create mode 100644 tools/virtiofsd/fuse_log.h
 create mode 100644 tools/virtiofsd/fuse_lowlevel.h
 create mode 100644 tools/virtiofsd/fuse_misc.h
 create mode 100644 tools/virtiofsd/fuse_opt.h
 create mode 100644 tools/virtiofsd/passthrough_helpers.h

diff --git a/tools/virtiofsd/fuse.h b/tools/virtiofsd/fuse.h
new file mode 100644
index 0000000000..883f6e59fb
--- /dev/null
+++ b/tools/virtiofsd/fuse.h
@@ -0,0 +1,1275 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB.
+*/
+
+#ifndef FUSE_H_
+#define FUSE_H_
+
+/** @file
+ *
+ * This file defines the library interface of FUSE
+ *
+ * IMPORTANT: you should define FUSE_USE_VERSION before including this header.
+ */
+
+#include "fuse_common.h"
+
+#include <fcntl.h>
+#include <time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include <sys/uio.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* ----------------------------------------------------------- *
+ * Basic FUSE API					       *
+ * ----------------------------------------------------------- */
+
+/** Handle for a FUSE filesystem */
+struct fuse;
+
+/**
+ * Readdir flags, passed to ->readdir()
+ */
+enum fuse_readdir_flags {
+	/**
+	 * "Plus" mode.
+	 *
+	 * The kernel wants to prefill the inode cache during readdir.  The
+	 * filesystem may honour this by filling in the attributes and setting
+	 * FUSE_FILL_DIR_FLAGS for the filler function.  The filesystem may also
+	 * just ignore this flag completely.
+	 */
+	FUSE_READDIR_PLUS = (1 << 0),
+};
+
+enum fuse_fill_dir_flags {
+	/**
+	 * "Plus" mode: all file attributes are valid
+	 *
+	 * The attributes are used by the kernel to prefill the inode cache
+	 * during a readdir.
+	 *
+	 * It is okay to set FUSE_FILL_DIR_PLUS if FUSE_READDIR_PLUS is not set
+	 * and vice versa.
+	 */
+	FUSE_FILL_DIR_PLUS = (1 << 1),
+};
+
+/** Function to add an entry in a readdir() operation
+ *
+ * The *off* parameter can be any non-zero value that enables the
+ * filesystem to identify the current point in the directory
+ * stream. It does not need to be the actual physical position. A
+ * value of zero is reserved to indicate that seeking in directories
+ * is not supported.
+ * 
+ * @param buf the buffer passed to the readdir() operation
+ * @param name the file name of the directory entry
+ * @param stat file attributes, can be NULL
+ * @param off offset of the next entry or zero
+ * @param flags fill flags
+ * @return 1 if buffer is full, zero otherwise
+ */
+typedef int (*fuse_fill_dir_t) (void *buf, const char *name,
+				const struct stat *stbuf, off_t off,
+				enum fuse_fill_dir_flags flags);
+/**
+ * Configuration of the high-level API
+ *
+ * This structure is initialized from the arguments passed to
+ * fuse_new(), and then passed to the file system's init() handler
+ * which should ensure that the configuration is compatible with the
+ * file system implementation.
+ */
+struct fuse_config {
+	/**
+	 * If `set_gid` is non-zero, the st_gid attribute of each file
+	 * is overwritten with the value of `gid`.
+	 */
+	int set_gid;
+	unsigned int gid;
+
+	/**
+	 * If `set_uid` is non-zero, the st_uid attribute of each file
+	 * is overwritten with the value of `uid`.
+	 */
+	int set_uid;
+	unsigned int uid;
+
+	/**
+	 * If `set_mode` is non-zero, the any permissions bits set in
+	 * `umask` are unset in the st_mode attribute of each file.
+	 */
+	int set_mode;
+	unsigned int umask;
+
+	/**
+	 * The timeout in seconds for which name lookups will be
+	 * cached.
+	 */
+	double entry_timeout;
+
+	/**
+	 * The timeout in seconds for which a negative lookup will be
+	 * cached. This means, that if file did not exist (lookup
+	 * retuned ENOENT), the lookup will only be redone after the
+	 * timeout, and the file/directory will be assumed to not
+	 * exist until then. A value of zero means that negative
+	 * lookups are not cached.
+	 */
+	double negative_timeout;
+
+	/**
+	 * The timeout in seconds for which file/directory attributes
+	 * (as returned by e.g. the `getattr` handler) are cached.
+	 */
+	double attr_timeout;
+
+	/**
+	 * Allow requests to be interrupted
+	 */
+	int intr;
+
+	/**
+	 * Specify which signal number to send to the filesystem when
+	 * a request is interrupted.  The default is hardcoded to
+	 * USR1.
+	 */
+	int intr_signal;
+
+	/**
+	 * Normally, FUSE assigns inodes to paths only for as long as
+	 * the kernel is aware of them. With this option inodes are
+	 * instead remembered for at least this many seconds.  This
+	 * will require more memory, but may be necessary when using
+	 * applications that make use of inode numbers.
+	 *
+	 * A number of -1 means that inodes will be remembered for the
+	 * entire life-time of the file-system process.
+	 */
+	int remember;
+
+	/**
+	 * The default behavior is that if an open file is deleted,
+	 * the file is renamed to a hidden file (.fuse_hiddenXXX), and
+	 * only removed when the file is finally released.  This
+	 * relieves the filesystem implementation of having to deal
+	 * with this problem. This option disables the hiding
+	 * behavior, and files are removed immediately in an unlink
+	 * operation (or in a rename operation which overwrites an
+	 * existing file).
+	 *
+	 * It is recommended that you not use the hard_remove
+	 * option. When hard_remove is set, the following libc
+	 * functions fail on unlinked files (returning errno of
+	 * ENOENT): read(2), write(2), fsync(2), close(2), f*xattr(2),
+	 * ftruncate(2), fstat(2), fchmod(2), fchown(2)
+	 */
+	int hard_remove;
+
+	/**
+	 * Honor the st_ino field in the functions getattr() and
+	 * fill_dir(). This value is used to fill in the st_ino field
+	 * in the stat(2), lstat(2), fstat(2) functions and the d_ino
+	 * field in the readdir(2) function. The filesystem does not
+	 * have to guarantee uniqueness, however some applications
+	 * rely on this value being unique for the whole filesystem.
+	 *
+	 * Note that this does *not* affect the inode that libfuse 
+	 * and the kernel use internally (also called the "nodeid").
+	 */
+	int use_ino;
+
+	/**
+	 * If use_ino option is not given, still try to fill in the
+	 * d_ino field in readdir(2). If the name was previously
+	 * looked up, and is still in the cache, the inode number
+	 * found there will be used.  Otherwise it will be set to -1.
+	 * If use_ino option is given, this option is ignored.
+	 */
+	int readdir_ino;
+
+	/**
+	 * This option disables the use of page cache (file content cache)
+	 * in the kernel for this filesystem. This has several affects:
+	 *
+	 * 1. Each read(2) or write(2) system call will initiate one
+	 *    or more read or write operations, data will not be
+	 *    cached in the kernel.
+	 *
+	 * 2. The return value of the read() and write() system calls
+	 *    will correspond to the return values of the read and
+	 *    write operations. This is useful for example if the
+	 *    file size is not known in advance (before reading it).
+	 *
+	 * Internally, enabling this option causes fuse to set the
+	 * `direct_io` field of `struct fuse_file_info` - overwriting
+	 * any value that was put there by the file system.
+	 */
+	int direct_io;
+
+	/**
+	 * This option disables flushing the cache of the file
+	 * contents on every open(2).  This should only be enabled on
+	 * filesystems where the file data is never changed
+	 * externally (not through the mounted FUSE filesystem).  Thus
+	 * it is not suitable for network filesystems and other
+	 * intermediate filesystems.
+	 *
+	 * NOTE: if this option is not specified (and neither
+	 * direct_io) data is still cached after the open(2), so a
+	 * read(2) system call will not always initiate a read
+	 * operation.
+	 *
+	 * Internally, enabling this option causes fuse to set the
+	 * `keep_cache` field of `struct fuse_file_info` - overwriting
+	 * any value that was put there by the file system.
+	 */
+	int kernel_cache;
+
+	/**
+	 * This option is an alternative to `kernel_cache`. Instead of
+	 * unconditionally keeping cached data, the cached data is
+	 * invalidated on open(2) if if the modification time or the
+	 * size of the file has changed since it was last opened.
+	 */
+	int auto_cache;
+
+	/**
+	 * The timeout in seconds for which file attributes are cached
+	 * for the purpose of checking if auto_cache should flush the
+	 * file data on open.
+	 */
+	int ac_attr_timeout_set;
+	double ac_attr_timeout;
+
+	/**
+	 * If this option is given the file-system handlers for the
+	 * following operations will not receive path information:
+	 * read, write, flush, release, fsync, readdir, releasedir,
+	 * fsyncdir, lock, ioctl and poll.
+	 *
+	 * For the truncate, getattr, chmod, chown and utimens
+	 * operations the path will be provided only if the struct
+	 * fuse_file_info argument is NULL.
+	 */
+	int nullpath_ok;
+
+	/**
+	 * The remaining options are used by libfuse internally and
+	 * should not be touched.
+	 */
+	int show_help;
+	char *modules;
+	int debug;
+};
+
+
+/**
+ * The file system operations:
+ *
+ * Most of these should work very similarly to the well known UNIX
+ * file system operations.  A major exception is that instead of
+ * returning an error in 'errno', the operation should return the
+ * negated error value (-errno) directly.
+ *
+ * All methods are optional, but some are essential for a useful
+ * filesystem (e.g. getattr).  Open, flush, release, fsync, opendir,
+ * releasedir, fsyncdir, access, create, truncate, lock, init and
+ * destroy are special purpose methods, without which a full featured
+ * filesystem can still be implemented.
+ *
+ * In general, all methods are expected to perform any necessary
+ * permission checking. However, a filesystem may delegate this task
+ * to the kernel by passing the `default_permissions` mount option to
+ * `fuse_new()`. In this case, methods will only be called if
+ * the kernel's permission check has succeeded.
+ *
+ * Almost all operations take a path which can be of any length.
+ */
+struct fuse_operations {
+	/** Get file attributes.
+	 *
+	 * Similar to stat().  The 'st_dev' and 'st_blksize' fields are
+	 * ignored. The 'st_ino' field is ignored except if the 'use_ino'
+	 * mount option is given. In that case it is passed to userspace,
+	 * but libfuse and the kernel will still assign a different
+	 * inode for internal use (called the "nodeid").
+	 *
+	 * `fi` will always be NULL if the file is not currently open, but
+	 * may also be NULL if the file is open.
+	 */
+	int (*getattr) (const char *, struct stat *, struct fuse_file_info *fi);
+
+	/** Read the target of a symbolic link
+	 *
+	 * The buffer should be filled with a null terminated string.  The
+	 * buffer size argument includes the space for the terminating
+	 * null character.	If the linkname is too long to fit in the
+	 * buffer, it should be truncated.	The return value should be 0
+	 * for success.
+	 */
+	int (*readlink) (const char *, char *, size_t);
+
+	/** Create a file node
+	 *
+	 * This is called for creation of all non-directory, non-symlink
+	 * nodes.  If the filesystem defines a create() method, then for
+	 * regular files that will be called instead.
+	 */
+	int (*mknod) (const char *, mode_t, dev_t);
+
+	/** Create a directory
+	 *
+	 * Note that the mode argument may not have the type specification
+	 * bits set, i.e. S_ISDIR(mode) can be false.  To obtain the
+	 * correct directory type bits use  mode|S_IFDIR
+	 * */
+	int (*mkdir) (const char *, mode_t);
+
+	/** Remove a file */
+	int (*unlink) (const char *);
+
+	/** Remove a directory */
+	int (*rmdir) (const char *);
+
+	/** Create a symbolic link */
+	int (*symlink) (const char *, const char *);
+
+	/** Rename a file
+	 *
+	 * *flags* may be `RENAME_EXCHANGE` or `RENAME_NOREPLACE`. If
+	 * RENAME_NOREPLACE is specified, the filesystem must not
+	 * overwrite *newname* if it exists and return an error
+	 * instead. If `RENAME_EXCHANGE` is specified, the filesystem
+	 * must atomically exchange the two files, i.e. both must
+	 * exist and neither may be deleted.
+	 */
+	int (*rename) (const char *, const char *, unsigned int flags);
+
+	/** Create a hard link to a file */
+	int (*link) (const char *, const char *);
+
+	/** Change the permission bits of a file
+	 *
+	 * `fi` will always be NULL if the file is not currenlty open, but
+	 * may also be NULL if the file is open.
+	 */
+	int (*chmod) (const char *, mode_t, struct fuse_file_info *fi);
+
+	/** Change the owner and group of a file
+	 *
+	 * `fi` will always be NULL if the file is not currenlty open, but
+	 * may also be NULL if the file is open.
+	 *
+	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+	 * expected to reset the setuid and setgid bits.
+	 */
+	int (*chown) (const char *, uid_t, gid_t, struct fuse_file_info *fi);
+
+	/** Change the size of a file
+	 *
+	 * `fi` will always be NULL if the file is not currenlty open, but
+	 * may also be NULL if the file is open.
+	 *
+	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+	 * expected to reset the setuid and setgid bits.
+	 */
+	int (*truncate) (const char *, off_t, struct fuse_file_info *fi);
+
+	/** Open a file
+	 *
+	 * Open flags are available in fi->flags. The following rules
+	 * apply.
+	 *
+	 *  - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be
+	 *    filtered out / handled by the kernel.
+	 *
+	 *  - Access modes (O_RDONLY, O_WRONLY, O_RDWR, O_EXEC, O_SEARCH)
+	 *    should be used by the filesystem to check if the operation is
+	 *    permitted.  If the ``-o default_permissions`` mount option is
+	 *    given, this check is already done by the kernel before calling
+	 *    open() and may thus be omitted by the filesystem.
+	 *
+	 *  - When writeback caching is enabled, the kernel may send
+	 *    read requests even for files opened with O_WRONLY. The
+	 *    filesystem should be prepared to handle this.
+	 *
+	 *  - When writeback caching is disabled, the filesystem is
+	 *    expected to properly handle the O_APPEND flag and ensure
+	 *    that each write is appending to the end of the file.
+	 * 
+         *  - When writeback caching is enabled, the kernel will
+	 *    handle O_APPEND. However, unless all changes to the file
+	 *    come through the kernel this will not work reliably. The
+	 *    filesystem should thus either ignore the O_APPEND flag
+	 *    (and let the kernel handle it), or return an error
+	 *    (indicating that reliably O_APPEND is not available).
+	 *
+	 * Filesystem may store an arbitrary file handle (pointer,
+	 * index, etc) in fi->fh, and use this in other all other file
+	 * operations (read, write, flush, release, fsync).
+	 *
+	 * Filesystem may also implement stateless file I/O and not store
+	 * anything in fi->fh.
+	 *
+	 * There are also some flags (direct_io, keep_cache) which the
+	 * filesystem may set in fi, to change the way the file is opened.
+	 * See fuse_file_info structure in <fuse_common.h> for more details.
+	 *
+	 * If this request is answered with an error code of ENOSYS
+	 * and FUSE_CAP_NO_OPEN_SUPPORT is set in
+	 * `fuse_conn_info.capable`, this is treated as success and
+	 * future calls to open will also succeed without being send
+	 * to the filesystem process.
+	 *
+	 */
+	int (*open) (const char *, struct fuse_file_info *);
+
+	/** Read data from an open file
+	 *
+	 * Read should return exactly the number of bytes requested except
+	 * on EOF or error, otherwise the rest of the data will be
+	 * substituted with zeroes.	 An exception to this is when the
+	 * 'direct_io' mount option is specified, in which case the return
+	 * value of the read system call will reflect the return value of
+	 * this operation.
+	 */
+	int (*read) (const char *, char *, size_t, off_t,
+		     struct fuse_file_info *);
+
+	/** Write data to an open file
+	 *
+	 * Write should return exactly the number of bytes requested
+	 * except on error.	 An exception to this is when the 'direct_io'
+	 * mount option is specified (see read operation).
+	 *
+	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+	 * expected to reset the setuid and setgid bits.
+	 */
+	int (*write) (const char *, const char *, size_t, off_t,
+		      struct fuse_file_info *);
+
+	/** Get file system statistics
+	 *
+	 * The 'f_favail', 'f_fsid' and 'f_flag' fields are ignored
+	 */
+	int (*statfs) (const char *, struct statvfs *);
+
+	/** Possibly flush cached data
+	 *
+	 * BIG NOTE: This is not equivalent to fsync().  It's not a
+	 * request to sync dirty data.
+	 *
+	 * Flush is called on each close() of a file descriptor, as opposed to
+	 * release which is called on the close of the last file descriptor for
+	 * a file.  Under Linux, errors returned by flush() will be passed to 
+	 * userspace as errors from close(), so flush() is a good place to write
+	 * back any cached dirty data. However, many applications ignore errors 
+	 * on close(), and on non-Linux systems, close() may succeed even if flush()
+	 * returns an error. For these reasons, filesystems should not assume
+	 * that errors returned by flush will ever be noticed or even
+	 * delivered.
+	 *
+	 * NOTE: The flush() method may be called more than once for each
+	 * open().  This happens if more than one file descriptor refers to an
+	 * open file handle, e.g. due to dup(), dup2() or fork() calls.  It is
+	 * not possible to determine if a flush is final, so each flush should
+	 * be treated equally.  Multiple write-flush sequences are relatively
+	 * rare, so this shouldn't be a problem.
+	 *
+	 * Filesystems shouldn't assume that flush will be called at any
+	 * particular point.  It may be called more times than expected, or not
+	 * at all.
+	 *
+	 * [close]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
+	 */
+	int (*flush) (const char *, struct fuse_file_info *);
+
+	/** Release an open file
+	 *
+	 * Release is called when there are no more references to an open
+	 * file: all file descriptors are closed and all memory mappings
+	 * are unmapped.
+	 *
+	 * For every open() call there will be exactly one release() call
+	 * with the same flags and file handle.  It is possible to
+	 * have a file opened more than once, in which case only the last
+	 * release will mean, that no more reads/writes will happen on the
+	 * file.  The return value of release is ignored.
+	 */
+	int (*release) (const char *, struct fuse_file_info *);
+
+	/** Synchronize file contents
+	 *
+	 * If the datasync parameter is non-zero, then only the user data
+	 * should be flushed, not the meta data.
+	 */
+	int (*fsync) (const char *, int, struct fuse_file_info *);
+
+	/** Set extended attributes */
+	int (*setxattr) (const char *, const char *, const char *, size_t, int);
+
+	/** Get extended attributes */
+	int (*getxattr) (const char *, const char *, char *, size_t);
+
+	/** List extended attributes */
+	int (*listxattr) (const char *, char *, size_t);
+
+	/** Remove extended attributes */
+	int (*removexattr) (const char *, const char *);
+
+	/** Open directory
+	 *
+	 * Unless the 'default_permissions' mount option is given,
+	 * this method should check if opendir is permitted for this
+	 * directory. Optionally opendir may also return an arbitrary
+	 * filehandle in the fuse_file_info structure, which will be
+	 * passed to readdir, releasedir and fsyncdir.
+	 */
+	int (*opendir) (const char *, struct fuse_file_info *);
+
+	/** Read directory
+	 *
+	 * The filesystem may choose between two modes of operation:
+	 *
+	 * 1) The readdir implementation ignores the offset parameter, and
+	 * passes zero to the filler function's offset.  The filler
+	 * function will not return '1' (unless an error happens), so the
+	 * whole directory is read in a single readdir operation.
+	 *
+	 * 2) The readdir implementation keeps track of the offsets of the
+	 * directory entries.  It uses the offset parameter and always
+	 * passes non-zero offset to the filler function.  When the buffer
+	 * is full (or an error happens) the filler function will return
+	 * '1'.
+	 */
+	int (*readdir) (const char *, void *, fuse_fill_dir_t, off_t,
+			struct fuse_file_info *, enum fuse_readdir_flags);
+
+	/** Release directory
+	 */
+	int (*releasedir) (const char *, struct fuse_file_info *);
+
+	/** Synchronize directory contents
+	 *
+	 * If the datasync parameter is non-zero, then only the user data
+	 * should be flushed, not the meta data
+	 */
+	int (*fsyncdir) (const char *, int, struct fuse_file_info *);
+
+	/**
+	 * Initialize filesystem
+	 *
+	 * The return value will passed in the `private_data` field of
+	 * `struct fuse_context` to all file operations, and as a
+	 * parameter to the destroy() method. It overrides the initial
+	 * value provided to fuse_main() / fuse_new().
+	 */
+	void *(*init) (struct fuse_conn_info *conn,
+		       struct fuse_config *cfg);
+
+	/**
+	 * Clean up filesystem
+	 *
+	 * Called on filesystem exit.
+	 */
+	void (*destroy) (void *private_data);
+
+	/**
+	 * Check file access permissions
+	 *
+	 * This will be called for the access() system call.  If the
+	 * 'default_permissions' mount option is given, this method is not
+	 * called.
+	 *
+	 * This method is not called under Linux kernel versions 2.4.x
+	 */
+	int (*access) (const char *, int);
+
+	/**
+	 * Create and open a file
+	 *
+	 * If the file does not exist, first create it with the specified
+	 * mode, and then open it.
+	 *
+	 * If this method is not implemented or under Linux kernel
+	 * versions earlier than 2.6.15, the mknod() and open() methods
+	 * will be called instead.
+	 */
+	int (*create) (const char *, mode_t, struct fuse_file_info *);
+
+	/**
+	 * Perform POSIX file locking operation
+	 *
+	 * The cmd argument will be either F_GETLK, F_SETLK or F_SETLKW.
+	 *
+	 * For the meaning of fields in 'struct flock' see the man page
+	 * for fcntl(2).  The l_whence field will always be set to
+	 * SEEK_SET.
+	 *
+	 * For checking lock ownership, the 'fuse_file_info->owner'
+	 * argument must be used.
+	 *
+	 * For F_GETLK operation, the library will first check currently
+	 * held locks, and if a conflicting lock is found it will return
+	 * information without calling this method.	 This ensures, that
+	 * for local locks the l_pid field is correctly filled in.	The
+	 * results may not be accurate in case of race conditions and in
+	 * the presence of hard links, but it's unlikely that an
+	 * application would rely on accurate GETLK results in these
+	 * cases.  If a conflicting lock is not found, this method will be
+	 * called, and the filesystem may fill out l_pid by a meaningful
+	 * value, or it may leave this field zero.
+	 *
+	 * For F_SETLK and F_SETLKW the l_pid field will be set to the pid
+	 * of the process performing the locking operation.
+	 *
+	 * Note: if this method is not implemented, the kernel will still
+	 * allow file locking to work locally.  Hence it is only
+	 * interesting for network filesystems and similar.
+	 */
+	int (*lock) (const char *, struct fuse_file_info *, int cmd,
+		     struct flock *);
+
+	/**
+	 * Change the access and modification times of a file with
+	 * nanosecond resolution
+	 *
+	 * This supersedes the old utime() interface.  New applications
+	 * should use this.
+	 *
+	 * `fi` will always be NULL if the file is not currenlty open, but
+	 * may also be NULL if the file is open.
+	 *
+	 * See the utimensat(2) man page for details.
+	 */
+	 int (*utimens) (const char *, const struct timespec tv[2],
+			 struct fuse_file_info *fi);
+
+	/**
+	 * Map block index within file to block index within device
+	 *
+	 * Note: This makes sense only for block device backed filesystems
+	 * mounted with the 'blkdev' option
+	 */
+	int (*bmap) (const char *, size_t blocksize, uint64_t *idx);
+
+	/**
+	 * Ioctl
+	 *
+	 * flags will have FUSE_IOCTL_COMPAT set for 32bit ioctls in
+	 * 64bit environment.  The size and direction of data is
+	 * determined by _IOC_*() decoding of cmd.  For _IOC_NONE,
+	 * data will be NULL, for _IOC_WRITE data is out area, for
+	 * _IOC_READ in area and if both are set in/out area.  In all
+	 * non-NULL cases, the area is of _IOC_SIZE(cmd) bytes.
+	 *
+	 * If flags has FUSE_IOCTL_DIR then the fuse_file_info refers to a
+	 * directory file handle.
+	 *
+	 * Note : the unsigned long request submitted by the application
+	 * is truncated to 32 bits.
+	 */
+	int (*ioctl) (const char *, unsigned int cmd, void *arg,
+		      struct fuse_file_info *, unsigned int flags, void *data);
+
+	/**
+	 * Poll for IO readiness events
+	 *
+	 * Note: If ph is non-NULL, the client should notify
+	 * when IO readiness events occur by calling
+	 * fuse_notify_poll() with the specified ph.
+	 *
+	 * Regardless of the number of times poll with a non-NULL ph
+	 * is received, single notification is enough to clear all.
+	 * Notifying more times incurs overhead but doesn't harm
+	 * correctness.
+	 *
+	 * The callee is responsible for destroying ph with
+	 * fuse_pollhandle_destroy() when no longer in use.
+	 */
+	int (*poll) (const char *, struct fuse_file_info *,
+		     struct fuse_pollhandle *ph, unsigned *reventsp);
+
+	/** Write contents of buffer to an open file
+	 *
+	 * Similar to the write() method, but data is supplied in a
+	 * generic buffer.  Use fuse_buf_copy() to transfer data to
+	 * the destination.
+	 *
+	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+	 * expected to reset the setuid and setgid bits.
+	 */
+	int (*write_buf) (const char *, struct fuse_bufvec *buf, off_t off,
+			  struct fuse_file_info *);
+
+	/** Store data from an open file in a buffer
+	 *
+	 * Similar to the read() method, but data is stored and
+	 * returned in a generic buffer.
+	 *
+	 * No actual copying of data has to take place, the source
+	 * file descriptor may simply be stored in the buffer for
+	 * later data transfer.
+	 *
+	 * The buffer must be allocated dynamically and stored at the
+	 * location pointed to by bufp.  If the buffer contains memory
+	 * regions, they too must be allocated using malloc().  The
+	 * allocated memory will be freed by the caller.
+	 */
+	int (*read_buf) (const char *, struct fuse_bufvec **bufp,
+			 size_t size, off_t off, struct fuse_file_info *);
+	/**
+	 * Perform BSD file locking operation
+	 *
+	 * The op argument will be either LOCK_SH, LOCK_EX or LOCK_UN
+	 *
+	 * Nonblocking requests will be indicated by ORing LOCK_NB to
+	 * the above operations
+	 *
+	 * For more information see the flock(2) manual page.
+	 *
+	 * Additionally fi->owner will be set to a value unique to
+	 * this open file.  This same value will be supplied to
+	 * ->release() when the file is released.
+	 *
+	 * Note: if this method is not implemented, the kernel will still
+	 * allow file locking to work locally.  Hence it is only
+	 * interesting for network filesystems and similar.
+	 */
+	int (*flock) (const char *, struct fuse_file_info *, int op);
+
+	/**
+	 * Allocates space for an open file
+	 *
+	 * This function ensures that required space is allocated for specified
+	 * file.  If this function returns success then any subsequent write
+	 * request to specified range is guaranteed not to fail because of lack
+	 * of space on the file system media.
+	 */
+	int (*fallocate) (const char *, int, off_t, off_t,
+			  struct fuse_file_info *);
+
+	/**
+	 * Copy a range of data from one file to another
+	 *
+	 * Performs an optimized copy between two file descriptors without the
+	 * additional cost of transferring data through the FUSE kernel module
+	 * to user space (glibc) and then back into the FUSE filesystem again.
+	 *
+	 * In case this method is not implemented, glibc falls back to reading
+	 * data from the source and writing to the destination. Effectively
+	 * doing an inefficient copy of the data.
+	 */
+	ssize_t (*copy_file_range) (const char *path_in,
+				    struct fuse_file_info *fi_in,
+				    off_t offset_in, const char *path_out,
+				    struct fuse_file_info *fi_out,
+				    off_t offset_out, size_t size, int flags);
+
+	/**
+	 * Find next data or hole after the specified offset
+	 */
+	off_t (*lseek) (const char *, off_t off, int whence, struct fuse_file_info *);
+};
+
+/** Extra context that may be needed by some filesystems
+ *
+ * The uid, gid and pid fields are not filled in case of a writepage
+ * operation.
+ */
+struct fuse_context {
+	/** Pointer to the fuse object */
+	struct fuse *fuse;
+
+	/** User ID of the calling process */
+	uid_t uid;
+
+	/** Group ID of the calling process */
+	gid_t gid;
+
+	/** Process ID of the calling thread */
+	pid_t pid;
+
+	/** Private filesystem data */
+	void *private_data;
+
+	/** Umask of the calling process */
+	mode_t umask;
+};
+
+/**
+ * Main function of FUSE.
+ *
+ * This is for the lazy.  This is all that has to be called from the
+ * main() function.
+ *
+ * This function does the following:
+ *   - parses command line options, and handles --help and
+ *     --version
+ *   - installs signal handlers for INT, HUP, TERM and PIPE
+ *   - registers an exit handler to unmount the filesystem on program exit
+ *   - creates a fuse handle
+ *   - registers the operations
+ *   - calls either the single-threaded or the multi-threaded event loop
+ *
+ * Most file systems will have to parse some file-system specific
+ * arguments before calling this function. It is recommended to do
+ * this with fuse_opt_parse() and a processing function that passes
+ * through any unknown options (this can also be achieved by just
+ * passing NULL as the processing function). That way, the remaining
+ * options can be passed directly to fuse_main().
+ *
+ * fuse_main() accepts all options that can be passed to
+ * fuse_parse_cmdline(), fuse_new(), or fuse_session_new().
+ *
+ * Option parsing skips argv[0], which is assumed to contain the
+ * program name. This element must always be present and is used to
+ * construct a basic ``usage: `` message for the --help
+ * output. argv[0] may also be set to the empty string. In this case
+ * the usage message is suppressed. This can be used by file systems
+ * to print their own usage line first. See hello.c for an example of
+ * how to do this.
+ *
+ * Note: this is currently implemented as a macro.
+ *
+ * The following error codes may be returned from fuse_main():
+ *   1: Invalid option arguments
+ *   2: No mount point specified
+ *   3: FUSE setup failed
+ *   4: Mounting failed
+ *   5: Failed to daemonize (detach from session)
+ *   6: Failed to set up signal handlers
+ *   7: An error occured during the life of the file system
+ *
+ * @param argc the argument counter passed to the main() function
+ * @param argv the argument vector passed to the main() function
+ * @param op the file system operation
+ * @param private_data Initial value for the `private_data`
+ *            field of `struct fuse_context`. May be overridden by the
+ *            `struct fuse_operations.init` handler.
+ * @return 0 on success, nonzero on failure
+ *
+ * Example usage, see hello.c
+ */
+/*
+  int fuse_main(int argc, char *argv[], const struct fuse_operations *op,
+  void *private_data);
+*/
+#define fuse_main(argc, argv, op, private_data)				\
+	fuse_main_real(argc, argv, op, sizeof(*(op)), private_data)
+
+/* ----------------------------------------------------------- *
+ * More detailed API					       *
+ * ----------------------------------------------------------- */
+
+/**
+ * Print available options (high- and low-level) to stdout.  This is
+ * not an exhaustive list, but includes only those options that may be
+ * of interest to an end-user of a file system.
+ *
+ * The function looks at the argument vector only to determine if
+ * there are additional modules to be loaded (module=foo option),
+ * and attempts to call their help functions as well.
+ *
+ * @param args the argument vector.
+ */
+void fuse_lib_help(struct fuse_args *args);
+
+/**
+ * Create a new FUSE filesystem.
+ *
+ * This function accepts most file-system independent mount options
+ * (like context, nodev, ro - see mount(8)), as well as the
+ * FUSE-specific mount options from mount.fuse(8).
+ *
+ * If the --help option is specified, the function writes a help text
+ * to stdout and returns NULL.
+ *
+ * Option parsing skips argv[0], which is assumed to contain the
+ * program name. This element must always be present and is used to
+ * construct a basic ``usage: `` message for the --help output. If
+ * argv[0] is set to the empty string, no usage message is included in
+ * the --help output.
+ *
+ * If an unknown option is passed in, an error message is written to
+ * stderr and the function returns NULL.
+ *
+ * @param args argument vector
+ * @param op the filesystem operations
+ * @param op_size the size of the fuse_operations structure
+ * @param private_data Initial value for the `private_data`
+ *            field of `struct fuse_context`. May be overridden by the
+ *            `struct fuse_operations.init` handler.
+ * @return the created FUSE handle
+ */
+#if FUSE_USE_VERSION == 30
+struct fuse *fuse_new_30(struct fuse_args *args, const struct fuse_operations *op,
+			 size_t op_size, void *private_data);
+#define fuse_new(args, op, size, data) fuse_new_30(args, op, size, data)
+#else
+struct fuse *fuse_new(struct fuse_args *args, const struct fuse_operations *op,
+		      size_t op_size, void *private_data);
+#endif
+
+/**
+ * Mount a FUSE file system.
+ *
+ * @param mountpoint the mount point path
+ * @param f the FUSE handle
+ *
+ * @return 0 on success, -1 on failure.
+ **/
+int fuse_mount(struct fuse *f, const char *mountpoint);
+
+/**
+ * Unmount a FUSE file system.
+ *
+ * See fuse_session_unmount() for additional information.
+ *
+ * @param f the FUSE handle
+ **/
+void fuse_unmount(struct fuse *f);
+
+/**
+ * Destroy the FUSE handle.
+ *
+ * NOTE: This function does not unmount the filesystem.	 If this is
+ * needed, call fuse_unmount() before calling this function.
+ *
+ * @param f the FUSE handle
+ */
+void fuse_destroy(struct fuse *f);
+
+/**
+ * FUSE event loop.
+ *
+ * Requests from the kernel are processed, and the appropriate
+ * operations are called.
+ *
+ * For a description of the return value and the conditions when the
+ * event loop exits, refer to the documentation of
+ * fuse_session_loop().
+ *
+ * @param f the FUSE handle
+ * @return see fuse_session_loop()
+ *
+ * See also: fuse_loop_mt()
+ */
+int fuse_loop(struct fuse *f);
+
+/**
+ * Flag session as terminated
+ *
+ * This function will cause any running event loops to exit on
+ * the next opportunity.
+ *
+ * @param f the FUSE handle
+ */
+void fuse_exit(struct fuse *f);
+
+/**
+ * FUSE event loop with multiple threads
+ *
+ * Requests from the kernel are processed, and the appropriate
+ * operations are called.  Request are processed in parallel by
+ * distributing them between multiple threads.
+ *
+ * For a description of the return value and the conditions when the
+ * event loop exits, refer to the documentation of
+ * fuse_session_loop().
+ *
+ * Note: using fuse_loop() instead of fuse_loop_mt() means you are running in
+ * single-threaded mode, and that you will not have to worry about reentrancy,
+ * though you will have to worry about recursive lookups. In single-threaded
+ * mode, FUSE will wait for one callback to return before calling another.
+ *
+ * Enabling multiple threads, by using fuse_loop_mt(), will cause FUSE to make
+ * multiple simultaneous calls into the various callback functions given by your
+ * fuse_operations record.
+ *
+ * If you are using multiple threads, you can enjoy all the parallel execution
+ * and interactive response benefits of threads, and you get to enjoy all the
+ * benefits of race conditions and locking bugs, too. Ensure that any code used
+ * in the callback function of fuse_operations is also thread-safe.
+ *
+ * @param f the FUSE handle
+ * @param config loop configuration
+ * @return see fuse_session_loop()
+ *
+ * See also: fuse_loop()
+ */
+#if FUSE_USE_VERSION < 32
+int fuse_loop_mt_31(struct fuse *f, int clone_fd);
+#define fuse_loop_mt(f, clone_fd) fuse_loop_mt_31(f, clone_fd)
+#else
+int fuse_loop_mt(struct fuse *f, struct fuse_loop_config *config);
+#endif
+
+/**
+ * Get the current context
+ *
+ * The context is only valid for the duration of a filesystem
+ * operation, and thus must not be stored and used later.
+ *
+ * @return the context
+ */
+struct fuse_context *fuse_get_context(void);
+
+/**
+ * Get the current supplementary group IDs for the current request
+ *
+ * Similar to the getgroups(2) system call, except the return value is
+ * always the total number of group IDs, even if it is larger than the
+ * specified size.
+ *
+ * The current fuse kernel module in linux (as of 2.6.30) doesn't pass
+ * the group list to userspace, hence this function needs to parse
+ * "/proc/$TID/task/$TID/status" to get the group IDs.
+ *
+ * This feature may not be supported on all operating systems.  In
+ * such a case this function will return -ENOSYS.
+ *
+ * @param size size of given array
+ * @param list array of group IDs to be filled in
+ * @return the total number of supplementary group IDs or -errno on failure
+ */
+int fuse_getgroups(int size, gid_t list[]);
+
+/**
+ * Check if the current request has already been interrupted
+ *
+ * @return 1 if the request has been interrupted, 0 otherwise
+ */
+int fuse_interrupted(void);
+
+/**
+ * Invalidates cache for the given path.
+ *
+ * This calls fuse_lowlevel_notify_inval_inode internally.
+ *
+ * @return 0 on successful invalidation, negative error value otherwise.
+ *         This routine may return -ENOENT to indicate that there was
+ *         no entry to be invalidated, e.g., because the path has not
+ *         been seen before or has been forgotten; this should not be
+ *         considered to be an error.
+ */
+int fuse_invalidate_path(struct fuse *f, const char *path);
+
+/**
+ * The real main function
+ *
+ * Do not call this directly, use fuse_main()
+ */
+int fuse_main_real(int argc, char *argv[], const struct fuse_operations *op,
+		   size_t op_size, void *private_data);
+
+/**
+ * Start the cleanup thread when using option "remember".
+ *
+ * This is done automatically by fuse_loop_mt()
+ * @param fuse struct fuse pointer for fuse instance
+ * @return 0 on success and -1 on error
+ */
+int fuse_start_cleanup_thread(struct fuse *fuse);
+
+/**
+ * Stop the cleanup thread when using option "remember".
+ *
+ * This is done automatically by fuse_loop_mt()
+ * @param fuse struct fuse pointer for fuse instance
+ */
+void fuse_stop_cleanup_thread(struct fuse *fuse);
+
+/**
+ * Iterate over cache removing stale entries
+ * use in conjunction with "-oremember"
+ *
+ * NOTE: This is already done for the standard sessions
+ *
+ * @param fuse struct fuse pointer for fuse instance
+ * @return the number of seconds until the next cleanup
+ */
+int fuse_clean_cache(struct fuse *fuse);
+
+/*
+ * Stacking API
+ */
+
+/**
+ * Fuse filesystem object
+ *
+ * This is opaque object represents a filesystem layer
+ */
+struct fuse_fs;
+
+/*
+ * These functions call the relevant filesystem operation, and return
+ * the result.
+ *
+ * If the operation is not defined, they return -ENOSYS, with the
+ * exception of fuse_fs_open, fuse_fs_release, fuse_fs_opendir,
+ * fuse_fs_releasedir and fuse_fs_statfs, which return 0.
+ */
+
+int fuse_fs_getattr(struct fuse_fs *fs, const char *path, struct stat *buf,
+		    struct fuse_file_info *fi);
+int fuse_fs_rename(struct fuse_fs *fs, const char *oldpath,
+		   const char *newpath, unsigned int flags);
+int fuse_fs_unlink(struct fuse_fs *fs, const char *path);
+int fuse_fs_rmdir(struct fuse_fs *fs, const char *path);
+int fuse_fs_symlink(struct fuse_fs *fs, const char *linkname,
+		    const char *path);
+int fuse_fs_link(struct fuse_fs *fs, const char *oldpath, const char *newpath);
+int fuse_fs_release(struct fuse_fs *fs,	 const char *path,
+		    struct fuse_file_info *fi);
+int fuse_fs_open(struct fuse_fs *fs, const char *path,
+		 struct fuse_file_info *fi);
+int fuse_fs_read(struct fuse_fs *fs, const char *path, char *buf, size_t size,
+		 off_t off, struct fuse_file_info *fi);
+int fuse_fs_read_buf(struct fuse_fs *fs, const char *path,
+		     struct fuse_bufvec **bufp, size_t size, off_t off,
+		     struct fuse_file_info *fi);
+int fuse_fs_write(struct fuse_fs *fs, const char *path, const char *buf,
+		  size_t size, off_t off, struct fuse_file_info *fi);
+int fuse_fs_write_buf(struct fuse_fs *fs, const char *path,
+		      struct fuse_bufvec *buf, off_t off,
+		      struct fuse_file_info *fi);
+int fuse_fs_fsync(struct fuse_fs *fs, const char *path, int datasync,
+		  struct fuse_file_info *fi);
+int fuse_fs_flush(struct fuse_fs *fs, const char *path,
+		  struct fuse_file_info *fi);
+int fuse_fs_statfs(struct fuse_fs *fs, const char *path, struct statvfs *buf);
+int fuse_fs_opendir(struct fuse_fs *fs, const char *path,
+		    struct fuse_file_info *fi);
+int fuse_fs_readdir(struct fuse_fs *fs, const char *path, void *buf,
+		    fuse_fill_dir_t filler, off_t off,
+		    struct fuse_file_info *fi, enum fuse_readdir_flags flags);
+int fuse_fs_fsyncdir(struct fuse_fs *fs, const char *path, int datasync,
+		     struct fuse_file_info *fi);
+int fuse_fs_releasedir(struct fuse_fs *fs, const char *path,
+		       struct fuse_file_info *fi);
+int fuse_fs_create(struct fuse_fs *fs, const char *path, mode_t mode,
+		   struct fuse_file_info *fi);
+int fuse_fs_lock(struct fuse_fs *fs, const char *path,
+		 struct fuse_file_info *fi, int cmd, struct flock *lock);
+int fuse_fs_flock(struct fuse_fs *fs, const char *path,
+		  struct fuse_file_info *fi, int op);
+int fuse_fs_chmod(struct fuse_fs *fs, const char *path, mode_t mode,
+		  struct fuse_file_info *fi);
+int fuse_fs_chown(struct fuse_fs *fs, const char *path, uid_t uid, gid_t gid,
+		  struct fuse_file_info *fi);
+int fuse_fs_truncate(struct fuse_fs *fs, const char *path, off_t size,
+		     struct fuse_file_info *fi);
+int fuse_fs_utimens(struct fuse_fs *fs, const char *path,
+		    const struct timespec tv[2], struct fuse_file_info *fi);
+int fuse_fs_access(struct fuse_fs *fs, const char *path, int mask);
+int fuse_fs_readlink(struct fuse_fs *fs, const char *path, char *buf,
+		     size_t len);
+int fuse_fs_mknod(struct fuse_fs *fs, const char *path, mode_t mode,
+		  dev_t rdev);
+int fuse_fs_mkdir(struct fuse_fs *fs, const char *path, mode_t mode);
+int fuse_fs_setxattr(struct fuse_fs *fs, const char *path, const char *name,
+		     const char *value, size_t size, int flags);
+int fuse_fs_getxattr(struct fuse_fs *fs, const char *path, const char *name,
+		     char *value, size_t size);
+int fuse_fs_listxattr(struct fuse_fs *fs, const char *path, char *list,
+		      size_t size);
+int fuse_fs_removexattr(struct fuse_fs *fs, const char *path,
+			const char *name);
+int fuse_fs_bmap(struct fuse_fs *fs, const char *path, size_t blocksize,
+		 uint64_t *idx);
+int fuse_fs_ioctl(struct fuse_fs *fs, const char *path, unsigned int cmd,
+		  void *arg, struct fuse_file_info *fi, unsigned int flags,
+		  void *data);
+int fuse_fs_poll(struct fuse_fs *fs, const char *path,
+		 struct fuse_file_info *fi, struct fuse_pollhandle *ph,
+		 unsigned *reventsp);
+int fuse_fs_fallocate(struct fuse_fs *fs, const char *path, int mode,
+		 off_t offset, off_t length, struct fuse_file_info *fi);
+ssize_t fuse_fs_copy_file_range(struct fuse_fs *fs, const char *path_in,
+				struct fuse_file_info *fi_in, off_t off_in,
+				const char *path_out,
+				struct fuse_file_info *fi_out, off_t off_out,
+				size_t len, int flags);
+off_t fuse_fs_lseek(struct fuse_fs *fs, const char *path, off_t off, int whence,
+		    struct fuse_file_info *fi);
+void fuse_fs_init(struct fuse_fs *fs, struct fuse_conn_info *conn,
+		struct fuse_config *cfg);
+void fuse_fs_destroy(struct fuse_fs *fs);
+
+int fuse_notify_poll(struct fuse_pollhandle *ph);
+
+/**
+ * Create a new fuse filesystem object
+ *
+ * This is usually called from the factory of a fuse module to create
+ * a new instance of a filesystem.
+ *
+ * @param op the filesystem operations
+ * @param op_size the size of the fuse_operations structure
+ * @param private_data Initial value for the `private_data`
+ *            field of `struct fuse_context`. May be overridden by the
+ *            `struct fuse_operations.init` handler.
+ * @return a new filesystem object
+ */
+struct fuse_fs *fuse_fs_new(const struct fuse_operations *op, size_t op_size,
+			    void *private_data);
+
+/**
+ * Factory for creating filesystem objects
+ *
+ * The function may use and remove options from 'args' that belong
+ * to this module.
+ *
+ * For now the 'fs' vector always contains exactly one filesystem.
+ * This is the filesystem which will be below the newly created
+ * filesystem in the stack.
+ *
+ * @param args the command line arguments
+ * @param fs NULL terminated filesystem object vector
+ * @return the new filesystem object
+ */
+typedef struct fuse_fs *(*fuse_module_factory_t)(struct fuse_args *args,
+						 struct fuse_fs *fs[]);
+/**
+ * Register filesystem module
+ *
+ * If the "-omodules=*name*_:..." option is present, filesystem
+ * objects are created and pushed onto the stack with the *factory_*
+ * function.
+ *
+ * @param name_ the name of this filesystem module
+ * @param factory_ the factory function for this filesystem module
+ */
+#define FUSE_REGISTER_MODULE(name_, factory_) \
+	fuse_module_factory_t fuse_module_ ## name_ ## _factory = factory_
+
+/** Get session from fuse object */
+struct fuse_session *fuse_get_session(struct fuse *f);
+
+/**
+ * Open a FUSE file descriptor and set up the mount for the given
+ * mountpoint and flags.
+ *
+ * @param mountpoint reference to the mount in the file system
+ * @param options mount options
+ * @return the FUSE file descriptor or -1 upon error
+ */
+int fuse_open_channel(const char *mountpoint, const char *options);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* FUSE_H_ */
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
new file mode 100644
index 0000000000..2d686b2ac4
--- /dev/null
+++ b/tools/virtiofsd/fuse_common.h
@@ -0,0 +1,823 @@
+/*  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB.
+*/
+
+/** @file */
+
+#if !defined(FUSE_H_) && !defined(FUSE_LOWLEVEL_H_)
+#error "Never include <fuse_common.h> directly; use <fuse.h> or <fuse_lowlevel.h> instead."
+#endif
+
+#ifndef FUSE_COMMON_H_
+#define FUSE_COMMON_H_
+
+#include "fuse_opt.h"
+#include "fuse_log.h"
+#include <stdint.h>
+#include <sys/types.h>
+
+/** Major version of FUSE library interface */
+#define FUSE_MAJOR_VERSION 3
+
+/** Minor version of FUSE library interface */
+#define FUSE_MINOR_VERSION 2
+
+#define FUSE_MAKE_VERSION(maj, min)  ((maj) * 10 + (min))
+#define FUSE_VERSION FUSE_MAKE_VERSION(FUSE_MAJOR_VERSION, FUSE_MINOR_VERSION)
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Information about an open file.
+ *
+ * File Handles are created by the open, opendir, and create methods and closed
+ * by the release and releasedir methods.  Multiple file handles may be
+ * concurrently open for the same file.  Generally, a client will create one
+ * file handle per file descriptor, though in some cases multiple file
+ * descriptors can share a single file handle.
+ */
+struct fuse_file_info {
+	/** Open flags.	 Available in open() and release() */
+	int flags;
+
+	/** In case of a write operation indicates if this was caused
+	    by a delayed write from the page cache. If so, then the
+	    context's pid, uid, and gid fields will not be valid, and
+	    the *fh* value may not match the *fh* value that would
+	    have been sent with the corresponding individual write
+	    requests if write caching had been disabled. */
+	unsigned int writepage : 1;
+
+	/** Can be filled in by open, to use direct I/O on this file. */
+	unsigned int direct_io : 1;
+
+	/** Can be filled in by open. It signals the kernel that any
+	    currently cached file data (ie., data that the filesystem
+	    provided the last time the file was open) need not be
+	    invalidated. Has no effect when set in other contexts (in
+	    particular it does nothing when set by opendir()). */
+	unsigned int keep_cache : 1;
+
+	/** Indicates a flush operation.  Set in flush operation, also
+	    maybe set in highlevel lock operation and lowlevel release
+	    operation. */
+	unsigned int flush : 1;
+
+	/** Can be filled in by open, to indicate that the file is not
+	    seekable. */
+	unsigned int nonseekable : 1;
+
+	/* Indicates that flock locks for this file should be
+	   released.  If set, lock_owner shall contain a valid value.
+	   May only be set in ->release(). */
+	unsigned int flock_release : 1;
+
+	/** Can be filled in by opendir. It signals the kernel to
+	    enable caching of entries returned by readdir().  Has no
+	    effect when set in other contexts (in particular it does
+	    nothing when set by open()). */
+	unsigned int cache_readdir : 1;
+
+	/** Padding.  Reserved for future use*/
+	unsigned int padding : 25;
+	unsigned int padding2 : 32;
+
+	/** File handle id.  May be filled in by filesystem in create,
+	 * open, and opendir().  Available in most other file operations on the
+	 * same file handle. */
+	uint64_t fh;
+
+	/** Lock owner id.  Available in locking operations and flush */
+	uint64_t lock_owner;
+
+	/** Requested poll events.  Available in ->poll.  Only set on kernels
+	    which support it.  If unsupported, this field is set to zero. */
+	uint32_t poll_events;
+};
+
+/**
+ * Configuration parameters passed to fuse_session_loop_mt() and
+ * fuse_loop_mt().
+ */
+struct fuse_loop_config {
+	/**
+	 * whether to use separate device fds for each thread
+	 * (may increase performance)
+	 */
+	int clone_fd;
+
+	/**
+	 * The maximum number of available worker threads before they
+	 * start to get deleted when they become idle. If not
+	 * specified, the default is 10.
+	 *
+	 * Adjusting this has performance implications; a very small number
+	 * of threads in the pool will cause a lot of thread creation and
+	 * deletion overhead and performance may suffer. When set to 0, a new
+	 * thread will be created to service every operation.
+	 */
+	unsigned int max_idle_threads;
+};
+
+/**************************************************************************
+ * Capability bits for 'fuse_conn_info.capable' and 'fuse_conn_info.want' *
+ **************************************************************************/
+
+/**
+ * Indicates that the filesystem supports asynchronous read requests.
+ *
+ * If this capability is not requested/available, the kernel will
+ * ensure that there is at most one pending read request per
+ * file-handle at any time, and will attempt to order read requests by
+ * increasing offset.
+ *
+ * This feature is enabled by default when supported by the kernel.
+ */
+#define FUSE_CAP_ASYNC_READ		(1 << 0)
+
+/**
+ * Indicates that the filesystem supports "remote" locking.
+ *
+ * This feature is enabled by default when supported by the kernel,
+ * and if getlk() and setlk() handlers are implemented.
+ */
+#define FUSE_CAP_POSIX_LOCKS		(1 << 1)
+
+/**
+ * Indicates that the filesystem supports the O_TRUNC open flag.  If
+ * disabled, and an application specifies O_TRUNC, fuse first calls
+ * truncate() and then open() with O_TRUNC filtered out.
+ *
+ * This feature is enabled by default when supported by the kernel.
+ */
+#define FUSE_CAP_ATOMIC_O_TRUNC		(1 << 3)
+
+/**
+ * Indicates that the filesystem supports lookups of "." and "..".
+ *
+ * This feature is disabled by default.
+ */
+#define FUSE_CAP_EXPORT_SUPPORT		(1 << 4)
+
+/**
+ * Indicates that the kernel should not apply the umask to the
+ * file mode on create operations.
+ *
+ * This feature is disabled by default.
+ */
+#define FUSE_CAP_DONT_MASK		(1 << 6)
+
+/**
+ * Indicates that libfuse should try to use splice() when writing to
+ * the fuse device. This may improve performance.
+ *
+ * This feature is disabled by default.
+ */
+#define FUSE_CAP_SPLICE_WRITE		(1 << 7)
+
+/**
+ * Indicates that libfuse should try to move pages instead of copying when
+ * writing to / reading from the fuse device. This may improve performance.
+ *
+ * This feature is disabled by default.
+ */
+#define FUSE_CAP_SPLICE_MOVE		(1 << 8)
+
+/**
+ * Indicates that libfuse should try to use splice() when reading from
+ * the fuse device. This may improve performance.
+ *
+ * This feature is enabled by default when supported by the kernel and
+ * if the filesystem implements a write_buf() handler.
+ */
+#define FUSE_CAP_SPLICE_READ		(1 << 9)
+
+/**
+ * If set, the calls to flock(2) will be emulated using POSIX locks and must
+ * then be handled by the filesystem's setlock() handler.
+ *
+ * If not set, flock(2) calls will be handled by the FUSE kernel module
+ * internally (so any access that does not go through the kernel cannot be taken
+ * into account).
+ *
+ * This feature is enabled by default when supported by the kernel and
+ * if the filesystem implements a flock() handler.
+ */
+#define FUSE_CAP_FLOCK_LOCKS		(1 << 10)
+
+/**
+ * Indicates that the filesystem supports ioctl's on directories.
+ *
+ * This feature is enabled by default when supported by the kernel.
+ */
+#define FUSE_CAP_IOCTL_DIR		(1 << 11)
+
+/**
+ * Traditionally, while a file is open the FUSE kernel module only
+ * asks the filesystem for an update of the file's attributes when a
+ * client attempts to read beyond EOF. This is unsuitable for
+ * e.g. network filesystems, where the file contents may change
+ * without the kernel knowing about it.
+ *
+ * If this flag is set, FUSE will check the validity of the attributes
+ * on every read. If the attributes are no longer valid (i.e., if the
+ * *attr_timeout* passed to fuse_reply_attr() or set in `struct
+ * fuse_entry_param` has passed), it will first issue a `getattr`
+ * request. If the new mtime differs from the previous value, any
+ * cached file *contents* will be invalidated as well.
+ *
+ * This flag should always be set when available. If all file changes
+ * go through the kernel, *attr_timeout* should be set to a very large
+ * number to avoid unnecessary getattr() calls.
+ *
+ * This feature is enabled by default when supported by the kernel.
+ */
+#define FUSE_CAP_AUTO_INVAL_DATA	(1 << 12)
+
+/**
+ * Indicates that the filesystem supports readdirplus.
+ *
+ * This feature is enabled by default when supported by the kernel and if the
+ * filesystem implements a readdirplus() handler.
+ */
+#define FUSE_CAP_READDIRPLUS		(1 << 13)
+
+/**
+ * Indicates that the filesystem supports adaptive readdirplus.
+ *
+ * If FUSE_CAP_READDIRPLUS is not set, this flag has no effect.
+ *
+ * If FUSE_CAP_READDIRPLUS is set and this flag is not set, the kernel
+ * will always issue readdirplus() requests to retrieve directory
+ * contents.
+ *
+ * If FUSE_CAP_READDIRPLUS is set and this flag is set, the kernel
+ * will issue both readdir() and readdirplus() requests, depending on
+ * how much information is expected to be required.
+ *
+ * As of Linux 4.20, the algorithm is as follows: when userspace
+ * starts to read directory entries, issue a READDIRPLUS request to
+ * the filesystem. If any entry attributes have been looked up by the
+ * time userspace requests the next batch of entries continue with
+ * READDIRPLUS, otherwise switch to plain READDIR.  This will reasult
+ * in eg plain "ls" triggering READDIRPLUS first then READDIR after
+ * that because it doesn't do lookups.  "ls -l" should result in all
+ * READDIRPLUS, except if dentries are already cached.
+ *
+ * This feature is enabled by default when supported by the kernel and
+ * if the filesystem implements both a readdirplus() and a readdir()
+ * handler.
+ */
+#define FUSE_CAP_READDIRPLUS_AUTO	(1 << 14)
+
+/**
+ * Indicates that the filesystem supports asynchronous direct I/O submission.
+ *
+ * If this capability is not requested/available, the kernel will ensure that
+ * there is at most one pending read and one pending write request per direct
+ * I/O file-handle at any time.
+ *
+ * This feature is enabled by default when supported by the kernel.
+ */
+#define FUSE_CAP_ASYNC_DIO		(1 << 15)
+
+/**
+ * Indicates that writeback caching should be enabled. This means that
+ * individual write request may be buffered and merged in the kernel
+ * before they are send to the filesystem.
+ *
+ * This feature is disabled by default.
+ */
+#define FUSE_CAP_WRITEBACK_CACHE	(1 << 16)
+
+/**
+ * Indicates support for zero-message opens. If this flag is set in
+ * the `capable` field of the `fuse_conn_info` structure, then the
+ * filesystem may return `ENOSYS` from the open() handler to indicate
+ * success. Further attempts to open files will be handled in the
+ * kernel. (If this flag is not set, returning ENOSYS will be treated
+ * as an error and signaled to the caller).
+ *
+ * Setting (or unsetting) this flag in the `want` field has *no
+ * effect*.
+ */
+#define FUSE_CAP_NO_OPEN_SUPPORT	(1 << 17)
+
+/**
+ * Indicates support for parallel directory operations. If this flag
+ * is unset, the FUSE kernel module will ensure that lookup() and
+ * readdir() requests are never issued concurrently for the same
+ * directory.
+ *
+ * This feature is enabled by default when supported by the kernel.
+ */
+#define FUSE_CAP_PARALLEL_DIROPS        (1 << 18)
+
+/**
+ * Indicates support for POSIX ACLs.
+ *
+ * If this feature is enabled, the kernel will cache and have
+ * responsibility for enforcing ACLs. ACL will be stored as xattrs and
+ * passed to userspace, which is responsible for updating the ACLs in
+ * the filesystem, keeping the file mode in sync with the ACL, and
+ * ensuring inheritance of default ACLs when new filesystem nodes are
+ * created. Note that this requires that the file system is able to
+ * parse and interpret the xattr representation of ACLs.
+ *
+ * Enabling this feature implicitly turns on the
+ * ``default_permissions`` mount option (even if it was not passed to
+ * mount(2)).
+ *
+ * This feature is disabled by default.
+ */
+#define FUSE_CAP_POSIX_ACL              (1 << 19)
+
+/**
+ * Indicates that the filesystem is responsible for unsetting
+ * setuid and setgid bits when a file is written, truncated, or
+ * its owner is changed.
+ *
+ * This feature is enabled by default when supported by the kernel.
+ */
+#define FUSE_CAP_HANDLE_KILLPRIV         (1 << 20)
+
+/**
+ * Indicates support for zero-message opendirs. If this flag is set in
+ * the `capable` field of the `fuse_conn_info` structure, then the filesystem
+ * may return `ENOSYS` from the opendir() handler to indicate success. Further
+ * opendir and releasedir messages will be handled in the kernel. (If this
+ * flag is not set, returning ENOSYS will be treated as an error and signalled
+ * to the caller.)
+ *
+ * Setting (or unsetting) this flag in the `want` field has *no effect*.
+ */
+#define FUSE_CAP_NO_OPENDIR_SUPPORT    (1 << 24)
+
+/**
+ * Ioctl flags
+ *
+ * FUSE_IOCTL_COMPAT: 32bit compat ioctl on 64bit machine
+ * FUSE_IOCTL_UNRESTRICTED: not restricted to well-formed ioctls, retry allowed
+ * FUSE_IOCTL_RETRY: retry with new iovecs
+ * FUSE_IOCTL_DIR: is a directory
+ *
+ * FUSE_IOCTL_MAX_IOV: maximum of in_iovecs + out_iovecs
+ */
+#define FUSE_IOCTL_COMPAT	(1 << 0)
+#define FUSE_IOCTL_UNRESTRICTED	(1 << 1)
+#define FUSE_IOCTL_RETRY	(1 << 2)
+#define FUSE_IOCTL_DIR		(1 << 4)
+
+#define FUSE_IOCTL_MAX_IOV	256
+
+/**
+ * Connection information, passed to the ->init() method
+ *
+ * Some of the elements are read-write, these can be changed to
+ * indicate the value requested by the filesystem.  The requested
+ * value must usually be smaller than the indicated value.
+ */
+struct fuse_conn_info {
+	/**
+	 * Major version of the protocol (read-only)
+	 */
+	unsigned proto_major;
+
+	/**
+	 * Minor version of the protocol (read-only)
+	 */
+	unsigned proto_minor;
+
+	/**
+	 * Maximum size of the write buffer
+	 */
+	unsigned max_write;
+
+	/**
+	 * Maximum size of read requests. A value of zero indicates no
+	 * limit. However, even if the filesystem does not specify a
+	 * limit, the maximum size of read requests will still be
+	 * limited by the kernel.
+	 *
+	 * NOTE: For the time being, the maximum size of read requests
+	 * must be set both here *and* passed to fuse_session_new()
+	 * using the ``-o max_read=<n>`` mount option. At some point
+	 * in the future, specifying the mount option will no longer
+	 * be necessary.
+	 */
+	unsigned max_read;
+
+	/**
+	 * Maximum readahead
+	 */
+	unsigned max_readahead;
+
+	/**
+	 * Capability flags that the kernel supports (read-only)
+	 */
+	unsigned capable;
+
+	/**
+	 * Capability flags that the filesystem wants to enable.
+	 *
+	 * libfuse attempts to initialize this field with
+	 * reasonable default values before calling the init() handler.
+	 */
+	unsigned want;
+
+	/**
+	 * Maximum number of pending "background" requests. A
+	 * background request is any type of request for which the
+	 * total number is not limited by other means. As of kernel
+	 * 4.8, only two types of requests fall into this category:
+	 *
+	 *   1. Read-ahead requests
+	 *   2. Asynchronous direct I/O requests
+	 *
+	 * Read-ahead requests are generated (if max_readahead is
+	 * non-zero) by the kernel to preemptively fill its caches
+	 * when it anticipates that userspace will soon read more
+	 * data.
+	 *
+	 * Asynchronous direct I/O requests are generated if
+	 * FUSE_CAP_ASYNC_DIO is enabled and userspace submits a large
+	 * direct I/O request. In this case the kernel will internally
+	 * split it up into multiple smaller requests and submit them
+	 * to the filesystem concurrently.
+	 *
+	 * Note that the following requests are *not* background
+	 * requests: writeback requests (limited by the kernel's
+	 * flusher algorithm), regular (i.e., synchronous and
+	 * buffered) userspace read/write requests (limited to one per
+	 * thread), asynchronous read requests (Linux's io_submit(2)
+	 * call actually blocks, so these are also limited to one per
+	 * thread).
+	 */
+	unsigned max_background;
+
+	/**
+	 * Kernel congestion threshold parameter. If the number of pending
+	 * background requests exceeds this number, the FUSE kernel module will
+	 * mark the filesystem as "congested". This instructs the kernel to
+	 * expect that queued requests will take some time to complete, and to
+	 * adjust its algorithms accordingly (e.g. by putting a waiting thread
+	 * to sleep instead of using a busy-loop).
+	 */
+	unsigned congestion_threshold;
+
+	/**
+	 * When FUSE_CAP_WRITEBACK_CACHE is enabled, the kernel is responsible
+	 * for updating mtime and ctime when write requests are received. The
+	 * updated values are passed to the filesystem with setattr() requests.
+	 * However, if the filesystem does not support the full resolution of
+	 * the kernel timestamps (nanoseconds), the mtime and ctime values used
+	 * by kernel and filesystem will differ (and result in an apparent
+	 * change of times after a cache flush).
+	 *
+	 * To prevent this problem, this variable can be used to inform the
+	 * kernel about the timestamp granularity supported by the file-system.
+	 * The value should be power of 10.  The default is 1, i.e. full
+	 * nano-second resolution. Filesystems supporting only second resolution
+	 * should set this to 1000000000.
+	 */
+	unsigned time_gran;
+
+	/**
+	 * For future use.
+	 */
+	unsigned reserved[22];
+};
+
+struct fuse_session;
+struct fuse_pollhandle;
+struct fuse_conn_info_opts;
+
+/**
+ * This function parses several command-line options that can be used
+ * to override elements of struct fuse_conn_info. The pointer returned
+ * by this function should be passed to the
+ * fuse_apply_conn_info_opts() method by the file system's init()
+ * handler.
+ *
+ * Before using this function, think twice if you really want these
+ * parameters to be adjustable from the command line. In most cases,
+ * they should be determined by the file system internally.
+ *
+ * The following options are recognized:
+ *
+ *   -o max_write=N         sets conn->max_write
+ *   -o max_readahead=N     sets conn->max_readahead
+ *   -o max_background=N    sets conn->max_background
+ *   -o congestion_threshold=N  sets conn->congestion_threshold
+ *   -o async_read          sets FUSE_CAP_ASYNC_READ in conn->want
+ *   -o sync_read           unsets FUSE_CAP_ASYNC_READ in conn->want
+ *   -o atomic_o_trunc      sets FUSE_CAP_ATOMIC_O_TRUNC in conn->want
+ *   -o no_remote_lock      Equivalent to -o no_remote_flock,no_remote_posix_lock
+ *   -o no_remote_flock     Unsets FUSE_CAP_FLOCK_LOCKS in conn->want
+ *   -o no_remote_posix_lock  Unsets FUSE_CAP_POSIX_LOCKS in conn->want
+ *   -o [no_]splice_write     (un-)sets FUSE_CAP_SPLICE_WRITE in conn->want
+ *   -o [no_]splice_move      (un-)sets FUSE_CAP_SPLICE_MOVE in conn->want
+ *   -o [no_]splice_read      (un-)sets FUSE_CAP_SPLICE_READ in conn->want
+ *   -o [no_]auto_inval_data  (un-)sets FUSE_CAP_AUTO_INVAL_DATA in conn->want
+ *   -o readdirplus=no        unsets FUSE_CAP_READDIRPLUS in conn->want
+ *   -o readdirplus=yes       sets FUSE_CAP_READDIRPLUS and unsets
+ *                            FUSE_CAP_READDIRPLUS_AUTO in conn->want
+ *   -o readdirplus=auto      sets FUSE_CAP_READDIRPLUS and
+ *                            FUSE_CAP_READDIRPLUS_AUTO in conn->want
+ *   -o [no_]async_dio        (un-)sets FUSE_CAP_ASYNC_DIO in conn->want
+ *   -o [no_]writeback_cache  (un-)sets FUSE_CAP_WRITEBACK_CACHE in conn->want
+ *   -o time_gran=N           sets conn->time_gran
+ *
+ * Known options will be removed from *args*, unknown options will be
+ * passed through unchanged.
+ *
+ * @param args argument vector (input+output)
+ * @return parsed options
+ **/
+struct fuse_conn_info_opts* fuse_parse_conn_info_opts(struct fuse_args *args);
+
+/**
+ * This function applies the (parsed) parameters in *opts* to the
+ * *conn* pointer. It may modify the following fields: wants,
+ * max_write, max_readahead, congestion_threshold, max_background,
+ * time_gran. A field is only set (or unset) if the corresponding
+ * option has been explicitly set.
+ */
+void fuse_apply_conn_info_opts(struct fuse_conn_info_opts *opts,
+			  struct fuse_conn_info *conn);
+
+/**
+ * Go into the background
+ *
+ * @param foreground if true, stay in the foreground
+ * @return 0 on success, -1 on failure
+ */
+int fuse_daemonize(int foreground);
+
+/**
+ * Get the version of the library
+ *
+ * @return the version
+ */
+int fuse_version(void);
+
+/**
+ * Get the full package version string of the library
+ *
+ * @return the package version
+ */
+const char *fuse_pkgversion(void);
+
+/**
+ * Destroy poll handle
+ *
+ * @param ph the poll handle
+ */
+void fuse_pollhandle_destroy(struct fuse_pollhandle *ph);
+
+/* ----------------------------------------------------------- *
+ * Data buffer						       *
+ * ----------------------------------------------------------- */
+
+/**
+ * Buffer flags
+ */
+enum fuse_buf_flags {
+	/**
+	 * Buffer contains a file descriptor
+	 *
+	 * If this flag is set, the .fd field is valid, otherwise the
+	 * .mem fields is valid.
+	 */
+	FUSE_BUF_IS_FD		= (1 << 1),
+
+	/**
+	 * Seek on the file descriptor
+	 *
+	 * If this flag is set then the .pos field is valid and is
+	 * used to seek to the given offset before performing
+	 * operation on file descriptor.
+	 */
+	FUSE_BUF_FD_SEEK	= (1 << 2),
+
+	/**
+	 * Retry operation on file descriptor
+	 *
+	 * If this flag is set then retry operation on file descriptor
+	 * until .size bytes have been copied or an error or EOF is
+	 * detected.
+	 */
+	FUSE_BUF_FD_RETRY	= (1 << 3),
+};
+
+/**
+ * Buffer copy flags
+ */
+enum fuse_buf_copy_flags {
+	/**
+	 * Don't use splice(2)
+	 *
+	 * Always fall back to using read and write instead of
+	 * splice(2) to copy data from one file descriptor to another.
+	 *
+	 * If this flag is not set, then only fall back if splice is
+	 * unavailable.
+	 */
+	FUSE_BUF_NO_SPLICE	= (1 << 1),
+
+	/**
+	 * Force splice
+	 *
+	 * Always use splice(2) to copy data from one file descriptor
+	 * to another.  If splice is not available, return -EINVAL.
+	 */
+	FUSE_BUF_FORCE_SPLICE	= (1 << 2),
+
+	/**
+	 * Try to move data with splice.
+	 *
+	 * If splice is used, try to move pages from the source to the
+	 * destination instead of copying.  See documentation of
+	 * SPLICE_F_MOVE in splice(2) man page.
+	 */
+	FUSE_BUF_SPLICE_MOVE	= (1 << 3),
+
+	/**
+	 * Don't block on the pipe when copying data with splice
+	 *
+	 * Makes the operations on the pipe non-blocking (if the pipe
+	 * is full or empty).  See SPLICE_F_NONBLOCK in the splice(2)
+	 * man page.
+	 */
+	FUSE_BUF_SPLICE_NONBLOCK= (1 << 4),
+};
+
+/**
+ * Single data buffer
+ *
+ * Generic data buffer for I/O, extended attributes, etc...  Data may
+ * be supplied as a memory pointer or as a file descriptor
+ */
+struct fuse_buf {
+	/**
+	 * Size of data in bytes
+	 */
+	size_t size;
+
+	/**
+	 * Buffer flags
+	 */
+	enum fuse_buf_flags flags;
+
+	/**
+	 * Memory pointer
+	 *
+	 * Used unless FUSE_BUF_IS_FD flag is set.
+	 */
+	void *mem;
+
+	/**
+	 * File descriptor
+	 *
+	 * Used if FUSE_BUF_IS_FD flag is set.
+	 */
+	int fd;
+
+	/**
+	 * File position
+	 *
+	 * Used if FUSE_BUF_FD_SEEK flag is set.
+	 */
+	off_t pos;
+};
+
+/**
+ * Data buffer vector
+ *
+ * An array of data buffers, each containing a memory pointer or a
+ * file descriptor.
+ *
+ * Allocate dynamically to add more than one buffer.
+ */
+struct fuse_bufvec {
+	/**
+	 * Number of buffers in the array
+	 */
+	size_t count;
+
+	/**
+	 * Index of current buffer within the array
+	 */
+	size_t idx;
+
+	/**
+	 * Current offset within the current buffer
+	 */
+	size_t off;
+
+	/**
+	 * Array of buffers
+	 */
+	struct fuse_buf buf[1];
+};
+
+/* Initialize bufvec with a single buffer of given size */
+#define FUSE_BUFVEC_INIT(size__)				\
+	((struct fuse_bufvec) {					\
+		/* .count= */ 1,				\
+		/* .idx =  */ 0,				\
+		/* .off =  */ 0,				\
+		/* .buf =  */ { /* [0] = */ {			\
+			/* .size =  */ (size__),		\
+			/* .flags = */ (enum fuse_buf_flags) 0,	\
+			/* .mem =   */ NULL,			\
+			/* .fd =    */ -1,			\
+			/* .pos =   */ 0,			\
+		} }						\
+	} )
+
+/**
+ * Get total size of data in a fuse buffer vector
+ *
+ * @param bufv buffer vector
+ * @return size of data
+ */
+size_t fuse_buf_size(const struct fuse_bufvec *bufv);
+
+/**
+ * Copy data from one buffer vector to another
+ *
+ * @param dst destination buffer vector
+ * @param src source buffer vector
+ * @param flags flags controlling the copy
+ * @return actual number of bytes copied or -errno on error
+ */
+ssize_t fuse_buf_copy(struct fuse_bufvec *dst, struct fuse_bufvec *src,
+		      enum fuse_buf_copy_flags flags);
+
+/* ----------------------------------------------------------- *
+ * Signal handling					       *
+ * ----------------------------------------------------------- */
+
+/**
+ * Exit session on HUP, TERM and INT signals and ignore PIPE signal
+ *
+ * Stores session in a global variable.	 May only be called once per
+ * process until fuse_remove_signal_handlers() is called.
+ *
+ * Once either of the POSIX signals arrives, the signal handler calls
+ * fuse_session_exit().
+ *
+ * @param se the session to exit
+ * @return 0 on success, -1 on failure
+ *
+ * See also:
+ * fuse_remove_signal_handlers()
+ */
+int fuse_set_signal_handlers(struct fuse_session *se);
+
+/**
+ * Restore default signal handlers
+ *
+ * Resets global session.  After this fuse_set_signal_handlers() may
+ * be called again.
+ *
+ * @param se the same session as given in fuse_set_signal_handlers()
+ *
+ * See also:
+ * fuse_set_signal_handlers()
+ */
+void fuse_remove_signal_handlers(struct fuse_session *se);
+
+/* ----------------------------------------------------------- *
+ * Compatibility stuff					       *
+ * ----------------------------------------------------------- */
+
+#if !defined(FUSE_USE_VERSION) || FUSE_USE_VERSION < 30
+#  error only API version 30 or greater is supported
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+
+/*
+ * This interface uses 64 bit off_t.
+ *
+ * On 32bit systems please add -D_FILE_OFFSET_BITS=64 to your compile flags!
+ */
+
+#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && !defined __cplusplus
+_Static_assert(sizeof(off_t) == 8, "fuse: off_t must be 64bit");
+#else
+struct _fuse_off_t_must_be_64bit_dummy_struct \
+	{ unsigned _fuse_off_t_must_be_64bit:((sizeof(off_t) == 8) ? 1 : -1); };
+#endif
+
+#endif /* FUSE_COMMON_H_ */
diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
new file mode 100644
index 0000000000..d38b630ac5
--- /dev/null
+++ b/tools/virtiofsd/fuse_i.h
@@ -0,0 +1,139 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB
+*/
+
+#include "fuse.h"
+#include "fuse_lowlevel.h"
+
+struct mount_opts;
+
+struct fuse_req {
+	struct fuse_session *se;
+	uint64_t unique;
+	int ctr;
+	pthread_mutex_t lock;
+	struct fuse_ctx ctx;
+	struct fuse_chan *ch;
+	int interrupted;
+	unsigned int ioctl_64bit : 1;
+	union {
+		struct {
+			uint64_t unique;
+		} i;
+		struct {
+			fuse_interrupt_func_t func;
+			void *data;
+		} ni;
+	} u;
+	struct fuse_req *next;
+	struct fuse_req *prev;
+};
+
+struct fuse_notify_req {
+	uint64_t unique;
+	void (*reply)(struct fuse_notify_req *, fuse_req_t, fuse_ino_t,
+		      const void *, const struct fuse_buf *);
+	struct fuse_notify_req *next;
+	struct fuse_notify_req *prev;
+};
+
+struct fuse_session {
+	char *mountpoint;
+	volatile int exited;
+	int fd;
+	struct mount_opts *mo;
+	int debug;
+	int deny_others;
+	struct fuse_lowlevel_ops op;
+	int got_init;
+	struct cuse_data *cuse_data;
+	void *userdata;
+	uid_t owner;
+	struct fuse_conn_info conn;
+	struct fuse_req list;
+	struct fuse_req interrupts;
+	pthread_mutex_t lock;
+	int got_destroy;
+	pthread_key_t pipe_key;
+	int broken_splice_nonblock;
+	uint64_t notify_ctr;
+	struct fuse_notify_req notify_list;
+	size_t bufsize;
+	int error;
+};
+
+struct fuse_chan {
+	pthread_mutex_t lock;
+	int ctr;
+	int fd;
+};
+
+/**
+ * Filesystem module
+ *
+ * Filesystem modules are registered with the FUSE_REGISTER_MODULE()
+ * macro.
+ *
+ */
+struct fuse_module {
+	char *name;
+	fuse_module_factory_t factory;
+	struct fuse_module *next;
+	struct fusemod_so *so;
+	int ctr;
+};
+
+/* ----------------------------------------------------------- *
+ * Channel interface (when using -o clone_fd)		       *
+ * ----------------------------------------------------------- */
+
+/**
+ * Obtain counted reference to the channel
+ *
+ * @param ch the channel
+ * @return the channel
+ */
+struct fuse_chan *fuse_chan_get(struct fuse_chan *ch);
+
+/**
+ * Drop counted reference to a channel
+ *
+ * @param ch the channel
+ */
+void fuse_chan_put(struct fuse_chan *ch);
+
+struct mount_opts *parse_mount_opts(struct fuse_args *args);
+void destroy_mount_opts(struct mount_opts *mo);
+void fuse_mount_version(void);
+unsigned get_max_read(struct mount_opts *o);
+void fuse_kern_unmount(const char *mountpoint, int fd);
+int fuse_kern_mount(const char *mountpoint, struct mount_opts *mo);
+
+int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
+			       int count);
+void fuse_free_req(fuse_req_t req);
+
+void cuse_lowlevel_init(fuse_req_t req, fuse_ino_t nodeide, const void *inarg);
+
+int fuse_start_thread(pthread_t *thread_id, void *(*func)(void *), void *arg);
+
+int fuse_session_receive_buf_int(struct fuse_session *se, struct fuse_buf *buf,
+				 struct fuse_chan *ch);
+void fuse_session_process_buf_int(struct fuse_session *se,
+				  const struct fuse_buf *buf, struct fuse_chan *ch);
+
+struct fuse *fuse_new_31(struct fuse_args *args, const struct fuse_operations *op,
+		      size_t op_size, void *private_data);
+int fuse_loop_mt_32(struct fuse *f, struct fuse_loop_config *config);
+int fuse_session_loop_mt_32(struct fuse_session *se, struct fuse_loop_config *config);
+
+#define FUSE_MAX_MAX_PAGES 256
+#define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32
+
+/* room needed in buffer to accommodate header */
+#define FUSE_BUFFER_HEADER_SIZE 0x1000
+
diff --git a/tools/virtiofsd/fuse_log.h b/tools/virtiofsd/fuse_log.h
new file mode 100644
index 0000000000..5e112e0f53
--- /dev/null
+++ b/tools/virtiofsd/fuse_log.h
@@ -0,0 +1,82 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2019  Red Hat, Inc.
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB.
+*/
+
+#ifndef FUSE_LOG_H_
+#define FUSE_LOG_H_
+
+/** @file
+ *
+ * This file defines the logging interface of FUSE
+ */
+
+#include <stdarg.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Log severity level
+ *
+ * These levels correspond to syslog(2) log levels since they are widely used.
+ */
+enum fuse_log_level {
+	FUSE_LOG_EMERG,
+	FUSE_LOG_ALERT,
+	FUSE_LOG_CRIT,
+	FUSE_LOG_ERR,
+	FUSE_LOG_WARNING,
+	FUSE_LOG_NOTICE,
+	FUSE_LOG_INFO,
+	FUSE_LOG_DEBUG
+};
+
+/**
+ * Log message handler function.
+ *
+ * This function must be thread-safe.  It may be called from any libfuse
+ * function, including fuse_parse_cmdline() and other functions invoked before
+ * a FUSE filesystem is created.
+ *
+ * Install a custom log message handler function using fuse_set_log_func().
+ *
+ * @param level log severity level
+ * @param fmt sprintf-style format string including newline
+ * @param ap format string arguments
+ */
+typedef void (*fuse_log_func_t)(enum fuse_log_level level,
+				const char *fmt, va_list ap);
+
+/**
+ * Install a custom log handler function.
+ *
+ * Log messages are emitted by libfuse functions to report errors and debug
+ * information.  Messages are printed to stderr by default but this can be
+ * overridden by installing a custom log message handler function.
+ *
+ * The log message handler function is global and affects all FUSE filesystems
+ * created within this process.
+ *
+ * @param func a custom log message handler function or NULL to revert to
+ *             the default
+ */
+void fuse_set_log_func(fuse_log_func_t func);
+
+/**
+ * Emit a log message
+ *
+ * @param level severity level (FUSE_LOG_ERR, FUSE_LOG_DEBUG, etc)
+ * @param fmt sprintf-style format string including newline
+ */
+void fuse_log(enum fuse_log_level level, const char *fmt, ...);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* FUSE_LOG_H_ */
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
new file mode 100644
index 0000000000..18c6363f07
--- /dev/null
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -0,0 +1,2089 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB.
+*/
+
+#ifndef FUSE_LOWLEVEL_H_
+#define FUSE_LOWLEVEL_H_
+
+/** @file
+ *
+ * Low level API
+ *
+ * IMPORTANT: you should define FUSE_USE_VERSION before including this
+ * header.  To use the newest API define it to 31 (recommended for any
+ * new application).
+ */
+
+#ifndef FUSE_USE_VERSION
+#error FUSE_USE_VERSION not defined
+#endif
+
+#include "fuse_common.h"
+
+#include <utime.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include <sys/uio.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* ----------------------------------------------------------- *
+ * Miscellaneous definitions				       *
+ * ----------------------------------------------------------- */
+
+/** The node ID of the root inode */
+#define FUSE_ROOT_ID 1
+
+/** Inode number type */
+typedef uint64_t fuse_ino_t;
+
+/** Request pointer type */
+typedef struct fuse_req *fuse_req_t;
+
+/**
+ * Session
+ *
+ * This provides hooks for processing requests, and exiting
+ */
+struct fuse_session;
+
+/** Directory entry parameters supplied to fuse_reply_entry() */
+struct fuse_entry_param {
+	/** Unique inode number
+	 *
+	 * In lookup, zero means negative entry (from version 2.5)
+	 * Returning ENOENT also means negative entry, but by setting zero
+	 * ino the kernel may cache negative entries for entry_timeout
+	 * seconds.
+	 */
+	fuse_ino_t ino;
+
+	/** Generation number for this entry.
+	 *
+	 * If the file system will be exported over NFS, the
+	 * ino/generation pairs need to be unique over the file
+	 * system's lifetime (rather than just the mount time). So if
+	 * the file system reuses an inode after it has been deleted,
+	 * it must assign a new, previously unused generation number
+	 * to the inode at the same time.
+	 *
+	 */
+	uint64_t generation;
+
+	/** Inode attributes.
+	 *
+	 * Even if attr_timeout == 0, attr must be correct. For example,
+	 * for open(), FUSE uses attr.st_size from lookup() to determine
+	 * how many bytes to request. If this value is not correct,
+	 * incorrect data will be returned.
+	 */
+	struct stat attr;
+
+	/** Validity timeout (in seconds) for inode attributes. If
+	    attributes only change as a result of requests that come
+	    through the kernel, this should be set to a very large
+	    value. */
+	double attr_timeout;
+
+	/** Validity timeout (in seconds) for the name. If directory
+	    entries are changed/deleted only as a result of requests
+	    that come through the kernel, this should be set to a very
+	    large value. */
+	double entry_timeout;
+};
+
+/**
+ * Additional context associated with requests.
+ *
+ * Note that the reported client uid, gid and pid may be zero in some
+ * situations. For example, if the FUSE file system is running in a
+ * PID or user namespace but then accessed from outside the namespace,
+ * there is no valid uid/pid/gid that could be reported.
+ */
+struct fuse_ctx {
+	/** User ID of the calling process */
+	uid_t uid;
+
+	/** Group ID of the calling process */
+	gid_t gid;
+
+	/** Thread ID of the calling process */
+	pid_t pid;
+
+	/** Umask of the calling process */
+	mode_t umask;
+};
+
+struct fuse_forget_data {
+	fuse_ino_t ino;
+	uint64_t nlookup;
+};
+
+/* 'to_set' flags in setattr */
+#define FUSE_SET_ATTR_MODE	(1 << 0)
+#define FUSE_SET_ATTR_UID	(1 << 1)
+#define FUSE_SET_ATTR_GID	(1 << 2)
+#define FUSE_SET_ATTR_SIZE	(1 << 3)
+#define FUSE_SET_ATTR_ATIME	(1 << 4)
+#define FUSE_SET_ATTR_MTIME	(1 << 5)
+#define FUSE_SET_ATTR_ATIME_NOW	(1 << 7)
+#define FUSE_SET_ATTR_MTIME_NOW	(1 << 8)
+#define FUSE_SET_ATTR_CTIME	(1 << 10)
+
+/* ----------------------------------------------------------- *
+ * Request methods and replies				       *
+ * ----------------------------------------------------------- */
+
+/**
+ * Low level filesystem operations
+ *
+ * Most of the methods (with the exception of init and destroy)
+ * receive a request handle (fuse_req_t) as their first argument.
+ * This handle must be passed to one of the specified reply functions.
+ *
+ * This may be done inside the method invocation, or after the call
+ * has returned.  The request handle is valid until one of the reply
+ * functions is called.
+ *
+ * Other pointer arguments (name, fuse_file_info, etc) are not valid
+ * after the call has returned, so if they are needed later, their
+ * contents have to be copied.
+ *
+ * In general, all methods are expected to perform any necessary
+ * permission checking. However, a filesystem may delegate this task
+ * to the kernel by passing the `default_permissions` mount option to
+ * `fuse_session_new()`. In this case, methods will only be called if
+ * the kernel's permission check has succeeded.
+ *
+ * The filesystem sometimes needs to handle a return value of -ENOENT
+ * from the reply function, which means, that the request was
+ * interrupted, and the reply discarded.  For example if
+ * fuse_reply_open() return -ENOENT means, that the release method for
+ * this file will not be called.
+ */
+struct fuse_lowlevel_ops {
+	/**
+	 * Initialize filesystem
+	 *
+	 * This function is called when libfuse establishes
+	 * communication with the FUSE kernel module. The file system
+	 * should use this module to inspect and/or modify the
+	 * connection parameters provided in the `conn` structure.
+	 *
+	 * Note that some parameters may be overwritten by options
+	 * passed to fuse_session_new() which take precedence over the
+	 * values set in this handler.
+	 *
+	 * There's no reply to this function
+	 *
+	 * @param userdata the user data passed to fuse_session_new()
+	 */
+	void (*init) (void *userdata, struct fuse_conn_info *conn);
+
+	/**
+	 * Clean up filesystem.
+	 *
+	 * Called on filesystem exit. When this method is called, the
+	 * connection to the kernel may be gone already, so that eg. calls
+	 * to fuse_lowlevel_notify_* will fail.
+	 *
+	 * There's no reply to this function
+	 *
+	 * @param userdata the user data passed to fuse_session_new()
+	 */
+	void (*destroy) (void *userdata);
+
+	/**
+	 * Look up a directory entry by name and get its attributes.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_entry
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param parent inode number of the parent directory
+	 * @param name the name to look up
+	 */
+	void (*lookup) (fuse_req_t req, fuse_ino_t parent, const char *name);
+
+	/**
+	 * Forget about an inode
+	 *
+	 * This function is called when the kernel removes an inode
+	 * from its internal caches.
+	 *
+	 * The inode's lookup count increases by one for every call to
+	 * fuse_reply_entry and fuse_reply_create. The nlookup parameter
+	 * indicates by how much the lookup count should be decreased.
+	 *
+	 * Inodes with a non-zero lookup count may receive request from
+	 * the kernel even after calls to unlink, rmdir or (when
+	 * overwriting an existing file) rename. Filesystems must handle
+	 * such requests properly and it is recommended to defer removal
+	 * of the inode until the lookup count reaches zero. Calls to
+	 * unlink, rmdir or rename will be followed closely by forget
+	 * unless the file or directory is open, in which case the
+	 * kernel issues forget only after the release or releasedir
+	 * calls.
+	 *
+	 * Note that if a file system will be exported over NFS the
+	 * inodes lifetime must extend even beyond forget. See the
+	 * generation field in struct fuse_entry_param above.
+	 *
+	 * On unmount the lookup count for all inodes implicitly drops
+	 * to zero. It is not guaranteed that the file system will
+	 * receive corresponding forget messages for the affected
+	 * inodes.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_none
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param nlookup the number of lookups to forget
+	 */
+	void (*forget) (fuse_req_t req, fuse_ino_t ino, uint64_t nlookup);
+
+	/**
+	 * Get file attributes.
+	 *
+	 * If writeback caching is enabled, the kernel may have a
+	 * better idea of a file's length than the FUSE file system
+	 * (eg if there has been a write that extended the file size,
+	 * but that has not yet been passed to the filesystem.n
+	 *
+	 * In this case, the st_size value provided by the file system
+	 * will be ignored.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_attr
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi for future use, currently always NULL
+	 */
+	void (*getattr) (fuse_req_t req, fuse_ino_t ino,
+			 struct fuse_file_info *fi);
+
+	/**
+	 * Set file attributes
+	 *
+	 * In the 'attr' argument only members indicated by the 'to_set'
+	 * bitmask contain valid values.  Other members contain undefined
+	 * values.
+	 *
+	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+	 * expected to reset the setuid and setgid bits if the file
+	 * size or owner is being changed.
+	 *
+	 * If the setattr was invoked from the ftruncate() system call
+	 * under Linux kernel versions 2.6.15 or later, the fi->fh will
+	 * contain the value set by the open method or will be undefined
+	 * if the open method didn't set any value.  Otherwise (not
+	 * ftruncate call, or kernel version earlier than 2.6.15) the fi
+	 * parameter will be NULL.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_attr
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param attr the attributes
+	 * @param to_set bit mask of attributes which should be set
+	 * @param fi file information, or NULL
+	 */
+	void (*setattr) (fuse_req_t req, fuse_ino_t ino, struct stat *attr,
+			 int to_set, struct fuse_file_info *fi);
+
+	/**
+	 * Read symbolic link
+	 *
+	 * Valid replies:
+	 *   fuse_reply_readlink
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 */
+	void (*readlink) (fuse_req_t req, fuse_ino_t ino);
+
+	/**
+	 * Create file node
+	 *
+	 * Create a regular file, character device, block device, fifo or
+	 * socket node.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_entry
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param parent inode number of the parent directory
+	 * @param name to create
+	 * @param mode file type and mode with which to create the new file
+	 * @param rdev the device number (only valid if created file is a device)
+	 */
+	void (*mknod) (fuse_req_t req, fuse_ino_t parent, const char *name,
+		       mode_t mode, dev_t rdev);
+
+	/**
+	 * Create a directory
+	 *
+	 * Valid replies:
+	 *   fuse_reply_entry
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param parent inode number of the parent directory
+	 * @param name to create
+	 * @param mode with which to create the new file
+	 */
+	void (*mkdir) (fuse_req_t req, fuse_ino_t parent, const char *name,
+		       mode_t mode);
+
+	/**
+	 * Remove a file
+	 *
+	 * If the file's inode's lookup count is non-zero, the file
+	 * system is expected to postpone any removal of the inode
+	 * until the lookup count reaches zero (see description of the
+	 * forget function).
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param parent inode number of the parent directory
+	 * @param name to remove
+	 */
+	void (*unlink) (fuse_req_t req, fuse_ino_t parent, const char *name);
+
+	/**
+	 * Remove a directory
+	 *
+	 * If the directory's inode's lookup count is non-zero, the
+	 * file system is expected to postpone any removal of the
+	 * inode until the lookup count reaches zero (see description
+	 * of the forget function).
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param parent inode number of the parent directory
+	 * @param name to remove
+	 */
+	void (*rmdir) (fuse_req_t req, fuse_ino_t parent, const char *name);
+
+	/**
+	 * Create a symbolic link
+	 *
+	 * Valid replies:
+	 *   fuse_reply_entry
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param link the contents of the symbolic link
+	 * @param parent inode number of the parent directory
+	 * @param name to create
+	 */
+	void (*symlink) (fuse_req_t req, const char *link, fuse_ino_t parent,
+			 const char *name);
+
+	/** Rename a file
+	 *
+	 * If the target exists it should be atomically replaced. If
+	 * the target's inode's lookup count is non-zero, the file
+	 * system is expected to postpone any removal of the inode
+	 * until the lookup count reaches zero (see description of the
+	 * forget function).
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent failure with error code EINVAL, i.e. all
+	 * future bmap requests will fail with EINVAL without being
+	 * send to the filesystem process.
+	 *
+	 * *flags* may be `RENAME_EXCHANGE` or `RENAME_NOREPLACE`. If
+	 * RENAME_NOREPLACE is specified, the filesystem must not
+	 * overwrite *newname* if it exists and return an error
+	 * instead. If `RENAME_EXCHANGE` is specified, the filesystem
+	 * must atomically exchange the two files, i.e. both must
+	 * exist and neither may be deleted.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param parent inode number of the old parent directory
+	 * @param name old name
+	 * @param newparent inode number of the new parent directory
+	 * @param newname new name
+	 */
+	void (*rename) (fuse_req_t req, fuse_ino_t parent, const char *name,
+			fuse_ino_t newparent, const char *newname,
+			unsigned int flags);
+
+	/**
+	 * Create a hard link
+	 *
+	 * Valid replies:
+	 *   fuse_reply_entry
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the old inode number
+	 * @param newparent inode number of the new parent directory
+	 * @param newname new name to create
+	 */
+	void (*link) (fuse_req_t req, fuse_ino_t ino, fuse_ino_t newparent,
+		      const char *newname);
+
+	/**
+	 * Open a file
+	 *
+	 * Open flags are available in fi->flags. The following rules
+	 * apply.
+	 *
+	 *  - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be
+	 *    filtered out / handled by the kernel.
+	 *
+	 *  - Access modes (O_RDONLY, O_WRONLY, O_RDWR) should be used
+	 *    by the filesystem to check if the operation is
+	 *    permitted.  If the ``-o default_permissions`` mount
+	 *    option is given, this check is already done by the
+	 *    kernel before calling open() and may thus be omitted by
+	 *    the filesystem.
+	 *
+	 *  - When writeback caching is enabled, the kernel may send
+	 *    read requests even for files opened with O_WRONLY. The
+	 *    filesystem should be prepared to handle this.
+	 *
+	 *  - When writeback caching is disabled, the filesystem is
+	 *    expected to properly handle the O_APPEND flag and ensure
+	 *    that each write is appending to the end of the file.
+	 * 
+         *  - When writeback caching is enabled, the kernel will
+	 *    handle O_APPEND. However, unless all changes to the file
+	 *    come through the kernel this will not work reliably. The
+	 *    filesystem should thus either ignore the O_APPEND flag
+	 *    (and let the kernel handle it), or return an error
+	 *    (indicating that reliably O_APPEND is not available).
+	 *
+	 * Filesystem may store an arbitrary file handle (pointer,
+	 * index, etc) in fi->fh, and use this in other all other file
+	 * operations (read, write, flush, release, fsync).
+	 *
+	 * Filesystem may also implement stateless file I/O and not store
+	 * anything in fi->fh.
+	 *
+	 * There are also some flags (direct_io, keep_cache) which the
+	 * filesystem may set in fi, to change the way the file is opened.
+	 * See fuse_file_info structure in <fuse_common.h> for more details.
+	 *
+	 * If this request is answered with an error code of ENOSYS
+	 * and FUSE_CAP_NO_OPEN_SUPPORT is set in
+	 * `fuse_conn_info.capable`, this is treated as success and
+	 * future calls to open and release will also succeed without being
+	 * sent to the filesystem process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_open
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi file information
+	 */
+	void (*open) (fuse_req_t req, fuse_ino_t ino,
+		      struct fuse_file_info *fi);
+
+	/**
+	 * Read data
+	 *
+	 * Read should send exactly the number of bytes requested except
+	 * on EOF or error, otherwise the rest of the data will be
+	 * substituted with zeroes.  An exception to this is when the file
+	 * has been opened in 'direct_io' mode, in which case the return
+	 * value of the read system call will reflect the return value of
+	 * this operation.
+	 *
+	 * fi->fh will contain the value set by the open method, or will
+	 * be undefined if the open method didn't set any value.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_buf
+	 *   fuse_reply_iov
+	 *   fuse_reply_data
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param size number of bytes to read
+	 * @param off offset to read from
+	 * @param fi file information
+	 */
+	void (*read) (fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
+		      struct fuse_file_info *fi);
+
+	/**
+	 * Write data
+	 *
+	 * Write should return exactly the number of bytes requested
+	 * except on error.  An exception to this is when the file has
+	 * been opened in 'direct_io' mode, in which case the return value
+	 * of the write system call will reflect the return value of this
+	 * operation.
+	 *
+	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+	 * expected to reset the setuid and setgid bits.
+	 *
+	 * fi->fh will contain the value set by the open method, or will
+	 * be undefined if the open method didn't set any value.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_write
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param buf data to write
+	 * @param size number of bytes to write
+	 * @param off offset to write to
+	 * @param fi file information
+	 */
+	void (*write) (fuse_req_t req, fuse_ino_t ino, const char *buf,
+		       size_t size, off_t off, struct fuse_file_info *fi);
+
+	/**
+	 * Flush method
+	 *
+	 * This is called on each close() of the opened file.
+	 *
+	 * Since file descriptors can be duplicated (dup, dup2, fork), for
+	 * one open call there may be many flush calls.
+	 *
+	 * Filesystems shouldn't assume that flush will always be called
+	 * after some writes, or that if will be called at all.
+	 *
+	 * fi->fh will contain the value set by the open method, or will
+	 * be undefined if the open method didn't set any value.
+	 *
+	 * NOTE: the name of the method is misleading, since (unlike
+	 * fsync) the filesystem is not forced to flush pending writes.
+	 * One reason to flush data is if the filesystem wants to return
+	 * write errors during close.  However, such use is non-portable
+	 * because POSIX does not require [close] to wait for delayed I/O to
+	 * complete.
+	 *
+	 * If the filesystem supports file locking operations (setlk,
+	 * getlk) it should remove all locks belonging to 'fi->owner'.
+	 *
+	 * If this request is answered with an error code of ENOSYS,
+	 * this is treated as success and future calls to flush() will
+	 * succeed automatically without being send to the filesystem
+	 * process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi file information
+	 *
+	 * [close]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
+	 */
+	void (*flush) (fuse_req_t req, fuse_ino_t ino,
+		       struct fuse_file_info *fi);
+
+	/**
+	 * Release an open file
+	 *
+	 * Release is called when there are no more references to an open
+	 * file: all file descriptors are closed and all memory mappings
+	 * are unmapped.
+	 *
+	 * For every open call there will be exactly one release call (unless
+	 * the filesystem is force-unmounted).
+	 *
+	 * The filesystem may reply with an error, but error values are
+	 * not returned to close() or munmap() which triggered the
+	 * release.
+	 *
+	 * fi->fh will contain the value set by the open method, or will
+	 * be undefined if the open method didn't set any value.
+	 * fi->flags will contain the same flags as for open.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi file information
+	 */
+	void (*release) (fuse_req_t req, fuse_ino_t ino,
+			 struct fuse_file_info *fi);
+
+	/**
+	 * Synchronize file contents
+	 *
+	 * If the datasync parameter is non-zero, then only the user data
+	 * should be flushed, not the meta data.
+	 *
+	 * If this request is answered with an error code of ENOSYS,
+	 * this is treated as success and future calls to fsync() will
+	 * succeed automatically without being send to the filesystem
+	 * process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param datasync flag indicating if only data should be flushed
+	 * @param fi file information
+	 */
+	void (*fsync) (fuse_req_t req, fuse_ino_t ino, int datasync,
+		       struct fuse_file_info *fi);
+
+	/**
+	 * Open a directory
+	 *
+	 * Filesystem may store an arbitrary file handle (pointer, index,
+	 * etc) in fi->fh, and use this in other all other directory
+	 * stream operations (readdir, releasedir, fsyncdir).
+	 *
+	 * If this request is answered with an error code of ENOSYS and
+	 * FUSE_CAP_NO_OPENDIR_SUPPORT is set in `fuse_conn_info.capable`,
+	 * this is treated as success and future calls to opendir and
+	 * releasedir will also succeed without being sent to the filesystem
+	 * process. In addition, the kernel will cache readdir results
+	 * as if opendir returned FOPEN_KEEP_CACHE | FOPEN_CACHE_DIR.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_open
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi file information
+	 */
+	void (*opendir) (fuse_req_t req, fuse_ino_t ino,
+			 struct fuse_file_info *fi);
+
+	/**
+	 * Read directory
+	 *
+	 * Send a buffer filled using fuse_add_direntry(), with size not
+	 * exceeding the requested size.  Send an empty buffer on end of
+	 * stream.
+	 *
+	 * fi->fh will contain the value set by the opendir method, or
+	 * will be undefined if the opendir method didn't set any value.
+	 *
+	 * Returning a directory entry from readdir() does not affect
+	 * its lookup count.
+	 *
+         * If off_t is non-zero, then it will correspond to one of the off_t
+	 * values that was previously returned by readdir() for the same
+	 * directory handle. In this case, readdir() should skip over entries
+	 * coming before the position defined by the off_t value. If entries
+	 * are added or removed while the directory handle is open, they filesystem
+	 * may still include the entries that have been removed, and may not
+	 * report the entries that have been created. However, addition or
+	 * removal of entries must never cause readdir() to skip over unrelated
+	 * entries or to report them more than once. This means
+	 * that off_t can not be a simple index that enumerates the entries
+	 * that have been returned but must contain sufficient information to
+	 * uniquely determine the next directory entry to return even when the
+	 * set of entries is changing.
+	 *
+	 * The function does not have to report the '.' and '..'
+	 * entries, but is allowed to do so. Note that, if readdir does
+	 * not return '.' or '..', they will not be implicitly returned,
+	 * and this behavior is observable by the caller.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_buf
+	 *   fuse_reply_data
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param size maximum number of bytes to send
+	 * @param off offset to continue reading the directory stream
+	 * @param fi file information
+	 */
+	void (*readdir) (fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
+			 struct fuse_file_info *fi);
+
+	/**
+	 * Release an open directory
+	 *
+	 * For every opendir call there will be exactly one releasedir
+	 * call (unless the filesystem is force-unmounted).
+	 *
+	 * fi->fh will contain the value set by the opendir method, or
+	 * will be undefined if the opendir method didn't set any value.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi file information
+	 */
+	void (*releasedir) (fuse_req_t req, fuse_ino_t ino,
+			    struct fuse_file_info *fi);
+
+	/**
+	 * Synchronize directory contents
+	 *
+	 * If the datasync parameter is non-zero, then only the directory
+	 * contents should be flushed, not the meta data.
+	 *
+	 * fi->fh will contain the value set by the opendir method, or
+	 * will be undefined if the opendir method didn't set any value.
+	 *
+	 * If this request is answered with an error code of ENOSYS,
+	 * this is treated as success and future calls to fsyncdir() will
+	 * succeed automatically without being send to the filesystem
+	 * process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param datasync flag indicating if only data should be flushed
+	 * @param fi file information
+	 */
+	void (*fsyncdir) (fuse_req_t req, fuse_ino_t ino, int datasync,
+			  struct fuse_file_info *fi);
+
+	/**
+	 * Get file system statistics
+	 *
+	 * Valid replies:
+	 *   fuse_reply_statfs
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number, zero means "undefined"
+	 */
+	void (*statfs) (fuse_req_t req, fuse_ino_t ino);
+
+	/**
+	 * Set an extended attribute
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+	 * future setxattr() requests will fail with EOPNOTSUPP without being
+	 * send to the filesystem process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 */
+	void (*setxattr) (fuse_req_t req, fuse_ino_t ino, const char *name,
+			  const char *value, size_t size, int flags);
+
+	/**
+	 * Get an extended attribute
+	 *
+	 * If size is zero, the size of the value should be sent with
+	 * fuse_reply_xattr.
+	 *
+	 * If the size is non-zero, and the value fits in the buffer, the
+	 * value should be sent with fuse_reply_buf.
+	 *
+	 * If the size is too small for the value, the ERANGE error should
+	 * be sent.
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+	 * future getxattr() requests will fail with EOPNOTSUPP without being
+	 * send to the filesystem process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_buf
+	 *   fuse_reply_data
+	 *   fuse_reply_xattr
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param name of the extended attribute
+	 * @param size maximum size of the value to send
+	 */
+	void (*getxattr) (fuse_req_t req, fuse_ino_t ino, const char *name,
+			  size_t size);
+
+	/**
+	 * List extended attribute names
+	 *
+	 * If size is zero, the total size of the attribute list should be
+	 * sent with fuse_reply_xattr.
+	 *
+	 * If the size is non-zero, and the null character separated
+	 * attribute list fits in the buffer, the list should be sent with
+	 * fuse_reply_buf.
+	 *
+	 * If the size is too small for the list, the ERANGE error should
+	 * be sent.
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+	 * future listxattr() requests will fail with EOPNOTSUPP without being
+	 * send to the filesystem process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_buf
+	 *   fuse_reply_data
+	 *   fuse_reply_xattr
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param size maximum size of the list to send
+	 */
+	void (*listxattr) (fuse_req_t req, fuse_ino_t ino, size_t size);
+
+	/**
+	 * Remove an extended attribute
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+	 * future removexattr() requests will fail with EOPNOTSUPP without being
+	 * send to the filesystem process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param name of the extended attribute
+	 */
+	void (*removexattr) (fuse_req_t req, fuse_ino_t ino, const char *name);
+
+	/**
+	 * Check file access permissions
+	 *
+	 * This will be called for the access() and chdir() system
+	 * calls.  If the 'default_permissions' mount option is given,
+	 * this method is not called.
+	 *
+	 * This method is not called under Linux kernel versions 2.4.x
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent success, i.e. this and all future access()
+	 * requests will succeed without being send to the filesystem process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param mask requested access mode
+	 */
+	void (*access) (fuse_req_t req, fuse_ino_t ino, int mask);
+
+	/**
+	 * Create and open a file
+	 *
+	 * If the file does not exist, first create it with the specified
+	 * mode, and then open it.
+	 *
+	 * See the description of the open handler for more
+	 * information.
+	 *
+	 * If this method is not implemented or under Linux kernel
+	 * versions earlier than 2.6.15, the mknod() and open() methods
+	 * will be called instead.
+	 *
+	 * If this request is answered with an error code of ENOSYS, the handler
+	 * is treated as not implemented (i.e., for this and future requests the
+	 * mknod() and open() handlers will be called instead).
+	 *
+	 * Valid replies:
+	 *   fuse_reply_create
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param parent inode number of the parent directory
+	 * @param name to create
+	 * @param mode file type and mode with which to create the new file
+	 * @param fi file information
+	 */
+	void (*create) (fuse_req_t req, fuse_ino_t parent, const char *name,
+			mode_t mode, struct fuse_file_info *fi);
+
+	/**
+	 * Test for a POSIX file lock
+	 *
+	 * Valid replies:
+	 *   fuse_reply_lock
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi file information
+	 * @param lock the region/type to test
+	 */
+	void (*getlk) (fuse_req_t req, fuse_ino_t ino,
+		       struct fuse_file_info *fi, struct flock *lock);
+
+	/**
+	 * Acquire, modify or release a POSIX file lock
+	 *
+	 * For POSIX threads (NPTL) there's a 1-1 relation between pid and
+	 * owner, but otherwise this is not always the case.  For checking
+	 * lock ownership, 'fi->owner' must be used.  The l_pid field in
+	 * 'struct flock' should only be used to fill in this field in
+	 * getlk().
+	 *
+	 * Note: if the locking methods are not implemented, the kernel
+	 * will still allow file locking to work locally.  Hence these are
+	 * only interesting for network filesystems and similar.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi file information
+	 * @param lock the region/type to set
+	 * @param sleep locking operation may sleep
+	 */
+	void (*setlk) (fuse_req_t req, fuse_ino_t ino,
+		       struct fuse_file_info *fi,
+		       struct flock *lock, int sleep);
+
+	/**
+	 * Map block index within file to block index within device
+	 *
+	 * Note: This makes sense only for block device backed filesystems
+	 * mounted with the 'blkdev' option
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent failure, i.e. all future bmap() requests will
+	 * fail with the same error code without being send to the filesystem
+	 * process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_bmap
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param blocksize unit of block index
+	 * @param idx block index within file
+	 */
+	void (*bmap) (fuse_req_t req, fuse_ino_t ino, size_t blocksize,
+		      uint64_t idx);
+
+	/**
+	 * Ioctl
+	 *
+	 * Note: For unrestricted ioctls (not allowed for FUSE
+	 * servers), data in and out areas can be discovered by giving
+	 * iovs and setting FUSE_IOCTL_RETRY in *flags*.  For
+	 * restricted ioctls, kernel prepares in/out data area
+	 * according to the information encoded in cmd.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_ioctl_retry
+	 *   fuse_reply_ioctl
+	 *   fuse_reply_ioctl_iov
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param cmd ioctl command
+	 * @param arg ioctl argument
+	 * @param fi file information
+	 * @param flags for FUSE_IOCTL_* flags
+	 * @param in_buf data fetched from the caller
+	 * @param in_bufsz number of fetched bytes
+	 * @param out_bufsz maximum size of output data
+	 *
+	 * Note : the unsigned long request submitted by the application
+	 * is truncated to 32 bits.
+	 */
+	void (*ioctl) (fuse_req_t req, fuse_ino_t ino, unsigned int cmd,
+		       void *arg, struct fuse_file_info *fi, unsigned flags,
+		       const void *in_buf, size_t in_bufsz, size_t out_bufsz);
+
+	/**
+	 * Poll for IO readiness
+	 *
+	 * Note: If ph is non-NULL, the client should notify
+	 * when IO readiness events occur by calling
+	 * fuse_lowlevel_notify_poll() with the specified ph.
+	 *
+	 * Regardless of the number of times poll with a non-NULL ph
+	 * is received, single notification is enough to clear all.
+	 * Notifying more times incurs overhead but doesn't harm
+	 * correctness.
+	 *
+	 * The callee is responsible for destroying ph with
+	 * fuse_pollhandle_destroy() when no longer in use.
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as success (with a kernel-defined default poll-mask) and
+	 * future calls to pull() will succeed the same way without being send
+	 * to the filesystem process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_poll
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi file information
+	 * @param ph poll handle to be used for notification
+	 */
+	void (*poll) (fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
+		      struct fuse_pollhandle *ph);
+
+	/**
+	 * Write data made available in a buffer
+	 *
+	 * This is a more generic version of the ->write() method.  If
+	 * FUSE_CAP_SPLICE_READ is set in fuse_conn_info.want and the
+	 * kernel supports splicing from the fuse device, then the
+	 * data will be made available in pipe for supporting zero
+	 * copy data transfer.
+	 *
+	 * buf->count is guaranteed to be one (and thus buf->idx is
+	 * always zero). The write_buf handler must ensure that
+	 * bufv->off is correctly updated (reflecting the number of
+	 * bytes read from bufv->buf[0]).
+	 *
+	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+	 * expected to reset the setuid and setgid bits.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_write
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param bufv buffer containing the data
+	 * @param off offset to write to
+	 * @param fi file information
+	 */
+	void (*write_buf) (fuse_req_t req, fuse_ino_t ino,
+			   struct fuse_bufvec *bufv, off_t off,
+			   struct fuse_file_info *fi);
+
+	/**
+	 * Callback function for the retrieve request
+	 *
+	 * Valid replies:
+	 *	fuse_reply_none
+	 *
+	 * @param req request handle
+	 * @param cookie user data supplied to fuse_lowlevel_notify_retrieve()
+	 * @param ino the inode number supplied to fuse_lowlevel_notify_retrieve()
+	 * @param offset the offset supplied to fuse_lowlevel_notify_retrieve()
+	 * @param bufv the buffer containing the returned data
+	 */
+	void (*retrieve_reply) (fuse_req_t req, void *cookie, fuse_ino_t ino,
+				off_t offset, struct fuse_bufvec *bufv);
+
+	/**
+	 * Forget about multiple inodes
+	 *
+	 * See description of the forget function for more
+	 * information.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_none
+	 *
+	 * @param req request handle
+	 */
+	void (*forget_multi) (fuse_req_t req, size_t count,
+			      struct fuse_forget_data *forgets);
+
+	/**
+	 * Acquire, modify or release a BSD file lock
+	 *
+	 * Note: if the locking methods are not implemented, the kernel
+	 * will still allow file locking to work locally.  Hence these are
+	 * only interesting for network filesystems and similar.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param fi file information
+	 * @param op the locking operation, see flock(2)
+	 */
+	void (*flock) (fuse_req_t req, fuse_ino_t ino,
+		       struct fuse_file_info *fi, int op);
+
+	/**
+	 * Allocate requested space. If this function returns success then
+	 * subsequent writes to the specified range shall not fail due to the lack
+	 * of free space on the file system storage media.
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+	 * future fallocate() requests will fail with EOPNOTSUPP without being
+	 * send to the filesystem process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param offset starting point for allocated region
+	 * @param length size of allocated region
+	 * @param mode determines the operation to be performed on the given range,
+	 *             see fallocate(2)
+	 */
+	void (*fallocate) (fuse_req_t req, fuse_ino_t ino, int mode,
+		       off_t offset, off_t length, struct fuse_file_info *fi);
+
+	/**
+	 * Read directory with attributes
+	 *
+	 * Send a buffer filled using fuse_add_direntry_plus(), with size not
+	 * exceeding the requested size.  Send an empty buffer on end of
+	 * stream.
+	 *
+	 * fi->fh will contain the value set by the opendir method, or
+	 * will be undefined if the opendir method didn't set any value.
+	 *
+	 * In contrast to readdir() (which does not affect the lookup counts),
+	 * the lookup count of every entry returned by readdirplus(), except "."
+	 * and "..", is incremented by one.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_buf
+	 *   fuse_reply_data
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param size maximum number of bytes to send
+	 * @param off offset to continue reading the directory stream
+	 * @param fi file information
+	 */
+	void (*readdirplus) (fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
+			 struct fuse_file_info *fi);
+
+	/**
+	 * Copy a range of data from one file to another
+	 *
+	 * Performs an optimized copy between two file descriptors without the
+	 * additional cost of transferring data through the FUSE kernel module
+	 * to user space (glibc) and then back into the FUSE filesystem again.
+	 *
+	 * In case this method is not implemented, glibc falls back to reading
+	 * data from the source and writing to the destination. Effectively
+	 * doing an inefficient copy of the data.
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+	 * future copy_file_range() requests will fail with EOPNOTSUPP without
+	 * being send to the filesystem process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_write
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino_in the inode number or the source file
+	 * @param off_in starting point from were the data should be read
+	 * @param fi_in file information of the source file
+	 * @param ino_out the inode number or the destination file
+	 * @param off_out starting point where the data should be written
+	 * @param fi_out file information of the destination file
+	 * @param len maximum size of the data to copy
+	 * @param flags passed along with the copy_file_range() syscall
+	 */
+	void (*copy_file_range) (fuse_req_t req, fuse_ino_t ino_in,
+				 off_t off_in, struct fuse_file_info *fi_in,
+				 fuse_ino_t ino_out, off_t off_out,
+				 struct fuse_file_info *fi_out, size_t len,
+				 int flags);
+
+	/**
+	 * Find next data or hole after the specified offset
+	 *
+	 * If this request is answered with an error code of ENOSYS, this is
+	 * treated as a permanent failure, i.e. all future lseek() requests will
+	 * fail with the same error code without being send to the filesystem
+	 * process.
+	 *
+	 * Valid replies:
+	 *   fuse_reply_lseek
+	 *   fuse_reply_err
+	 *
+	 * @param req request handle
+	 * @param ino the inode number
+	 * @param off offset to start search from
+	 * @param whence either SEEK_DATA or SEEK_HOLE
+	 * @param fi file information
+	 */
+	void (*lseek) (fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
+		       struct fuse_file_info *fi);
+};
+
+/**
+ * Reply with an error code or success.
+ *
+ * Possible requests:
+ *   all except forget
+ *
+ * Whereever possible, error codes should be chosen from the list of
+ * documented error conditions in the corresponding system calls
+ * manpage.
+ *
+ * An error code of ENOSYS is sometimes treated specially. This is
+ * indicated in the documentation of the affected handler functions.
+ *
+ * The following requests may be answered with a zero error code:
+ * unlink, rmdir, rename, flush, release, fsync, fsyncdir, setxattr,
+ * removexattr, setlk.
+ *
+ * @param req request handle
+ * @param err the positive error value, or zero for success
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_err(fuse_req_t req, int err);
+
+/**
+ * Don't send reply
+ *
+ * Possible requests:
+ *   forget
+ *   forget_multi
+ *   retrieve_reply
+ *
+ * @param req request handle
+ */
+void fuse_reply_none(fuse_req_t req);
+
+/**
+ * Reply with a directory entry
+ *
+ * Possible requests:
+ *   lookup, mknod, mkdir, symlink, link
+ *
+ * Side effects:
+ *   increments the lookup count on success
+ *
+ * @param req request handle
+ * @param e the entry parameters
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_entry(fuse_req_t req, const struct fuse_entry_param *e);
+
+/**
+ * Reply with a directory entry and open parameters
+ *
+ * currently the following members of 'fi' are used:
+ *   fh, direct_io, keep_cache
+ *
+ * Possible requests:
+ *   create
+ *
+ * Side effects:
+ *   increments the lookup count on success
+ *
+ * @param req request handle
+ * @param e the entry parameters
+ * @param fi file information
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_create(fuse_req_t req, const struct fuse_entry_param *e,
+		      const struct fuse_file_info *fi);
+
+/**
+ * Reply with attributes
+ *
+ * Possible requests:
+ *   getattr, setattr
+ *
+ * @param req request handle
+ * @param attr the attributes
+ * @param attr_timeout	validity timeout (in seconds) for the attributes
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_attr(fuse_req_t req, const struct stat *attr,
+		    double attr_timeout);
+
+/**
+ * Reply with the contents of a symbolic link
+ *
+ * Possible requests:
+ *   readlink
+ *
+ * @param req request handle
+ * @param link symbolic link contents
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_readlink(fuse_req_t req, const char *link);
+
+/**
+ * Reply with open parameters
+ *
+ * currently the following members of 'fi' are used:
+ *   fh, direct_io, keep_cache
+ *
+ * Possible requests:
+ *   open, opendir
+ *
+ * @param req request handle
+ * @param fi file information
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_open(fuse_req_t req, const struct fuse_file_info *fi);
+
+/**
+ * Reply with number of bytes written
+ *
+ * Possible requests:
+ *   write
+ *
+ * @param req request handle
+ * @param count the number of bytes written
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_write(fuse_req_t req, size_t count);
+
+/**
+ * Reply with data
+ *
+ * Possible requests:
+ *   read, readdir, getxattr, listxattr
+ *
+ * @param req request handle
+ * @param buf buffer containing data
+ * @param size the size of data in bytes
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_buf(fuse_req_t req, const char *buf, size_t size);
+
+/**
+ * Reply with data copied/moved from buffer(s)
+ *
+ * Zero copy data transfer ("splicing") will be used under
+ * the following circumstances:
+ *
+ * 1. FUSE_CAP_SPLICE_WRITE is set in fuse_conn_info.want, and
+ * 2. the kernel supports splicing from the fuse device
+ *    (FUSE_CAP_SPLICE_WRITE is set in fuse_conn_info.capable), and
+ * 3. *flags* does not contain FUSE_BUF_NO_SPLICE
+ * 4. The amount of data that is provided in file-descriptor backed
+ *    buffers (i.e., buffers for which bufv[n].flags == FUSE_BUF_FD)
+ *    is at least twice the page size.
+ *
+ * In order for SPLICE_F_MOVE to be used, the following additional
+ * conditions have to be fulfilled:
+ *
+ * 1. FUSE_CAP_SPLICE_MOVE is set in fuse_conn_info.want, and
+ * 2. the kernel supports it (i.e, FUSE_CAP_SPLICE_MOVE is set in
+      fuse_conn_info.capable), and
+ * 3. *flags* contains FUSE_BUF_SPLICE_MOVE
+ *
+ * Note that, if splice is used, the data is actually spliced twice:
+ * once into a temporary pipe (to prepend header data), and then again
+ * into the kernel. If some of the provided buffers are memory-backed,
+ * the data in them is copied in step one and spliced in step two.
+ *
+ * The FUSE_BUF_SPLICE_FORCE_SPLICE and FUSE_BUF_SPLICE_NONBLOCK flags
+ * are silently ignored.
+ *
+ * Possible requests:
+ *   read, readdir, getxattr, listxattr
+ *
+ * Side effects:
+ *   when used to return data from a readdirplus() (but not readdir())
+ *   call, increments the lookup count of each returned entry by one
+ *   on success.
+ *
+ * @param req request handle
+ * @param bufv buffer vector
+ * @param flags flags controlling the copy
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_data(fuse_req_t req, struct fuse_bufvec *bufv,
+		    enum fuse_buf_copy_flags flags);
+
+/**
+ * Reply with data vector
+ *
+ * Possible requests:
+ *   read, readdir, getxattr, listxattr
+ *
+ * @param req request handle
+ * @param iov the vector containing the data
+ * @param count the size of vector
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_iov(fuse_req_t req, const struct iovec *iov, int count);
+
+/**
+ * Reply with filesystem statistics
+ *
+ * Possible requests:
+ *   statfs
+ *
+ * @param req request handle
+ * @param stbuf filesystem statistics
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_statfs(fuse_req_t req, const struct statvfs *stbuf);
+
+/**
+ * Reply with needed buffer size
+ *
+ * Possible requests:
+ *   getxattr, listxattr
+ *
+ * @param req request handle
+ * @param count the buffer size needed in bytes
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_xattr(fuse_req_t req, size_t count);
+
+/**
+ * Reply with file lock information
+ *
+ * Possible requests:
+ *   getlk
+ *
+ * @param req request handle
+ * @param lock the lock information
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_lock(fuse_req_t req, const struct flock *lock);
+
+/**
+ * Reply with block index
+ *
+ * Possible requests:
+ *   bmap
+ *
+ * @param req request handle
+ * @param idx block index within device
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_bmap(fuse_req_t req, uint64_t idx);
+
+/* ----------------------------------------------------------- *
+ * Filling a buffer in readdir				       *
+ * ----------------------------------------------------------- */
+
+/**
+ * Add a directory entry to the buffer
+ *
+ * Buffer needs to be large enough to hold the entry.  If it's not,
+ * then the entry is not filled in but the size of the entry is still
+ * returned.  The caller can check this by comparing the bufsize
+ * parameter with the returned entry size.  If the entry size is
+ * larger than the buffer size, the operation failed.
+ *
+ * From the 'stbuf' argument the st_ino field and bits 12-15 of the
+ * st_mode field are used.  The other fields are ignored.
+ *
+ * *off* should be any non-zero value that the filesystem can use to
+ * identify the current point in the directory stream. It does not
+ * need to be the actual physical position. A value of zero is
+ * reserved to mean "from the beginning", and should therefore never
+ * be used (the first call to fuse_add_direntry should be passed the
+ * offset of the second directory entry).
+ *
+ * @param req request handle
+ * @param buf the point where the new entry will be added to the buffer
+ * @param bufsize remaining size of the buffer
+ * @param name the name of the entry
+ * @param stbuf the file attributes
+ * @param off the offset of the next entry
+ * @return the space needed for the entry
+ */
+size_t fuse_add_direntry(fuse_req_t req, char *buf, size_t bufsize,
+			 const char *name, const struct stat *stbuf,
+			 off_t off);
+
+/**
+ * Add a directory entry to the buffer with the attributes
+ *
+ * See documentation of `fuse_add_direntry()` for more details.
+ *
+ * @param req request handle
+ * @param buf the point where the new entry will be added to the buffer
+ * @param bufsize remaining size of the buffer
+ * @param name the name of the entry
+ * @param e the directory entry
+ * @param off the offset of the next entry
+ * @return the space needed for the entry
+ */
+size_t fuse_add_direntry_plus(fuse_req_t req, char *buf, size_t bufsize,
+			      const char *name,
+			      const struct fuse_entry_param *e, off_t off);
+
+/**
+ * Reply to ask for data fetch and output buffer preparation.  ioctl
+ * will be retried with the specified input data fetched and output
+ * buffer prepared.
+ *
+ * Possible requests:
+ *   ioctl
+ *
+ * @param req request handle
+ * @param in_iov iovec specifying data to fetch from the caller
+ * @param in_count number of entries in in_iov
+ * @param out_iov iovec specifying addresses to write output to
+ * @param out_count number of entries in out_iov
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_ioctl_retry(fuse_req_t req,
+			   const struct iovec *in_iov, size_t in_count,
+			   const struct iovec *out_iov, size_t out_count);
+
+/**
+ * Reply to finish ioctl
+ *
+ * Possible requests:
+ *   ioctl
+ *
+ * @param req request handle
+ * @param result result to be passed to the caller
+ * @param buf buffer containing output data
+ * @param size length of output data
+ */
+int fuse_reply_ioctl(fuse_req_t req, int result, const void *buf, size_t size);
+
+/**
+ * Reply to finish ioctl with iov buffer
+ *
+ * Possible requests:
+ *   ioctl
+ *
+ * @param req request handle
+ * @param result result to be passed to the caller
+ * @param iov the vector containing the data
+ * @param count the size of vector
+ */
+int fuse_reply_ioctl_iov(fuse_req_t req, int result, const struct iovec *iov,
+			 int count);
+
+/**
+ * Reply with poll result event mask
+ *
+ * @param req request handle
+ * @param revents poll result event mask
+ */
+int fuse_reply_poll(fuse_req_t req, unsigned revents);
+
+/**
+ * Reply with offset
+ *
+ * Possible requests:
+ *   lseek
+ *
+ * @param req request handle
+ * @param off offset of next data or hole
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_lseek(fuse_req_t req, off_t off);
+
+/* ----------------------------------------------------------- *
+ * Notification						       *
+ * ----------------------------------------------------------- */
+
+/**
+ * Notify IO readiness event
+ *
+ * For more information, please read comment for poll operation.
+ *
+ * @param ph poll handle to notify IO readiness event for
+ */
+int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph);
+
+/**
+ * Notify to invalidate cache for an inode.
+ *
+ * Added in FUSE protocol version 7.12. If the kernel does not support
+ * this (or a newer) version, the function will return -ENOSYS and do
+ * nothing.
+ *
+ * If the filesystem has writeback caching enabled, invalidating an
+ * inode will first trigger a writeback of all dirty pages. The call
+ * will block until all writeback requests have completed and the
+ * inode has been invalidated. It will, however, not wait for
+ * completion of pending writeback requests that have been issued
+ * before.
+ *
+ * If there are no dirty pages, this function will never block.
+ *
+ * @param se the session object
+ * @param ino the inode number
+ * @param off the offset in the inode where to start invalidating
+ *            or negative to invalidate attributes only
+ * @param len the amount of cache to invalidate or 0 for all
+ * @return zero for success, -errno for failure
+ */
+int fuse_lowlevel_notify_inval_inode(struct fuse_session *se, fuse_ino_t ino,
+				     off_t off, off_t len);
+
+/**
+ * Notify to invalidate parent attributes and the dentry matching
+ * parent/name
+ *
+ * To avoid a deadlock this function must not be called in the
+ * execution path of a related filesytem operation or within any code
+ * that could hold a lock that could be needed to execute such an
+ * operation. As of kernel 4.18, a "related operation" is a lookup(),
+ * symlink(), mknod(), mkdir(), unlink(), rename(), link() or create()
+ * request for the parent, and a setattr(), unlink(), rmdir(),
+ * rename(), setxattr(), removexattr(), readdir() or readdirplus()
+ * request for the inode itself.
+ *
+ * When called correctly, this function will never block.
+ *
+ * Added in FUSE protocol version 7.12. If the kernel does not support
+ * this (or a newer) version, the function will return -ENOSYS and do
+ * nothing.
+ *
+ * @param se the session object
+ * @param parent inode number
+ * @param name file name
+ * @param namelen strlen() of file name
+ * @return zero for success, -errno for failure
+ */
+int fuse_lowlevel_notify_inval_entry(struct fuse_session *se, fuse_ino_t parent,
+				     const char *name, size_t namelen);
+
+/**
+ * This function behaves like fuse_lowlevel_notify_inval_entry() with
+ * the following additional effect (at least as of Linux kernel 4.8):
+ *
+ * If the provided *child* inode matches the inode that is currently
+ * associated with the cached dentry, and if there are any inotify
+ * watches registered for the dentry, then the watchers are informed
+ * that the dentry has been deleted.
+ *
+ * To avoid a deadlock this function must not be called while
+ * executing a related filesytem operation or while holding a lock
+ * that could be needed to execute such an operation (see the
+ * description of fuse_lowlevel_notify_inval_entry() for more
+ * details).
+ *
+ * When called correctly, this function will never block.
+ *
+ * Added in FUSE protocol version 7.18. If the kernel does not support
+ * this (or a newer) version, the function will return -ENOSYS and do
+ * nothing.
+ *
+ * @param se the session object
+ * @param parent inode number
+ * @param child inode number
+ * @param name file name
+ * @param namelen strlen() of file name
+ * @return zero for success, -errno for failure
+ */
+int fuse_lowlevel_notify_delete(struct fuse_session *se,
+				fuse_ino_t parent, fuse_ino_t child,
+				const char *name, size_t namelen);
+
+/**
+ * Store data to the kernel buffers
+ *
+ * Synchronously store data in the kernel buffers belonging to the
+ * given inode.  The stored data is marked up-to-date (no read will be
+ * performed against it, unless it's invalidated or evicted from the
+ * cache).
+ *
+ * If the stored data overflows the current file size, then the size
+ * is extended, similarly to a write(2) on the filesystem.
+ *
+ * If this function returns an error, then the store wasn't fully
+ * completed, but it may have been partially completed.
+ *
+ * Added in FUSE protocol version 7.15. If the kernel does not support
+ * this (or a newer) version, the function will return -ENOSYS and do
+ * nothing.
+ *
+ * @param se the session object
+ * @param ino the inode number
+ * @param offset the starting offset into the file to store to
+ * @param bufv buffer vector
+ * @param flags flags controlling the copy
+ * @return zero for success, -errno for failure
+ */
+int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
+			       off_t offset, struct fuse_bufvec *bufv,
+			       enum fuse_buf_copy_flags flags);
+/**
+ * Retrieve data from the kernel buffers
+ *
+ * Retrieve data in the kernel buffers belonging to the given inode.
+ * If successful then the retrieve_reply() method will be called with
+ * the returned data.
+ *
+ * Only present pages are returned in the retrieve reply.  Retrieving
+ * stops when it finds a non-present page and only data prior to that
+ * is returned.
+ *
+ * If this function returns an error, then the retrieve will not be
+ * completed and no reply will be sent.
+ *
+ * This function doesn't change the dirty state of pages in the kernel
+ * buffer.  For dirty pages the write() method will be called
+ * regardless of having been retrieved previously.
+ *
+ * Added in FUSE protocol version 7.15. If the kernel does not support
+ * this (or a newer) version, the function will return -ENOSYS and do
+ * nothing.
+ *
+ * @param se the session object
+ * @param ino the inode number
+ * @param size the number of bytes to retrieve
+ * @param offset the starting offset into the file to retrieve from
+ * @param cookie user data to supply to the reply callback
+ * @return zero for success, -errno for failure
+ */
+int fuse_lowlevel_notify_retrieve(struct fuse_session *se, fuse_ino_t ino,
+				  size_t size, off_t offset, void *cookie);
+
+
+/* ----------------------------------------------------------- *
+ * Utility functions					       *
+ * ----------------------------------------------------------- */
+
+/**
+ * Get the userdata from the request
+ *
+ * @param req request handle
+ * @return the user data passed to fuse_session_new()
+ */
+void *fuse_req_userdata(fuse_req_t req);
+
+/**
+ * Get the context from the request
+ *
+ * The pointer returned by this function will only be valid for the
+ * request's lifetime
+ *
+ * @param req request handle
+ * @return the context structure
+ */
+const struct fuse_ctx *fuse_req_ctx(fuse_req_t req);
+
+/**
+ * Get the current supplementary group IDs for the specified request
+ *
+ * Similar to the getgroups(2) system call, except the return value is
+ * always the total number of group IDs, even if it is larger than the
+ * specified size.
+ *
+ * The current fuse kernel module in linux (as of 2.6.30) doesn't pass
+ * the group list to userspace, hence this function needs to parse
+ * "/proc/$TID/task/$TID/status" to get the group IDs.
+ *
+ * This feature may not be supported on all operating systems.  In
+ * such a case this function will return -ENOSYS.
+ *
+ * @param req request handle
+ * @param size size of given array
+ * @param list array of group IDs to be filled in
+ * @return the total number of supplementary group IDs or -errno on failure
+ */
+int fuse_req_getgroups(fuse_req_t req, int size, gid_t list[]);
+
+/**
+ * Callback function for an interrupt
+ *
+ * @param req interrupted request
+ * @param data user data
+ */
+typedef void (*fuse_interrupt_func_t)(fuse_req_t req, void *data);
+
+/**
+ * Register/unregister callback for an interrupt
+ *
+ * If an interrupt has already happened, then the callback function is
+ * called from within this function, hence it's not possible for
+ * interrupts to be lost.
+ *
+ * @param req request handle
+ * @param func the callback function or NULL for unregister
+ * @param data user data passed to the callback function
+ */
+void fuse_req_interrupt_func(fuse_req_t req, fuse_interrupt_func_t func,
+			     void *data);
+
+/**
+ * Check if a request has already been interrupted
+ *
+ * @param req request handle
+ * @return 1 if the request has been interrupted, 0 otherwise
+ */
+int fuse_req_interrupted(fuse_req_t req);
+
+
+/* ----------------------------------------------------------- *
+ * Inquiry functions                                           *
+ * ----------------------------------------------------------- */
+
+/**
+ * Print low-level version information to stdout.
+ */
+void fuse_lowlevel_version(void);
+
+/**
+ * Print available low-level options to stdout. This is not an
+ * exhaustive list, but includes only those options that may be of
+ * interest to an end-user of a file system.
+ */
+void fuse_lowlevel_help(void);
+
+/**
+ * Print available options for `fuse_parse_cmdline()`.
+ */
+void fuse_cmdline_help(void);
+
+/* ----------------------------------------------------------- *
+ * Filesystem setup & teardown                                 *
+ * ----------------------------------------------------------- */
+
+struct fuse_cmdline_opts {
+	int singlethread;
+	int foreground;
+	int debug;
+	int nodefault_subtype;
+	char *mountpoint;
+	int show_version;
+	int show_help;
+	int clone_fd;
+	unsigned int max_idle_threads;
+};
+
+/**
+ * Utility function to parse common options for simple file systems
+ * using the low-level API. A help text that describes the available
+ * options can be printed with `fuse_cmdline_help`. A single
+ * non-option argument is treated as the mountpoint. Multiple
+ * non-option arguments will result in an error.
+ *
+ * If neither -o subtype= or -o fsname= options are given, a new
+ * subtype option will be added and set to the basename of the program
+ * (the fsname will remain unset, and then defaults to "fuse").
+ *
+ * Known options will be removed from *args*, unknown options will
+ * remain.
+ *
+ * @param args argument vector (input+output)
+ * @param opts output argument for parsed options
+ * @return 0 on success, -1 on failure
+ */
+int fuse_parse_cmdline(struct fuse_args *args,
+		       struct fuse_cmdline_opts *opts);
+
+/**
+ * Create a low level session.
+ *
+ * Returns a session structure suitable for passing to
+ * fuse_session_mount() and fuse_session_loop().
+ *
+ * This function accepts most file-system independent mount options
+ * (like context, nodev, ro - see mount(8)), as well as the general
+ * fuse mount options listed in mount.fuse(8) (e.g. -o allow_root and
+ * -o default_permissions, but not ``-o use_ino``).  Instead of `-o
+ * debug`, debugging may also enabled with `-d` or `--debug`.
+ *
+ * If not all options are known, an error message is written to stderr
+ * and the function returns NULL.
+ *
+ * Option parsing skips argv[0], which is assumed to contain the
+ * program name. To prevent accidentally passing an option in
+ * argv[0], this element must always be present (even if no options
+ * are specified). It may be set to the empty string ('\0') if no
+ * reasonable value can be provided.
+ *
+ * @param args argument vector
+ * @param op the (low-level) filesystem operations
+ * @param op_size sizeof(struct fuse_lowlevel_ops)
+ * @param userdata user data
+ *
+ * @return the fuse session on success, NULL on failure
+ **/
+struct fuse_session *fuse_session_new(struct fuse_args *args,
+				      const struct fuse_lowlevel_ops *op,
+				      size_t op_size, void *userdata);
+
+/**
+ * Mount a FUSE file system.
+ *
+ * @param mountpoint the mount point path
+ * @param se session object
+ *
+ * @return 0 on success, -1 on failure.
+ **/
+int fuse_session_mount(struct fuse_session *se, const char *mountpoint);
+
+/**
+ * Enter a single threaded, blocking event loop.
+ *
+ * When the event loop terminates because the connection to the FUSE
+ * kernel module has been closed, this function returns zero. This
+ * happens when the filesystem is unmounted regularly (by the
+ * filesystem owner or root running the umount(8) or fusermount(1)
+ * command), or if connection is explicitly severed by writing ``1``
+ * to the``abort`` file in ``/sys/fs/fuse/connections/NNN``. The only
+ * way to distinguish between these two conditions is to check if the
+ * filesystem is still mounted after the session loop returns.
+ *
+ * When some error occurs during request processing, the function
+ * returns a negated errno(3) value.
+ *
+ * If the loop has been terminated because of a signal handler
+ * installed by fuse_set_signal_handlers(), this function returns the
+ * (positive) signal value that triggered the exit.
+ *
+ * @param se the session
+ * @return 0, -errno, or a signal value
+ */
+int fuse_session_loop(struct fuse_session *se);
+
+/**
+ * Enter a multi-threaded event loop.
+ *
+ * For a description of the return value and the conditions when the
+ * event loop exits, refer to the documentation of
+ * fuse_session_loop().
+ *
+ * @param se the session
+ * @param config session loop configuration 
+ * @return see fuse_session_loop()
+ */
+#if FUSE_USE_VERSION < 32
+int fuse_session_loop_mt_31(struct fuse_session *se, int clone_fd);
+#define fuse_session_loop_mt(se, clone_fd) fuse_session_loop_mt_31(se, clone_fd)
+#else
+int fuse_session_loop_mt(struct fuse_session *se, struct fuse_loop_config *config);
+#endif
+
+/**
+ * Flag a session as terminated.
+ *
+ * This function is invoked by the POSIX signal handlers, when
+ * registered using fuse_set_signal_handlers(). It will cause any
+ * running event loops to terminate on the next opportunity.
+ *
+ * @param se the session
+ */
+void fuse_session_exit(struct fuse_session *se);
+
+/**
+ * Reset the terminated flag of a session
+ *
+ * @param se the session
+ */
+void fuse_session_reset(struct fuse_session *se);
+
+/**
+ * Query the terminated flag of a session
+ *
+ * @param se the session
+ * @return 1 if exited, 0 if not exited
+ */
+int fuse_session_exited(struct fuse_session *se);
+
+/**
+ * Ensure that file system is unmounted.
+ *
+ * In regular operation, the file system is typically unmounted by the
+ * user calling umount(8) or fusermount(1), which then terminates the
+ * FUSE session loop. However, the session loop may also terminate as
+ * a result of an explicit call to fuse_session_exit() (e.g. by a
+ * signal handler installed by fuse_set_signal_handler()). In this
+ * case the filesystem remains mounted, but any attempt to access it
+ * will block (while the filesystem process is still running) or give
+ * an ESHUTDOWN error (after the filesystem process has terminated).
+ *
+ * If the communication channel with the FUSE kernel module is still
+ * open (i.e., if the session loop was terminated by an explicit call
+ * to fuse_session_exit()), this function will close it and unmount
+ * the filesystem. If the communication channel has been closed by the
+ * kernel, this method will do (almost) nothing.
+ *
+ * NOTE: The above semantics mean that if the connection to the kernel
+ * is terminated via the ``/sys/fs/fuse/connections/NNN/abort`` file,
+ * this method will *not* unmount the filesystem.
+ *
+ * @param se the session
+ */
+void fuse_session_unmount(struct fuse_session *se);
+
+/**
+ * Destroy a session
+ *
+ * @param se the session
+ */
+void fuse_session_destroy(struct fuse_session *se);
+
+/* ----------------------------------------------------------- *
+ * Custom event loop support                                   *
+ * ----------------------------------------------------------- */
+
+/**
+ * Return file descriptor for communication with kernel.
+ *
+ * The file selector can be used to integrate FUSE with a custom event
+ * loop. Whenever data is available for reading on the provided fd,
+ * the event loop should call `fuse_session_receive_buf` followed by
+ * `fuse_session_process_buf` to process the request.
+ *
+ * The returned file descriptor is valid until `fuse_session_unmount`
+ * is called.
+ *
+ * @param se the session
+ * @return a file descriptor
+ */
+int fuse_session_fd(struct fuse_session *se);
+
+/**
+ * Process a raw request supplied in a generic buffer
+ *
+ * The fuse_buf may contain a memory buffer or a pipe file descriptor.
+ *
+ * @param se the session
+ * @param buf the fuse_buf containing the request
+ */
+void fuse_session_process_buf(struct fuse_session *se,
+			      const struct fuse_buf *buf);
+
+/**
+ * Read a raw request from the kernel into the supplied buffer.
+ *
+ * Depending on file system options, system capabilities, and request
+ * size the request is either read into a memory buffer or spliced
+ * into a temporary pipe.
+ *
+ * @param se the session
+ * @param buf the fuse_buf to store the request in
+ * @return the actual size of the raw request, or -errno on error
+ */
+int fuse_session_receive_buf(struct fuse_session *se, struct fuse_buf *buf);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_misc.h b/tools/virtiofsd/fuse_misc.h
new file mode 100644
index 0000000000..2f6663ed7d
--- /dev/null
+++ b/tools/virtiofsd/fuse_misc.h
@@ -0,0 +1,59 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB
+*/
+
+#include <pthread.h>
+
+/*
+  Versioned symbols cannot be used in some cases because it
+    - confuse the dynamic linker in uClibc
+    - not supported on MacOSX (in MachO binary format)
+*/
+#if (!defined(__UCLIBC__) && !defined(__APPLE__))
+#define FUSE_SYMVER(x) __asm__(x)
+#else
+#define FUSE_SYMVER(x)
+#endif
+
+#ifndef USE_UCLIBC
+#define fuse_mutex_init(mut) pthread_mutex_init(mut, NULL)
+#else
+/* Is this hack still needed? */
+static inline void fuse_mutex_init(pthread_mutex_t *mut)
+{
+	pthread_mutexattr_t attr;
+	pthread_mutexattr_init(&attr);
+	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ADAPTIVE_NP);
+	pthread_mutex_init(mut, &attr);
+	pthread_mutexattr_destroy(&attr);
+}
+#endif
+
+#ifdef HAVE_STRUCT_STAT_ST_ATIM
+/* Linux */
+#define ST_ATIM_NSEC(stbuf) ((stbuf)->st_atim.tv_nsec)
+#define ST_CTIM_NSEC(stbuf) ((stbuf)->st_ctim.tv_nsec)
+#define ST_MTIM_NSEC(stbuf) ((stbuf)->st_mtim.tv_nsec)
+#define ST_ATIM_NSEC_SET(stbuf, val) (stbuf)->st_atim.tv_nsec = (val)
+#define ST_CTIM_NSEC_SET(stbuf, val) (stbuf)->st_ctim.tv_nsec = (val)
+#define ST_MTIM_NSEC_SET(stbuf, val) (stbuf)->st_mtim.tv_nsec = (val)
+#elif defined(HAVE_STRUCT_STAT_ST_ATIMESPEC)
+/* FreeBSD */
+#define ST_ATIM_NSEC(stbuf) ((stbuf)->st_atimespec.tv_nsec)
+#define ST_CTIM_NSEC(stbuf) ((stbuf)->st_ctimespec.tv_nsec)
+#define ST_MTIM_NSEC(stbuf) ((stbuf)->st_mtimespec.tv_nsec)
+#define ST_ATIM_NSEC_SET(stbuf, val) (stbuf)->st_atimespec.tv_nsec = (val)
+#define ST_CTIM_NSEC_SET(stbuf, val) (stbuf)->st_ctimespec.tv_nsec = (val)
+#define ST_MTIM_NSEC_SET(stbuf, val) (stbuf)->st_mtimespec.tv_nsec = (val)
+#else
+#define ST_ATIM_NSEC(stbuf) 0
+#define ST_CTIM_NSEC(stbuf) 0
+#define ST_MTIM_NSEC(stbuf) 0
+#define ST_ATIM_NSEC_SET(stbuf, val) do { } while (0)
+#define ST_CTIM_NSEC_SET(stbuf, val) do { } while (0)
+#define ST_MTIM_NSEC_SET(stbuf, val) do { } while (0)
+#endif
diff --git a/tools/virtiofsd/fuse_opt.h b/tools/virtiofsd/fuse_opt.h
new file mode 100644
index 0000000000..d8573e74fd
--- /dev/null
+++ b/tools/virtiofsd/fuse_opt.h
@@ -0,0 +1,271 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB.
+*/
+
+#ifndef FUSE_OPT_H_
+#define FUSE_OPT_H_
+
+/** @file
+ *
+ * This file defines the option parsing interface of FUSE
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Option description
+ *
+ * This structure describes a single option, and action associated
+ * with it, in case it matches.
+ *
+ * More than one such match may occur, in which case the action for
+ * each match is executed.
+ *
+ * There are three possible actions in case of a match:
+ *
+ * i) An integer (int or unsigned) variable determined by 'offset' is
+ *    set to 'value'
+ *
+ * ii) The processing function is called, with 'value' as the key
+ *
+ * iii) An integer (any) or string (char *) variable determined by
+ *    'offset' is set to the value of an option parameter
+ *
+ * 'offset' should normally be either set to
+ *
+ *  - 'offsetof(struct foo, member)'  actions i) and iii)
+ *
+ *  - -1			      action ii)
+ *
+ * The 'offsetof()' macro is defined in the <stddef.h> header.
+ *
+ * The template determines which options match, and also have an
+ * effect on the action.  Normally the action is either i) or ii), but
+ * if a format is present in the template, then action iii) is
+ * performed.
+ *
+ * The types of templates are:
+ *
+ * 1) "-x", "-foo", "--foo", "--foo-bar", etc.	These match only
+ *   themselves.  Invalid values are "--" and anything beginning
+ *   with "-o"
+ *
+ * 2) "foo", "foo-bar", etc.  These match "-ofoo", "-ofoo-bar" or
+ *    the relevant option in a comma separated option list
+ *
+ * 3) "bar=", "--foo=", etc.  These are variations of 1) and 2)
+ *    which have a parameter
+ *
+ * 4) "bar=%s", "--foo=%lu", etc.  Same matching as above but perform
+ *    action iii).
+ *
+ * 5) "-x ", etc.  Matches either "-xparam" or "-x param" as
+ *    two separate arguments
+ *
+ * 6) "-x %s", etc.  Combination of 4) and 5)
+ *
+ * If the format is "%s", memory is allocated for the string unlike with
+ * scanf().  The previous value (if non-NULL) stored at the this location is
+ * freed.
+ */
+struct fuse_opt {
+	/** Matching template and optional parameter formatting */
+	const char *templ;
+
+	/**
+	 * Offset of variable within 'data' parameter of fuse_opt_parse()
+	 * or -1
+	 */
+	unsigned long offset;
+
+	/**
+	 * Value to set the variable to, or to be passed as 'key' to the
+	 * processing function.	 Ignored if template has a format
+	 */
+	int value;
+};
+
+/**
+ * Key option.	In case of a match, the processing function will be
+ * called with the specified key.
+ */
+#define FUSE_OPT_KEY(templ, key) { templ, -1U, key }
+
+/**
+ * Last option.	 An array of 'struct fuse_opt' must end with a NULL
+ * template value
+ */
+#define FUSE_OPT_END { NULL, 0, 0 }
+
+/**
+ * Argument list
+ */
+struct fuse_args {
+	/** Argument count */
+	int argc;
+
+	/** Argument vector.  NULL terminated */
+	char **argv;
+
+	/** Is 'argv' allocated? */
+	int allocated;
+};
+
+/**
+ * Initializer for 'struct fuse_args'
+ */
+#define FUSE_ARGS_INIT(argc, argv) { argc, argv, 0 }
+
+/**
+ * Key value passed to the processing function if an option did not
+ * match any template
+ */
+#define FUSE_OPT_KEY_OPT     -1
+
+/**
+ * Key value passed to the processing function for all non-options
+ *
+ * Non-options are the arguments beginning with a character other than
+ * '-' or all arguments after the special '--' option
+ */
+#define FUSE_OPT_KEY_NONOPT  -2
+
+/**
+ * Special key value for options to keep
+ *
+ * Argument is not passed to processing function, but behave as if the
+ * processing function returned 1
+ */
+#define FUSE_OPT_KEY_KEEP -3
+
+/**
+ * Special key value for options to discard
+ *
+ * Argument is not passed to processing function, but behave as if the
+ * processing function returned zero
+ */
+#define FUSE_OPT_KEY_DISCARD -4
+
+/**
+ * Processing function
+ *
+ * This function is called if
+ *    - option did not match any 'struct fuse_opt'
+ *    - argument is a non-option
+ *    - option did match and offset was set to -1
+ *
+ * The 'arg' parameter will always contain the whole argument or
+ * option including the parameter if exists.  A two-argument option
+ * ("-x foo") is always converted to single argument option of the
+ * form "-xfoo" before this function is called.
+ *
+ * Options of the form '-ofoo' are passed to this function without the
+ * '-o' prefix.
+ *
+ * The return value of this function determines whether this argument
+ * is to be inserted into the output argument vector, or discarded.
+ *
+ * @param data is the user data passed to the fuse_opt_parse() function
+ * @param arg is the whole argument or option
+ * @param key determines why the processing function was called
+ * @param outargs the current output argument list
+ * @return -1 on error, 0 if arg is to be discarded, 1 if arg should be kept
+ */
+typedef int (*fuse_opt_proc_t)(void *data, const char *arg, int key,
+			       struct fuse_args *outargs);
+
+/**
+ * Option parsing function
+ *
+ * If 'args' was returned from a previous call to fuse_opt_parse() or
+ * it was constructed from
+ *
+ * A NULL 'args' is equivalent to an empty argument vector
+ *
+ * A NULL 'opts' is equivalent to an 'opts' array containing a single
+ * end marker
+ *
+ * A NULL 'proc' is equivalent to a processing function always
+ * returning '1'
+ *
+ * @param args is the input and output argument list
+ * @param data is the user data
+ * @param opts is the option description array
+ * @param proc is the processing function
+ * @return -1 on error, 0 on success
+ */
+int fuse_opt_parse(struct fuse_args *args, void *data,
+		   const struct fuse_opt opts[], fuse_opt_proc_t proc);
+
+/**
+ * Add an option to a comma separated option list
+ *
+ * @param opts is a pointer to an option list, may point to a NULL value
+ * @param opt is the option to add
+ * @return -1 on allocation error, 0 on success
+ */
+int fuse_opt_add_opt(char **opts, const char *opt);
+
+/**
+ * Add an option, escaping commas, to a comma separated option list
+ *
+ * @param opts is a pointer to an option list, may point to a NULL value
+ * @param opt is the option to add
+ * @return -1 on allocation error, 0 on success
+ */
+int fuse_opt_add_opt_escaped(char **opts, const char *opt);
+
+/**
+ * Add an argument to a NULL terminated argument vector
+ *
+ * @param args is the structure containing the current argument list
+ * @param arg is the new argument to add
+ * @return -1 on allocation error, 0 on success
+ */
+int fuse_opt_add_arg(struct fuse_args *args, const char *arg);
+
+/**
+ * Add an argument at the specified position in a NULL terminated
+ * argument vector
+ *
+ * Adds the argument to the N-th position.  This is useful for adding
+ * options at the beginning of the array which must not come after the
+ * special '--' option.
+ *
+ * @param args is the structure containing the current argument list
+ * @param pos is the position at which to add the argument
+ * @param arg is the new argument to add
+ * @return -1 on allocation error, 0 on success
+ */
+int fuse_opt_insert_arg(struct fuse_args *args, int pos, const char *arg);
+
+/**
+ * Free the contents of argument list
+ *
+ * The structure itself is not freed
+ *
+ * @param args is the structure containing the argument list
+ */
+void fuse_opt_free_args(struct fuse_args *args);
+
+
+/**
+ * Check if an option matches
+ *
+ * @param opts is the option description array
+ * @param opt is the option to match
+ * @return 1 if a match is found, 0 if not
+ */
+int fuse_opt_match(const struct fuse_opt opts[], const char *opt);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* FUSE_OPT_H_ */
diff --git a/tools/virtiofsd/passthrough_helpers.h b/tools/virtiofsd/passthrough_helpers.h
new file mode 100644
index 0000000000..6b77c33600
--- /dev/null
+++ b/tools/virtiofsd/passthrough_helpers.h
@@ -0,0 +1,76 @@
+/*
+ * FUSE: Filesystem in Userspace
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE
+ */
+
+/*
+ * Creates files on the underlying file system in response to a FUSE_MKNOD
+ * operation
+ */
+static int mknod_wrapper(int dirfd, const char *path, const char *link,
+	int mode, dev_t rdev)
+{
+	int res;
+
+	if (S_ISREG(mode)) {
+		res = openat(dirfd, path, O_CREAT | O_EXCL | O_WRONLY, mode);
+		if (res >= 0)
+			res = close(res);
+	} else if (S_ISDIR(mode)) {
+		res = mkdirat(dirfd, path, mode);
+	} else if (S_ISLNK(mode) && link != NULL) {
+		res = symlinkat(link, dirfd, path);
+	} else if (S_ISFIFO(mode)) {
+		res = mkfifoat(dirfd, path, mode);
+#ifdef __FreeBSD__
+	} else if (S_ISSOCK(mode)) {
+		struct sockaddr_un su;
+		int fd;
+
+		if (strlen(path) >= sizeof(su.sun_path)) {
+			errno = ENAMETOOLONG;
+			return -1;
+		}
+		fd = socket(AF_UNIX, SOCK_STREAM, 0);
+		if (fd >= 0) {
+			/*
+			 * We must bind the socket to the underlying file
+			 * system to create the socket file, even though
+			 * we'll never listen on this socket.
+			 */
+			su.sun_family = AF_UNIX;
+			strncpy(su.sun_path, path, sizeof(su.sun_path));
+			res = bindat(dirfd, fd, (struct sockaddr*)&su,
+				sizeof(su));
+			if (res == 0)
+				close(fd);
+		} else {
+			res = -1;
+		}
+#endif
+	} else {
+		res = mknodat(dirfd, path, mode, rdev);
+	}
+
+	return res;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 002/104] virtiofsd: Pull in kernel's fuse.h
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
  2019-12-12 16:37 ` [PATCH 001/104] virtiofsd: Pull in upstream headers Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 11:56   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 003/104] virtiofsd: Add auxiliary .c's Dr. David Alan Gilbert (git)
                   ` (104 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Update scripts/update-linux-headers.sh to add fuse.h and
use it to pull in fuse.h from the kernel; from v5.5-rc1

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/standard-headers/linux/fuse.h | 891 ++++++++++++++++++++++++++
 scripts/update-linux-headers.sh       |   1 +
 2 files changed, 892 insertions(+)
 create mode 100644 include/standard-headers/linux/fuse.h

diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
new file mode 100644
index 0000000000..f4df0a40f6
--- /dev/null
+++ b/include/standard-headers/linux/fuse.h
@@ -0,0 +1,891 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/*
+    This file defines the kernel interface of FUSE
+    Copyright (C) 2001-2008  Miklos Szeredi <miklos@szeredi.hu>
+
+    This program can be distributed under the terms of the GNU GPL.
+    See the file COPYING.
+
+    This -- and only this -- header file may also be distributed under
+    the terms of the BSD Licence as follows:
+
+    Copyright (C) 2001-2007 Miklos Szeredi. All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+    1. Redistributions of source code must retain the above copyright
+       notice, this list of conditions and the following disclaimer.
+    2. Redistributions in binary form must reproduce the above copyright
+       notice, this list of conditions and the following disclaimer in the
+       documentation and/or other materials provided with the distribution.
+
+    THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+    ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+    IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+    ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
+    FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+    DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+    OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+    HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+    LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+    OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+    SUCH DAMAGE.
+*/
+
+/*
+ * This file defines the kernel interface of FUSE
+ *
+ * Protocol changelog:
+ *
+ * 7.1:
+ *  - add the following messages:
+ *      FUSE_SETATTR, FUSE_SYMLINK, FUSE_MKNOD, FUSE_MKDIR, FUSE_UNLINK,
+ *      FUSE_RMDIR, FUSE_RENAME, FUSE_LINK, FUSE_OPEN, FUSE_READ, FUSE_WRITE,
+ *      FUSE_RELEASE, FUSE_FSYNC, FUSE_FLUSH, FUSE_SETXATTR, FUSE_GETXATTR,
+ *      FUSE_LISTXATTR, FUSE_REMOVEXATTR, FUSE_OPENDIR, FUSE_READDIR,
+ *      FUSE_RELEASEDIR
+ *  - add padding to messages to accommodate 32-bit servers on 64-bit kernels
+ *
+ * 7.2:
+ *  - add FOPEN_DIRECT_IO and FOPEN_KEEP_CACHE flags
+ *  - add FUSE_FSYNCDIR message
+ *
+ * 7.3:
+ *  - add FUSE_ACCESS message
+ *  - add FUSE_CREATE message
+ *  - add filehandle to fuse_setattr_in
+ *
+ * 7.4:
+ *  - add frsize to fuse_kstatfs
+ *  - clean up request size limit checking
+ *
+ * 7.5:
+ *  - add flags and max_write to fuse_init_out
+ *
+ * 7.6:
+ *  - add max_readahead to fuse_init_in and fuse_init_out
+ *
+ * 7.7:
+ *  - add FUSE_INTERRUPT message
+ *  - add POSIX file lock support
+ *
+ * 7.8:
+ *  - add lock_owner and flags fields to fuse_release_in
+ *  - add FUSE_BMAP message
+ *  - add FUSE_DESTROY message
+ *
+ * 7.9:
+ *  - new fuse_getattr_in input argument of GETATTR
+ *  - add lk_flags in fuse_lk_in
+ *  - add lock_owner field to fuse_setattr_in, fuse_read_in and fuse_write_in
+ *  - add blksize field to fuse_attr
+ *  - add file flags field to fuse_read_in and fuse_write_in
+ *  - Add ATIME_NOW and MTIME_NOW flags to fuse_setattr_in
+ *
+ * 7.10
+ *  - add nonseekable open flag
+ *
+ * 7.11
+ *  - add IOCTL message
+ *  - add unsolicited notification support
+ *  - add POLL message and NOTIFY_POLL notification
+ *
+ * 7.12
+ *  - add umask flag to input argument of create, mknod and mkdir
+ *  - add notification messages for invalidation of inodes and
+ *    directory entries
+ *
+ * 7.13
+ *  - make max number of background requests and congestion threshold
+ *    tunables
+ *
+ * 7.14
+ *  - add splice support to fuse device
+ *
+ * 7.15
+ *  - add store notify
+ *  - add retrieve notify
+ *
+ * 7.16
+ *  - add BATCH_FORGET request
+ *  - FUSE_IOCTL_UNRESTRICTED shall now return with array of 'struct
+ *    fuse_ioctl_iovec' instead of ambiguous 'struct iovec'
+ *  - add FUSE_IOCTL_32BIT flag
+ *
+ * 7.17
+ *  - add FUSE_FLOCK_LOCKS and FUSE_RELEASE_FLOCK_UNLOCK
+ *
+ * 7.18
+ *  - add FUSE_IOCTL_DIR flag
+ *  - add FUSE_NOTIFY_DELETE
+ *
+ * 7.19
+ *  - add FUSE_FALLOCATE
+ *
+ * 7.20
+ *  - add FUSE_AUTO_INVAL_DATA
+ *
+ * 7.21
+ *  - add FUSE_READDIRPLUS
+ *  - send the requested events in POLL request
+ *
+ * 7.22
+ *  - add FUSE_ASYNC_DIO
+ *
+ * 7.23
+ *  - add FUSE_WRITEBACK_CACHE
+ *  - add time_gran to fuse_init_out
+ *  - add reserved space to fuse_init_out
+ *  - add FATTR_CTIME
+ *  - add ctime and ctimensec to fuse_setattr_in
+ *  - add FUSE_RENAME2 request
+ *  - add FUSE_NO_OPEN_SUPPORT flag
+ *
+ *  7.24
+ *  - add FUSE_LSEEK for SEEK_HOLE and SEEK_DATA support
+ *
+ *  7.25
+ *  - add FUSE_PARALLEL_DIROPS
+ *
+ *  7.26
+ *  - add FUSE_HANDLE_KILLPRIV
+ *  - add FUSE_POSIX_ACL
+ *
+ *  7.27
+ *  - add FUSE_ABORT_ERROR
+ *
+ *  7.28
+ *  - add FUSE_COPY_FILE_RANGE
+ *  - add FOPEN_CACHE_DIR
+ *  - add FUSE_MAX_PAGES, add max_pages to init_out
+ *  - add FUSE_CACHE_SYMLINKS
+ *
+ *  7.29
+ *  - add FUSE_NO_OPENDIR_SUPPORT flag
+ *
+ *  7.30
+ *  - add FUSE_EXPLICIT_INVAL_DATA
+ *  - add FUSE_IOCTL_COMPAT_X32
+ *
+ *  7.31
+ *  - add FUSE_WRITE_KILL_PRIV flag
+ *  - add FUSE_SETUPMAPPING and FUSE_REMOVEMAPPING
+ *  - add map_alignment to fuse_init_out, add FUSE_MAP_ALIGNMENT flag
+ */
+
+#ifndef _LINUX_FUSE_H
+#define _LINUX_FUSE_H
+
+#include <stdint.h>
+
+/*
+ * Version negotiation:
+ *
+ * Both the kernel and userspace send the version they support in the
+ * INIT request and reply respectively.
+ *
+ * If the major versions match then both shall use the smallest
+ * of the two minor versions for communication.
+ *
+ * If the kernel supports a larger major version, then userspace shall
+ * reply with the major version it supports, ignore the rest of the
+ * INIT message and expect a new INIT message from the kernel with a
+ * matching major version.
+ *
+ * If the library supports a larger major version, then it shall fall
+ * back to the major protocol version sent by the kernel for
+ * communication and reply with that major version (and an arbitrary
+ * supported minor version).
+ */
+
+/** Version number of this interface */
+#define FUSE_KERNEL_VERSION 7
+
+/** Minor version number of this interface */
+#define FUSE_KERNEL_MINOR_VERSION 31
+
+/** The node ID of the root inode */
+#define FUSE_ROOT_ID 1
+
+/* Make sure all structures are padded to 64bit boundary, so 32bit
+   userspace works under 64bit kernels */
+
+struct fuse_attr {
+	uint64_t	ino;
+	uint64_t	size;
+	uint64_t	blocks;
+	uint64_t	atime;
+	uint64_t	mtime;
+	uint64_t	ctime;
+	uint32_t	atimensec;
+	uint32_t	mtimensec;
+	uint32_t	ctimensec;
+	uint32_t	mode;
+	uint32_t	nlink;
+	uint32_t	uid;
+	uint32_t	gid;
+	uint32_t	rdev;
+	uint32_t	blksize;
+	uint32_t	padding;
+};
+
+struct fuse_kstatfs {
+	uint64_t	blocks;
+	uint64_t	bfree;
+	uint64_t	bavail;
+	uint64_t	files;
+	uint64_t	ffree;
+	uint32_t	bsize;
+	uint32_t	namelen;
+	uint32_t	frsize;
+	uint32_t	padding;
+	uint32_t	spare[6];
+};
+
+struct fuse_file_lock {
+	uint64_t	start;
+	uint64_t	end;
+	uint32_t	type;
+	uint32_t	pid; /* tgid */
+};
+
+/**
+ * Bitmasks for fuse_setattr_in.valid
+ */
+#define FATTR_MODE	(1 << 0)
+#define FATTR_UID	(1 << 1)
+#define FATTR_GID	(1 << 2)
+#define FATTR_SIZE	(1 << 3)
+#define FATTR_ATIME	(1 << 4)
+#define FATTR_MTIME	(1 << 5)
+#define FATTR_FH	(1 << 6)
+#define FATTR_ATIME_NOW	(1 << 7)
+#define FATTR_MTIME_NOW	(1 << 8)
+#define FATTR_LOCKOWNER	(1 << 9)
+#define FATTR_CTIME	(1 << 10)
+
+/**
+ * Flags returned by the OPEN request
+ *
+ * FOPEN_DIRECT_IO: bypass page cache for this open file
+ * FOPEN_KEEP_CACHE: don't invalidate the data cache on open
+ * FOPEN_NONSEEKABLE: the file is not seekable
+ * FOPEN_CACHE_DIR: allow caching this directory
+ * FOPEN_STREAM: the file is stream-like (no file position at all)
+ */
+#define FOPEN_DIRECT_IO		(1 << 0)
+#define FOPEN_KEEP_CACHE	(1 << 1)
+#define FOPEN_NONSEEKABLE	(1 << 2)
+#define FOPEN_CACHE_DIR		(1 << 3)
+#define FOPEN_STREAM		(1 << 4)
+
+/**
+ * INIT request/reply flags
+ *
+ * FUSE_ASYNC_READ: asynchronous read requests
+ * FUSE_POSIX_LOCKS: remote locking for POSIX file locks
+ * FUSE_FILE_OPS: kernel sends file handle for fstat, etc... (not yet supported)
+ * FUSE_ATOMIC_O_TRUNC: handles the O_TRUNC open flag in the filesystem
+ * FUSE_EXPORT_SUPPORT: filesystem handles lookups of "." and ".."
+ * FUSE_BIG_WRITES: filesystem can handle write size larger than 4kB
+ * FUSE_DONT_MASK: don't apply umask to file mode on create operations
+ * FUSE_SPLICE_WRITE: kernel supports splice write on the device
+ * FUSE_SPLICE_MOVE: kernel supports splice move on the device
+ * FUSE_SPLICE_READ: kernel supports splice read on the device
+ * FUSE_FLOCK_LOCKS: remote locking for BSD style file locks
+ * FUSE_HAS_IOCTL_DIR: kernel supports ioctl on directories
+ * FUSE_AUTO_INVAL_DATA: automatically invalidate cached pages
+ * FUSE_DO_READDIRPLUS: do READDIRPLUS (READDIR+LOOKUP in one)
+ * FUSE_READDIRPLUS_AUTO: adaptive readdirplus
+ * FUSE_ASYNC_DIO: asynchronous direct I/O submission
+ * FUSE_WRITEBACK_CACHE: use writeback cache for buffered writes
+ * FUSE_NO_OPEN_SUPPORT: kernel supports zero-message opens
+ * FUSE_PARALLEL_DIROPS: allow parallel lookups and readdir
+ * FUSE_HANDLE_KILLPRIV: fs handles killing suid/sgid/cap on write/chown/trunc
+ * FUSE_POSIX_ACL: filesystem supports posix acls
+ * FUSE_ABORT_ERROR: reading the device after abort returns ECONNABORTED
+ * FUSE_MAX_PAGES: init_out.max_pages contains the max number of req pages
+ * FUSE_CACHE_SYMLINKS: cache READLINK responses
+ * FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir
+ * FUSE_EXPLICIT_INVAL_DATA: only invalidate cached pages on explicit request
+ * FUSE_MAP_ALIGNMENT: map_alignment field is valid
+ */
+#define FUSE_ASYNC_READ		(1 << 0)
+#define FUSE_POSIX_LOCKS	(1 << 1)
+#define FUSE_FILE_OPS		(1 << 2)
+#define FUSE_ATOMIC_O_TRUNC	(1 << 3)
+#define FUSE_EXPORT_SUPPORT	(1 << 4)
+#define FUSE_BIG_WRITES		(1 << 5)
+#define FUSE_DONT_MASK		(1 << 6)
+#define FUSE_SPLICE_WRITE	(1 << 7)
+#define FUSE_SPLICE_MOVE	(1 << 8)
+#define FUSE_SPLICE_READ	(1 << 9)
+#define FUSE_FLOCK_LOCKS	(1 << 10)
+#define FUSE_HAS_IOCTL_DIR	(1 << 11)
+#define FUSE_AUTO_INVAL_DATA	(1 << 12)
+#define FUSE_DO_READDIRPLUS	(1 << 13)
+#define FUSE_READDIRPLUS_AUTO	(1 << 14)
+#define FUSE_ASYNC_DIO		(1 << 15)
+#define FUSE_WRITEBACK_CACHE	(1 << 16)
+#define FUSE_NO_OPEN_SUPPORT	(1 << 17)
+#define FUSE_PARALLEL_DIROPS    (1 << 18)
+#define FUSE_HANDLE_KILLPRIV	(1 << 19)
+#define FUSE_POSIX_ACL		(1 << 20)
+#define FUSE_ABORT_ERROR	(1 << 21)
+#define FUSE_MAX_PAGES		(1 << 22)
+#define FUSE_CACHE_SYMLINKS	(1 << 23)
+#define FUSE_NO_OPENDIR_SUPPORT (1 << 24)
+#define FUSE_EXPLICIT_INVAL_DATA (1 << 25)
+#define FUSE_MAP_ALIGNMENT	(1 << 26)
+
+/**
+ * CUSE INIT request/reply flags
+ *
+ * CUSE_UNRESTRICTED_IOCTL:  use unrestricted ioctl
+ */
+#define CUSE_UNRESTRICTED_IOCTL	(1 << 0)
+
+/**
+ * Release flags
+ */
+#define FUSE_RELEASE_FLUSH	(1 << 0)
+#define FUSE_RELEASE_FLOCK_UNLOCK	(1 << 1)
+
+/**
+ * Getattr flags
+ */
+#define FUSE_GETATTR_FH		(1 << 0)
+
+/**
+ * Lock flags
+ */
+#define FUSE_LK_FLOCK		(1 << 0)
+
+/**
+ * WRITE flags
+ *
+ * FUSE_WRITE_CACHE: delayed write from page cache, file handle is guessed
+ * FUSE_WRITE_LOCKOWNER: lock_owner field is valid
+ * FUSE_WRITE_KILL_PRIV: kill suid and sgid bits
+ */
+#define FUSE_WRITE_CACHE	(1 << 0)
+#define FUSE_WRITE_LOCKOWNER	(1 << 1)
+#define FUSE_WRITE_KILL_PRIV	(1 << 2)
+
+/**
+ * Read flags
+ */
+#define FUSE_READ_LOCKOWNER	(1 << 1)
+
+/**
+ * Ioctl flags
+ *
+ * FUSE_IOCTL_COMPAT: 32bit compat ioctl on 64bit machine
+ * FUSE_IOCTL_UNRESTRICTED: not restricted to well-formed ioctls, retry allowed
+ * FUSE_IOCTL_RETRY: retry with new iovecs
+ * FUSE_IOCTL_32BIT: 32bit ioctl
+ * FUSE_IOCTL_DIR: is a directory
+ * FUSE_IOCTL_COMPAT_X32: x32 compat ioctl on 64bit machine (64bit time_t)
+ *
+ * FUSE_IOCTL_MAX_IOV: maximum of in_iovecs + out_iovecs
+ */
+#define FUSE_IOCTL_COMPAT	(1 << 0)
+#define FUSE_IOCTL_UNRESTRICTED	(1 << 1)
+#define FUSE_IOCTL_RETRY	(1 << 2)
+#define FUSE_IOCTL_32BIT	(1 << 3)
+#define FUSE_IOCTL_DIR		(1 << 4)
+#define FUSE_IOCTL_COMPAT_X32	(1 << 5)
+
+#define FUSE_IOCTL_MAX_IOV	256
+
+/**
+ * Poll flags
+ *
+ * FUSE_POLL_SCHEDULE_NOTIFY: request poll notify
+ */
+#define FUSE_POLL_SCHEDULE_NOTIFY (1 << 0)
+
+/**
+ * Fsync flags
+ *
+ * FUSE_FSYNC_FDATASYNC: Sync data only, not metadata
+ */
+#define FUSE_FSYNC_FDATASYNC	(1 << 0)
+
+enum fuse_opcode {
+	FUSE_LOOKUP		= 1,
+	FUSE_FORGET		= 2,  /* no reply */
+	FUSE_GETATTR		= 3,
+	FUSE_SETATTR		= 4,
+	FUSE_READLINK		= 5,
+	FUSE_SYMLINK		= 6,
+	FUSE_MKNOD		= 8,
+	FUSE_MKDIR		= 9,
+	FUSE_UNLINK		= 10,
+	FUSE_RMDIR		= 11,
+	FUSE_RENAME		= 12,
+	FUSE_LINK		= 13,
+	FUSE_OPEN		= 14,
+	FUSE_READ		= 15,
+	FUSE_WRITE		= 16,
+	FUSE_STATFS		= 17,
+	FUSE_RELEASE		= 18,
+	FUSE_FSYNC		= 20,
+	FUSE_SETXATTR		= 21,
+	FUSE_GETXATTR		= 22,
+	FUSE_LISTXATTR		= 23,
+	FUSE_REMOVEXATTR	= 24,
+	FUSE_FLUSH		= 25,
+	FUSE_INIT		= 26,
+	FUSE_OPENDIR		= 27,
+	FUSE_READDIR		= 28,
+	FUSE_RELEASEDIR		= 29,
+	FUSE_FSYNCDIR		= 30,
+	FUSE_GETLK		= 31,
+	FUSE_SETLK		= 32,
+	FUSE_SETLKW		= 33,
+	FUSE_ACCESS		= 34,
+	FUSE_CREATE		= 35,
+	FUSE_INTERRUPT		= 36,
+	FUSE_BMAP		= 37,
+	FUSE_DESTROY		= 38,
+	FUSE_IOCTL		= 39,
+	FUSE_POLL		= 40,
+	FUSE_NOTIFY_REPLY	= 41,
+	FUSE_BATCH_FORGET	= 42,
+	FUSE_FALLOCATE		= 43,
+	FUSE_READDIRPLUS	= 44,
+	FUSE_RENAME2		= 45,
+	FUSE_LSEEK		= 46,
+	FUSE_COPY_FILE_RANGE	= 47,
+	FUSE_SETUPMAPPING	= 48,
+	FUSE_REMOVEMAPPING	= 49,
+
+	/* CUSE specific operations */
+	CUSE_INIT		= 4096,
+
+	/* Reserved opcodes: helpful to detect structure endian-ness */
+	CUSE_INIT_BSWAP_RESERVED	= 1048576,	/* CUSE_INIT << 8 */
+	FUSE_INIT_BSWAP_RESERVED	= 436207616,	/* FUSE_INIT << 24 */
+};
+
+enum fuse_notify_code {
+	FUSE_NOTIFY_POLL   = 1,
+	FUSE_NOTIFY_INVAL_INODE = 2,
+	FUSE_NOTIFY_INVAL_ENTRY = 3,
+	FUSE_NOTIFY_STORE = 4,
+	FUSE_NOTIFY_RETRIEVE = 5,
+	FUSE_NOTIFY_DELETE = 6,
+	FUSE_NOTIFY_CODE_MAX,
+};
+
+/* The read buffer is required to be at least 8k, but may be much larger */
+#define FUSE_MIN_READ_BUFFER 8192
+
+#define FUSE_COMPAT_ENTRY_OUT_SIZE 120
+
+struct fuse_entry_out {
+	uint64_t	nodeid;		/* Inode ID */
+	uint64_t	generation;	/* Inode generation: nodeid:gen must
+					   be unique for the fs's lifetime */
+	uint64_t	entry_valid;	/* Cache timeout for the name */
+	uint64_t	attr_valid;	/* Cache timeout for the attributes */
+	uint32_t	entry_valid_nsec;
+	uint32_t	attr_valid_nsec;
+	struct fuse_attr attr;
+};
+
+struct fuse_forget_in {
+	uint64_t	nlookup;
+};
+
+struct fuse_forget_one {
+	uint64_t	nodeid;
+	uint64_t	nlookup;
+};
+
+struct fuse_batch_forget_in {
+	uint32_t	count;
+	uint32_t	dummy;
+};
+
+struct fuse_getattr_in {
+	uint32_t	getattr_flags;
+	uint32_t	dummy;
+	uint64_t	fh;
+};
+
+#define FUSE_COMPAT_ATTR_OUT_SIZE 96
+
+struct fuse_attr_out {
+	uint64_t	attr_valid;	/* Cache timeout for the attributes */
+	uint32_t	attr_valid_nsec;
+	uint32_t	dummy;
+	struct fuse_attr attr;
+};
+
+#define FUSE_COMPAT_MKNOD_IN_SIZE 8
+
+struct fuse_mknod_in {
+	uint32_t	mode;
+	uint32_t	rdev;
+	uint32_t	umask;
+	uint32_t	padding;
+};
+
+struct fuse_mkdir_in {
+	uint32_t	mode;
+	uint32_t	umask;
+};
+
+struct fuse_rename_in {
+	uint64_t	newdir;
+};
+
+struct fuse_rename2_in {
+	uint64_t	newdir;
+	uint32_t	flags;
+	uint32_t	padding;
+};
+
+struct fuse_link_in {
+	uint64_t	oldnodeid;
+};
+
+struct fuse_setattr_in {
+	uint32_t	valid;
+	uint32_t	padding;
+	uint64_t	fh;
+	uint64_t	size;
+	uint64_t	lock_owner;
+	uint64_t	atime;
+	uint64_t	mtime;
+	uint64_t	ctime;
+	uint32_t	atimensec;
+	uint32_t	mtimensec;
+	uint32_t	ctimensec;
+	uint32_t	mode;
+	uint32_t	unused4;
+	uint32_t	uid;
+	uint32_t	gid;
+	uint32_t	unused5;
+};
+
+struct fuse_open_in {
+	uint32_t	flags;
+	uint32_t	unused;
+};
+
+struct fuse_create_in {
+	uint32_t	flags;
+	uint32_t	mode;
+	uint32_t	umask;
+	uint32_t	padding;
+};
+
+struct fuse_open_out {
+	uint64_t	fh;
+	uint32_t	open_flags;
+	uint32_t	padding;
+};
+
+struct fuse_release_in {
+	uint64_t	fh;
+	uint32_t	flags;
+	uint32_t	release_flags;
+	uint64_t	lock_owner;
+};
+
+struct fuse_flush_in {
+	uint64_t	fh;
+	uint32_t	unused;
+	uint32_t	padding;
+	uint64_t	lock_owner;
+};
+
+struct fuse_read_in {
+	uint64_t	fh;
+	uint64_t	offset;
+	uint32_t	size;
+	uint32_t	read_flags;
+	uint64_t	lock_owner;
+	uint32_t	flags;
+	uint32_t	padding;
+};
+
+#define FUSE_COMPAT_WRITE_IN_SIZE 24
+
+struct fuse_write_in {
+	uint64_t	fh;
+	uint64_t	offset;
+	uint32_t	size;
+	uint32_t	write_flags;
+	uint64_t	lock_owner;
+	uint32_t	flags;
+	uint32_t	padding;
+};
+
+struct fuse_write_out {
+	uint32_t	size;
+	uint32_t	padding;
+};
+
+#define FUSE_COMPAT_STATFS_SIZE 48
+
+struct fuse_statfs_out {
+	struct fuse_kstatfs st;
+};
+
+struct fuse_fsync_in {
+	uint64_t	fh;
+	uint32_t	fsync_flags;
+	uint32_t	padding;
+};
+
+struct fuse_setxattr_in {
+	uint32_t	size;
+	uint32_t	flags;
+};
+
+struct fuse_getxattr_in {
+	uint32_t	size;
+	uint32_t	padding;
+};
+
+struct fuse_getxattr_out {
+	uint32_t	size;
+	uint32_t	padding;
+};
+
+struct fuse_lk_in {
+	uint64_t	fh;
+	uint64_t	owner;
+	struct fuse_file_lock lk;
+	uint32_t	lk_flags;
+	uint32_t	padding;
+};
+
+struct fuse_lk_out {
+	struct fuse_file_lock lk;
+};
+
+struct fuse_access_in {
+	uint32_t	mask;
+	uint32_t	padding;
+};
+
+struct fuse_init_in {
+	uint32_t	major;
+	uint32_t	minor;
+	uint32_t	max_readahead;
+	uint32_t	flags;
+};
+
+#define FUSE_COMPAT_INIT_OUT_SIZE 8
+#define FUSE_COMPAT_22_INIT_OUT_SIZE 24
+
+struct fuse_init_out {
+	uint32_t	major;
+	uint32_t	minor;
+	uint32_t	max_readahead;
+	uint32_t	flags;
+	uint16_t	max_background;
+	uint16_t	congestion_threshold;
+	uint32_t	max_write;
+	uint32_t	time_gran;
+	uint16_t	max_pages;
+	uint16_t	map_alignment;
+	uint32_t	unused[8];
+};
+
+#define CUSE_INIT_INFO_MAX 4096
+
+struct cuse_init_in {
+	uint32_t	major;
+	uint32_t	minor;
+	uint32_t	unused;
+	uint32_t	flags;
+};
+
+struct cuse_init_out {
+	uint32_t	major;
+	uint32_t	minor;
+	uint32_t	unused;
+	uint32_t	flags;
+	uint32_t	max_read;
+	uint32_t	max_write;
+	uint32_t	dev_major;		/* chardev major */
+	uint32_t	dev_minor;		/* chardev minor */
+	uint32_t	spare[10];
+};
+
+struct fuse_interrupt_in {
+	uint64_t	unique;
+};
+
+struct fuse_bmap_in {
+	uint64_t	block;
+	uint32_t	blocksize;
+	uint32_t	padding;
+};
+
+struct fuse_bmap_out {
+	uint64_t	block;
+};
+
+struct fuse_ioctl_in {
+	uint64_t	fh;
+	uint32_t	flags;
+	uint32_t	cmd;
+	uint64_t	arg;
+	uint32_t	in_size;
+	uint32_t	out_size;
+};
+
+struct fuse_ioctl_iovec {
+	uint64_t	base;
+	uint64_t	len;
+};
+
+struct fuse_ioctl_out {
+	int32_t		result;
+	uint32_t	flags;
+	uint32_t	in_iovs;
+	uint32_t	out_iovs;
+};
+
+struct fuse_poll_in {
+	uint64_t	fh;
+	uint64_t	kh;
+	uint32_t	flags;
+	uint32_t	events;
+};
+
+struct fuse_poll_out {
+	uint32_t	revents;
+	uint32_t	padding;
+};
+
+struct fuse_notify_poll_wakeup_out {
+	uint64_t	kh;
+};
+
+struct fuse_fallocate_in {
+	uint64_t	fh;
+	uint64_t	offset;
+	uint64_t	length;
+	uint32_t	mode;
+	uint32_t	padding;
+};
+
+struct fuse_in_header {
+	uint32_t	len;
+	uint32_t	opcode;
+	uint64_t	unique;
+	uint64_t	nodeid;
+	uint32_t	uid;
+	uint32_t	gid;
+	uint32_t	pid;
+	uint32_t	padding;
+};
+
+struct fuse_out_header {
+	uint32_t	len;
+	int32_t		error;
+	uint64_t	unique;
+};
+
+struct fuse_dirent {
+	uint64_t	ino;
+	uint64_t	off;
+	uint32_t	namelen;
+	uint32_t	type;
+	char name[];
+};
+
+#define FUSE_NAME_OFFSET offsetof(struct fuse_dirent, name)
+#define FUSE_DIRENT_ALIGN(x) \
+	(((x) + sizeof(uint64_t) - 1) & ~(sizeof(uint64_t) - 1))
+#define FUSE_DIRENT_SIZE(d) \
+	FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET + (d)->namelen)
+
+struct fuse_direntplus {
+	struct fuse_entry_out entry_out;
+	struct fuse_dirent dirent;
+};
+
+#define FUSE_NAME_OFFSET_DIRENTPLUS \
+	offsetof(struct fuse_direntplus, dirent.name)
+#define FUSE_DIRENTPLUS_SIZE(d) \
+	FUSE_DIRENT_ALIGN(FUSE_NAME_OFFSET_DIRENTPLUS + (d)->dirent.namelen)
+
+struct fuse_notify_inval_inode_out {
+	uint64_t	ino;
+	int64_t		off;
+	int64_t		len;
+};
+
+struct fuse_notify_inval_entry_out {
+	uint64_t	parent;
+	uint32_t	namelen;
+	uint32_t	padding;
+};
+
+struct fuse_notify_delete_out {
+	uint64_t	parent;
+	uint64_t	child;
+	uint32_t	namelen;
+	uint32_t	padding;
+};
+
+struct fuse_notify_store_out {
+	uint64_t	nodeid;
+	uint64_t	offset;
+	uint32_t	size;
+	uint32_t	padding;
+};
+
+struct fuse_notify_retrieve_out {
+	uint64_t	notify_unique;
+	uint64_t	nodeid;
+	uint64_t	offset;
+	uint32_t	size;
+	uint32_t	padding;
+};
+
+/* Matches the size of fuse_write_in */
+struct fuse_notify_retrieve_in {
+	uint64_t	dummy1;
+	uint64_t	offset;
+	uint32_t	size;
+	uint32_t	dummy2;
+	uint64_t	dummy3;
+	uint64_t	dummy4;
+};
+
+/* Device ioctls: */
+#define FUSE_DEV_IOC_CLONE	_IOR(229, 0, uint32_t)
+
+struct fuse_lseek_in {
+	uint64_t	fh;
+	uint64_t	offset;
+	uint32_t	whence;
+	uint32_t	padding;
+};
+
+struct fuse_lseek_out {
+	uint64_t	offset;
+};
+
+struct fuse_copy_file_range_in {
+	uint64_t	fh_in;
+	uint64_t	off_in;
+	uint64_t	nodeid_out;
+	uint64_t	fh_out;
+	uint64_t	off_out;
+	uint64_t	len;
+	uint64_t	flags;
+};
+
+#endif /* _LINUX_FUSE_H */
diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index f76d77363b..29c27f4681 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -186,6 +186,7 @@ rm -rf "$output/include/standard-headers/linux"
 mkdir -p "$output/include/standard-headers/linux"
 for i in "$tmpdir"/include/linux/*virtio*.h \
          "$tmpdir/include/linux/qemu_fw_cfg.h" \
+         "$tmpdir/include/linux/fuse.h" \
          "$tmpdir/include/linux/input.h" \
          "$tmpdir/include/linux/input-event-codes.h" \
          "$tmpdir/include/linux/pci_regs.h" \
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 003/104] virtiofsd: Add auxiliary .c's
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
  2019-12-12 16:37 ` [PATCH 001/104] virtiofsd: Pull in upstream headers Dr. David Alan Gilbert (git)
  2019-12-12 16:37 ` [PATCH 002/104] virtiofsd: Pull in kernel's fuse.h Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 11:57   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 004/104] virtiofsd: Add fuse_lowlevel.c Dr. David Alan Gilbert (git)
                   ` (103 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add most of the non-main .c files we need from upstream fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/buffer.c       | 321 ++++++++++++++++++++++++
 tools/virtiofsd/fuse_log.c     |  40 +++
 tools/virtiofsd/fuse_loop_mt.c | 362 +++++++++++++++++++++++++++
 tools/virtiofsd/fuse_opt.c     | 423 +++++++++++++++++++++++++++++++
 tools/virtiofsd/fuse_signals.c |  91 +++++++
 tools/virtiofsd/helper.c       | 440 +++++++++++++++++++++++++++++++++
 6 files changed, 1677 insertions(+)
 create mode 100644 tools/virtiofsd/buffer.c
 create mode 100644 tools/virtiofsd/fuse_log.c
 create mode 100644 tools/virtiofsd/fuse_loop_mt.c
 create mode 100644 tools/virtiofsd/fuse_opt.c
 create mode 100644 tools/virtiofsd/fuse_signals.c
 create mode 100644 tools/virtiofsd/helper.c

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
new file mode 100644
index 0000000000..5ab9b87455
--- /dev/null
+++ b/tools/virtiofsd/buffer.c
@@ -0,0 +1,321 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2010  Miklos Szeredi <miklos@szeredi.hu>
+
+  Functions for dealing with `struct fuse_buf` and `struct
+  fuse_bufvec`.
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB
+*/
+
+#define _GNU_SOURCE
+
+#include "config.h"
+#include "fuse_i.h"
+#include "fuse_lowlevel.h"
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <assert.h>
+
+size_t fuse_buf_size(const struct fuse_bufvec *bufv)
+{
+	size_t i;
+	size_t size = 0;
+
+	for (i = 0; i < bufv->count; i++) {
+		if (bufv->buf[i].size == SIZE_MAX)
+			size = SIZE_MAX;
+		else
+			size += bufv->buf[i].size;
+	}
+
+	return size;
+}
+
+static size_t min_size(size_t s1, size_t s2)
+{
+	return s1 < s2 ? s1 : s2;
+}
+
+static ssize_t fuse_buf_write(const struct fuse_buf *dst, size_t dst_off,
+			      const struct fuse_buf *src, size_t src_off,
+			      size_t len)
+{
+	ssize_t res = 0;
+	size_t copied = 0;
+
+	while (len) {
+		if (dst->flags & FUSE_BUF_FD_SEEK) {
+			res = pwrite(dst->fd, (char *)src->mem + src_off, len,
+				     dst->pos + dst_off);
+		} else {
+			res = write(dst->fd, (char *)src->mem + src_off, len);
+		}
+		if (res == -1) {
+			if (!copied)
+				return -errno;
+			break;
+		}
+		if (res == 0)
+			break;
+
+		copied += res;
+		if (!(dst->flags & FUSE_BUF_FD_RETRY))
+			break;
+
+		src_off += res;
+		dst_off += res;
+		len -= res;
+	}
+
+	return copied;
+}
+
+static ssize_t fuse_buf_read(const struct fuse_buf *dst, size_t dst_off,
+			     const struct fuse_buf *src, size_t src_off,
+			     size_t len)
+{
+	ssize_t res = 0;
+	size_t copied = 0;
+
+	while (len) {
+		if (src->flags & FUSE_BUF_FD_SEEK) {
+			res = pread(src->fd, (char *)dst->mem + dst_off, len,
+				     src->pos + src_off);
+		} else {
+			res = read(src->fd, (char *)dst->mem + dst_off, len);
+		}
+		if (res == -1) {
+			if (!copied)
+				return -errno;
+			break;
+		}
+		if (res == 0)
+			break;
+
+		copied += res;
+		if (!(src->flags & FUSE_BUF_FD_RETRY))
+			break;
+
+		dst_off += res;
+		src_off += res;
+		len -= res;
+	}
+
+	return copied;
+}
+
+static ssize_t fuse_buf_fd_to_fd(const struct fuse_buf *dst, size_t dst_off,
+				 const struct fuse_buf *src, size_t src_off,
+				 size_t len)
+{
+	char buf[4096];
+	struct fuse_buf tmp = {
+		.size = sizeof(buf),
+		.flags = 0,
+	};
+	ssize_t res;
+	size_t copied = 0;
+
+	tmp.mem = buf;
+
+	while (len) {
+		size_t this_len = min_size(tmp.size, len);
+		size_t read_len;
+
+		res = fuse_buf_read(&tmp, 0, src, src_off, this_len);
+		if (res < 0) {
+			if (!copied)
+				return res;
+			break;
+		}
+		if (res == 0)
+			break;
+
+		read_len = res;
+		res = fuse_buf_write(dst, dst_off, &tmp, 0, read_len);
+		if (res < 0) {
+			if (!copied)
+				return res;
+			break;
+		}
+		if (res == 0)
+			break;
+
+		copied += res;
+
+		if (res < this_len)
+			break;
+
+		dst_off += res;
+		src_off += res;
+		len -= res;
+	}
+
+	return copied;
+}
+
+#ifdef HAVE_SPLICE
+static ssize_t fuse_buf_splice(const struct fuse_buf *dst, size_t dst_off,
+			       const struct fuse_buf *src, size_t src_off,
+			       size_t len, enum fuse_buf_copy_flags flags)
+{
+	int splice_flags = 0;
+	off_t *srcpos = NULL;
+	off_t *dstpos = NULL;
+	off_t srcpos_val;
+	off_t dstpos_val;
+	ssize_t res;
+	size_t copied = 0;
+
+	if (flags & FUSE_BUF_SPLICE_MOVE)
+		splice_flags |= SPLICE_F_MOVE;
+	if (flags & FUSE_BUF_SPLICE_NONBLOCK)
+		splice_flags |= SPLICE_F_NONBLOCK;
+
+	if (src->flags & FUSE_BUF_FD_SEEK) {
+		srcpos_val = src->pos + src_off;
+		srcpos = &srcpos_val;
+	}
+	if (dst->flags & FUSE_BUF_FD_SEEK) {
+		dstpos_val = dst->pos + dst_off;
+		dstpos = &dstpos_val;
+	}
+
+	while (len) {
+		res = splice(src->fd, srcpos, dst->fd, dstpos, len,
+			     splice_flags);
+		if (res == -1) {
+			if (copied)
+				break;
+
+			if (errno != EINVAL || (flags & FUSE_BUF_FORCE_SPLICE))
+				return -errno;
+
+			/* Maybe splice is not supported for this combination */
+			return fuse_buf_fd_to_fd(dst, dst_off, src, src_off,
+						 len);
+		}
+		if (res == 0)
+			break;
+
+		copied += res;
+		if (!(src->flags & FUSE_BUF_FD_RETRY) &&
+		    !(dst->flags & FUSE_BUF_FD_RETRY)) {
+			break;
+		}
+
+		len -= res;
+	}
+
+	return copied;
+}
+#else
+static ssize_t fuse_buf_splice(const struct fuse_buf *dst, size_t dst_off,
+			       const struct fuse_buf *src, size_t src_off,
+			       size_t len, enum fuse_buf_copy_flags flags)
+{
+	(void) flags;
+
+	return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
+}
+#endif
+
+
+static ssize_t fuse_buf_copy_one(const struct fuse_buf *dst, size_t dst_off,
+				 const struct fuse_buf *src, size_t src_off,
+				 size_t len, enum fuse_buf_copy_flags flags)
+{
+	int src_is_fd = src->flags & FUSE_BUF_IS_FD;
+	int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
+
+	if (!src_is_fd && !dst_is_fd) {
+		char *dstmem = (char *)dst->mem + dst_off;
+		char *srcmem = (char *)src->mem + src_off;
+
+		if (dstmem != srcmem) {
+			if (dstmem + len <= srcmem || srcmem + len <= dstmem)
+				memcpy(dstmem, srcmem, len);
+			else
+				memmove(dstmem, srcmem, len);
+		}
+
+		return len;
+	} else if (!src_is_fd) {
+		return fuse_buf_write(dst, dst_off, src, src_off, len);
+	} else if (!dst_is_fd) {
+		return fuse_buf_read(dst, dst_off, src, src_off, len);
+	} else if (flags & FUSE_BUF_NO_SPLICE) {
+		return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
+	} else {
+		return fuse_buf_splice(dst, dst_off, src, src_off, len, flags);
+	}
+}
+
+static const struct fuse_buf *fuse_bufvec_current(struct fuse_bufvec *bufv)
+{
+	if (bufv->idx < bufv->count)
+		return &bufv->buf[bufv->idx];
+	else
+		return NULL;
+}
+
+static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
+{
+	const struct fuse_buf *buf = fuse_bufvec_current(bufv);
+
+	bufv->off += len;
+	assert(bufv->off <= buf->size);
+	if (bufv->off == buf->size) {
+		assert(bufv->idx < bufv->count);
+		bufv->idx++;
+		if (bufv->idx == bufv->count)
+			return 0;
+		bufv->off = 0;
+	}
+	return 1;
+}
+
+ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
+		      enum fuse_buf_copy_flags flags)
+{
+	size_t copied = 0;
+
+	if (dstv == srcv)
+		return fuse_buf_size(dstv);
+
+	for (;;) {
+		const struct fuse_buf *src = fuse_bufvec_current(srcv);
+		const struct fuse_buf *dst = fuse_bufvec_current(dstv);
+		size_t src_len;
+		size_t dst_len;
+		size_t len;
+		ssize_t res;
+
+		if (src == NULL || dst == NULL)
+			break;
+
+		src_len = src->size - srcv->off;
+		dst_len = dst->size - dstv->off;
+		len = min_size(src_len, dst_len);
+
+		res = fuse_buf_copy_one(dst, dstv->off, src, srcv->off, len, flags);
+		if (res < 0) {
+			if (!copied)
+				return res;
+			break;
+		}
+		copied += res;
+
+		if (!fuse_bufvec_advance(srcv, res) ||
+		    !fuse_bufvec_advance(dstv, res))
+			break;
+
+		if (res < len)
+			break;
+	}
+
+	return copied;
+}
diff --git a/tools/virtiofsd/fuse_log.c b/tools/virtiofsd/fuse_log.c
new file mode 100644
index 0000000000..0d268ab014
--- /dev/null
+++ b/tools/virtiofsd/fuse_log.c
@@ -0,0 +1,40 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2019  Red Hat, Inc.
+
+  Logging API.
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB
+*/
+
+#include "fuse_log.h"
+
+#include <stdarg.h>
+#include <stdio.h>
+
+static void default_log_func(
+		__attribute__(( unused )) enum fuse_log_level level,
+		const char *fmt, va_list ap)
+{
+	vfprintf(stderr, fmt, ap);
+}
+
+static fuse_log_func_t log_func = default_log_func;
+
+void fuse_set_log_func(fuse_log_func_t func)
+{
+	if (!func)
+		func = default_log_func;
+
+	log_func = func;
+}
+
+void fuse_log(enum fuse_log_level level, const char *fmt, ...)
+{
+	va_list ap;
+
+	va_start(ap, fmt);
+	log_func(level, fmt, ap);
+	va_end(ap);
+}
diff --git a/tools/virtiofsd/fuse_loop_mt.c b/tools/virtiofsd/fuse_loop_mt.c
new file mode 100644
index 0000000000..445e9a0ab0
--- /dev/null
+++ b/tools/virtiofsd/fuse_loop_mt.c
@@ -0,0 +1,362 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  Implementation of the multi-threaded FUSE session loop.
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB.
+*/
+
+#include "config.h"
+#include "fuse_lowlevel.h"
+#include "fuse_misc.h"
+#include "fuse_kernel.h"
+#include "fuse_i.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <signal.h>
+#include <semaphore.h>
+#include <errno.h>
+#include <sys/time.h>
+#include <sys/ioctl.h>
+#include <assert.h>
+
+/* Environment var controlling the thread stack size */
+#define ENVNAME_THREAD_STACK "FUSE_THREAD_STACK"
+
+struct fuse_worker {
+	struct fuse_worker *prev;
+	struct fuse_worker *next;
+	pthread_t thread_id;
+	size_t bufsize;
+
+	// We need to include fuse_buf so that we can properly free
+	// it when a thread is terminated by pthread_cancel().
+	struct fuse_buf fbuf;
+	struct fuse_chan *ch;
+	struct fuse_mt *mt;
+};
+
+struct fuse_mt {
+	pthread_mutex_t lock;
+	int numworker;
+	int numavail;
+	struct fuse_session *se;
+	struct fuse_worker main;
+	sem_t finish;
+	int exit;
+	int error;
+	int clone_fd;
+	int max_idle;
+};
+
+static struct fuse_chan *fuse_chan_new(int fd)
+{
+	struct fuse_chan *ch = (struct fuse_chan *) malloc(sizeof(*ch));
+	if (ch == NULL) {
+		fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate channel\n");
+		return NULL;
+	}
+
+	memset(ch, 0, sizeof(*ch));
+	ch->fd = fd;
+	ch->ctr = 1;
+	fuse_mutex_init(&ch->lock);
+
+	return ch;
+}
+
+struct fuse_chan *fuse_chan_get(struct fuse_chan *ch)
+{
+	assert(ch->ctr > 0);
+	pthread_mutex_lock(&ch->lock);
+	ch->ctr++;
+	pthread_mutex_unlock(&ch->lock);
+
+	return ch;
+}
+
+void fuse_chan_put(struct fuse_chan *ch)
+{
+	if (ch == NULL)
+		return;
+	pthread_mutex_lock(&ch->lock);
+	ch->ctr--;
+	if (!ch->ctr) {
+		pthread_mutex_unlock(&ch->lock);
+		close(ch->fd);
+		pthread_mutex_destroy(&ch->lock);
+		free(ch);
+	} else
+		pthread_mutex_unlock(&ch->lock);
+}
+
+static void list_add_worker(struct fuse_worker *w, struct fuse_worker *next)
+{
+	struct fuse_worker *prev = next->prev;
+	w->next = next;
+	w->prev = prev;
+	prev->next = w;
+	next->prev = w;
+}
+
+static void list_del_worker(struct fuse_worker *w)
+{
+	struct fuse_worker *prev = w->prev;
+	struct fuse_worker *next = w->next;
+	prev->next = next;
+	next->prev = prev;
+}
+
+static int fuse_loop_start_thread(struct fuse_mt *mt);
+
+static void *fuse_do_work(void *data)
+{
+	struct fuse_worker *w = (struct fuse_worker *) data;
+	struct fuse_mt *mt = w->mt;
+
+	while (!fuse_session_exited(mt->se)) {
+		int isforget = 0;
+		int res;
+
+		pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
+		res = fuse_session_receive_buf_int(mt->se, &w->fbuf, w->ch);
+		pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
+		if (res == -EINTR)
+			continue;
+		if (res <= 0) {
+			if (res < 0) {
+				fuse_session_exit(mt->se);
+				mt->error = res;
+			}
+			break;
+		}
+
+		pthread_mutex_lock(&mt->lock);
+		if (mt->exit) {
+			pthread_mutex_unlock(&mt->lock);
+			return NULL;
+		}
+
+		/*
+		 * This disgusting hack is needed so that zillions of threads
+		 * are not created on a burst of FORGET messages
+		 */
+		if (!(w->fbuf.flags & FUSE_BUF_IS_FD)) {
+			struct fuse_in_header *in = w->fbuf.mem;
+
+			if (in->opcode == FUSE_FORGET ||
+			    in->opcode == FUSE_BATCH_FORGET)
+				isforget = 1;
+		}
+
+		if (!isforget)
+			mt->numavail--;
+		if (mt->numavail == 0)
+			fuse_loop_start_thread(mt);
+		pthread_mutex_unlock(&mt->lock);
+
+		fuse_session_process_buf_int(mt->se, &w->fbuf, w->ch);
+
+		pthread_mutex_lock(&mt->lock);
+		if (!isforget)
+			mt->numavail++;
+		if (mt->numavail > mt->max_idle) {
+			if (mt->exit) {
+				pthread_mutex_unlock(&mt->lock);
+				return NULL;
+			}
+			list_del_worker(w);
+			mt->numavail--;
+			mt->numworker--;
+			pthread_mutex_unlock(&mt->lock);
+
+			pthread_detach(w->thread_id);
+			free(w->fbuf.mem);
+			fuse_chan_put(w->ch);
+			free(w);
+			return NULL;
+		}
+		pthread_mutex_unlock(&mt->lock);
+	}
+
+	sem_post(&mt->finish);
+
+	return NULL;
+}
+
+int fuse_start_thread(pthread_t *thread_id, void *(*func)(void *), void *arg)
+{
+	sigset_t oldset;
+	sigset_t newset;
+	int res;
+	pthread_attr_t attr;
+	char *stack_size;
+
+	/* Override default stack size */
+	pthread_attr_init(&attr);
+	stack_size = getenv(ENVNAME_THREAD_STACK);
+	if (stack_size && pthread_attr_setstacksize(&attr, atoi(stack_size)))
+		fuse_log(FUSE_LOG_ERR, "fuse: invalid stack size: %s\n", stack_size);
+
+	/* Disallow signal reception in worker threads */
+	sigemptyset(&newset);
+	sigaddset(&newset, SIGTERM);
+	sigaddset(&newset, SIGINT);
+	sigaddset(&newset, SIGHUP);
+	sigaddset(&newset, SIGQUIT);
+	pthread_sigmask(SIG_BLOCK, &newset, &oldset);
+	res = pthread_create(thread_id, &attr, func, arg);
+	pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+	pthread_attr_destroy(&attr);
+	if (res != 0) {
+		fuse_log(FUSE_LOG_ERR, "fuse: error creating thread: %s\n",
+			strerror(res));
+		return -1;
+	}
+
+	return 0;
+}
+
+static struct fuse_chan *fuse_clone_chan(struct fuse_mt *mt)
+{
+	int res;
+	int clonefd;
+	uint32_t masterfd;
+	struct fuse_chan *newch;
+	const char *devname = "/dev/fuse";
+
+#ifndef O_CLOEXEC
+#define O_CLOEXEC 0
+#endif
+	clonefd = open(devname, O_RDWR | O_CLOEXEC);
+	if (clonefd == -1) {
+		fuse_log(FUSE_LOG_ERR, "fuse: failed to open %s: %s\n", devname,
+			strerror(errno));
+		return NULL;
+	}
+	fcntl(clonefd, F_SETFD, FD_CLOEXEC);
+
+	masterfd = mt->se->fd;
+	res = ioctl(clonefd, FUSE_DEV_IOC_CLONE, &masterfd);
+	if (res == -1) {
+		fuse_log(FUSE_LOG_ERR, "fuse: failed to clone device fd: %s\n",
+			strerror(errno));
+		close(clonefd);
+		return NULL;
+	}
+	newch = fuse_chan_new(clonefd);
+	if (newch == NULL)
+		close(clonefd);
+
+	return newch;
+}
+
+static int fuse_loop_start_thread(struct fuse_mt *mt)
+{
+	int res;
+
+	struct fuse_worker *w = malloc(sizeof(struct fuse_worker));
+	if (!w) {
+		fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate worker structure\n");
+		return -1;
+	}
+	memset(w, 0, sizeof(struct fuse_worker));
+	w->fbuf.mem = NULL;
+	w->mt = mt;
+
+	w->ch = NULL;
+	if (mt->clone_fd) {
+		w->ch = fuse_clone_chan(mt);
+		if(!w->ch) {
+			/* Don't attempt this again */
+			fuse_log(FUSE_LOG_ERR, "fuse: trying to continue "
+				"without -o clone_fd.\n");
+			mt->clone_fd = 0;
+		}
+	}
+
+	res = fuse_start_thread(&w->thread_id, fuse_do_work, w);
+	if (res == -1) {
+		fuse_chan_put(w->ch);
+		free(w);
+		return -1;
+	}
+	list_add_worker(w, &mt->main);
+	mt->numavail ++;
+	mt->numworker ++;
+
+	return 0;
+}
+
+static void fuse_join_worker(struct fuse_mt *mt, struct fuse_worker *w)
+{
+	pthread_join(w->thread_id, NULL);
+	pthread_mutex_lock(&mt->lock);
+	list_del_worker(w);
+	pthread_mutex_unlock(&mt->lock);
+	free(w->fbuf.mem);
+	fuse_chan_put(w->ch);
+	free(w);
+}
+
+FUSE_SYMVER(".symver fuse_session_loop_mt_32,fuse_session_loop_mt@@FUSE_3.2");
+int fuse_session_loop_mt_32(struct fuse_session *se, struct fuse_loop_config *config)
+{
+	int err;
+	struct fuse_mt mt;
+	struct fuse_worker *w;
+
+	memset(&mt, 0, sizeof(struct fuse_mt));
+	mt.se = se;
+	mt.clone_fd = config->clone_fd;
+	mt.error = 0;
+	mt.numworker = 0;
+	mt.numavail = 0;
+	mt.max_idle = config->max_idle_threads;
+	mt.main.thread_id = pthread_self();
+	mt.main.prev = mt.main.next = &mt.main;
+	sem_init(&mt.finish, 0, 0);
+	fuse_mutex_init(&mt.lock);
+
+	pthread_mutex_lock(&mt.lock);
+	err = fuse_loop_start_thread(&mt);
+	pthread_mutex_unlock(&mt.lock);
+	if (!err) {
+		/* sem_wait() is interruptible */
+		while (!fuse_session_exited(se))
+			sem_wait(&mt.finish);
+
+		pthread_mutex_lock(&mt.lock);
+		for (w = mt.main.next; w != &mt.main; w = w->next)
+			pthread_cancel(w->thread_id);
+		mt.exit = 1;
+		pthread_mutex_unlock(&mt.lock);
+
+		while (mt.main.next != &mt.main)
+			fuse_join_worker(&mt, mt.main.next);
+
+		err = mt.error;
+	}
+
+	pthread_mutex_destroy(&mt.lock);
+	sem_destroy(&mt.finish);
+	if(se->error != 0)
+		err = se->error;
+	fuse_session_reset(se);
+	return err;
+}
+
+int fuse_session_loop_mt_31(struct fuse_session *se, int clone_fd);
+FUSE_SYMVER(".symver fuse_session_loop_mt_31,fuse_session_loop_mt@FUSE_3.0");
+int fuse_session_loop_mt_31(struct fuse_session *se, int clone_fd)
+{
+	struct fuse_loop_config config;
+	config.clone_fd = clone_fd;
+	config.max_idle_threads = 10;
+	return fuse_session_loop_mt_32(se, &config);
+}
diff --git a/tools/virtiofsd/fuse_opt.c b/tools/virtiofsd/fuse_opt.c
new file mode 100644
index 0000000000..93066b926e
--- /dev/null
+++ b/tools/virtiofsd/fuse_opt.c
@@ -0,0 +1,423 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  Implementation of option parsing routines (dealing with `struct
+  fuse_args`).
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB
+*/
+
+#include "config.h"
+#include "fuse_i.h"
+#include "fuse_opt.h"
+#include "fuse_misc.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+
+struct fuse_opt_context {
+	void *data;
+	const struct fuse_opt *opt;
+	fuse_opt_proc_t proc;
+	int argctr;
+	int argc;
+	char **argv;
+	struct fuse_args outargs;
+	char *opts;
+	int nonopt;
+};
+
+void fuse_opt_free_args(struct fuse_args *args)
+{
+	if (args) {
+		if (args->argv && args->allocated) {
+			int i;
+			for (i = 0; i < args->argc; i++)
+				free(args->argv[i]);
+			free(args->argv);
+		}
+		args->argc = 0;
+		args->argv = NULL;
+		args->allocated = 0;
+	}
+}
+
+static int alloc_failed(void)
+{
+	fuse_log(FUSE_LOG_ERR, "fuse: memory allocation failed\n");
+	return -1;
+}
+
+int fuse_opt_add_arg(struct fuse_args *args, const char *arg)
+{
+	char **newargv;
+	char *newarg;
+
+	assert(!args->argv || args->allocated);
+
+	newarg = strdup(arg);
+	if (!newarg)
+		return alloc_failed();
+
+	newargv = realloc(args->argv, (args->argc + 2) * sizeof(char *));
+	if (!newargv) {
+		free(newarg);
+		return alloc_failed();
+	}
+
+	args->argv = newargv;
+	args->allocated = 1;
+	args->argv[args->argc++] = newarg;
+	args->argv[args->argc] = NULL;
+	return 0;
+}
+
+static int fuse_opt_insert_arg_common(struct fuse_args *args, int pos,
+				      const char *arg)
+{
+	assert(pos <= args->argc);
+	if (fuse_opt_add_arg(args, arg) == -1)
+		return -1;
+
+	if (pos != args->argc - 1) {
+		char *newarg = args->argv[args->argc - 1];
+		memmove(&args->argv[pos + 1], &args->argv[pos],
+			sizeof(char *) * (args->argc - pos - 1));
+		args->argv[pos] = newarg;
+	}
+	return 0;
+}
+
+int fuse_opt_insert_arg(struct fuse_args *args, int pos, const char *arg)
+{
+	return fuse_opt_insert_arg_common(args, pos, arg);
+}
+
+static int next_arg(struct fuse_opt_context *ctx, const char *opt)
+{
+	if (ctx->argctr + 1 >= ctx->argc) {
+		fuse_log(FUSE_LOG_ERR, "fuse: missing argument after `%s'\n", opt);
+		return -1;
+	}
+	ctx->argctr++;
+	return 0;
+}
+
+static int add_arg(struct fuse_opt_context *ctx, const char *arg)
+{
+	return fuse_opt_add_arg(&ctx->outargs, arg);
+}
+
+static int add_opt_common(char **opts, const char *opt, int esc)
+{
+	unsigned oldlen = *opts ? strlen(*opts) : 0;
+	char *d = realloc(*opts, oldlen + 1 + strlen(opt) * 2 + 1);
+
+	if (!d)
+		return alloc_failed();
+
+	*opts = d;
+	if (oldlen) {
+		d += oldlen;
+		*d++ = ',';
+	}
+
+	for (; *opt; opt++) {
+		if (esc && (*opt == ',' || *opt == '\\'))
+			*d++ = '\\';
+		*d++ = *opt;
+	}
+	*d = '\0';
+
+	return 0;
+}
+
+int fuse_opt_add_opt(char **opts, const char *opt)
+{
+	return add_opt_common(opts, opt, 0);
+}
+
+int fuse_opt_add_opt_escaped(char **opts, const char *opt)
+{
+	return add_opt_common(opts, opt, 1);
+}
+
+static int add_opt(struct fuse_opt_context *ctx, const char *opt)
+{
+	return add_opt_common(&ctx->opts, opt, 1);
+}
+
+static int call_proc(struct fuse_opt_context *ctx, const char *arg, int key,
+		     int iso)
+{
+	if (key == FUSE_OPT_KEY_DISCARD)
+		return 0;
+
+	if (key != FUSE_OPT_KEY_KEEP && ctx->proc) {
+		int res = ctx->proc(ctx->data, arg, key, &ctx->outargs);
+		if (res == -1 || !res)
+			return res;
+	}
+	if (iso)
+		return add_opt(ctx, arg);
+	else
+		return add_arg(ctx, arg);
+}
+
+static int match_template(const char *t, const char *arg, unsigned *sepp)
+{
+	int arglen = strlen(arg);
+	const char *sep = strchr(t, '=');
+	sep = sep ? sep : strchr(t, ' ');
+	if (sep && (!sep[1] || sep[1] == '%')) {
+		int tlen = sep - t;
+		if (sep[0] == '=')
+			tlen ++;
+		if (arglen >= tlen && strncmp(arg, t, tlen) == 0) {
+			*sepp = sep - t;
+			return 1;
+		}
+	}
+	if (strcmp(t, arg) == 0) {
+		*sepp = 0;
+		return 1;
+	}
+	return 0;
+}
+
+static const struct fuse_opt *find_opt(const struct fuse_opt *opt,
+				       const char *arg, unsigned *sepp)
+{
+	for (; opt && opt->templ; opt++)
+		if (match_template(opt->templ, arg, sepp))
+			return opt;
+	return NULL;
+}
+
+int fuse_opt_match(const struct fuse_opt *opts, const char *opt)
+{
+	unsigned dummy;
+	return find_opt(opts, opt, &dummy) ? 1 : 0;
+}
+
+static int process_opt_param(void *var, const char *format, const char *param,
+			     const char *arg)
+{
+	assert(format[0] == '%');
+	if (format[1] == 's') {
+		char **s = var;
+		char *copy = strdup(param);
+		if (!copy)
+			return alloc_failed();
+
+		free(*s);
+		*s = copy;
+	} else {
+		if (sscanf(param, format, var) != 1) {
+			fuse_log(FUSE_LOG_ERR, "fuse: invalid parameter in option `%s'\n", arg);
+			return -1;
+		}
+	}
+	return 0;
+}
+
+static int process_opt(struct fuse_opt_context *ctx,
+		       const struct fuse_opt *opt, unsigned sep,
+		       const char *arg, int iso)
+{
+	if (opt->offset == -1U) {
+		if (call_proc(ctx, arg, opt->value, iso) == -1)
+			return -1;
+	} else {
+		void *var = (char *)ctx->data + opt->offset;
+		if (sep && opt->templ[sep + 1]) {
+			const char *param = arg + sep;
+			if (opt->templ[sep] == '=')
+				param ++;
+			if (process_opt_param(var, opt->templ + sep + 1,
+					      param, arg) == -1)
+				return -1;
+		} else
+			*(int *)var = opt->value;
+	}
+	return 0;
+}
+
+static int process_opt_sep_arg(struct fuse_opt_context *ctx,
+			       const struct fuse_opt *opt, unsigned sep,
+			       const char *arg, int iso)
+{
+	int res;
+	char *newarg;
+	char *param;
+
+	if (next_arg(ctx, arg) == -1)
+		return -1;
+
+	param = ctx->argv[ctx->argctr];
+	newarg = malloc(sep + strlen(param) + 1);
+	if (!newarg)
+		return alloc_failed();
+
+	memcpy(newarg, arg, sep);
+	strcpy(newarg + sep, param);
+	res = process_opt(ctx, opt, sep, newarg, iso);
+	free(newarg);
+
+	return res;
+}
+
+static int process_gopt(struct fuse_opt_context *ctx, const char *arg, int iso)
+{
+	unsigned sep;
+	const struct fuse_opt *opt = find_opt(ctx->opt, arg, &sep);
+	if (opt) {
+		for (; opt; opt = find_opt(opt + 1, arg, &sep)) {
+			int res;
+			if (sep && opt->templ[sep] == ' ' && !arg[sep])
+				res = process_opt_sep_arg(ctx, opt, sep, arg,
+							  iso);
+			else
+				res = process_opt(ctx, opt, sep, arg, iso);
+			if (res == -1)
+				return -1;
+		}
+		return 0;
+	} else
+		return call_proc(ctx, arg, FUSE_OPT_KEY_OPT, iso);
+}
+
+static int process_real_option_group(struct fuse_opt_context *ctx, char *opts)
+{
+	char *s = opts;
+	char *d = s;
+	int end = 0;
+
+	while (!end) {
+		if (*s == '\0')
+			end = 1;
+		if (*s == ',' || end) {
+			int res;
+
+			*d = '\0';
+			res = process_gopt(ctx, opts, 1);
+			if (res == -1)
+				return -1;
+			d = opts;
+		} else {
+			if (s[0] == '\\' && s[1] != '\0') {
+				s++;
+				if (s[0] >= '0' && s[0] <= '3' &&
+				    s[1] >= '0' && s[1] <= '7' &&
+				    s[2] >= '0' && s[2] <= '7') {
+					*d++ = (s[0] - '0') * 0100 +
+						(s[1] - '0') * 0010 +
+						(s[2] - '0');
+					s += 2;
+				} else {
+					*d++ = *s;
+				}
+			} else {
+				*d++ = *s;
+			}
+		}
+		s++;
+	}
+
+	return 0;
+}
+
+static int process_option_group(struct fuse_opt_context *ctx, const char *opts)
+{
+	int res;
+	char *copy = strdup(opts);
+
+	if (!copy) {
+		fuse_log(FUSE_LOG_ERR, "fuse: memory allocation failed\n");
+		return -1;
+	}
+	res = process_real_option_group(ctx, copy);
+	free(copy);
+	return res;
+}
+
+static int process_one(struct fuse_opt_context *ctx, const char *arg)
+{
+	if (ctx->nonopt || arg[0] != '-')
+		return call_proc(ctx, arg, FUSE_OPT_KEY_NONOPT, 0);
+	else if (arg[1] == 'o') {
+		if (arg[2])
+			return process_option_group(ctx, arg + 2);
+		else {
+			if (next_arg(ctx, arg) == -1)
+				return -1;
+
+			return process_option_group(ctx,
+						    ctx->argv[ctx->argctr]);
+		}
+	} else if (arg[1] == '-' && !arg[2]) {
+		if (add_arg(ctx, arg) == -1)
+			return -1;
+		ctx->nonopt = ctx->outargs.argc;
+		return 0;
+	} else
+		return process_gopt(ctx, arg, 0);
+}
+
+static int opt_parse(struct fuse_opt_context *ctx)
+{
+	if (ctx->argc) {
+		if (add_arg(ctx, ctx->argv[0]) == -1)
+			return -1;
+	}
+
+	for (ctx->argctr = 1; ctx->argctr < ctx->argc; ctx->argctr++)
+		if (process_one(ctx, ctx->argv[ctx->argctr]) == -1)
+			return -1;
+
+	if (ctx->opts) {
+		if (fuse_opt_insert_arg(&ctx->outargs, 1, "-o") == -1 ||
+		    fuse_opt_insert_arg(&ctx->outargs, 2, ctx->opts) == -1)
+			return -1;
+	}
+
+	/* If option separator ("--") is the last argument, remove it */
+	if (ctx->nonopt && ctx->nonopt == ctx->outargs.argc &&
+	    strcmp(ctx->outargs.argv[ctx->outargs.argc - 1], "--") == 0) {
+		free(ctx->outargs.argv[ctx->outargs.argc - 1]);
+		ctx->outargs.argv[--ctx->outargs.argc] = NULL;
+	}
+
+	return 0;
+}
+
+int fuse_opt_parse(struct fuse_args *args, void *data,
+		   const struct fuse_opt opts[], fuse_opt_proc_t proc)
+{
+	int res;
+	struct fuse_opt_context ctx = {
+		.data = data,
+		.opt = opts,
+		.proc = proc,
+	};
+
+	if (!args || !args->argv || !args->argc)
+		return 0;
+
+	ctx.argc = args->argc;
+	ctx.argv = args->argv;
+
+	res = opt_parse(&ctx);
+	if (res != -1) {
+		struct fuse_args tmp = *args;
+		*args = ctx.outargs;
+		ctx.outargs = tmp;
+	}
+	free(ctx.opts);
+	fuse_opt_free_args(&ctx.outargs);
+	return res;
+}
diff --git a/tools/virtiofsd/fuse_signals.c b/tools/virtiofsd/fuse_signals.c
new file mode 100644
index 0000000000..4271947bd4
--- /dev/null
+++ b/tools/virtiofsd/fuse_signals.c
@@ -0,0 +1,91 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  Utility functions for setting signal handlers.
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB
+*/
+
+#include "config.h"
+#include "fuse_lowlevel.h"
+#include "fuse_i.h"
+
+#include <stdio.h>
+#include <string.h>
+#include <signal.h>
+#include <stdlib.h>
+
+static struct fuse_session *fuse_instance;
+
+static void exit_handler(int sig)
+{
+	if (fuse_instance) {
+		fuse_session_exit(fuse_instance);
+		if(sig <= 0) {
+			fuse_log(FUSE_LOG_ERR, "assertion error: signal value <= 0\n");
+			abort();
+		}
+		fuse_instance->error = sig;
+	}
+}
+
+static void do_nothing(int sig)
+{
+	(void) sig;
+}
+
+static int set_one_signal_handler(int sig, void (*handler)(int), int remove)
+{
+	struct sigaction sa;
+	struct sigaction old_sa;
+
+	memset(&sa, 0, sizeof(struct sigaction));
+	sa.sa_handler = remove ? SIG_DFL : handler;
+	sigemptyset(&(sa.sa_mask));
+	sa.sa_flags = 0;
+
+	if (sigaction(sig, NULL, &old_sa) == -1) {
+		perror("fuse: cannot get old signal handler");
+		return -1;
+	}
+
+	if (old_sa.sa_handler == (remove ? handler : SIG_DFL) &&
+	    sigaction(sig, &sa, NULL) == -1) {
+		perror("fuse: cannot set signal handler");
+		return -1;
+	}
+	return 0;
+}
+
+int fuse_set_signal_handlers(struct fuse_session *se)
+{
+	/* If we used SIG_IGN instead of the do_nothing function,
+	   then we would be unable to tell if we set SIG_IGN (and
+	   thus should reset to SIG_DFL in fuse_remove_signal_handlers)
+	   or if it was already set to SIG_IGN (and should be left
+	   untouched. */
+	if (set_one_signal_handler(SIGHUP, exit_handler, 0) == -1 ||
+	    set_one_signal_handler(SIGINT, exit_handler, 0) == -1 ||
+	    set_one_signal_handler(SIGTERM, exit_handler, 0) == -1 ||
+	    set_one_signal_handler(SIGPIPE, do_nothing, 0) == -1)
+		return -1;
+
+	fuse_instance = se;
+	return 0;
+}
+
+void fuse_remove_signal_handlers(struct fuse_session *se)
+{
+	if (fuse_instance != se)
+		fuse_log(FUSE_LOG_ERR,
+			"fuse: fuse_remove_signal_handlers: unknown session\n");
+	else
+		fuse_instance = NULL;
+
+	set_one_signal_handler(SIGHUP, exit_handler, 1);
+	set_one_signal_handler(SIGINT, exit_handler, 1);
+	set_one_signal_handler(SIGTERM, exit_handler, 1);
+	set_one_signal_handler(SIGPIPE, do_nothing, 1);
+}
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
new file mode 100644
index 0000000000..64ff7ad6d5
--- /dev/null
+++ b/tools/virtiofsd/helper.c
@@ -0,0 +1,440 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  Helper functions to create (simple) standalone programs. With the
+  aid of these functions it should be possible to create full FUSE
+  file system by implementing nothing but the request handlers.
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB.
+*/
+
+#include "config.h"
+#include "fuse_i.h"
+#include "fuse_misc.h"
+#include "fuse_opt.h"
+#include "fuse_lowlevel.h"
+#include "mount_util.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <string.h>
+#include <limits.h>
+#include <errno.h>
+#include <sys/param.h>
+
+#define FUSE_HELPER_OPT(t, p) \
+	{ t, offsetof(struct fuse_cmdline_opts, p), 1 }
+
+static const struct fuse_opt fuse_helper_opts[] = {
+	FUSE_HELPER_OPT("-h",		show_help),
+	FUSE_HELPER_OPT("--help",	show_help),
+	FUSE_HELPER_OPT("-V",		show_version),
+	FUSE_HELPER_OPT("--version",	show_version),
+	FUSE_HELPER_OPT("-d",		debug),
+	FUSE_HELPER_OPT("debug",	debug),
+	FUSE_HELPER_OPT("-d",		foreground),
+	FUSE_HELPER_OPT("debug",	foreground),
+	FUSE_OPT_KEY("-d",		FUSE_OPT_KEY_KEEP),
+	FUSE_OPT_KEY("debug",		FUSE_OPT_KEY_KEEP),
+	FUSE_HELPER_OPT("-f",		foreground),
+	FUSE_HELPER_OPT("-s",		singlethread),
+	FUSE_HELPER_OPT("fsname=",	nodefault_subtype),
+	FUSE_OPT_KEY("fsname=",		FUSE_OPT_KEY_KEEP),
+#ifndef __FreeBSD__
+	FUSE_HELPER_OPT("subtype=",	nodefault_subtype),
+	FUSE_OPT_KEY("subtype=",	FUSE_OPT_KEY_KEEP),
+#endif
+	FUSE_HELPER_OPT("clone_fd",	clone_fd),
+	FUSE_HELPER_OPT("max_idle_threads=%u", max_idle_threads),
+	FUSE_OPT_END
+};
+
+struct fuse_conn_info_opts {
+	int atomic_o_trunc;
+	int no_remote_posix_lock;
+	int no_remote_flock;
+	int splice_write;
+	int splice_move;
+	int splice_read;
+	int no_splice_write;
+	int no_splice_move;
+	int no_splice_read;
+	int auto_inval_data;
+	int no_auto_inval_data;
+	int no_readdirplus;
+	int no_readdirplus_auto;
+	int async_dio;
+	int no_async_dio;
+	int writeback_cache;
+	int no_writeback_cache;
+	int async_read;
+	int sync_read;
+	unsigned max_write;
+	unsigned max_readahead;
+	unsigned max_background;
+	unsigned congestion_threshold;
+	unsigned time_gran;
+	int set_max_write;
+	int set_max_readahead;
+	int set_max_background;
+	int set_congestion_threshold;
+	int set_time_gran;
+};
+
+#define CONN_OPTION(t, p, v)					\
+	{ t, offsetof(struct fuse_conn_info_opts, p), v }
+static const struct fuse_opt conn_info_opt_spec[] = {
+	CONN_OPTION("max_write=%u", max_write, 0),
+	CONN_OPTION("max_write=", set_max_write, 1),
+	CONN_OPTION("max_readahead=%u", max_readahead, 0),
+	CONN_OPTION("max_readahead=", set_max_readahead, 1),
+	CONN_OPTION("max_background=%u", max_background, 0),
+	CONN_OPTION("max_background=", set_max_background, 1),
+	CONN_OPTION("congestion_threshold=%u", congestion_threshold, 0),
+	CONN_OPTION("congestion_threshold=", set_congestion_threshold, 1),
+	CONN_OPTION("sync_read", sync_read, 1),
+	CONN_OPTION("async_read", async_read, 1),
+	CONN_OPTION("atomic_o_trunc", atomic_o_trunc, 1),
+	CONN_OPTION("no_remote_lock", no_remote_posix_lock, 1),
+	CONN_OPTION("no_remote_lock", no_remote_flock, 1),
+	CONN_OPTION("no_remote_flock", no_remote_flock, 1),
+	CONN_OPTION("no_remote_posix_lock", no_remote_posix_lock, 1),
+	CONN_OPTION("splice_write", splice_write, 1),
+	CONN_OPTION("no_splice_write", no_splice_write, 1),
+	CONN_OPTION("splice_move", splice_move, 1),
+	CONN_OPTION("no_splice_move", no_splice_move, 1),
+	CONN_OPTION("splice_read", splice_read, 1),
+	CONN_OPTION("no_splice_read", no_splice_read, 1),
+	CONN_OPTION("auto_inval_data", auto_inval_data, 1),
+	CONN_OPTION("no_auto_inval_data", no_auto_inval_data, 1),
+	CONN_OPTION("readdirplus=no", no_readdirplus, 1),
+	CONN_OPTION("readdirplus=yes", no_readdirplus, 0),
+	CONN_OPTION("readdirplus=yes", no_readdirplus_auto, 1),
+	CONN_OPTION("readdirplus=auto", no_readdirplus, 0),
+	CONN_OPTION("readdirplus=auto", no_readdirplus_auto, 0),
+	CONN_OPTION("async_dio", async_dio, 1),
+	CONN_OPTION("no_async_dio", no_async_dio, 1),
+	CONN_OPTION("writeback_cache", writeback_cache, 1),
+	CONN_OPTION("no_writeback_cache", no_writeback_cache, 1),
+	CONN_OPTION("time_gran=%u", time_gran, 0),
+	CONN_OPTION("time_gran=", set_time_gran, 1),
+	FUSE_OPT_END
+};
+
+
+void fuse_cmdline_help(void)
+{
+	printf("    -h   --help            print help\n"
+	       "    -V   --version         print version\n"
+	       "    -d   -o debug          enable debug output (implies -f)\n"
+	       "    -f                     foreground operation\n"
+	       "    -s                     disable multi-threaded operation\n"
+	       "    -o clone_fd            use separate fuse device fd for each thread\n"
+	       "                           (may improve performance)\n"
+	       "    -o max_idle_threads    the maximum number of idle worker threads\n"
+	       "                           allowed (default: 10)\n");
+}
+
+static int fuse_helper_opt_proc(void *data, const char *arg, int key,
+				struct fuse_args *outargs)
+{
+	(void) outargs;
+	struct fuse_cmdline_opts *opts = data;
+
+	switch (key) {
+	case FUSE_OPT_KEY_NONOPT:
+		if (!opts->mountpoint) {
+			if (fuse_mnt_parse_fuse_fd(arg) != -1) {
+				return fuse_opt_add_opt(&opts->mountpoint, arg);
+			}
+
+			char mountpoint[PATH_MAX] = "";
+			if (realpath(arg, mountpoint) == NULL) {
+				fuse_log(FUSE_LOG_ERR,
+					"fuse: bad mount point `%s': %s\n",
+					arg, strerror(errno));
+				return -1;
+			}
+			return fuse_opt_add_opt(&opts->mountpoint, mountpoint);
+		} else {
+			fuse_log(FUSE_LOG_ERR, "fuse: invalid argument `%s'\n", arg);
+			return -1;
+		}
+
+	default:
+		/* Pass through unknown options */
+		return 1;
+	}
+}
+
+/* Under FreeBSD, there is no subtype option so this
+   function actually sets the fsname */
+static int add_default_subtype(const char *progname, struct fuse_args *args)
+{
+	int res;
+	char *subtype_opt;
+
+	const char *basename = strrchr(progname, '/');
+	if (basename == NULL)
+		basename = progname;
+	else if (basename[1] != '\0')
+		basename++;
+
+	subtype_opt = (char *) malloc(strlen(basename) + 64);
+	if (subtype_opt == NULL) {
+		fuse_log(FUSE_LOG_ERR, "fuse: memory allocation failed\n");
+		return -1;
+	}
+#ifdef __FreeBSD__
+	sprintf(subtype_opt, "-ofsname=%s", basename);
+#else
+	sprintf(subtype_opt, "-osubtype=%s", basename);
+#endif
+	res = fuse_opt_add_arg(args, subtype_opt);
+	free(subtype_opt);
+	return res;
+}
+
+int fuse_parse_cmdline(struct fuse_args *args,
+		       struct fuse_cmdline_opts *opts)
+{
+	memset(opts, 0, sizeof(struct fuse_cmdline_opts));
+
+	opts->max_idle_threads = 10;
+
+	if (fuse_opt_parse(args, opts, fuse_helper_opts,
+			   fuse_helper_opt_proc) == -1)
+		return -1;
+
+	/* *Linux*: if neither -o subtype nor -o fsname are specified,
+	   set subtype to program's basename.
+	   *FreeBSD*: if fsname is not specified, set to program's
+	   basename. */
+	if (!opts->nodefault_subtype)
+		if (add_default_subtype(args->argv[0], args) == -1)
+			return -1;
+
+	return 0;
+}
+
+
+int fuse_daemonize(int foreground)
+{
+	if (!foreground) {
+		int nullfd;
+		int waiter[2];
+		char completed;
+
+		if (pipe(waiter)) {
+			perror("fuse_daemonize: pipe");
+			return -1;
+		}
+
+		/*
+		 * demonize current process by forking it and killing the
+		 * parent.  This makes current process as a child of 'init'.
+		 */
+		switch(fork()) {
+		case -1:
+			perror("fuse_daemonize: fork");
+			return -1;
+		case 0:
+			break;
+		default:
+			(void) read(waiter[0], &completed, sizeof(completed));
+			_exit(0);
+		}
+
+		if (setsid() == -1) {
+			perror("fuse_daemonize: setsid");
+			return -1;
+		}
+
+		(void) chdir("/");
+
+		nullfd = open("/dev/null", O_RDWR, 0);
+		if (nullfd != -1) {
+			(void) dup2(nullfd, 0);
+			(void) dup2(nullfd, 1);
+			(void) dup2(nullfd, 2);
+			if (nullfd > 2)
+				close(nullfd);
+		}
+
+		/* Propagate completion of daemon initialization */
+		completed = 1;
+		(void) write(waiter[1], &completed, sizeof(completed));
+		close(waiter[0]);
+		close(waiter[1]);
+	} else {
+		(void) chdir("/");
+	}
+	return 0;
+}
+
+int fuse_main_real(int argc, char *argv[], const struct fuse_operations *op,
+		   size_t op_size, void *user_data)
+{
+	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
+	struct fuse *fuse;
+	struct fuse_cmdline_opts opts;
+	int res;
+
+	if (fuse_parse_cmdline(&args, &opts) != 0)
+		return 1;
+
+	if (opts.show_version) {
+		printf("FUSE library version %s\n", PACKAGE_VERSION);
+		fuse_lowlevel_version();
+		res = 0;
+		goto out1;
+	}
+
+	if (opts.show_help) {
+		if(args.argv[0][0] != '\0')
+			printf("usage: %s [options] <mountpoint>\n\n",
+			       args.argv[0]);
+		printf("FUSE options:\n");
+		fuse_cmdline_help();
+		fuse_lib_help(&args);
+		res = 0;
+		goto out1;
+	}
+
+	if (!opts.show_help &&
+	    !opts.mountpoint) {
+		fuse_log(FUSE_LOG_ERR, "error: no mountpoint specified\n");
+		res = 2;
+		goto out1;
+	}
+
+
+	fuse = fuse_new_31(&args, op, op_size, user_data);
+	if (fuse == NULL) {
+		res = 3;
+		goto out1;
+	}
+
+	if (fuse_mount(fuse,opts.mountpoint) != 0) {
+		res = 4;
+		goto out2;
+	}
+
+	if (fuse_daemonize(opts.foreground) != 0) {
+		res = 5;
+		goto out3;
+	}
+
+	struct fuse_session *se = fuse_get_session(fuse);
+	if (fuse_set_signal_handlers(se) != 0) {
+		res = 6;
+		goto out3;
+	}
+
+	if (opts.singlethread)
+		res = fuse_loop(fuse);
+	else {
+		struct fuse_loop_config loop_config;
+		loop_config.clone_fd = opts.clone_fd;
+		loop_config.max_idle_threads = opts.max_idle_threads;
+		res = fuse_loop_mt_32(fuse, &loop_config);
+	}
+	if (res)
+		res = 7;
+
+	fuse_remove_signal_handlers(se);
+out3:
+	fuse_unmount(fuse);
+out2:
+	fuse_destroy(fuse);
+out1:
+	free(opts.mountpoint);
+	fuse_opt_free_args(&args);
+	return res;
+}
+
+
+void fuse_apply_conn_info_opts(struct fuse_conn_info_opts *opts,
+			       struct fuse_conn_info *conn)
+{
+	if(opts->set_max_write)
+		conn->max_write = opts->max_write;
+	if(opts->set_max_background)
+		conn->max_background = opts->max_background;
+	if(opts->set_congestion_threshold)
+		conn->congestion_threshold = opts->congestion_threshold;
+	if(opts->set_time_gran)
+		conn->time_gran = opts->time_gran;
+	if(opts->set_max_readahead)
+		conn->max_readahead = opts->max_readahead;
+
+#define LL_ENABLE(cond,cap) \
+	if (cond) conn->want |= (cap)
+#define LL_DISABLE(cond,cap) \
+	if (cond) conn->want &= ~(cap)
+
+	LL_ENABLE(opts->splice_read, FUSE_CAP_SPLICE_READ);
+	LL_DISABLE(opts->no_splice_read, FUSE_CAP_SPLICE_READ);
+
+	LL_ENABLE(opts->splice_write, FUSE_CAP_SPLICE_WRITE);
+	LL_DISABLE(opts->no_splice_write, FUSE_CAP_SPLICE_WRITE);
+
+	LL_ENABLE(opts->splice_move, FUSE_CAP_SPLICE_MOVE);
+	LL_DISABLE(opts->no_splice_move, FUSE_CAP_SPLICE_MOVE);
+
+	LL_ENABLE(opts->auto_inval_data, FUSE_CAP_AUTO_INVAL_DATA);
+	LL_DISABLE(opts->no_auto_inval_data, FUSE_CAP_AUTO_INVAL_DATA);
+
+	LL_DISABLE(opts->no_readdirplus, FUSE_CAP_READDIRPLUS);
+	LL_DISABLE(opts->no_readdirplus_auto, FUSE_CAP_READDIRPLUS_AUTO);
+
+	LL_ENABLE(opts->async_dio, FUSE_CAP_ASYNC_DIO);
+	LL_DISABLE(opts->no_async_dio, FUSE_CAP_ASYNC_DIO);
+
+	LL_ENABLE(opts->writeback_cache, FUSE_CAP_WRITEBACK_CACHE);
+	LL_DISABLE(opts->no_writeback_cache, FUSE_CAP_WRITEBACK_CACHE);
+
+	LL_ENABLE(opts->async_read, FUSE_CAP_ASYNC_READ);
+	LL_DISABLE(opts->sync_read, FUSE_CAP_ASYNC_READ);
+
+	LL_DISABLE(opts->no_remote_posix_lock, FUSE_CAP_POSIX_LOCKS);
+	LL_DISABLE(opts->no_remote_flock, FUSE_CAP_FLOCK_LOCKS);
+}
+
+struct fuse_conn_info_opts* fuse_parse_conn_info_opts(struct fuse_args *args)
+{
+	struct fuse_conn_info_opts *opts;
+
+	opts = calloc(1, sizeof(struct fuse_conn_info_opts));
+	if(opts == NULL) {
+		fuse_log(FUSE_LOG_ERR, "calloc failed\n");
+		return NULL;
+	}
+	if(fuse_opt_parse(args, opts, conn_info_opt_spec, NULL) == -1) {
+		free(opts);
+		return NULL;
+	}
+	return opts;
+}
+
+int fuse_open_channel(const char *mountpoint, const char* options)
+{
+	struct mount_opts *opts = NULL;
+	int fd = -1;
+	const char *argv[] = { "", "-o", options };
+	int argc = sizeof(argv) / sizeof(argv[0]);
+	struct fuse_args args = FUSE_ARGS_INIT(argc, (char**) argv);
+
+	opts = parse_mount_opts(&args);
+	if (opts == NULL)
+		return -1;
+
+	fd = fuse_kern_mount(mountpoint, opts);
+	destroy_mount_opts(opts);
+
+	return fd;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 004/104] virtiofsd: Add fuse_lowlevel.c
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 003/104] virtiofsd: Add auxiliary .c's Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 11:58   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 005/104] virtiofsd: Add passthrough_ll Dr. David Alan Gilbert (git)
                   ` (102 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

fuse_lowlevel is one of the largest files from the library
and does most of the work.  Add it separately to keep the diff
sizes small.
Again this is from upstream fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 3129 +++++++++++++++++++++++++++++++
 1 file changed, 3129 insertions(+)
 create mode 100644 tools/virtiofsd/fuse_lowlevel.c

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
new file mode 100644
index 0000000000..f2d7038e34
--- /dev/null
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -0,0 +1,3129 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  Implementation of (most of) the low-level FUSE API. The session loop
+  functions are implemented in separate files.
+
+  This program can be distributed under the terms of the GNU LGPLv2.
+  See the file COPYING.LIB
+*/
+
+#define _GNU_SOURCE
+
+#include "config.h"
+#include "fuse_i.h"
+#include "fuse_kernel.h"
+#include "fuse_opt.h"
+#include "fuse_misc.h"
+#include "mount_util.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <string.h>
+#include <unistd.h>
+#include <limits.h>
+#include <errno.h>
+#include <assert.h>
+#include <sys/file.h>
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE       1024
+#endif
+#ifndef F_SETPIPE_SZ
+#define F_SETPIPE_SZ	(F_LINUX_SPECIFIC_BASE + 7)
+#endif
+
+
+#define PARAM(inarg) (((char *)(inarg)) + sizeof(*(inarg)))
+#define OFFSET_MAX 0x7fffffffffffffffLL
+
+#define container_of(ptr, type, member) ({				\
+			const typeof( ((type *)0)->member ) *__mptr = (ptr); \
+			(type *)( (char *)__mptr - offsetof(type,member) );})
+
+struct fuse_pollhandle {
+	uint64_t kh;
+	struct fuse_session *se;
+};
+
+static size_t pagesize;
+
+static __attribute__((constructor)) void fuse_ll_init_pagesize(void)
+{
+	pagesize = getpagesize();
+}
+
+static void convert_stat(const struct stat *stbuf, struct fuse_attr *attr)
+{
+	attr->ino	= stbuf->st_ino;
+	attr->mode	= stbuf->st_mode;
+	attr->nlink	= stbuf->st_nlink;
+	attr->uid	= stbuf->st_uid;
+	attr->gid	= stbuf->st_gid;
+	attr->rdev	= stbuf->st_rdev;
+	attr->size	= stbuf->st_size;
+	attr->blksize	= stbuf->st_blksize;
+	attr->blocks	= stbuf->st_blocks;
+	attr->atime	= stbuf->st_atime;
+	attr->mtime	= stbuf->st_mtime;
+	attr->ctime	= stbuf->st_ctime;
+	attr->atimensec = ST_ATIM_NSEC(stbuf);
+	attr->mtimensec = ST_MTIM_NSEC(stbuf);
+	attr->ctimensec = ST_CTIM_NSEC(stbuf);
+}
+
+static void convert_attr(const struct fuse_setattr_in *attr, struct stat *stbuf)
+{
+	stbuf->st_mode	       = attr->mode;
+	stbuf->st_uid	       = attr->uid;
+	stbuf->st_gid	       = attr->gid;
+	stbuf->st_size	       = attr->size;
+	stbuf->st_atime	       = attr->atime;
+	stbuf->st_mtime	       = attr->mtime;
+	stbuf->st_ctime        = attr->ctime;
+	ST_ATIM_NSEC_SET(stbuf, attr->atimensec);
+	ST_MTIM_NSEC_SET(stbuf, attr->mtimensec);
+	ST_CTIM_NSEC_SET(stbuf, attr->ctimensec);
+}
+
+static	size_t iov_length(const struct iovec *iov, size_t count)
+{
+	size_t seg;
+	size_t ret = 0;
+
+	for (seg = 0; seg < count; seg++)
+		ret += iov[seg].iov_len;
+	return ret;
+}
+
+static void list_init_req(struct fuse_req *req)
+{
+	req->next = req;
+	req->prev = req;
+}
+
+static void list_del_req(struct fuse_req *req)
+{
+	struct fuse_req *prev = req->prev;
+	struct fuse_req *next = req->next;
+	prev->next = next;
+	next->prev = prev;
+}
+
+static void list_add_req(struct fuse_req *req, struct fuse_req *next)
+{
+	struct fuse_req *prev = next->prev;
+	req->next = next;
+	req->prev = prev;
+	prev->next = req;
+	next->prev = req;
+}
+
+static void destroy_req(fuse_req_t req)
+{
+	pthread_mutex_destroy(&req->lock);
+	free(req);
+}
+
+void fuse_free_req(fuse_req_t req)
+{
+	int ctr;
+	struct fuse_session *se = req->se;
+
+	pthread_mutex_lock(&se->lock);
+	req->u.ni.func = NULL;
+	req->u.ni.data = NULL;
+	list_del_req(req);
+	ctr = --req->ctr;
+	fuse_chan_put(req->ch);
+	req->ch = NULL;
+	pthread_mutex_unlock(&se->lock);
+	if (!ctr)
+		destroy_req(req);
+}
+
+static struct fuse_req *fuse_ll_alloc_req(struct fuse_session *se)
+{
+	struct fuse_req *req;
+
+	req = (struct fuse_req *) calloc(1, sizeof(struct fuse_req));
+	if (req == NULL) {
+		fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate request\n");
+	} else {
+		req->se = se;
+		req->ctr = 1;
+		list_init_req(req);
+		fuse_mutex_init(&req->lock);
+	}
+
+	return req;
+}
+
+/* Send data. If *ch* is NULL, send via session master fd */
+static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
+			 struct iovec *iov, int count)
+{
+	struct fuse_out_header *out = iov[0].iov_base;
+
+	out->len = iov_length(iov, count);
+	if (se->debug) {
+		if (out->unique == 0) {
+			fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n",
+				out->error, out->len);
+		} else if (out->error) {
+			fuse_log(FUSE_LOG_DEBUG,
+				"   unique: %llu, error: %i (%s), outsize: %i\n",
+				(unsigned long long) out->unique, out->error,
+				strerror(-out->error), out->len);
+		} else {
+			fuse_log(FUSE_LOG_DEBUG,
+				"   unique: %llu, success, outsize: %i\n",
+				(unsigned long long) out->unique, out->len);
+		}
+	}
+
+	ssize_t res = writev(ch ? ch->fd : se->fd,
+			     iov, count);
+	int err = errno;
+
+	if (res == -1) {
+		assert(se != NULL);
+
+		/* ENOENT means the operation was interrupted */
+		if (!fuse_session_exited(se) && err != ENOENT)
+			perror("fuse: writing device");
+		return -err;
+	}
+
+	return 0;
+}
+
+
+int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
+			       int count)
+{
+	struct fuse_out_header out;
+
+	if (error <= -1000 || error > 0) {
+		fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",	error);
+		error = -ERANGE;
+	}
+
+	out.unique = req->unique;
+	out.error = error;
+
+	iov[0].iov_base = &out;
+	iov[0].iov_len = sizeof(struct fuse_out_header);
+
+	return fuse_send_msg(req->se, req->ch, iov, count);
+}
+
+static int send_reply_iov(fuse_req_t req, int error, struct iovec *iov,
+			  int count)
+{
+	int res;
+
+	res = fuse_send_reply_iov_nofree(req, error, iov, count);
+	fuse_free_req(req);
+	return res;
+}
+
+static int send_reply(fuse_req_t req, int error, const void *arg,
+		      size_t argsize)
+{
+	struct iovec iov[2];
+	int count = 1;
+	if (argsize) {
+		iov[1].iov_base = (void *) arg;
+		iov[1].iov_len = argsize;
+		count++;
+	}
+	return send_reply_iov(req, error, iov, count);
+}
+
+int fuse_reply_iov(fuse_req_t req, const struct iovec *iov, int count)
+{
+	int res;
+	struct iovec *padded_iov;
+
+	padded_iov = malloc((count + 1) * sizeof(struct iovec));
+	if (padded_iov == NULL)
+		return fuse_reply_err(req, ENOMEM);
+
+	memcpy(padded_iov + 1, iov, count * sizeof(struct iovec));
+	count++;
+
+	res = send_reply_iov(req, 0, padded_iov, count);
+	free(padded_iov);
+
+	return res;
+}
+
+
+/* `buf` is allowed to be empty so that the proper size may be
+   allocated by the caller */
+size_t fuse_add_direntry(fuse_req_t req, char *buf, size_t bufsize,
+			 const char *name, const struct stat *stbuf, off_t off)
+{
+	(void)req;
+	size_t namelen;
+	size_t entlen;
+	size_t entlen_padded;
+	struct fuse_dirent *dirent;
+
+	namelen = strlen(name);
+	entlen = FUSE_NAME_OFFSET + namelen;
+	entlen_padded = FUSE_DIRENT_ALIGN(entlen);
+
+	if ((buf == NULL) || (entlen_padded > bufsize))
+	  return entlen_padded;
+
+	dirent = (struct fuse_dirent*) buf;
+	dirent->ino = stbuf->st_ino;
+	dirent->off = off;
+	dirent->namelen = namelen;
+	dirent->type = (stbuf->st_mode & S_IFMT) >> 12;
+	memcpy(dirent->name, name, namelen);
+	memset(dirent->name + namelen, 0, entlen_padded - entlen);
+
+	return entlen_padded;
+}
+
+static void convert_statfs(const struct statvfs *stbuf,
+			   struct fuse_kstatfs *kstatfs)
+{
+	kstatfs->bsize	 = stbuf->f_bsize;
+	kstatfs->frsize	 = stbuf->f_frsize;
+	kstatfs->blocks	 = stbuf->f_blocks;
+	kstatfs->bfree	 = stbuf->f_bfree;
+	kstatfs->bavail	 = stbuf->f_bavail;
+	kstatfs->files	 = stbuf->f_files;
+	kstatfs->ffree	 = stbuf->f_ffree;
+	kstatfs->namelen = stbuf->f_namemax;
+}
+
+static int send_reply_ok(fuse_req_t req, const void *arg, size_t argsize)
+{
+	return send_reply(req, 0, arg, argsize);
+}
+
+int fuse_reply_err(fuse_req_t req, int err)
+{
+	return send_reply(req, -err, NULL, 0);
+}
+
+void fuse_reply_none(fuse_req_t req)
+{
+	fuse_free_req(req);
+}
+
+static unsigned long calc_timeout_sec(double t)
+{
+	if (t > (double) ULONG_MAX)
+		return ULONG_MAX;
+	else if (t < 0.0)
+		return 0;
+	else
+		return (unsigned long) t;
+}
+
+static unsigned int calc_timeout_nsec(double t)
+{
+	double f = t - (double) calc_timeout_sec(t);
+	if (f < 0.0)
+		return 0;
+	else if (f >= 0.999999999)
+		return 999999999;
+	else
+		return (unsigned int) (f * 1.0e9);
+}
+
+static void fill_entry(struct fuse_entry_out *arg,
+		       const struct fuse_entry_param *e)
+{
+	arg->nodeid = e->ino;
+	arg->generation = e->generation;
+	arg->entry_valid = calc_timeout_sec(e->entry_timeout);
+	arg->entry_valid_nsec = calc_timeout_nsec(e->entry_timeout);
+	arg->attr_valid = calc_timeout_sec(e->attr_timeout);
+	arg->attr_valid_nsec = calc_timeout_nsec(e->attr_timeout);
+	convert_stat(&e->attr, &arg->attr);
+}
+
+/* `buf` is allowed to be empty so that the proper size may be
+   allocated by the caller */
+size_t fuse_add_direntry_plus(fuse_req_t req, char *buf, size_t bufsize,
+			      const char *name,
+			      const struct fuse_entry_param *e, off_t off)
+{
+	(void)req;
+	size_t namelen;
+	size_t entlen;
+	size_t entlen_padded;
+
+	namelen = strlen(name);
+	entlen = FUSE_NAME_OFFSET_DIRENTPLUS + namelen;
+	entlen_padded = FUSE_DIRENT_ALIGN(entlen);
+	if ((buf == NULL) || (entlen_padded > bufsize))
+	  return entlen_padded;
+
+	struct fuse_direntplus *dp = (struct fuse_direntplus *) buf;
+	memset(&dp->entry_out, 0, sizeof(dp->entry_out));
+	fill_entry(&dp->entry_out, e);
+
+	struct fuse_dirent *dirent = &dp->dirent;
+	dirent->ino = e->attr.st_ino;
+	dirent->off = off;
+	dirent->namelen = namelen;
+	dirent->type = (e->attr.st_mode & S_IFMT) >> 12;
+	memcpy(dirent->name, name, namelen);
+	memset(dirent->name + namelen, 0, entlen_padded - entlen);
+
+	return entlen_padded;
+}
+
+static void fill_open(struct fuse_open_out *arg,
+		      const struct fuse_file_info *f)
+{
+	arg->fh = f->fh;
+	if (f->direct_io)
+		arg->open_flags |= FOPEN_DIRECT_IO;
+	if (f->keep_cache)
+		arg->open_flags |= FOPEN_KEEP_CACHE;
+	if (f->cache_readdir)
+		arg->open_flags |= FOPEN_CACHE_DIR;
+	if (f->nonseekable)
+		arg->open_flags |= FOPEN_NONSEEKABLE;
+}
+
+int fuse_reply_entry(fuse_req_t req, const struct fuse_entry_param *e)
+{
+	struct fuse_entry_out arg;
+	size_t size = req->se->conn.proto_minor < 9 ?
+		FUSE_COMPAT_ENTRY_OUT_SIZE : sizeof(arg);
+
+	/* before ABI 7.4 e->ino == 0 was invalid, only ENOENT meant
+	   negative entry */
+	if (!e->ino && req->se->conn.proto_minor < 4)
+		return fuse_reply_err(req, ENOENT);
+
+	memset(&arg, 0, sizeof(arg));
+	fill_entry(&arg, e);
+	return send_reply_ok(req, &arg, size);
+}
+
+int fuse_reply_create(fuse_req_t req, const struct fuse_entry_param *e,
+		      const struct fuse_file_info *f)
+{
+	char buf[sizeof(struct fuse_entry_out) + sizeof(struct fuse_open_out)];
+	size_t entrysize = req->se->conn.proto_minor < 9 ?
+		FUSE_COMPAT_ENTRY_OUT_SIZE : sizeof(struct fuse_entry_out);
+	struct fuse_entry_out *earg = (struct fuse_entry_out *) buf;
+	struct fuse_open_out *oarg = (struct fuse_open_out *) (buf + entrysize);
+
+	memset(buf, 0, sizeof(buf));
+	fill_entry(earg, e);
+	fill_open(oarg, f);
+	return send_reply_ok(req, buf,
+			     entrysize + sizeof(struct fuse_open_out));
+}
+
+int fuse_reply_attr(fuse_req_t req, const struct stat *attr,
+		    double attr_timeout)
+{
+	struct fuse_attr_out arg;
+	size_t size = req->se->conn.proto_minor < 9 ?
+		FUSE_COMPAT_ATTR_OUT_SIZE : sizeof(arg);
+
+	memset(&arg, 0, sizeof(arg));
+	arg.attr_valid = calc_timeout_sec(attr_timeout);
+	arg.attr_valid_nsec = calc_timeout_nsec(attr_timeout);
+	convert_stat(attr, &arg.attr);
+
+	return send_reply_ok(req, &arg, size);
+}
+
+int fuse_reply_readlink(fuse_req_t req, const char *linkname)
+{
+	return send_reply_ok(req, linkname, strlen(linkname));
+}
+
+int fuse_reply_open(fuse_req_t req, const struct fuse_file_info *f)
+{
+	struct fuse_open_out arg;
+
+	memset(&arg, 0, sizeof(arg));
+	fill_open(&arg, f);
+	return send_reply_ok(req, &arg, sizeof(arg));
+}
+
+int fuse_reply_write(fuse_req_t req, size_t count)
+{
+	struct fuse_write_out arg;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.size = count;
+
+	return send_reply_ok(req, &arg, sizeof(arg));
+}
+
+int fuse_reply_buf(fuse_req_t req, const char *buf, size_t size)
+{
+	return send_reply_ok(req, buf, size);
+}
+
+static int fuse_send_data_iov_fallback(struct fuse_session *se,
+				       struct fuse_chan *ch,
+				       struct iovec *iov, int iov_count,
+				       struct fuse_bufvec *buf,
+				       size_t len)
+{
+	struct fuse_bufvec mem_buf = FUSE_BUFVEC_INIT(len);
+	void *mbuf;
+	int res;
+
+	/* Optimize common case */
+	if (buf->count == 1 && buf->idx == 0 && buf->off == 0 &&
+	    !(buf->buf[0].flags & FUSE_BUF_IS_FD)) {
+		/* FIXME: also avoid memory copy if there are multiple buffers
+		   but none of them contain an fd */
+
+		iov[iov_count].iov_base = buf->buf[0].mem;
+		iov[iov_count].iov_len = len;
+		iov_count++;
+		return fuse_send_msg(se, ch, iov, iov_count);
+	}
+
+	res = posix_memalign(&mbuf, pagesize, len);
+	if (res != 0)
+		return res;
+
+	mem_buf.buf[0].mem = mbuf;
+	res = fuse_buf_copy(&mem_buf, buf, 0);
+	if (res < 0) {
+		free(mbuf);
+		return -res;
+	}
+	len = res;
+
+	iov[iov_count].iov_base = mbuf;
+	iov[iov_count].iov_len = len;
+	iov_count++;
+	res = fuse_send_msg(se, ch, iov, iov_count);
+	free(mbuf);
+
+	return res;
+}
+
+struct fuse_ll_pipe {
+	size_t size;
+	int can_grow;
+	int pipe[2];
+};
+
+static void fuse_ll_pipe_free(struct fuse_ll_pipe *llp)
+{
+	close(llp->pipe[0]);
+	close(llp->pipe[1]);
+	free(llp);
+}
+
+#ifdef HAVE_SPLICE
+#if !defined(HAVE_PIPE2) || !defined(O_CLOEXEC)
+static int fuse_pipe(int fds[2])
+{
+	int rv = pipe(fds);
+
+	if (rv == -1)
+		return rv;
+
+	if (fcntl(fds[0], F_SETFL, O_NONBLOCK) == -1 ||
+	    fcntl(fds[1], F_SETFL, O_NONBLOCK) == -1 ||
+	    fcntl(fds[0], F_SETFD, FD_CLOEXEC) == -1 ||
+	    fcntl(fds[1], F_SETFD, FD_CLOEXEC) == -1) {
+		close(fds[0]);
+		close(fds[1]);
+		rv = -1;
+	}
+	return rv;
+}
+#else
+static int fuse_pipe(int fds[2])
+{
+	return pipe2(fds, O_CLOEXEC | O_NONBLOCK);
+}
+#endif
+
+static struct fuse_ll_pipe *fuse_ll_get_pipe(struct fuse_session *se)
+{
+	struct fuse_ll_pipe *llp = pthread_getspecific(se->pipe_key);
+	if (llp == NULL) {
+		int res;
+
+		llp = malloc(sizeof(struct fuse_ll_pipe));
+		if (llp == NULL)
+			return NULL;
+
+		res = fuse_pipe(llp->pipe);
+		if (res == -1) {
+			free(llp);
+			return NULL;
+		}
+
+		/*
+		 *the default size is 16 pages on linux
+		 */
+		llp->size = pagesize * 16;
+		llp->can_grow = 1;
+
+		pthread_setspecific(se->pipe_key, llp);
+	}
+
+	return llp;
+}
+#endif
+
+static void fuse_ll_clear_pipe(struct fuse_session *se)
+{
+	struct fuse_ll_pipe *llp = pthread_getspecific(se->pipe_key);
+	if (llp) {
+		pthread_setspecific(se->pipe_key, NULL);
+		fuse_ll_pipe_free(llp);
+	}
+}
+
+#if defined(HAVE_SPLICE) && defined(HAVE_VMSPLICE)
+static int read_back(int fd, char *buf, size_t len)
+{
+	int res;
+
+	res = read(fd, buf, len);
+	if (res == -1) {
+		fuse_log(FUSE_LOG_ERR, "fuse: internal error: failed to read back from pipe: %s\n", strerror(errno));
+		return -EIO;
+	}
+	if (res != len) {
+		fuse_log(FUSE_LOG_ERR, "fuse: internal error: short read back from pipe: %i from %zi\n", res, len);
+		return -EIO;
+	}
+	return 0;
+}
+
+static int grow_pipe_to_max(int pipefd)
+{
+	int max;
+	int res;
+	int maxfd;
+	char buf[32];
+
+	maxfd = open("/proc/sys/fs/pipe-max-size", O_RDONLY);
+	if (maxfd < 0)
+		return -errno;
+
+	res = read(maxfd, buf, sizeof(buf) - 1);
+	if (res < 0) {
+		int saved_errno;
+
+		saved_errno = errno;
+		close(maxfd);
+		return -saved_errno;
+	}
+	close(maxfd);
+	buf[res] = '\0';
+
+	max = atoi(buf);
+	res = fcntl(pipefd, F_SETPIPE_SZ, max);
+	if (res < 0)
+		return -errno;
+	return max;
+}
+
+static int fuse_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
+			       struct iovec *iov, int iov_count,
+			       struct fuse_bufvec *buf, unsigned int flags)
+{
+	int res;
+	size_t len = fuse_buf_size(buf);
+	struct fuse_out_header *out = iov[0].iov_base;
+	struct fuse_ll_pipe *llp;
+	int splice_flags;
+	size_t pipesize;
+	size_t total_fd_size;
+	size_t idx;
+	size_t headerlen;
+	struct fuse_bufvec pipe_buf = FUSE_BUFVEC_INIT(len);
+
+	if (se->broken_splice_nonblock)
+		goto fallback;
+
+	if (flags & FUSE_BUF_NO_SPLICE)
+		goto fallback;
+
+	total_fd_size = 0;
+	for (idx = buf->idx; idx < buf->count; idx++) {
+		if (buf->buf[idx].flags & FUSE_BUF_IS_FD) {
+			total_fd_size = buf->buf[idx].size;
+			if (idx == buf->idx)
+				total_fd_size -= buf->off;
+		}
+	}
+	if (total_fd_size < 2 * pagesize)
+		goto fallback;
+
+	if (se->conn.proto_minor < 14 ||
+	    !(se->conn.want & FUSE_CAP_SPLICE_WRITE))
+		goto fallback;
+
+	llp = fuse_ll_get_pipe(se);
+	if (llp == NULL)
+		goto fallback;
+
+
+	headerlen = iov_length(iov, iov_count);
+
+	out->len = headerlen + len;
+
+	/*
+	 * Heuristic for the required pipe size, does not work if the
+	 * source contains less than page size fragments
+	 */
+	pipesize = pagesize * (iov_count + buf->count + 1) + out->len;
+
+	if (llp->size < pipesize) {
+		if (llp->can_grow) {
+			res = fcntl(llp->pipe[0], F_SETPIPE_SZ, pipesize);
+			if (res == -1) {
+				res = grow_pipe_to_max(llp->pipe[0]);
+				if (res > 0)
+					llp->size = res;
+				llp->can_grow = 0;
+				goto fallback;
+			}
+			llp->size = res;
+		}
+		if (llp->size < pipesize)
+			goto fallback;
+	}
+
+
+	res = vmsplice(llp->pipe[1], iov, iov_count, SPLICE_F_NONBLOCK);
+	if (res == -1)
+		goto fallback;
+
+	if (res != headerlen) {
+		res = -EIO;
+		fuse_log(FUSE_LOG_ERR, "fuse: short vmsplice to pipe: %u/%zu\n", res,
+			headerlen);
+		goto clear_pipe;
+	}
+
+	pipe_buf.buf[0].flags = FUSE_BUF_IS_FD;
+	pipe_buf.buf[0].fd = llp->pipe[1];
+
+	res = fuse_buf_copy(&pipe_buf, buf,
+			    FUSE_BUF_FORCE_SPLICE | FUSE_BUF_SPLICE_NONBLOCK);
+	if (res < 0) {
+		if (res == -EAGAIN || res == -EINVAL) {
+			/*
+			 * Should only get EAGAIN on kernels with
+			 * broken SPLICE_F_NONBLOCK support (<=
+			 * 2.6.35) where this error or a short read is
+			 * returned even if the pipe itself is not
+			 * full
+			 *
+			 * EINVAL might mean that splice can't handle
+			 * this combination of input and output.
+			 */
+			if (res == -EAGAIN)
+				se->broken_splice_nonblock = 1;
+
+			pthread_setspecific(se->pipe_key, NULL);
+			fuse_ll_pipe_free(llp);
+			goto fallback;
+		}
+		res = -res;
+		goto clear_pipe;
+	}
+
+	if (res != 0 && res < len) {
+		struct fuse_bufvec mem_buf = FUSE_BUFVEC_INIT(len);
+		void *mbuf;
+		size_t now_len = res;
+		/*
+		 * For regular files a short count is either
+		 *  1) due to EOF, or
+		 *  2) because of broken SPLICE_F_NONBLOCK (see above)
+		 *
+		 * For other inputs it's possible that we overflowed
+		 * the pipe because of small buffer fragments.
+		 */
+
+		res = posix_memalign(&mbuf, pagesize, len);
+		if (res != 0)
+			goto clear_pipe;
+
+		mem_buf.buf[0].mem = mbuf;
+		mem_buf.off = now_len;
+		res = fuse_buf_copy(&mem_buf, buf, 0);
+		if (res > 0) {
+			char *tmpbuf;
+			size_t extra_len = res;
+			/*
+			 * Trickiest case: got more data.  Need to get
+			 * back the data from the pipe and then fall
+			 * back to regular write.
+			 */
+			tmpbuf = malloc(headerlen);
+			if (tmpbuf == NULL) {
+				free(mbuf);
+				res = ENOMEM;
+				goto clear_pipe;
+			}
+			res = read_back(llp->pipe[0], tmpbuf, headerlen);
+			free(tmpbuf);
+			if (res != 0) {
+				free(mbuf);
+				goto clear_pipe;
+			}
+			res = read_back(llp->pipe[0], mbuf, now_len);
+			if (res != 0) {
+				free(mbuf);
+				goto clear_pipe;
+			}
+			len = now_len + extra_len;
+			iov[iov_count].iov_base = mbuf;
+			iov[iov_count].iov_len = len;
+			iov_count++;
+			res = fuse_send_msg(se, ch, iov, iov_count);
+			free(mbuf);
+			return res;
+		}
+		free(mbuf);
+		res = now_len;
+	}
+	len = res;
+	out->len = headerlen + len;
+
+	if (se->debug) {
+		fuse_log(FUSE_LOG_DEBUG,
+			"   unique: %llu, success, outsize: %i (splice)\n",
+			(unsigned long long) out->unique, out->len);
+	}
+
+	splice_flags = 0;
+	if ((flags & FUSE_BUF_SPLICE_MOVE) &&
+	    (se->conn.want & FUSE_CAP_SPLICE_MOVE))
+		splice_flags |= SPLICE_F_MOVE;
+
+	res = splice(llp->pipe[0], NULL, ch ? ch->fd : se->fd,
+		     NULL, out->len, splice_flags);
+	if (res == -1) {
+		res = -errno;
+		perror("fuse: splice from pipe");
+		goto clear_pipe;
+	}
+	if (res != out->len) {
+		res = -EIO;
+		fuse_log(FUSE_LOG_ERR, "fuse: short splice from pipe: %u/%u\n",
+			res, out->len);
+		goto clear_pipe;
+	}
+	return 0;
+
+clear_pipe:
+	fuse_ll_clear_pipe(se);
+	return res;
+
+fallback:
+	return fuse_send_data_iov_fallback(se, ch, iov, iov_count, buf, len);
+}
+#else
+static int fuse_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
+			       struct iovec *iov, int iov_count,
+			       struct fuse_bufvec *buf, unsigned int flags)
+{
+	size_t len = fuse_buf_size(buf);
+	(void) flags;
+
+	return fuse_send_data_iov_fallback(se, ch, iov, iov_count, buf, len);
+}
+#endif
+
+int fuse_reply_data(fuse_req_t req, struct fuse_bufvec *bufv,
+		    enum fuse_buf_copy_flags flags)
+{
+	struct iovec iov[2];
+	struct fuse_out_header out;
+	int res;
+
+	iov[0].iov_base = &out;
+	iov[0].iov_len = sizeof(struct fuse_out_header);
+
+	out.unique = req->unique;
+	out.error = 0;
+
+	res = fuse_send_data_iov(req->se, req->ch, iov, 1, bufv, flags);
+	if (res <= 0) {
+		fuse_free_req(req);
+		return res;
+	} else {
+		return fuse_reply_err(req, res);
+	}
+}
+
+int fuse_reply_statfs(fuse_req_t req, const struct statvfs *stbuf)
+{
+	struct fuse_statfs_out arg;
+	size_t size = req->se->conn.proto_minor < 4 ?
+		FUSE_COMPAT_STATFS_SIZE : sizeof(arg);
+
+	memset(&arg, 0, sizeof(arg));
+	convert_statfs(stbuf, &arg.st);
+
+	return send_reply_ok(req, &arg, size);
+}
+
+int fuse_reply_xattr(fuse_req_t req, size_t count)
+{
+	struct fuse_getxattr_out arg;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.size = count;
+
+	return send_reply_ok(req, &arg, sizeof(arg));
+}
+
+int fuse_reply_lock(fuse_req_t req, const struct flock *lock)
+{
+	struct fuse_lk_out arg;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.lk.type = lock->l_type;
+	if (lock->l_type != F_UNLCK) {
+		arg.lk.start = lock->l_start;
+		if (lock->l_len == 0)
+			arg.lk.end = OFFSET_MAX;
+		else
+			arg.lk.end = lock->l_start + lock->l_len - 1;
+	}
+	arg.lk.pid = lock->l_pid;
+	return send_reply_ok(req, &arg, sizeof(arg));
+}
+
+int fuse_reply_bmap(fuse_req_t req, uint64_t idx)
+{
+	struct fuse_bmap_out arg;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.block = idx;
+
+	return send_reply_ok(req, &arg, sizeof(arg));
+}
+
+static struct fuse_ioctl_iovec *fuse_ioctl_iovec_copy(const struct iovec *iov,
+						      size_t count)
+{
+	struct fuse_ioctl_iovec *fiov;
+	size_t i;
+
+	fiov = malloc(sizeof(fiov[0]) * count);
+	if (!fiov)
+		return NULL;
+
+	for (i = 0; i < count; i++) {
+		fiov[i].base = (uintptr_t) iov[i].iov_base;
+		fiov[i].len = iov[i].iov_len;
+	}
+
+	return fiov;
+}
+
+int fuse_reply_ioctl_retry(fuse_req_t req,
+			   const struct iovec *in_iov, size_t in_count,
+			   const struct iovec *out_iov, size_t out_count)
+{
+	struct fuse_ioctl_out arg;
+	struct fuse_ioctl_iovec *in_fiov = NULL;
+	struct fuse_ioctl_iovec *out_fiov = NULL;
+	struct iovec iov[4];
+	size_t count = 1;
+	int res;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.flags |= FUSE_IOCTL_RETRY;
+	arg.in_iovs = in_count;
+	arg.out_iovs = out_count;
+	iov[count].iov_base = &arg;
+	iov[count].iov_len = sizeof(arg);
+	count++;
+
+	if (req->se->conn.proto_minor < 16) {
+		if (in_count) {
+			iov[count].iov_base = (void *)in_iov;
+			iov[count].iov_len = sizeof(in_iov[0]) * in_count;
+			count++;
+		}
+
+		if (out_count) {
+			iov[count].iov_base = (void *)out_iov;
+			iov[count].iov_len = sizeof(out_iov[0]) * out_count;
+			count++;
+		}
+	} else {
+		/* Can't handle non-compat 64bit ioctls on 32bit */
+		if (sizeof(void *) == 4 && req->ioctl_64bit) {
+			res = fuse_reply_err(req, EINVAL);
+			goto out;
+		}
+
+		if (in_count) {
+			in_fiov = fuse_ioctl_iovec_copy(in_iov, in_count);
+			if (!in_fiov)
+				goto enomem;
+
+			iov[count].iov_base = (void *)in_fiov;
+			iov[count].iov_len = sizeof(in_fiov[0]) * in_count;
+			count++;
+		}
+		if (out_count) {
+			out_fiov = fuse_ioctl_iovec_copy(out_iov, out_count);
+			if (!out_fiov)
+				goto enomem;
+
+			iov[count].iov_base = (void *)out_fiov;
+			iov[count].iov_len = sizeof(out_fiov[0]) * out_count;
+			count++;
+		}
+	}
+
+	res = send_reply_iov(req, 0, iov, count);
+out:
+	free(in_fiov);
+	free(out_fiov);
+
+	return res;
+
+enomem:
+	res = fuse_reply_err(req, ENOMEM);
+	goto out;
+}
+
+int fuse_reply_ioctl(fuse_req_t req, int result, const void *buf, size_t size)
+{
+	struct fuse_ioctl_out arg;
+	struct iovec iov[3];
+	size_t count = 1;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.result = result;
+	iov[count].iov_base = &arg;
+	iov[count].iov_len = sizeof(arg);
+	count++;
+
+	if (size) {
+		iov[count].iov_base = (char *) buf;
+		iov[count].iov_len = size;
+		count++;
+	}
+
+	return send_reply_iov(req, 0, iov, count);
+}
+
+int fuse_reply_ioctl_iov(fuse_req_t req, int result, const struct iovec *iov,
+			 int count)
+{
+	struct iovec *padded_iov;
+	struct fuse_ioctl_out arg;
+	int res;
+
+	padded_iov = malloc((count + 2) * sizeof(struct iovec));
+	if (padded_iov == NULL)
+		return fuse_reply_err(req, ENOMEM);
+
+	memset(&arg, 0, sizeof(arg));
+	arg.result = result;
+	padded_iov[1].iov_base = &arg;
+	padded_iov[1].iov_len = sizeof(arg);
+
+	memcpy(&padded_iov[2], iov, count * sizeof(struct iovec));
+
+	res = send_reply_iov(req, 0, padded_iov, count + 2);
+	free(padded_iov);
+
+	return res;
+}
+
+int fuse_reply_poll(fuse_req_t req, unsigned revents)
+{
+	struct fuse_poll_out arg;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.revents = revents;
+
+	return send_reply_ok(req, &arg, sizeof(arg));
+}
+
+int fuse_reply_lseek(fuse_req_t req, off_t off)
+{
+	struct fuse_lseek_out arg;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.offset = off;
+
+	return send_reply_ok(req, &arg, sizeof(arg));
+}
+
+static void do_lookup(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	char *name = (char *) inarg;
+
+	if (req->se->op.lookup)
+		req->se->op.lookup(req, nodeid, name);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_forget(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_forget_in *arg = (struct fuse_forget_in *) inarg;
+
+	if (req->se->op.forget)
+		req->se->op.forget(req, nodeid, arg->nlookup);
+	else
+		fuse_reply_none(req);
+}
+
+static void do_batch_forget(fuse_req_t req, fuse_ino_t nodeid,
+			    const void *inarg)
+{
+	struct fuse_batch_forget_in *arg = (void *) inarg;
+	struct fuse_forget_one *param = (void *) PARAM(arg);
+	unsigned int i;
+
+	(void) nodeid;
+
+	if (req->se->op.forget_multi) {
+		req->se->op.forget_multi(req, arg->count,
+				     (struct fuse_forget_data *) param);
+	} else if (req->se->op.forget) {
+		for (i = 0; i < arg->count; i++) {
+			struct fuse_forget_one *forget = &param[i];
+			struct fuse_req *dummy_req;
+
+			dummy_req = fuse_ll_alloc_req(req->se);
+			if (dummy_req == NULL)
+				break;
+
+			dummy_req->unique = req->unique;
+			dummy_req->ctx = req->ctx;
+			dummy_req->ch = NULL;
+
+			req->se->op.forget(dummy_req, forget->nodeid,
+					  forget->nlookup);
+		}
+		fuse_reply_none(req);
+	} else {
+		fuse_reply_none(req);
+	}
+}
+
+static void do_getattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_file_info *fip = NULL;
+	struct fuse_file_info fi;
+
+	if (req->se->conn.proto_minor >= 9) {
+		struct fuse_getattr_in *arg = (struct fuse_getattr_in *) inarg;
+
+		if (arg->getattr_flags & FUSE_GETATTR_FH) {
+			memset(&fi, 0, sizeof(fi));
+			fi.fh = arg->fh;
+			fip = &fi;
+		}
+	}
+
+	if (req->se->op.getattr)
+		req->se->op.getattr(req, nodeid, fip);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_setattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_setattr_in *arg = (struct fuse_setattr_in *) inarg;
+
+	if (req->se->op.setattr) {
+		struct fuse_file_info *fi = NULL;
+		struct fuse_file_info fi_store;
+		struct stat stbuf;
+		memset(&stbuf, 0, sizeof(stbuf));
+		convert_attr(arg, &stbuf);
+		if (arg->valid & FATTR_FH) {
+			arg->valid &= ~FATTR_FH;
+			memset(&fi_store, 0, sizeof(fi_store));
+			fi = &fi_store;
+			fi->fh = arg->fh;
+		}
+		arg->valid &=
+			FUSE_SET_ATTR_MODE	|
+			FUSE_SET_ATTR_UID	|
+			FUSE_SET_ATTR_GID	|
+			FUSE_SET_ATTR_SIZE	|
+			FUSE_SET_ATTR_ATIME	|
+			FUSE_SET_ATTR_MTIME	|
+			FUSE_SET_ATTR_ATIME_NOW	|
+			FUSE_SET_ATTR_MTIME_NOW |
+			FUSE_SET_ATTR_CTIME;
+
+		req->se->op.setattr(req, nodeid, &stbuf, arg->valid, fi);
+	} else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_access(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_access_in *arg = (struct fuse_access_in *) inarg;
+
+	if (req->se->op.access)
+		req->se->op.access(req, nodeid, arg->mask);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_readlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	(void) inarg;
+
+	if (req->se->op.readlink)
+		req->se->op.readlink(req, nodeid);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_mknod(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_mknod_in *arg = (struct fuse_mknod_in *) inarg;
+	char *name = PARAM(arg);
+
+	if (req->se->conn.proto_minor >= 12)
+		req->ctx.umask = arg->umask;
+	else
+		name = (char *) inarg + FUSE_COMPAT_MKNOD_IN_SIZE;
+
+	if (req->se->op.mknod)
+		req->se->op.mknod(req, nodeid, name, arg->mode, arg->rdev);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_mkdir_in *arg = (struct fuse_mkdir_in *) inarg;
+
+	if (req->se->conn.proto_minor >= 12)
+		req->ctx.umask = arg->umask;
+
+	if (req->se->op.mkdir)
+		req->se->op.mkdir(req, nodeid, PARAM(arg), arg->mode);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_unlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	char *name = (char *) inarg;
+
+	if (req->se->op.unlink)
+		req->se->op.unlink(req, nodeid, name);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_rmdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	char *name = (char *) inarg;
+
+	if (req->se->op.rmdir)
+		req->se->op.rmdir(req, nodeid, name);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_symlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	char *name = (char *) inarg;
+	char *linkname = ((char *) inarg) + strlen((char *) inarg) + 1;
+
+	if (req->se->op.symlink)
+		req->se->op.symlink(req, linkname, nodeid, name);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_rename(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_rename_in *arg = (struct fuse_rename_in *) inarg;
+	char *oldname = PARAM(arg);
+	char *newname = oldname + strlen(oldname) + 1;
+
+	if (req->se->op.rename)
+		req->se->op.rename(req, nodeid, oldname, arg->newdir, newname,
+				  0);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_rename2(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_rename2_in *arg = (struct fuse_rename2_in *) inarg;
+	char *oldname = PARAM(arg);
+	char *newname = oldname + strlen(oldname) + 1;
+
+	if (req->se->op.rename)
+		req->se->op.rename(req, nodeid, oldname, arg->newdir, newname,
+				  arg->flags);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_link(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_link_in *arg = (struct fuse_link_in *) inarg;
+
+	if (req->se->op.link)
+		req->se->op.link(req, arg->oldnodeid, nodeid, PARAM(arg));
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_create(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_create_in *arg = (struct fuse_create_in *) inarg;
+
+	if (req->se->op.create) {
+		struct fuse_file_info fi;
+		char *name = PARAM(arg);
+
+		memset(&fi, 0, sizeof(fi));
+		fi.flags = arg->flags;
+
+		if (req->se->conn.proto_minor >= 12)
+			req->ctx.umask = arg->umask;
+		else
+			name = (char *) inarg + sizeof(struct fuse_open_in);
+
+		req->se->op.create(req, nodeid, name, arg->mode, &fi);
+	} else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_open(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_open_in *arg = (struct fuse_open_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.flags = arg->flags;
+
+	if (req->se->op.open)
+		req->se->op.open(req, nodeid, &fi);
+	else
+		fuse_reply_open(req, &fi);
+}
+
+static void do_read(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_read_in *arg = (struct fuse_read_in *) inarg;
+
+	if (req->se->op.read) {
+		struct fuse_file_info fi;
+
+		memset(&fi, 0, sizeof(fi));
+		fi.fh = arg->fh;
+		if (req->se->conn.proto_minor >= 9) {
+			fi.lock_owner = arg->lock_owner;
+			fi.flags = arg->flags;
+		}
+		req->se->op.read(req, nodeid, arg->size, arg->offset, &fi);
+	} else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_write_in *arg = (struct fuse_write_in *) inarg;
+	struct fuse_file_info fi;
+	char *param;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+	fi.writepage = (arg->write_flags & FUSE_WRITE_CACHE) != 0;
+
+	if (req->se->conn.proto_minor < 9) {
+		param = ((char *) arg) + FUSE_COMPAT_WRITE_IN_SIZE;
+	} else {
+		fi.lock_owner = arg->lock_owner;
+		fi.flags = arg->flags;
+		param = PARAM(arg);
+	}
+
+	if (req->se->op.write)
+		req->se->op.write(req, nodeid, param, arg->size,
+				 arg->offset, &fi);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
+			 const struct fuse_buf *ibuf)
+{
+	struct fuse_session *se = req->se;
+	struct fuse_bufvec bufv = {
+		.buf[0] = *ibuf,
+		.count = 1,
+	};
+	struct fuse_write_in *arg = (struct fuse_write_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+	fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
+
+	if (se->conn.proto_minor < 9) {
+		bufv.buf[0].mem = ((char *) arg) + FUSE_COMPAT_WRITE_IN_SIZE;
+		bufv.buf[0].size -= sizeof(struct fuse_in_header) +
+			FUSE_COMPAT_WRITE_IN_SIZE;
+		assert(!(bufv.buf[0].flags & FUSE_BUF_IS_FD));
+	} else {
+		fi.lock_owner = arg->lock_owner;
+		fi.flags = arg->flags;
+		if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD))
+			bufv.buf[0].mem = PARAM(arg);
+
+		bufv.buf[0].size -= sizeof(struct fuse_in_header) +
+			sizeof(struct fuse_write_in);
+	}
+	if (bufv.buf[0].size < arg->size) {
+		fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
+		fuse_reply_err(req, EIO);
+		goto out;
+	}
+	bufv.buf[0].size = arg->size;
+
+	se->op.write_buf(req, nodeid, &bufv, arg->offset, &fi);
+
+out:
+	/* Need to reset the pipe if ->write_buf() didn't consume all data */
+	if ((ibuf->flags & FUSE_BUF_IS_FD) && bufv.idx < bufv.count)
+		fuse_ll_clear_pipe(se);
+}
+
+static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_flush_in *arg = (struct fuse_flush_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+	fi.flush = 1;
+	if (req->se->conn.proto_minor >= 7)
+		fi.lock_owner = arg->lock_owner;
+
+	if (req->se->op.flush)
+		req->se->op.flush(req, nodeid, &fi);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_release(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_release_in *arg = (struct fuse_release_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.flags = arg->flags;
+	fi.fh = arg->fh;
+	if (req->se->conn.proto_minor >= 8) {
+		fi.flush = (arg->release_flags & FUSE_RELEASE_FLUSH) ? 1 : 0;
+		fi.lock_owner = arg->lock_owner;
+	}
+	if (arg->release_flags & FUSE_RELEASE_FLOCK_UNLOCK) {
+		fi.flock_release = 1;
+		fi.lock_owner = arg->lock_owner;
+	}
+
+	if (req->se->op.release)
+		req->se->op.release(req, nodeid, &fi);
+	else
+		fuse_reply_err(req, 0);
+}
+
+static void do_fsync(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_fsync_in *arg = (struct fuse_fsync_in *) inarg;
+	struct fuse_file_info fi;
+	int datasync = arg->fsync_flags & 1;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+
+	if (req->se->op.fsync)
+		req->se->op.fsync(req, nodeid, datasync, &fi);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_opendir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_open_in *arg = (struct fuse_open_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.flags = arg->flags;
+
+	if (req->se->op.opendir)
+		req->se->op.opendir(req, nodeid, &fi);
+	else
+		fuse_reply_open(req, &fi);
+}
+
+static void do_readdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_read_in *arg = (struct fuse_read_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+
+	if (req->se->op.readdir)
+		req->se->op.readdir(req, nodeid, arg->size, arg->offset, &fi);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_readdirplus(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_read_in *arg = (struct fuse_read_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+
+	if (req->se->op.readdirplus)
+		req->se->op.readdirplus(req, nodeid, arg->size, arg->offset, &fi);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_releasedir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_release_in *arg = (struct fuse_release_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.flags = arg->flags;
+	fi.fh = arg->fh;
+
+	if (req->se->op.releasedir)
+		req->se->op.releasedir(req, nodeid, &fi);
+	else
+		fuse_reply_err(req, 0);
+}
+
+static void do_fsyncdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_fsync_in *arg = (struct fuse_fsync_in *) inarg;
+	struct fuse_file_info fi;
+	int datasync = arg->fsync_flags & 1;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+
+	if (req->se->op.fsyncdir)
+		req->se->op.fsyncdir(req, nodeid, datasync, &fi);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_statfs(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	(void) nodeid;
+	(void) inarg;
+
+	if (req->se->op.statfs)
+		req->se->op.statfs(req, nodeid);
+	else {
+		struct statvfs buf = {
+			.f_namemax = 255,
+			.f_bsize = 512,
+		};
+		fuse_reply_statfs(req, &buf);
+	}
+}
+
+static void do_setxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_setxattr_in *arg = (struct fuse_setxattr_in *) inarg;
+	char *name = PARAM(arg);
+	char *value = name + strlen(name) + 1;
+
+	if (req->se->op.setxattr)
+		req->se->op.setxattr(req, nodeid, name, value, arg->size,
+				    arg->flags);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_getxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *) inarg;
+
+	if (req->se->op.getxattr)
+		req->se->op.getxattr(req, nodeid, PARAM(arg), arg->size);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_listxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *) inarg;
+
+	if (req->se->op.listxattr)
+		req->se->op.listxattr(req, nodeid, arg->size);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_removexattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	char *name = (char *) inarg;
+
+	if (req->se->op.removexattr)
+		req->se->op.removexattr(req, nodeid, name);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void convert_fuse_file_lock(struct fuse_file_lock *fl,
+				   struct flock *flock)
+{
+	memset(flock, 0, sizeof(struct flock));
+	flock->l_type = fl->type;
+	flock->l_whence = SEEK_SET;
+	flock->l_start = fl->start;
+	if (fl->end == OFFSET_MAX)
+		flock->l_len = 0;
+	else
+		flock->l_len = fl->end - fl->start + 1;
+	flock->l_pid = fl->pid;
+}
+
+static void do_getlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_lk_in *arg = (struct fuse_lk_in *) inarg;
+	struct fuse_file_info fi;
+	struct flock flock;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+	fi.lock_owner = arg->owner;
+
+	convert_fuse_file_lock(&arg->lk, &flock);
+	if (req->se->op.getlk)
+		req->se->op.getlk(req, nodeid, &fi, &flock);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_setlk_common(fuse_req_t req, fuse_ino_t nodeid,
+			    const void *inarg, int sleep)
+{
+	struct fuse_lk_in *arg = (struct fuse_lk_in *) inarg;
+	struct fuse_file_info fi;
+	struct flock flock;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+	fi.lock_owner = arg->owner;
+
+	if (arg->lk_flags & FUSE_LK_FLOCK) {
+		int op = 0;
+
+		switch (arg->lk.type) {
+		case F_RDLCK:
+			op = LOCK_SH;
+			break;
+		case F_WRLCK:
+			op = LOCK_EX;
+			break;
+		case F_UNLCK:
+			op = LOCK_UN;
+			break;
+		}
+		if (!sleep)
+			op |= LOCK_NB;
+
+		if (req->se->op.flock)
+			req->se->op.flock(req, nodeid, &fi, op);
+		else
+			fuse_reply_err(req, ENOSYS);
+	} else {
+		convert_fuse_file_lock(&arg->lk, &flock);
+		if (req->se->op.setlk)
+			req->se->op.setlk(req, nodeid, &fi, &flock, sleep);
+		else
+			fuse_reply_err(req, ENOSYS);
+	}
+}
+
+static void do_setlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	do_setlk_common(req, nodeid, inarg, 0);
+}
+
+static void do_setlkw(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	do_setlk_common(req, nodeid, inarg, 1);
+}
+
+static int find_interrupted(struct fuse_session *se, struct fuse_req *req)
+{
+	struct fuse_req *curr;
+
+	for (curr = se->list.next; curr != &se->list; curr = curr->next) {
+		if (curr->unique == req->u.i.unique) {
+			fuse_interrupt_func_t func;
+			void *data;
+
+			curr->ctr++;
+			pthread_mutex_unlock(&se->lock);
+
+			/* Ugh, ugly locking */
+			pthread_mutex_lock(&curr->lock);
+			pthread_mutex_lock(&se->lock);
+			curr->interrupted = 1;
+			func = curr->u.ni.func;
+			data = curr->u.ni.data;
+			pthread_mutex_unlock(&se->lock);
+			if (func)
+				func(curr, data);
+			pthread_mutex_unlock(&curr->lock);
+
+			pthread_mutex_lock(&se->lock);
+			curr->ctr--;
+			if (!curr->ctr)
+				destroy_req(curr);
+
+			return 1;
+		}
+	}
+	for (curr = se->interrupts.next; curr != &se->interrupts;
+	     curr = curr->next) {
+		if (curr->u.i.unique == req->u.i.unique)
+			return 1;
+	}
+	return 0;
+}
+
+static void do_interrupt(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_interrupt_in *arg = (struct fuse_interrupt_in *) inarg;
+	struct fuse_session *se = req->se;
+
+	(void) nodeid;
+	if (se->debug)
+		fuse_log(FUSE_LOG_DEBUG, "INTERRUPT: %llu\n",
+			(unsigned long long) arg->unique);
+
+	req->u.i.unique = arg->unique;
+
+	pthread_mutex_lock(&se->lock);
+	if (find_interrupted(se, req))
+		destroy_req(req);
+	else
+		list_add_req(req, &se->interrupts);
+	pthread_mutex_unlock(&se->lock);
+}
+
+static struct fuse_req *check_interrupt(struct fuse_session *se,
+					struct fuse_req *req)
+{
+	struct fuse_req *curr;
+
+	for (curr = se->interrupts.next; curr != &se->interrupts;
+	     curr = curr->next) {
+		if (curr->u.i.unique == req->unique) {
+			req->interrupted = 1;
+			list_del_req(curr);
+			free(curr);
+			return NULL;
+		}
+	}
+	curr = se->interrupts.next;
+	if (curr != &se->interrupts) {
+		list_del_req(curr);
+		list_init_req(curr);
+		return curr;
+	} else
+		return NULL;
+}
+
+static void do_bmap(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_bmap_in *arg = (struct fuse_bmap_in *) inarg;
+
+	if (req->se->op.bmap)
+		req->se->op.bmap(req, nodeid, arg->blocksize, arg->block);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_ioctl(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_ioctl_in *arg = (struct fuse_ioctl_in *) inarg;
+	unsigned int flags = arg->flags;
+	void *in_buf = arg->in_size ? PARAM(arg) : NULL;
+	struct fuse_file_info fi;
+
+	if (flags & FUSE_IOCTL_DIR &&
+	    !(req->se->conn.want & FUSE_CAP_IOCTL_DIR)) {
+		fuse_reply_err(req, ENOTTY);
+		return;
+	}
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+
+	if (sizeof(void *) == 4 && req->se->conn.proto_minor >= 16 &&
+	    !(flags & FUSE_IOCTL_32BIT)) {
+		req->ioctl_64bit = 1;
+	}
+
+	if (req->se->op.ioctl)
+		req->se->op.ioctl(req, nodeid, arg->cmd,
+				 (void *)(uintptr_t)arg->arg, &fi, flags,
+				 in_buf, arg->in_size, arg->out_size);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+void fuse_pollhandle_destroy(struct fuse_pollhandle *ph)
+{
+	free(ph);
+}
+
+static void do_poll(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_poll_in *arg = (struct fuse_poll_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+	fi.poll_events = arg->events;
+
+	if (req->se->op.poll) {
+		struct fuse_pollhandle *ph = NULL;
+
+		if (arg->flags & FUSE_POLL_SCHEDULE_NOTIFY) {
+			ph = malloc(sizeof(struct fuse_pollhandle));
+			if (ph == NULL) {
+				fuse_reply_err(req, ENOMEM);
+				return;
+			}
+			ph->kh = arg->kh;
+			ph->se = req->se;
+		}
+
+		req->se->op.poll(req, nodeid, &fi, ph);
+	} else {
+		fuse_reply_err(req, ENOSYS);
+	}
+}
+
+static void do_fallocate(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_fallocate_in *arg = (struct fuse_fallocate_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+
+	if (req->se->op.fallocate)
+		req->se->op.fallocate(req, nodeid, arg->mode, arg->offset, arg->length, &fi);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_copy_file_range(fuse_req_t req, fuse_ino_t nodeid_in, const void *inarg)
+{
+	struct fuse_copy_file_range_in *arg = (struct fuse_copy_file_range_in *) inarg;
+	struct fuse_file_info fi_in, fi_out;
+
+	memset(&fi_in, 0, sizeof(fi_in));
+	fi_in.fh = arg->fh_in;
+
+	memset(&fi_out, 0, sizeof(fi_out));
+	fi_out.fh = arg->fh_out;
+
+
+	if (req->se->op.copy_file_range)
+		req->se->op.copy_file_range(req, nodeid_in, arg->off_in,
+					    &fi_in, arg->nodeid_out,
+					    arg->off_out, &fi_out, arg->len,
+					    arg->flags);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_lseek(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_lseek_in *arg = (struct fuse_lseek_in *) inarg;
+	struct fuse_file_info fi;
+
+	memset(&fi, 0, sizeof(fi));
+	fi.fh = arg->fh;
+
+	if (req->se->op.lseek)
+		req->se->op.lseek(req, nodeid, arg->offset, arg->whence, &fi);
+	else
+		fuse_reply_err(req, ENOSYS);
+}
+
+static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_init_in *arg = (struct fuse_init_in *) inarg;
+	struct fuse_init_out outarg;
+	struct fuse_session *se = req->se;
+	size_t bufsize = se->bufsize;
+	size_t outargsize = sizeof(outarg);
+
+	(void) nodeid;
+	if (se->debug) {
+		fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
+		if (arg->major == 7 && arg->minor >= 6) {
+			fuse_log(FUSE_LOG_DEBUG, "flags=0x%08x\n", arg->flags);
+			fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n",
+				arg->max_readahead);
+		}
+	}
+	se->conn.proto_major = arg->major;
+	se->conn.proto_minor = arg->minor;
+	se->conn.capable = 0;
+	se->conn.want = 0;
+
+	memset(&outarg, 0, sizeof(outarg));
+	outarg.major = FUSE_KERNEL_VERSION;
+	outarg.minor = FUSE_KERNEL_MINOR_VERSION;
+
+	if (arg->major < 7) {
+		fuse_log(FUSE_LOG_ERR, "fuse: unsupported protocol version: %u.%u\n",
+			arg->major, arg->minor);
+		fuse_reply_err(req, EPROTO);
+		return;
+	}
+
+	if (arg->major > 7) {
+		/* Wait for a second INIT request with a 7.X version */
+		send_reply_ok(req, &outarg, sizeof(outarg));
+		return;
+	}
+
+	if (arg->minor >= 6) {
+		if (arg->max_readahead < se->conn.max_readahead)
+			se->conn.max_readahead = arg->max_readahead;
+		if (arg->flags & FUSE_ASYNC_READ)
+			se->conn.capable |= FUSE_CAP_ASYNC_READ;
+		if (arg->flags & FUSE_POSIX_LOCKS)
+			se->conn.capable |= FUSE_CAP_POSIX_LOCKS;
+		if (arg->flags & FUSE_ATOMIC_O_TRUNC)
+			se->conn.capable |= FUSE_CAP_ATOMIC_O_TRUNC;
+		if (arg->flags & FUSE_EXPORT_SUPPORT)
+			se->conn.capable |= FUSE_CAP_EXPORT_SUPPORT;
+		if (arg->flags & FUSE_DONT_MASK)
+			se->conn.capable |= FUSE_CAP_DONT_MASK;
+		if (arg->flags & FUSE_FLOCK_LOCKS)
+			se->conn.capable |= FUSE_CAP_FLOCK_LOCKS;
+		if (arg->flags & FUSE_AUTO_INVAL_DATA)
+			se->conn.capable |= FUSE_CAP_AUTO_INVAL_DATA;
+		if (arg->flags & FUSE_DO_READDIRPLUS)
+			se->conn.capable |= FUSE_CAP_READDIRPLUS;
+		if (arg->flags & FUSE_READDIRPLUS_AUTO)
+			se->conn.capable |= FUSE_CAP_READDIRPLUS_AUTO;
+		if (arg->flags & FUSE_ASYNC_DIO)
+			se->conn.capable |= FUSE_CAP_ASYNC_DIO;
+		if (arg->flags & FUSE_WRITEBACK_CACHE)
+			se->conn.capable |= FUSE_CAP_WRITEBACK_CACHE;
+		if (arg->flags & FUSE_NO_OPEN_SUPPORT)
+			se->conn.capable |= FUSE_CAP_NO_OPEN_SUPPORT;
+		if (arg->flags & FUSE_PARALLEL_DIROPS)
+			se->conn.capable |= FUSE_CAP_PARALLEL_DIROPS;
+		if (arg->flags & FUSE_POSIX_ACL)
+			se->conn.capable |= FUSE_CAP_POSIX_ACL;
+		if (arg->flags & FUSE_HANDLE_KILLPRIV)
+			se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV;
+		if (arg->flags & FUSE_NO_OPENDIR_SUPPORT)
+			se->conn.capable |= FUSE_CAP_NO_OPENDIR_SUPPORT;
+		if (!(arg->flags & FUSE_MAX_PAGES)) {
+			size_t max_bufsize =
+				FUSE_DEFAULT_MAX_PAGES_PER_REQ * getpagesize()
+				+ FUSE_BUFFER_HEADER_SIZE;
+			if (bufsize > max_bufsize) {
+				bufsize = max_bufsize;
+			}
+		}
+	} else {
+		se->conn.max_readahead = 0;
+	}
+
+	if (se->conn.proto_minor >= 14) {
+#ifdef HAVE_SPLICE
+#ifdef HAVE_VMSPLICE
+		se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
+#endif
+		se->conn.capable |= FUSE_CAP_SPLICE_READ;
+#endif
+	}
+	if (se->conn.proto_minor >= 18)
+		se->conn.capable |= FUSE_CAP_IOCTL_DIR;
+
+	/* Default settings for modern filesystems.
+	 *
+	 * Most of these capabilities were disabled by default in
+	 * libfuse2 for backwards compatibility reasons. In libfuse3,
+	 * we can finally enable them by default (as long as they're
+	 * supported by the kernel).
+	 */
+#define LL_SET_DEFAULT(cond, cap) \
+	if ((cond) && (se->conn.capable & (cap))) \
+		se->conn.want |= (cap)
+	LL_SET_DEFAULT(1, FUSE_CAP_ASYNC_READ);
+	LL_SET_DEFAULT(1, FUSE_CAP_PARALLEL_DIROPS);
+	LL_SET_DEFAULT(1, FUSE_CAP_AUTO_INVAL_DATA);
+	LL_SET_DEFAULT(1, FUSE_CAP_HANDLE_KILLPRIV);
+	LL_SET_DEFAULT(1, FUSE_CAP_ASYNC_DIO);
+	LL_SET_DEFAULT(1, FUSE_CAP_IOCTL_DIR);
+	LL_SET_DEFAULT(1, FUSE_CAP_ATOMIC_O_TRUNC);
+	LL_SET_DEFAULT(se->op.write_buf, FUSE_CAP_SPLICE_READ);
+	LL_SET_DEFAULT(se->op.getlk && se->op.setlk,
+		       FUSE_CAP_POSIX_LOCKS);
+	LL_SET_DEFAULT(se->op.flock, FUSE_CAP_FLOCK_LOCKS);
+	LL_SET_DEFAULT(se->op.readdirplus, FUSE_CAP_READDIRPLUS);
+	LL_SET_DEFAULT(se->op.readdirplus && se->op.readdir,
+		       FUSE_CAP_READDIRPLUS_AUTO);
+	se->conn.time_gran = 1;
+	
+	if (bufsize < FUSE_MIN_READ_BUFFER) {
+		fuse_log(FUSE_LOG_ERR, "fuse: warning: buffer size too small: %zu\n",
+			bufsize);
+		bufsize = FUSE_MIN_READ_BUFFER;
+	}
+	se->bufsize = bufsize;
+
+	if (se->conn.max_write > bufsize - FUSE_BUFFER_HEADER_SIZE)
+		se->conn.max_write = bufsize - FUSE_BUFFER_HEADER_SIZE;
+
+	se->got_init = 1;
+	if (se->op.init)
+		se->op.init(se->userdata, &se->conn);
+
+	if (se->conn.want & (~se->conn.capable)) {
+		fuse_log(FUSE_LOG_ERR, "fuse: error: filesystem requested capabilities "
+			"0x%x that are not supported by kernel, aborting.\n",
+			se->conn.want & (~se->conn.capable));
+		fuse_reply_err(req, EPROTO);
+		se->error = -EPROTO;
+		fuse_session_exit(se);
+		return;
+	}
+
+	unsigned max_read_mo = get_max_read(se->mo);
+	if (se->conn.max_read != max_read_mo) {
+		fuse_log(FUSE_LOG_ERR, "fuse: error: init() and fuse_session_new() "
+			"requested different maximum read size (%u vs %u)\n",
+			se->conn.max_read, max_read_mo);
+		fuse_reply_err(req, EPROTO);
+		se->error = -EPROTO;
+		fuse_session_exit(se);
+		return;
+	}
+
+	if (se->conn.max_write < bufsize - FUSE_BUFFER_HEADER_SIZE) {
+		se->bufsize = se->conn.max_write + FUSE_BUFFER_HEADER_SIZE;
+	}
+	if (arg->flags & FUSE_MAX_PAGES) {
+		outarg.flags |= FUSE_MAX_PAGES;
+		outarg.max_pages = (se->conn.max_write - 1) / getpagesize() + 1;
+	}
+
+	/* Always enable big writes, this is superseded
+	   by the max_write option */
+	outarg.flags |= FUSE_BIG_WRITES;
+
+	if (se->conn.want & FUSE_CAP_ASYNC_READ)
+		outarg.flags |= FUSE_ASYNC_READ;
+	if (se->conn.want & FUSE_CAP_POSIX_LOCKS)
+		outarg.flags |= FUSE_POSIX_LOCKS;
+	if (se->conn.want & FUSE_CAP_ATOMIC_O_TRUNC)
+		outarg.flags |= FUSE_ATOMIC_O_TRUNC;
+	if (se->conn.want & FUSE_CAP_EXPORT_SUPPORT)
+		outarg.flags |= FUSE_EXPORT_SUPPORT;
+	if (se->conn.want & FUSE_CAP_DONT_MASK)
+		outarg.flags |= FUSE_DONT_MASK;
+	if (se->conn.want & FUSE_CAP_FLOCK_LOCKS)
+		outarg.flags |= FUSE_FLOCK_LOCKS;
+	if (se->conn.want & FUSE_CAP_AUTO_INVAL_DATA)
+		outarg.flags |= FUSE_AUTO_INVAL_DATA;
+	if (se->conn.want & FUSE_CAP_READDIRPLUS)
+		outarg.flags |= FUSE_DO_READDIRPLUS;
+	if (se->conn.want & FUSE_CAP_READDIRPLUS_AUTO)
+		outarg.flags |= FUSE_READDIRPLUS_AUTO;
+	if (se->conn.want & FUSE_CAP_ASYNC_DIO)
+		outarg.flags |= FUSE_ASYNC_DIO;
+	if (se->conn.want & FUSE_CAP_WRITEBACK_CACHE)
+		outarg.flags |= FUSE_WRITEBACK_CACHE;
+	if (se->conn.want & FUSE_CAP_POSIX_ACL)
+		outarg.flags |= FUSE_POSIX_ACL;
+	outarg.max_readahead = se->conn.max_readahead;
+	outarg.max_write = se->conn.max_write;
+	if (se->conn.proto_minor >= 13) {
+		if (se->conn.max_background >= (1 << 16))
+			se->conn.max_background = (1 << 16) - 1;
+		if (se->conn.congestion_threshold > se->conn.max_background)
+			se->conn.congestion_threshold = se->conn.max_background;
+		if (!se->conn.congestion_threshold) {
+			se->conn.congestion_threshold =
+				se->conn.max_background * 3 / 4;
+		}
+
+		outarg.max_background = se->conn.max_background;
+		outarg.congestion_threshold = se->conn.congestion_threshold;
+	}
+	if (se->conn.proto_minor >= 23)
+		outarg.time_gran = se->conn.time_gran;
+
+	if (se->debug) {
+		fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major, outarg.minor);
+		fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
+		fuse_log(FUSE_LOG_DEBUG, "   max_readahead=0x%08x\n",
+			outarg.max_readahead);
+		fuse_log(FUSE_LOG_DEBUG, "   max_write=0x%08x\n", outarg.max_write);
+		fuse_log(FUSE_LOG_DEBUG, "   max_background=%i\n",
+			outarg.max_background);
+		fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
+			outarg.congestion_threshold);
+		fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n",
+			outarg.time_gran);
+	}
+	if (arg->minor < 5)
+		outargsize = FUSE_COMPAT_INIT_OUT_SIZE;
+	else if (arg->minor < 23)
+		outargsize = FUSE_COMPAT_22_INIT_OUT_SIZE;
+
+	send_reply_ok(req, &outarg, outargsize);
+}
+
+static void do_destroy(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+{
+	struct fuse_session *se = req->se;
+
+	(void) nodeid;
+	(void) inarg;
+
+	se->got_destroy = 1;
+	if (se->op.destroy)
+		se->op.destroy(se->userdata);
+
+	send_reply_ok(req, NULL, 0);
+}
+
+static void list_del_nreq(struct fuse_notify_req *nreq)
+{
+	struct fuse_notify_req *prev = nreq->prev;
+	struct fuse_notify_req *next = nreq->next;
+	prev->next = next;
+	next->prev = prev;
+}
+
+static void list_add_nreq(struct fuse_notify_req *nreq,
+			  struct fuse_notify_req *next)
+{
+	struct fuse_notify_req *prev = next->prev;
+	nreq->next = next;
+	nreq->prev = prev;
+	prev->next = nreq;
+	next->prev = nreq;
+}
+
+static void list_init_nreq(struct fuse_notify_req *nreq)
+{
+	nreq->next = nreq;
+	nreq->prev = nreq;
+}
+
+static void do_notify_reply(fuse_req_t req, fuse_ino_t nodeid,
+			    const void *inarg, const struct fuse_buf *buf)
+{
+	struct fuse_session *se = req->se;
+	struct fuse_notify_req *nreq;
+	struct fuse_notify_req *head;
+
+	pthread_mutex_lock(&se->lock);
+	head = &se->notify_list;
+	for (nreq = head->next; nreq != head; nreq = nreq->next) {
+		if (nreq->unique == req->unique) {
+			list_del_nreq(nreq);
+			break;
+		}
+	}
+	pthread_mutex_unlock(&se->lock);
+
+	if (nreq != head)
+		nreq->reply(nreq, req, nodeid, inarg, buf);
+}
+
+static int send_notify_iov(struct fuse_session *se, int notify_code,
+			   struct iovec *iov, int count)
+{
+	struct fuse_out_header out;
+
+	if (!se->got_init)
+		return -ENOTCONN;
+
+	out.unique = 0;
+	out.error = notify_code;
+	iov[0].iov_base = &out;
+	iov[0].iov_len = sizeof(struct fuse_out_header);
+
+	return fuse_send_msg(se, NULL, iov, count);
+}
+
+int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph)
+{
+	if (ph != NULL) {
+		struct fuse_notify_poll_wakeup_out outarg;
+		struct iovec iov[2];
+
+		outarg.kh = ph->kh;
+
+		iov[1].iov_base = &outarg;
+		iov[1].iov_len = sizeof(outarg);
+
+		return send_notify_iov(ph->se, FUSE_NOTIFY_POLL, iov, 2);
+	} else {
+		return 0;
+	}
+}
+
+int fuse_lowlevel_notify_inval_inode(struct fuse_session *se, fuse_ino_t ino,
+				     off_t off, off_t len)
+{
+	struct fuse_notify_inval_inode_out outarg;
+	struct iovec iov[2];
+
+	if (!se)
+		return -EINVAL;
+
+	if (se->conn.proto_major < 6 || se->conn.proto_minor < 12)
+		return -ENOSYS;
+	
+	outarg.ino = ino;
+	outarg.off = off;
+	outarg.len = len;
+
+	iov[1].iov_base = &outarg;
+	iov[1].iov_len = sizeof(outarg);
+
+	return send_notify_iov(se, FUSE_NOTIFY_INVAL_INODE, iov, 2);
+}
+
+int fuse_lowlevel_notify_inval_entry(struct fuse_session *se, fuse_ino_t parent,
+				     const char *name, size_t namelen)
+{
+	struct fuse_notify_inval_entry_out outarg;
+	struct iovec iov[3];
+
+	if (!se)
+		return -EINVAL;
+	
+	if (se->conn.proto_major < 6 || se->conn.proto_minor < 12)
+		return -ENOSYS;
+
+	outarg.parent = parent;
+	outarg.namelen = namelen;
+	outarg.padding = 0;
+
+	iov[1].iov_base = &outarg;
+	iov[1].iov_len = sizeof(outarg);
+	iov[2].iov_base = (void *)name;
+	iov[2].iov_len = namelen + 1;
+
+	return send_notify_iov(se, FUSE_NOTIFY_INVAL_ENTRY, iov, 3);
+}
+
+int fuse_lowlevel_notify_delete(struct fuse_session *se,
+				fuse_ino_t parent, fuse_ino_t child,
+				const char *name, size_t namelen)
+{
+	struct fuse_notify_delete_out outarg;
+	struct iovec iov[3];
+
+	if (!se)
+		return -EINVAL;
+
+	if (se->conn.proto_major < 6 || se->conn.proto_minor < 18)
+		return -ENOSYS;
+
+	outarg.parent = parent;
+	outarg.child = child;
+	outarg.namelen = namelen;
+	outarg.padding = 0;
+
+	iov[1].iov_base = &outarg;
+	iov[1].iov_len = sizeof(outarg);
+	iov[2].iov_base = (void *)name;
+	iov[2].iov_len = namelen + 1;
+
+	return send_notify_iov(se, FUSE_NOTIFY_DELETE, iov, 3);
+}
+
+int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
+			       off_t offset, struct fuse_bufvec *bufv,
+			       enum fuse_buf_copy_flags flags)
+{
+	struct fuse_out_header out;
+	struct fuse_notify_store_out outarg;
+	struct iovec iov[3];
+	size_t size = fuse_buf_size(bufv);
+	int res;
+
+	if (!se)
+		return -EINVAL;
+
+	if (se->conn.proto_major < 6 || se->conn.proto_minor < 15)
+		return -ENOSYS;
+
+	out.unique = 0;
+	out.error = FUSE_NOTIFY_STORE;
+
+	outarg.nodeid = ino;
+	outarg.offset = offset;
+	outarg.size = size;
+	outarg.padding = 0;
+
+	iov[0].iov_base = &out;
+	iov[0].iov_len = sizeof(out);
+	iov[1].iov_base = &outarg;
+	iov[1].iov_len = sizeof(outarg);
+
+	res = fuse_send_data_iov(se, NULL, iov, 2, bufv, flags);
+	if (res > 0)
+		res = -res;
+
+	return res;
+}
+
+struct fuse_retrieve_req {
+	struct fuse_notify_req nreq;
+	void *cookie;
+};
+
+static void fuse_ll_retrieve_reply(struct fuse_notify_req *nreq,
+				   fuse_req_t req, fuse_ino_t ino,
+				   const void *inarg,
+				   const struct fuse_buf *ibuf)
+{
+	struct fuse_session *se = req->se;
+	struct fuse_retrieve_req *rreq =
+		container_of(nreq, struct fuse_retrieve_req, nreq);
+	const struct fuse_notify_retrieve_in *arg = inarg;
+	struct fuse_bufvec bufv = {
+		.buf[0] = *ibuf,
+		.count = 1,
+	};
+
+	if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD))
+		bufv.buf[0].mem = PARAM(arg);
+
+	bufv.buf[0].size -= sizeof(struct fuse_in_header) +
+		sizeof(struct fuse_notify_retrieve_in);
+
+	if (bufv.buf[0].size < arg->size) {
+		fuse_log(FUSE_LOG_ERR, "fuse: retrieve reply: buffer size too small\n");
+		fuse_reply_none(req);
+		goto out;
+	}
+	bufv.buf[0].size = arg->size;
+
+	if (se->op.retrieve_reply) {
+		se->op.retrieve_reply(req, rreq->cookie, ino,
+					  arg->offset, &bufv);
+	} else {
+		fuse_reply_none(req);
+	}
+out:
+	free(rreq);
+	if ((ibuf->flags & FUSE_BUF_IS_FD) && bufv.idx < bufv.count)
+		fuse_ll_clear_pipe(se);
+}
+
+int fuse_lowlevel_notify_retrieve(struct fuse_session *se, fuse_ino_t ino,
+				  size_t size, off_t offset, void *cookie)
+{
+	struct fuse_notify_retrieve_out outarg;
+	struct iovec iov[2];
+	struct fuse_retrieve_req *rreq;
+	int err;
+
+	if (!se)
+		return -EINVAL;
+
+	if (se->conn.proto_major < 6 || se->conn.proto_minor < 15)
+		return -ENOSYS;
+
+	rreq = malloc(sizeof(*rreq));
+	if (rreq == NULL)
+		return -ENOMEM;
+
+	pthread_mutex_lock(&se->lock);
+	rreq->cookie = cookie;
+	rreq->nreq.unique = se->notify_ctr++;
+	rreq->nreq.reply = fuse_ll_retrieve_reply;
+	list_add_nreq(&rreq->nreq, &se->notify_list);
+	pthread_mutex_unlock(&se->lock);
+
+	outarg.notify_unique = rreq->nreq.unique;
+	outarg.nodeid = ino;
+	outarg.offset = offset;
+	outarg.size = size;
+	outarg.padding = 0;
+
+	iov[1].iov_base = &outarg;
+	iov[1].iov_len = sizeof(outarg);
+
+	err = send_notify_iov(se, FUSE_NOTIFY_RETRIEVE, iov, 2);
+	if (err) {
+		pthread_mutex_lock(&se->lock);
+		list_del_nreq(&rreq->nreq);
+		pthread_mutex_unlock(&se->lock);
+		free(rreq);
+	}
+
+	return err;
+}
+
+void *fuse_req_userdata(fuse_req_t req)
+{
+	return req->se->userdata;
+}
+
+const struct fuse_ctx *fuse_req_ctx(fuse_req_t req)
+{
+	return &req->ctx;
+}
+
+void fuse_req_interrupt_func(fuse_req_t req, fuse_interrupt_func_t func,
+			     void *data)
+{
+	pthread_mutex_lock(&req->lock);
+	pthread_mutex_lock(&req->se->lock);
+	req->u.ni.func = func;
+	req->u.ni.data = data;
+	pthread_mutex_unlock(&req->se->lock);
+	if (req->interrupted && func)
+		func(req, data);
+	pthread_mutex_unlock(&req->lock);
+}
+
+int fuse_req_interrupted(fuse_req_t req)
+{
+	int interrupted;
+
+	pthread_mutex_lock(&req->se->lock);
+	interrupted = req->interrupted;
+	pthread_mutex_unlock(&req->se->lock);
+
+	return interrupted;
+}
+
+static struct {
+	void (*func)(fuse_req_t, fuse_ino_t, const void *);
+	const char *name;
+} fuse_ll_ops[] = {
+	[FUSE_LOOKUP]	   = { do_lookup,      "LOOKUP"	     },
+	[FUSE_FORGET]	   = { do_forget,      "FORGET"	     },
+	[FUSE_GETATTR]	   = { do_getattr,     "GETATTR"     },
+	[FUSE_SETATTR]	   = { do_setattr,     "SETATTR"     },
+	[FUSE_READLINK]	   = { do_readlink,    "READLINK"    },
+	[FUSE_SYMLINK]	   = { do_symlink,     "SYMLINK"     },
+	[FUSE_MKNOD]	   = { do_mknod,       "MKNOD"	     },
+	[FUSE_MKDIR]	   = { do_mkdir,       "MKDIR"	     },
+	[FUSE_UNLINK]	   = { do_unlink,      "UNLINK"	     },
+	[FUSE_RMDIR]	   = { do_rmdir,       "RMDIR"	     },
+	[FUSE_RENAME]	   = { do_rename,      "RENAME"	     },
+	[FUSE_LINK]	   = { do_link,	       "LINK"	     },
+	[FUSE_OPEN]	   = { do_open,	       "OPEN"	     },
+	[FUSE_READ]	   = { do_read,	       "READ"	     },
+	[FUSE_WRITE]	   = { do_write,       "WRITE"	     },
+	[FUSE_STATFS]	   = { do_statfs,      "STATFS"	     },
+	[FUSE_RELEASE]	   = { do_release,     "RELEASE"     },
+	[FUSE_FSYNC]	   = { do_fsync,       "FSYNC"	     },
+	[FUSE_SETXATTR]	   = { do_setxattr,    "SETXATTR"    },
+	[FUSE_GETXATTR]	   = { do_getxattr,    "GETXATTR"    },
+	[FUSE_LISTXATTR]   = { do_listxattr,   "LISTXATTR"   },
+	[FUSE_REMOVEXATTR] = { do_removexattr, "REMOVEXATTR" },
+	[FUSE_FLUSH]	   = { do_flush,       "FLUSH"	     },
+	[FUSE_INIT]	   = { do_init,	       "INIT"	     },
+	[FUSE_OPENDIR]	   = { do_opendir,     "OPENDIR"     },
+	[FUSE_READDIR]	   = { do_readdir,     "READDIR"     },
+	[FUSE_RELEASEDIR]  = { do_releasedir,  "RELEASEDIR"  },
+	[FUSE_FSYNCDIR]	   = { do_fsyncdir,    "FSYNCDIR"    },
+	[FUSE_GETLK]	   = { do_getlk,       "GETLK"	     },
+	[FUSE_SETLK]	   = { do_setlk,       "SETLK"	     },
+	[FUSE_SETLKW]	   = { do_setlkw,      "SETLKW"	     },
+	[FUSE_ACCESS]	   = { do_access,      "ACCESS"	     },
+	[FUSE_CREATE]	   = { do_create,      "CREATE"	     },
+	[FUSE_INTERRUPT]   = { do_interrupt,   "INTERRUPT"   },
+	[FUSE_BMAP]	   = { do_bmap,	       "BMAP"	     },
+	[FUSE_IOCTL]	   = { do_ioctl,       "IOCTL"	     },
+	[FUSE_POLL]	   = { do_poll,        "POLL"	     },
+	[FUSE_FALLOCATE]   = { do_fallocate,   "FALLOCATE"   },
+	[FUSE_DESTROY]	   = { do_destroy,     "DESTROY"     },
+	[FUSE_NOTIFY_REPLY] = { (void *) 1,    "NOTIFY_REPLY" },
+	[FUSE_BATCH_FORGET] = { do_batch_forget, "BATCH_FORGET" },
+	[FUSE_READDIRPLUS] = { do_readdirplus,	"READDIRPLUS"},
+	[FUSE_RENAME2]     = { do_rename2,      "RENAME2"    },
+	[FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
+	[FUSE_LSEEK]	   = { do_lseek,       "LSEEK"	     },
+	[CUSE_INIT]	   = { cuse_lowlevel_init, "CUSE_INIT"   },
+};
+
+#define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
+
+static const char *opname(enum fuse_opcode opcode)
+{
+	if (opcode >= FUSE_MAXOP || !fuse_ll_ops[opcode].name)
+		return "???";
+	else
+		return fuse_ll_ops[opcode].name;
+}
+
+static int fuse_ll_copy_from_pipe(struct fuse_bufvec *dst,
+				  struct fuse_bufvec *src)
+{
+	ssize_t res = fuse_buf_copy(dst, src, 0);
+	if (res < 0) {
+		fuse_log(FUSE_LOG_ERR, "fuse: copy from pipe: %s\n", strerror(-res));
+		return res;
+	}
+	if ((size_t)res < fuse_buf_size(dst)) {
+		fuse_log(FUSE_LOG_ERR, "fuse: copy from pipe: short read\n");
+		return -1;
+	}
+	return 0;
+}
+
+void fuse_session_process_buf(struct fuse_session *se,
+			      const struct fuse_buf *buf)
+{
+	fuse_session_process_buf_int(se, buf, NULL);
+}
+
+void fuse_session_process_buf_int(struct fuse_session *se,
+				  const struct fuse_buf *buf, struct fuse_chan *ch)
+{
+	const size_t write_header_size = sizeof(struct fuse_in_header) +
+		sizeof(struct fuse_write_in);
+	struct fuse_bufvec bufv = { .buf[0] = *buf, .count = 1 };
+	struct fuse_bufvec tmpbuf = FUSE_BUFVEC_INIT(write_header_size);
+	struct fuse_in_header *in;
+	const void *inarg;
+	struct fuse_req *req;
+	void *mbuf = NULL;
+	int err;
+	int res;
+
+	if (buf->flags & FUSE_BUF_IS_FD) {
+		if (buf->size < tmpbuf.buf[0].size)
+			tmpbuf.buf[0].size = buf->size;
+
+		mbuf = malloc(tmpbuf.buf[0].size);
+		if (mbuf == NULL) {
+			fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate header\n");
+			goto clear_pipe;
+		}
+		tmpbuf.buf[0].mem = mbuf;
+
+		res = fuse_ll_copy_from_pipe(&tmpbuf, &bufv);
+		if (res < 0)
+			goto clear_pipe;
+
+		in = mbuf;
+	} else {
+		in = buf->mem;
+	}
+
+	if (se->debug) {
+		fuse_log(FUSE_LOG_DEBUG,
+			"unique: %llu, opcode: %s (%i), nodeid: %llu, insize: %zu, pid: %u\n",
+			(unsigned long long) in->unique,
+			opname((enum fuse_opcode) in->opcode), in->opcode,
+			(unsigned long long) in->nodeid, buf->size, in->pid);
+	}
+
+	req = fuse_ll_alloc_req(se);
+	if (req == NULL) {
+		struct fuse_out_header out = {
+			.unique = in->unique,
+			.error = -ENOMEM,
+		};
+		struct iovec iov = {
+			.iov_base = &out,
+			.iov_len = sizeof(struct fuse_out_header),
+		};
+
+		fuse_send_msg(se, ch, &iov, 1);
+		goto clear_pipe;
+	}
+
+	req->unique = in->unique;
+	req->ctx.uid = in->uid;
+	req->ctx.gid = in->gid;
+	req->ctx.pid = in->pid;
+	req->ch = ch ? fuse_chan_get(ch) : NULL;
+
+	err = EIO;
+	if (!se->got_init) {
+		enum fuse_opcode expected;
+
+		expected = se->cuse_data ? CUSE_INIT : FUSE_INIT;
+		if (in->opcode != expected)
+			goto reply_err;
+	} else if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT)
+		goto reply_err;
+
+	err = EACCES;
+	/* Implement -o allow_root */
+	if (se->deny_others && in->uid != se->owner && in->uid != 0 &&
+		 in->opcode != FUSE_INIT && in->opcode != FUSE_READ &&
+		 in->opcode != FUSE_WRITE && in->opcode != FUSE_FSYNC &&
+		 in->opcode != FUSE_RELEASE && in->opcode != FUSE_READDIR &&
+		 in->opcode != FUSE_FSYNCDIR && in->opcode != FUSE_RELEASEDIR &&
+		 in->opcode != FUSE_NOTIFY_REPLY &&
+		 in->opcode != FUSE_READDIRPLUS)
+		goto reply_err;
+
+	err = ENOSYS;
+	if (in->opcode >= FUSE_MAXOP || !fuse_ll_ops[in->opcode].func)
+		goto reply_err;
+	if (in->opcode != FUSE_INTERRUPT) {
+		struct fuse_req *intr;
+		pthread_mutex_lock(&se->lock);
+		intr = check_interrupt(se, req);
+		list_add_req(req, &se->list);
+		pthread_mutex_unlock(&se->lock);
+		if (intr)
+			fuse_reply_err(intr, EAGAIN);
+	}
+
+	if ((buf->flags & FUSE_BUF_IS_FD) && write_header_size < buf->size &&
+	    (in->opcode != FUSE_WRITE || !se->op.write_buf) &&
+	    in->opcode != FUSE_NOTIFY_REPLY) {
+		void *newmbuf;
+
+		err = ENOMEM;
+		newmbuf = realloc(mbuf, buf->size);
+		if (newmbuf == NULL)
+			goto reply_err;
+		mbuf = newmbuf;
+
+		tmpbuf = FUSE_BUFVEC_INIT(buf->size - write_header_size);
+		tmpbuf.buf[0].mem = (char *)mbuf + write_header_size;
+
+		res = fuse_ll_copy_from_pipe(&tmpbuf, &bufv);
+		err = -res;
+		if (res < 0)
+			goto reply_err;
+
+		in = mbuf;
+	}
+
+	inarg = (void *) &in[1];
+	if (in->opcode == FUSE_WRITE && se->op.write_buf)
+		do_write_buf(req, in->nodeid, inarg, buf);
+	else if (in->opcode == FUSE_NOTIFY_REPLY)
+		do_notify_reply(req, in->nodeid, inarg, buf);
+	else
+		fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
+
+out_free:
+	free(mbuf);
+	return;
+
+reply_err:
+	fuse_reply_err(req, err);
+clear_pipe:
+	if (buf->flags & FUSE_BUF_IS_FD)
+		fuse_ll_clear_pipe(se);
+	goto out_free;
+}
+
+#define LL_OPTION(n,o,v) \
+	{ n, offsetof(struct fuse_session, o), v }
+
+static const struct fuse_opt fuse_ll_opts[] = {
+	LL_OPTION("debug", debug, 1),
+	LL_OPTION("-d", debug, 1),
+	LL_OPTION("--debug", debug, 1),
+	LL_OPTION("allow_root", deny_others, 1),
+	FUSE_OPT_END
+};
+
+void fuse_lowlevel_version(void)
+{
+	printf("using FUSE kernel interface version %i.%i\n",
+	       FUSE_KERNEL_VERSION, FUSE_KERNEL_MINOR_VERSION);
+	fuse_mount_version();
+}
+
+void fuse_lowlevel_help(void)
+{
+	/* These are not all options, but the ones that are
+	   potentially of interest to an end-user */
+	printf(
+"    -o allow_other         allow access by all users\n"
+"    -o allow_root          allow access by root\n"
+"    -o auto_unmount        auto unmount on process termination\n");
+}
+
+void fuse_session_destroy(struct fuse_session *se)
+{
+	struct fuse_ll_pipe *llp;
+
+	if (se->got_init && !se->got_destroy) {
+		if (se->op.destroy)
+			se->op.destroy(se->userdata);
+	}
+	llp = pthread_getspecific(se->pipe_key);
+	if (llp != NULL)
+		fuse_ll_pipe_free(llp);
+	pthread_key_delete(se->pipe_key);
+	pthread_mutex_destroy(&se->lock);
+	free(se->cuse_data);
+	if (se->fd != -1)
+		close(se->fd);
+	destroy_mount_opts(se->mo);
+	free(se);
+}
+
+
+static void fuse_ll_pipe_destructor(void *data)
+{
+	struct fuse_ll_pipe *llp = data;
+	fuse_ll_pipe_free(llp);
+}
+
+int fuse_session_receive_buf(struct fuse_session *se, struct fuse_buf *buf)
+{
+	return fuse_session_receive_buf_int(se, buf, NULL);
+}
+
+int fuse_session_receive_buf_int(struct fuse_session *se, struct fuse_buf *buf,
+				 struct fuse_chan *ch)
+{
+	int err;
+	ssize_t res;
+#ifdef HAVE_SPLICE
+	size_t bufsize = se->bufsize;
+	struct fuse_ll_pipe *llp;
+	struct fuse_buf tmpbuf;
+
+	if (se->conn.proto_minor < 14 || !(se->conn.want & FUSE_CAP_SPLICE_READ))
+		goto fallback;
+
+	llp = fuse_ll_get_pipe(se);
+	if (llp == NULL)
+		goto fallback;
+
+	if (llp->size < bufsize) {
+		if (llp->can_grow) {
+			res = fcntl(llp->pipe[0], F_SETPIPE_SZ, bufsize);
+			if (res == -1) {
+				llp->can_grow = 0;
+				res = grow_pipe_to_max(llp->pipe[0]);
+				if (res > 0)
+					llp->size = res;
+				goto fallback;
+			}
+			llp->size = res;
+		}
+		if (llp->size < bufsize)
+			goto fallback;
+	}
+
+	res = splice(ch ? ch->fd : se->fd,
+		     NULL, llp->pipe[1], NULL, bufsize, 0);
+	err = errno;
+
+	if (fuse_session_exited(se))
+		return 0;
+
+	if (res == -1) {
+		if (err == ENODEV) {
+			/* Filesystem was unmounted, or connection was aborted
+			   via /sys/fs/fuse/connections */
+			fuse_session_exit(se);
+			return 0;
+		}
+		if (err != EINTR && err != EAGAIN)
+			perror("fuse: splice from device");
+		return -err;
+	}
+
+	if (res < sizeof(struct fuse_in_header)) {
+		fuse_log(FUSE_LOG_ERR, "short splice from fuse device\n");
+		return -EIO;
+	}
+
+	tmpbuf = (struct fuse_buf) {
+		.size = res,
+		.flags = FUSE_BUF_IS_FD,
+		.fd = llp->pipe[0],
+	};
+
+	/*
+	 * Don't bother with zero copy for small requests.
+	 * fuse_loop_mt() needs to check for FORGET so this more than
+	 * just an optimization.
+	 */
+	if (res < sizeof(struct fuse_in_header) +
+	    sizeof(struct fuse_write_in) + pagesize) {
+		struct fuse_bufvec src = { .buf[0] = tmpbuf, .count = 1 };
+		struct fuse_bufvec dst = { .count = 1 };
+
+		if (!buf->mem) {
+			buf->mem = malloc(se->bufsize);
+			if (!buf->mem) {
+				fuse_log(FUSE_LOG_ERR,
+					"fuse: failed to allocate read buffer\n");
+				return -ENOMEM;
+			}
+		}
+		buf->size = se->bufsize;
+		buf->flags = 0;
+		dst.buf[0] = *buf;
+
+		res = fuse_buf_copy(&dst, &src, 0);
+		if (res < 0) {
+			fuse_log(FUSE_LOG_ERR, "fuse: copy from pipe: %s\n",
+				strerror(-res));
+			fuse_ll_clear_pipe(se);
+			return res;
+		}
+		if (res < tmpbuf.size) {
+			fuse_log(FUSE_LOG_ERR, "fuse: copy from pipe: short read\n");
+			fuse_ll_clear_pipe(se);
+			return -EIO;
+		}
+		assert(res == tmpbuf.size);
+
+	} else {
+		/* Don't overwrite buf->mem, as that would cause a leak */
+		buf->fd = tmpbuf.fd;
+		buf->flags = tmpbuf.flags;
+	}
+	buf->size = tmpbuf.size;
+
+	return res;
+
+fallback:
+#endif
+	if (!buf->mem) {
+		buf->mem = malloc(se->bufsize);
+		if (!buf->mem) {
+			fuse_log(FUSE_LOG_ERR,
+				"fuse: failed to allocate read buffer\n");
+			return -ENOMEM;
+		}
+	}
+
+restart:
+	res = read(ch ? ch->fd : se->fd, buf->mem, se->bufsize);
+	err = errno;
+
+	if (fuse_session_exited(se))
+		return 0;
+	if (res == -1) {
+		/* ENOENT means the operation was interrupted, it's safe
+		   to restart */
+		if (err == ENOENT)
+			goto restart;
+
+		if (err == ENODEV) {
+			/* Filesystem was unmounted, or connection was aborted
+			   via /sys/fs/fuse/connections */
+			fuse_session_exit(se);
+			return 0;
+		}
+		/* Errors occurring during normal operation: EINTR (read
+		   interrupted), EAGAIN (nonblocking I/O), ENODEV (filesystem
+		   umounted) */
+		if (err != EINTR && err != EAGAIN)
+			perror("fuse: reading device");
+		return -err;
+	}
+	if ((size_t) res < sizeof(struct fuse_in_header)) {
+		fuse_log(FUSE_LOG_ERR, "short read on fuse device\n");
+		return -EIO;
+	}
+
+	buf->size = res;
+
+	return res;
+}
+
+struct fuse_session *fuse_session_new(struct fuse_args *args,
+				      const struct fuse_lowlevel_ops *op,
+				      size_t op_size, void *userdata)
+{
+	int err;
+	struct fuse_session *se;
+	struct mount_opts *mo;
+
+	if (sizeof(struct fuse_lowlevel_ops) < op_size) {
+		fuse_log(FUSE_LOG_ERR, "fuse: warning: library too old, some operations may not work\n");
+		op_size = sizeof(struct fuse_lowlevel_ops);
+	}
+
+	if (args->argc == 0) {
+		fuse_log(FUSE_LOG_ERR, "fuse: empty argv passed to fuse_session_new().\n");
+		return NULL;
+	}
+
+	se = (struct fuse_session *) calloc(1, sizeof(struct fuse_session));
+	if (se == NULL) {
+		fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate fuse object\n");
+		goto out1;
+	}
+	se->fd = -1;
+	se->conn.max_write = UINT_MAX;
+	se->conn.max_readahead = UINT_MAX;
+
+	/* Parse options */
+	if(fuse_opt_parse(args, se, fuse_ll_opts, NULL) == -1)
+		goto out2;
+	if(se->deny_others) {
+		/* Allowing access only by root is done by instructing
+		 * kernel to allow access by everyone, and then restricting
+		 * access to root and mountpoint owner in libfuse.
+		 */
+		// We may be adding the option a second time, but
+		// that doesn't hurt.
+		if(fuse_opt_add_arg(args, "-oallow_other") == -1)
+			goto out2;
+	}
+	mo = parse_mount_opts(args);
+	if (mo == NULL)
+		goto out3;
+
+	if(args->argc == 1 &&
+	   args->argv[0][0] == '-') {
+		fuse_log(FUSE_LOG_ERR, "fuse: warning: argv[0] looks like an option, but "
+			"will be ignored\n");
+	} else if (args->argc != 1) {
+		int i;
+		fuse_log(FUSE_LOG_ERR, "fuse: unknown option(s): `");
+		for(i = 1; i < args->argc-1; i++)
+			fuse_log(FUSE_LOG_ERR, "%s ", args->argv[i]);
+		fuse_log(FUSE_LOG_ERR, "%s'\n", args->argv[i]);
+		goto out4;
+	}
+
+	if (se->debug)
+		fuse_log(FUSE_LOG_DEBUG, "FUSE library version: %s\n", PACKAGE_VERSION);
+
+	se->bufsize = FUSE_MAX_MAX_PAGES * getpagesize() +
+		FUSE_BUFFER_HEADER_SIZE;
+
+	list_init_req(&se->list);
+	list_init_req(&se->interrupts);
+	list_init_nreq(&se->notify_list);
+	se->notify_ctr = 1;
+	fuse_mutex_init(&se->lock);
+
+	err = pthread_key_create(&se->pipe_key, fuse_ll_pipe_destructor);
+	if (err) {
+		fuse_log(FUSE_LOG_ERR, "fuse: failed to create thread specific key: %s\n",
+			strerror(err));
+		goto out5;
+	}
+
+	memcpy(&se->op, op, op_size);
+	se->owner = getuid();
+	se->userdata = userdata;
+
+	se->mo = mo;
+	return se;
+
+out5:
+	pthread_mutex_destroy(&se->lock);
+out4:
+	fuse_opt_free_args(args);
+out3:
+	free(mo);
+out2:
+	free(se);
+out1:
+	return NULL;
+}
+
+int fuse_session_mount(struct fuse_session *se, const char *mountpoint)
+{
+	int fd;
+
+	/*
+	 * Make sure file descriptors 0, 1 and 2 are open, otherwise chaos
+	 * would ensue.
+	 */
+	do {
+		fd = open("/dev/null", O_RDWR);
+		if (fd > 2)
+			close(fd);
+	} while (fd >= 0 && fd <= 2);
+
+	/*
+	 * To allow FUSE daemons to run without privileges, the caller may open
+	 * /dev/fuse before launching the file system and pass on the file
+	 * descriptor by specifying /dev/fd/N as the mount point. Note that the
+	 * parent process takes care of performing the mount in this case.
+	 */
+	fd = fuse_mnt_parse_fuse_fd(mountpoint);
+	if (fd != -1) {
+		if (fcntl(fd, F_GETFD) == -1) {
+			fuse_log(FUSE_LOG_ERR,
+				"fuse: Invalid file descriptor /dev/fd/%u\n",
+				fd);
+			return -1;
+		}
+		se->fd = fd;
+		return 0;
+	}
+
+	/* Open channel */
+	fd = fuse_kern_mount(mountpoint, se->mo);
+	if (fd == -1)
+		return -1;
+	se->fd = fd;
+
+	/* Save mountpoint */
+	se->mountpoint = strdup(mountpoint);
+	if (se->mountpoint == NULL)
+		goto error_out;
+
+	return 0;
+
+error_out:
+	fuse_kern_unmount(mountpoint, fd);
+	return -1;
+}
+
+int fuse_session_fd(struct fuse_session *se)
+{
+	return se->fd;
+}
+
+void fuse_session_unmount(struct fuse_session *se)
+{
+	if (se->mountpoint != NULL) {
+		fuse_kern_unmount(se->mountpoint, se->fd);
+		free(se->mountpoint);
+		se->mountpoint = NULL;
+	}
+}
+
+#ifdef linux
+int fuse_req_getgroups(fuse_req_t req, int size, gid_t list[])
+{
+	char *buf;
+	size_t bufsize = 1024;
+	char path[128];
+	int ret;
+	int fd;
+	unsigned long pid = req->ctx.pid;
+	char *s;
+
+	sprintf(path, "/proc/%lu/task/%lu/status", pid, pid);
+
+retry:
+	buf = malloc(bufsize);
+	if (buf == NULL)
+		return -ENOMEM;
+
+	ret = -EIO;
+	fd = open(path, O_RDONLY);
+	if (fd == -1)
+		goto out_free;
+
+	ret = read(fd, buf, bufsize);
+	close(fd);
+	if (ret < 0) {
+		ret = -EIO;
+		goto out_free;
+	}
+
+	if ((size_t)ret == bufsize) {
+		free(buf);
+		bufsize *= 4;
+		goto retry;
+	}
+
+	ret = -EIO;
+	s = strstr(buf, "\nGroups:");
+	if (s == NULL)
+		goto out_free;
+
+	s += 8;
+	ret = 0;
+	while (1) {
+		char *end;
+		unsigned long val = strtoul(s, &end, 0);
+		if (end == s)
+			break;
+
+		s = end;
+		if (ret < size)
+			list[ret] = val;
+		ret++;
+	}
+
+out_free:
+	free(buf);
+	return ret;
+}
+#else /* linux */
+/*
+ * This is currently not implemented on other than Linux...
+ */
+int fuse_req_getgroups(fuse_req_t req, int size, gid_t list[])
+{
+	(void) req; (void) size; (void) list;
+	return -ENOSYS;
+}
+#endif
+
+void fuse_session_exit(struct fuse_session *se)
+{
+	se->exited = 1;
+}
+
+void fuse_session_reset(struct fuse_session *se)
+{
+	se->exited = 0;
+	se->error = 0;
+}
+
+int fuse_session_exited(struct fuse_session *se)
+{
+	return se->exited;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 005/104] virtiofsd: Add passthrough_ll
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 004/104] virtiofsd: Add fuse_lowlevel.c Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 12:01   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 006/104] virtiofsd: Trim down imported files Dr. David Alan Gilbert (git)
                   ` (101 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

passthrough_ll is one of the examples in the upstream fuse project
and is the main part of our daemon here.  It passes through requests
from fuse to the underlying filesystem, using syscalls as directly
as possible.

From libfuse fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 1338 ++++++++++++++++++++++++++++++
 1 file changed, 1338 insertions(+)
 create mode 100644 tools/virtiofsd/passthrough_ll.c

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
new file mode 100644
index 0000000000..5372d02934
--- /dev/null
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -0,0 +1,1338 @@
+/*
+  FUSE: Filesystem in Userspace
+  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+
+  This program can be distributed under the terms of the GNU GPL.
+  See the file COPYING.
+*/
+
+/** @file
+ *
+ * This file system mirrors the existing file system hierarchy of the
+ * system, starting at the root file system. This is implemented by
+ * just "passing through" all requests to the corresponding user-space
+ * libc functions. In contrast to passthrough.c and passthrough_fh.c,
+ * this implementation uses the low-level API. Its performance should
+ * be the least bad among the three, but many operations are not
+ * implemented. In particular, it is not possible to remove files (or
+ * directories) because the code necessary to defer actual removal
+ * until the file is not opened anymore would make the example much
+ * more complicated.
+ *
+ * When writeback caching is enabled (-o writeback mount option), it
+ * is only possible to write to files for which the mounting user has
+ * read permissions. This is because the writeback cache requires the
+ * kernel to be able to issue read requests for all files (which the
+ * passthrough filesystem cannot satisfy if it can't read the file in
+ * the underlying filesystem).
+ *
+ * Compile with:
+ *
+ *     gcc -Wall passthrough_ll.c `pkg-config fuse3 --cflags --libs` -o passthrough_ll
+ *
+ * ## Source code ##
+ * \include passthrough_ll.c
+ */
+
+#define _GNU_SOURCE
+#define FUSE_USE_VERSION 31
+
+#include "config.h"
+
+#include <fuse_lowlevel.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <stddef.h>
+#include <stdbool.h>
+#include <string.h>
+#include <limits.h>
+#include <dirent.h>
+#include <assert.h>
+#include <errno.h>
+#include <inttypes.h>
+#include <pthread.h>
+#include <sys/file.h>
+#include <sys/xattr.h>
+
+#include "passthrough_helpers.h"
+
+/* We are re-using pointers to our `struct lo_inode` and `struct
+   lo_dirp` elements as inodes. This means that we must be able to
+   store uintptr_t values in a fuse_ino_t variable. The following
+   incantation checks this condition at compile time. */
+#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && !defined __cplusplus
+_Static_assert(sizeof(fuse_ino_t) >= sizeof(uintptr_t),
+	       "fuse_ino_t too small to hold uintptr_t values!");
+#else
+struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct \
+	{ unsigned _uintptr_to_must_hold_fuse_ino_t:
+			((sizeof(fuse_ino_t) >= sizeof(uintptr_t)) ? 1 : -1); };
+#endif
+
+struct lo_inode {
+	struct lo_inode *next; /* protected by lo->mutex */
+	struct lo_inode *prev; /* protected by lo->mutex */
+	int fd;
+	bool is_symlink;
+	ino_t ino;
+	dev_t dev;
+	uint64_t refcount; /* protected by lo->mutex */
+};
+
+enum {
+	CACHE_NEVER,
+	CACHE_NORMAL,
+	CACHE_ALWAYS,
+};
+
+struct lo_data {
+	pthread_mutex_t mutex;
+	int debug;
+	int writeback;
+	int flock;
+	int xattr;
+	const char *source;
+	double timeout;
+	int cache;
+	int timeout_set;
+	struct lo_inode root; /* protected by lo->mutex */
+};
+
+static const struct fuse_opt lo_opts[] = {
+	{ "writeback",
+	  offsetof(struct lo_data, writeback), 1 },
+	{ "no_writeback",
+	  offsetof(struct lo_data, writeback), 0 },
+	{ "source=%s",
+	  offsetof(struct lo_data, source), 0 },
+	{ "flock",
+	  offsetof(struct lo_data, flock), 1 },
+	{ "no_flock",
+	  offsetof(struct lo_data, flock), 0 },
+	{ "xattr",
+	  offsetof(struct lo_data, xattr), 1 },
+	{ "no_xattr",
+	  offsetof(struct lo_data, xattr), 0 },
+	{ "timeout=%lf",
+	  offsetof(struct lo_data, timeout), 0 },
+	{ "timeout=",
+	  offsetof(struct lo_data, timeout_set), 1 },
+	{ "cache=never",
+	  offsetof(struct lo_data, cache), CACHE_NEVER },
+	{ "cache=auto",
+	  offsetof(struct lo_data, cache), CACHE_NORMAL },
+	{ "cache=always",
+	  offsetof(struct lo_data, cache), CACHE_ALWAYS },
+
+	FUSE_OPT_END
+};
+
+static struct lo_data *lo_data(fuse_req_t req)
+{
+	return (struct lo_data *) fuse_req_userdata(req);
+}
+
+static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
+{
+	if (ino == FUSE_ROOT_ID)
+		return &lo_data(req)->root;
+	else
+		return (struct lo_inode *) (uintptr_t) ino;
+}
+
+static int lo_fd(fuse_req_t req, fuse_ino_t ino)
+{
+	return lo_inode(req, ino)->fd;
+}
+
+static bool lo_debug(fuse_req_t req)
+{
+	return lo_data(req)->debug != 0;
+}
+
+static void lo_init(void *userdata,
+		    struct fuse_conn_info *conn)
+{
+	struct lo_data *lo = (struct lo_data*) userdata;
+
+	if(conn->capable & FUSE_CAP_EXPORT_SUPPORT)
+		conn->want |= FUSE_CAP_EXPORT_SUPPORT;
+
+	if (lo->writeback &&
+	    conn->capable & FUSE_CAP_WRITEBACK_CACHE) {
+		if (lo->debug)
+			fuse_log(FUSE_LOG_DEBUG, "lo_init: activating writeback\n");
+		conn->want |= FUSE_CAP_WRITEBACK_CACHE;
+	}
+	if (lo->flock && conn->capable & FUSE_CAP_FLOCK_LOCKS) {
+		if (lo->debug)
+			fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
+		conn->want |= FUSE_CAP_FLOCK_LOCKS;
+	}
+}
+
+static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
+			     struct fuse_file_info *fi)
+{
+	int res;
+	struct stat buf;
+	struct lo_data *lo = lo_data(req);
+
+	(void) fi;
+
+	res = fstatat(lo_fd(req, ino), "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+	if (res == -1)
+		return (void) fuse_reply_err(req, errno);
+
+	fuse_reply_attr(req, &buf, lo->timeout);
+}
+
+static int utimensat_empty_nofollow(struct lo_inode *inode,
+				    const struct timespec *tv)
+{
+	int res;
+	char procname[64];
+
+	if (inode->is_symlink) {
+		res = utimensat(inode->fd, "", tv,
+				AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+		if (res == -1 && errno == EINVAL) {
+			/* Sorry, no race free way to set times on symlink. */
+			errno = EPERM;
+		}
+		return res;
+	}
+	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+	return utimensat(AT_FDCWD, procname, tv, 0);
+}
+
+static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
+		       int valid, struct fuse_file_info *fi)
+{
+	int saverr;
+	char procname[64];
+	struct lo_inode *inode = lo_inode(req, ino);
+	int ifd = inode->fd;
+	int res;
+
+	if (valid & FUSE_SET_ATTR_MODE) {
+		if (fi) {
+			res = fchmod(fi->fh, attr->st_mode);
+		} else {
+			sprintf(procname, "/proc/self/fd/%i", ifd);
+			res = chmod(procname, attr->st_mode);
+		}
+		if (res == -1)
+			goto out_err;
+	}
+	if (valid & (FUSE_SET_ATTR_UID | FUSE_SET_ATTR_GID)) {
+		uid_t uid = (valid & FUSE_SET_ATTR_UID) ?
+			attr->st_uid : (uid_t) -1;
+		gid_t gid = (valid & FUSE_SET_ATTR_GID) ?
+			attr->st_gid : (gid_t) -1;
+
+		res = fchownat(ifd, "", uid, gid,
+			       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+		if (res == -1)
+			goto out_err;
+	}
+	if (valid & FUSE_SET_ATTR_SIZE) {
+		if (fi) {
+			res = ftruncate(fi->fh, attr->st_size);
+		} else {
+			sprintf(procname, "/proc/self/fd/%i", ifd);
+			res = truncate(procname, attr->st_size);
+		}
+		if (res == -1)
+			goto out_err;
+	}
+	if (valid & (FUSE_SET_ATTR_ATIME | FUSE_SET_ATTR_MTIME)) {
+		struct timespec tv[2];
+
+		tv[0].tv_sec = 0;
+		tv[1].tv_sec = 0;
+		tv[0].tv_nsec = UTIME_OMIT;
+		tv[1].tv_nsec = UTIME_OMIT;
+
+		if (valid & FUSE_SET_ATTR_ATIME_NOW)
+			tv[0].tv_nsec = UTIME_NOW;
+		else if (valid & FUSE_SET_ATTR_ATIME)
+			tv[0] = attr->st_atim;
+
+		if (valid & FUSE_SET_ATTR_MTIME_NOW)
+			tv[1].tv_nsec = UTIME_NOW;
+		else if (valid & FUSE_SET_ATTR_MTIME)
+			tv[1] = attr->st_mtim;
+
+		if (fi)
+			res = futimens(fi->fh, tv);
+		else
+			res = utimensat_empty_nofollow(inode, tv);
+		if (res == -1)
+			goto out_err;
+	}
+
+	return lo_getattr(req, ino, fi);
+
+out_err:
+	saverr = errno;
+	fuse_reply_err(req, saverr);
+}
+
+static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st)
+{
+	struct lo_inode *p;
+	struct lo_inode *ret = NULL;
+
+	pthread_mutex_lock(&lo->mutex);
+	for (p = lo->root.next; p != &lo->root; p = p->next) {
+		if (p->ino == st->st_ino && p->dev == st->st_dev) {
+			assert(p->refcount > 0);
+			ret = p;
+			ret->refcount++;
+			break;
+		}
+	}
+	pthread_mutex_unlock(&lo->mutex);
+	return ret;
+}
+
+static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
+			 struct fuse_entry_param *e)
+{
+	int newfd;
+	int res;
+	int saverr;
+	struct lo_data *lo = lo_data(req);
+	struct lo_inode *inode;
+
+	memset(e, 0, sizeof(*e));
+	e->attr_timeout = lo->timeout;
+	e->entry_timeout = lo->timeout;
+
+	newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
+	if (newfd == -1)
+		goto out_err;
+
+	res = fstatat(newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+	if (res == -1)
+		goto out_err;
+
+	inode = lo_find(lo_data(req), &e->attr);
+	if (inode) {
+		close(newfd);
+		newfd = -1;
+	} else {
+		struct lo_inode *prev, *next;
+
+		saverr = ENOMEM;
+		inode = calloc(1, sizeof(struct lo_inode));
+		if (!inode)
+			goto out_err;
+
+		inode->is_symlink = S_ISLNK(e->attr.st_mode);
+		inode->refcount = 1;
+		inode->fd = newfd;
+		inode->ino = e->attr.st_ino;
+		inode->dev = e->attr.st_dev;
+
+		pthread_mutex_lock(&lo->mutex);
+		prev = &lo->root;
+		next = prev->next;
+		next->prev = inode;
+		inode->next = next;
+		inode->prev = prev;
+		prev->next = inode;
+		pthread_mutex_unlock(&lo->mutex);
+	}
+	e->ino = (uintptr_t) inode;
+
+	if (lo_debug(req))
+		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
+			(unsigned long long) parent, name, (unsigned long long) e->ino);
+
+	return 0;
+
+out_err:
+	saverr = errno;
+	if (newfd != -1)
+		close(newfd);
+	return saverr;
+}
+
+static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
+{
+	struct fuse_entry_param e;
+	int err;
+
+	if (lo_debug(req))
+		fuse_log(FUSE_LOG_DEBUG, "lo_lookup(parent=%" PRIu64 ", name=%s)\n",
+			parent, name);
+
+	err = lo_do_lookup(req, parent, name, &e);
+	if (err)
+		fuse_reply_err(req, err);
+	else
+		fuse_reply_entry(req, &e);
+}
+
+static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
+			     const char *name, mode_t mode, dev_t rdev,
+			     const char *link)
+{
+	int res;
+	int saverr;
+	struct lo_inode *dir = lo_inode(req, parent);
+	struct fuse_entry_param e;
+
+	saverr = ENOMEM;
+
+	res = mknod_wrapper(dir->fd, name, link, mode, rdev);
+
+	saverr = errno;
+	if (res == -1)
+		goto out;
+
+	saverr = lo_do_lookup(req, parent, name, &e);
+	if (saverr)
+		goto out;
+
+	if (lo_debug(req))
+		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
+			(unsigned long long) parent, name, (unsigned long long) e.ino);
+
+	fuse_reply_entry(req, &e);
+	return;
+
+out:
+	fuse_reply_err(req, saverr);
+}
+
+static void lo_mknod(fuse_req_t req, fuse_ino_t parent,
+		     const char *name, mode_t mode, dev_t rdev)
+{
+	lo_mknod_symlink(req, parent, name, mode, rdev, NULL);
+}
+
+static void lo_mkdir(fuse_req_t req, fuse_ino_t parent, const char *name,
+		     mode_t mode)
+{
+	lo_mknod_symlink(req, parent, name, S_IFDIR | mode, 0, NULL);
+}
+
+static void lo_symlink(fuse_req_t req, const char *link,
+		       fuse_ino_t parent, const char *name)
+{
+	lo_mknod_symlink(req, parent, name, S_IFLNK, 0, link);
+}
+
+static int linkat_empty_nofollow(struct lo_inode *inode, int dfd,
+				 const char *name)
+{
+	int res;
+	char procname[64];
+
+	if (inode->is_symlink) {
+		res = linkat(inode->fd, "", dfd, name, AT_EMPTY_PATH);
+		if (res == -1 && (errno == ENOENT || errno == EINVAL)) {
+			/* Sorry, no race free way to hard-link a symlink. */
+			errno = EPERM;
+		}
+		return res;
+	}
+
+	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+	return linkat(AT_FDCWD, procname, dfd, name, AT_SYMLINK_FOLLOW);
+}
+
+static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
+		    const char *name)
+{
+	int res;
+	struct lo_data *lo = lo_data(req);
+	struct lo_inode *inode = lo_inode(req, ino);
+	struct fuse_entry_param e;
+	int saverr;
+
+	memset(&e, 0, sizeof(struct fuse_entry_param));
+	e.attr_timeout = lo->timeout;
+	e.entry_timeout = lo->timeout;
+
+	res = linkat_empty_nofollow(inode, lo_fd(req, parent), name);
+	if (res == -1)
+		goto out_err;
+
+	res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+	if (res == -1)
+		goto out_err;
+
+	pthread_mutex_lock(&lo->mutex);
+	inode->refcount++;
+	pthread_mutex_unlock(&lo->mutex);
+	e.ino = (uintptr_t) inode;
+
+	if (lo_debug(req))
+		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
+			(unsigned long long) parent, name,
+			(unsigned long long) e.ino);
+
+	fuse_reply_entry(req, &e);
+	return;
+
+out_err:
+	saverr = errno;
+	fuse_reply_err(req, saverr);
+}
+
+static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
+{
+	int res;
+
+	res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
+
+	fuse_reply_err(req, res == -1 ? errno : 0);
+}
+
+static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
+		      fuse_ino_t newparent, const char *newname,
+		      unsigned int flags)
+{
+	int res;
+
+	if (flags) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
+	res = renameat(lo_fd(req, parent), name,
+			lo_fd(req, newparent), newname);
+
+	fuse_reply_err(req, res == -1 ? errno : 0);
+}
+
+static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
+{
+	int res;
+
+	res = unlinkat(lo_fd(req, parent), name, 0);
+
+	fuse_reply_err(req, res == -1 ? errno : 0);
+}
+
+static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
+{
+	if (!inode)
+		return;
+
+	pthread_mutex_lock(&lo->mutex);
+	assert(inode->refcount >= n);
+	inode->refcount -= n;
+	if (!inode->refcount) {
+		struct lo_inode *prev, *next;
+
+		prev = inode->prev;
+		next = inode->next;
+		next->prev = prev;
+		prev->next = next;
+
+		pthread_mutex_unlock(&lo->mutex);
+		close(inode->fd);
+		free(inode);
+
+	} else {
+		pthread_mutex_unlock(&lo->mutex);
+	}
+}
+
+static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
+{
+	struct lo_data *lo = lo_data(req);
+	struct lo_inode *inode = lo_inode(req, ino);
+
+	if (lo_debug(req)) {
+		fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
+			(unsigned long long) ino,
+			(unsigned long long) inode->refcount,
+			(unsigned long long) nlookup);
+	}
+
+	unref_inode(lo, inode, nlookup);
+}
+
+static void lo_forget(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
+{
+	lo_forget_one(req, ino, nlookup);
+	fuse_reply_none(req);
+}
+
+static void lo_forget_multi(fuse_req_t req, size_t count,
+				struct fuse_forget_data *forgets)
+{
+	int i;
+
+	for (i = 0; i < count; i++)
+		lo_forget_one(req, forgets[i].ino, forgets[i].nlookup);
+	fuse_reply_none(req);
+}
+
+static void lo_readlink(fuse_req_t req, fuse_ino_t ino)
+{
+	char buf[PATH_MAX + 1];
+	int res;
+
+	res = readlinkat(lo_fd(req, ino), "", buf, sizeof(buf));
+	if (res == -1)
+		return (void) fuse_reply_err(req, errno);
+
+	if (res == sizeof(buf))
+		return (void) fuse_reply_err(req, ENAMETOOLONG);
+
+	buf[res] = '\0';
+
+	fuse_reply_readlink(req, buf);
+}
+
+struct lo_dirp {
+	DIR *dp;
+	struct dirent *entry;
+	off_t offset;
+};
+
+static struct lo_dirp *lo_dirp(struct fuse_file_info *fi)
+{
+	return (struct lo_dirp *) (uintptr_t) fi->fh;
+}
+
+static void lo_opendir(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
+{
+	int error = ENOMEM;
+	struct lo_data *lo = lo_data(req);
+	struct lo_dirp *d;
+	int fd;
+
+	d = calloc(1, sizeof(struct lo_dirp));
+	if (d == NULL)
+		goto out_err;
+
+	fd = openat(lo_fd(req, ino), ".", O_RDONLY);
+	if (fd == -1)
+		goto out_errno;
+
+	d->dp = fdopendir(fd);
+	if (d->dp == NULL)
+		goto out_errno;
+
+	d->offset = 0;
+	d->entry = NULL;
+
+	fi->fh = (uintptr_t) d;
+	if (lo->cache == CACHE_ALWAYS)
+		fi->keep_cache = 1;
+	fuse_reply_open(req, fi);
+	return;
+
+out_errno:
+	error = errno;
+out_err:
+	if (d) {
+		if (fd != -1)
+			close(fd);
+		free(d);
+	}
+	fuse_reply_err(req, error);
+}
+
+static int is_dot_or_dotdot(const char *name)
+{
+	return name[0] == '.' && (name[1] == '\0' ||
+				  (name[1] == '.' && name[2] == '\0'));
+}
+
+static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
+			  off_t offset, struct fuse_file_info *fi, int plus)
+{
+	struct lo_dirp *d = lo_dirp(fi);
+	char *buf;
+	char *p;
+	size_t rem = size;
+	int err;
+
+	(void) ino;
+
+	buf = calloc(1, size);
+	if (!buf) {
+		err = ENOMEM;
+		goto error;
+	}
+	p = buf;
+
+	if (offset != d->offset) {
+		seekdir(d->dp, offset);
+		d->entry = NULL;
+		d->offset = offset;
+	}
+	while (1) {
+		size_t entsize;
+		off_t nextoff;
+		const char *name;
+
+		if (!d->entry) {
+			errno = 0;
+			d->entry = readdir(d->dp);
+			if (!d->entry) {
+				if (errno) {  // Error
+					err = errno;
+					goto error;
+				} else {  // End of stream
+					break; 
+				}
+			}
+		}
+		nextoff = d->entry->d_off;
+		name = d->entry->d_name;
+		fuse_ino_t entry_ino = 0;
+		if (plus) {
+			struct fuse_entry_param e;
+			if (is_dot_or_dotdot(name)) {
+				e = (struct fuse_entry_param) {
+					.attr.st_ino = d->entry->d_ino,
+					.attr.st_mode = d->entry->d_type << 12,
+				};
+			} else {
+				err = lo_do_lookup(req, ino, name, &e);
+				if (err)
+					goto error;
+				entry_ino = e.ino;
+			}
+
+			entsize = fuse_add_direntry_plus(req, p, rem, name,
+							 &e, nextoff);
+		} else {
+			struct stat st = {
+				.st_ino = d->entry->d_ino,
+				.st_mode = d->entry->d_type << 12,
+			};
+			entsize = fuse_add_direntry(req, p, rem, name,
+						    &st, nextoff);
+		}
+		if (entsize > rem) {
+			if (entry_ino != 0) 
+				lo_forget_one(req, entry_ino, 1);
+			break;
+		}
+		
+		p += entsize;
+		rem -= entsize;
+
+		d->entry = NULL;
+		d->offset = nextoff;
+	}
+
+    err = 0;
+error:
+    // If there's an error, we can only signal it if we haven't stored
+    // any entries yet - otherwise we'd end up with wrong lookup
+    // counts for the entries that are already in the buffer. So we
+    // return what we've collected until that point.
+    if (err && rem == size)
+	    fuse_reply_err(req, err);
+    else
+	    fuse_reply_buf(req, buf, size - rem);
+    free(buf);
+}
+
+static void lo_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
+		       off_t offset, struct fuse_file_info *fi)
+{
+	lo_do_readdir(req, ino, size, offset, fi, 0);
+}
+
+static void lo_readdirplus(fuse_req_t req, fuse_ino_t ino, size_t size,
+			   off_t offset, struct fuse_file_info *fi)
+{
+	lo_do_readdir(req, ino, size, offset, fi, 1);
+}
+
+static void lo_releasedir(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
+{
+	struct lo_dirp *d = lo_dirp(fi);
+	(void) ino;
+	closedir(d->dp);
+	free(d);
+	fuse_reply_err(req, 0);
+}
+
+static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
+		      mode_t mode, struct fuse_file_info *fi)
+{
+	int fd;
+	struct lo_data *lo = lo_data(req);
+	struct fuse_entry_param e;
+	int err;
+
+	if (lo_debug(req))
+		fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
+			parent, name);
+
+	fd = openat(lo_fd(req, parent), name,
+		    (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
+	if (fd == -1)
+		return (void) fuse_reply_err(req, errno);
+
+	fi->fh = fd;
+	if (lo->cache == CACHE_NEVER)
+		fi->direct_io = 1;
+	else if (lo->cache == CACHE_ALWAYS)
+		fi->keep_cache = 1;
+
+	err = lo_do_lookup(req, parent, name, &e);
+	if (err)
+		fuse_reply_err(req, err);
+	else
+		fuse_reply_create(req, &e, fi);
+}
+
+static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
+			struct fuse_file_info *fi)
+{
+	int res;
+	int fd = dirfd(lo_dirp(fi)->dp);
+	(void) ino;
+	if (datasync)
+		res = fdatasync(fd);
+	else
+		res = fsync(fd);
+	fuse_reply_err(req, res == -1 ? errno : 0);
+}
+
+static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
+{
+	int fd;
+	char buf[64];
+	struct lo_data *lo = lo_data(req);
+
+	if (lo_debug(req))
+		fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n",
+			ino, fi->flags);
+
+	/* With writeback cache, kernel may send read requests even
+	   when userspace opened write-only */
+	if (lo->writeback && (fi->flags & O_ACCMODE) == O_WRONLY) {
+		fi->flags &= ~O_ACCMODE;
+		fi->flags |= O_RDWR;
+	}
+
+	/* With writeback cache, O_APPEND is handled by the kernel.
+	   This breaks atomicity (since the file may change in the
+	   underlying filesystem, so that the kernel's idea of the
+	   end of the file isn't accurate anymore). In this example,
+	   we just accept that. A more rigorous filesystem may want
+	   to return an error here */
+	if (lo->writeback && (fi->flags & O_APPEND))
+		fi->flags &= ~O_APPEND;
+
+	sprintf(buf, "/proc/self/fd/%i", lo_fd(req, ino));
+	fd = open(buf, fi->flags & ~O_NOFOLLOW);
+	if (fd == -1)
+		return (void) fuse_reply_err(req, errno);
+
+	fi->fh = fd;
+	if (lo->cache == CACHE_NEVER)
+		fi->direct_io = 1;
+	else if (lo->cache == CACHE_ALWAYS)
+		fi->keep_cache = 1;
+	fuse_reply_open(req, fi);
+}
+
+static void lo_release(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
+{
+	(void) ino;
+
+	close(fi->fh);
+	fuse_reply_err(req, 0);
+}
+
+static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
+{
+	int res;
+	(void) ino;
+	res = close(dup(fi->fh));
+	fuse_reply_err(req, res == -1 ? errno : 0);
+}
+
+static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
+		     struct fuse_file_info *fi)
+{
+	int res;
+	(void) ino;
+	if (datasync)
+		res = fdatasync(fi->fh);
+	else
+		res = fsync(fi->fh);
+	fuse_reply_err(req, res == -1 ? errno : 0);
+}
+
+static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size,
+		    off_t offset, struct fuse_file_info *fi)
+{
+	struct fuse_bufvec buf = FUSE_BUFVEC_INIT(size);
+
+	if (lo_debug(req))
+		fuse_log(FUSE_LOG_DEBUG, "lo_read(ino=%" PRIu64 ", size=%zd, "
+			"off=%lu)\n", ino, size, (unsigned long) offset);
+
+	buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
+	buf.buf[0].fd = fi->fh;
+	buf.buf[0].pos = offset;
+
+	fuse_reply_data(req, &buf, FUSE_BUF_SPLICE_MOVE);
+}
+
+static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
+			 struct fuse_bufvec *in_buf, off_t off,
+			 struct fuse_file_info *fi)
+{
+	(void) ino;
+	ssize_t res;
+	struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
+
+	out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
+	out_buf.buf[0].fd = fi->fh;
+	out_buf.buf[0].pos = off;
+
+	if (lo_debug(req))
+		fuse_log(FUSE_LOG_DEBUG, "lo_write(ino=%" PRIu64 ", size=%zd, off=%lu)\n",
+			ino, out_buf.buf[0].size, (unsigned long) off);
+
+	res = fuse_buf_copy(&out_buf, in_buf, 0);
+	if(res < 0)
+		fuse_reply_err(req, -res);
+	else
+		fuse_reply_write(req, (size_t) res);
+}
+
+static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
+{
+	int res;
+	struct statvfs stbuf;
+
+	res = fstatvfs(lo_fd(req, ino), &stbuf);
+	if (res == -1)
+		fuse_reply_err(req, errno);
+	else
+		fuse_reply_statfs(req, &stbuf);
+}
+
+static void lo_fallocate(fuse_req_t req, fuse_ino_t ino, int mode,
+			 off_t offset, off_t length, struct fuse_file_info *fi)
+{
+	int err = EOPNOTSUPP;
+	(void) ino;
+
+#ifdef HAVE_FALLOCATE
+	err = fallocate(fi->fh, mode, offset, length);
+	if (err < 0)
+		err = errno;
+
+#elif defined(HAVE_POSIX_FALLOCATE)
+	if (mode) {
+		fuse_reply_err(req, EOPNOTSUPP);
+		return;
+	}
+
+	err = posix_fallocate(fi->fh, offset, length);
+#endif
+
+	fuse_reply_err(req, err);
+}
+
+static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
+		     int op)
+{
+	int res;
+	(void) ino;
+
+	res = flock(fi->fh, op);
+
+	fuse_reply_err(req, res == -1 ? errno : 0);
+}
+
+static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
+			size_t size)
+{
+	char *value = NULL;
+	char procname[64];
+	struct lo_inode *inode = lo_inode(req, ino);
+	ssize_t ret;
+	int saverr;
+
+	saverr = ENOSYS;
+	if (!lo_data(req)->xattr)
+		goto out;
+
+	if (lo_debug(req)) {
+		fuse_log(FUSE_LOG_DEBUG, "lo_getxattr(ino=%" PRIu64 ", name=%s size=%zd)\n",
+			ino, name, size);
+	}
+
+	if (inode->is_symlink) {
+		/* Sorry, no race free way to getxattr on symlink. */
+		saverr = EPERM;
+		goto out;
+	}
+
+	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+	if (size) {
+		value = malloc(size);
+		if (!value)
+			goto out_err;
+
+		ret = getxattr(procname, name, value, size);
+		if (ret == -1)
+			goto out_err;
+		saverr = 0;
+		if (ret == 0)
+			goto out;
+
+		fuse_reply_buf(req, value, ret);
+	} else {
+		ret = getxattr(procname, name, NULL, 0);
+		if (ret == -1)
+			goto out_err;
+
+		fuse_reply_xattr(req, ret);
+	}
+out_free:
+	free(value);
+	return;
+
+out_err:
+	saverr = errno;
+out:
+	fuse_reply_err(req, saverr);
+	goto out_free;
+}
+
+static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
+{
+	char *value = NULL;
+	char procname[64];
+	struct lo_inode *inode = lo_inode(req, ino);
+	ssize_t ret;
+	int saverr;
+
+	saverr = ENOSYS;
+	if (!lo_data(req)->xattr)
+		goto out;
+
+	if (lo_debug(req)) {
+		fuse_log(FUSE_LOG_DEBUG, "lo_listxattr(ino=%" PRIu64 ", size=%zd)\n",
+			ino, size);
+	}
+
+	if (inode->is_symlink) {
+		/* Sorry, no race free way to listxattr on symlink. */
+		saverr = EPERM;
+		goto out;
+	}
+
+	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+	if (size) {
+		value = malloc(size);
+		if (!value)
+			goto out_err;
+
+		ret = listxattr(procname, value, size);
+		if (ret == -1)
+			goto out_err;
+		saverr = 0;
+		if (ret == 0)
+			goto out;
+
+		fuse_reply_buf(req, value, ret);
+	} else {
+		ret = listxattr(procname, NULL, 0);
+		if (ret == -1)
+			goto out_err;
+
+		fuse_reply_xattr(req, ret);
+	}
+out_free:
+	free(value);
+	return;
+
+out_err:
+	saverr = errno;
+out:
+	fuse_reply_err(req, saverr);
+	goto out_free;
+}
+
+static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
+			const char *value, size_t size, int flags)
+{
+	char procname[64];
+	struct lo_inode *inode = lo_inode(req, ino);
+	ssize_t ret;
+	int saverr;
+
+	saverr = ENOSYS;
+	if (!lo_data(req)->xattr)
+		goto out;
+
+	if (lo_debug(req)) {
+		fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64 ", name=%s value=%s size=%zd)\n",
+			ino, name, value, size);
+	}
+
+	if (inode->is_symlink) {
+		/* Sorry, no race free way to setxattr on symlink. */
+		saverr = EPERM;
+		goto out;
+	}
+
+	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+	ret = setxattr(procname, name, value, size, flags);
+	saverr = ret == -1 ? errno : 0;
+
+out:
+	fuse_reply_err(req, saverr);
+}
+
+static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
+{
+	char procname[64];
+	struct lo_inode *inode = lo_inode(req, ino);
+	ssize_t ret;
+	int saverr;
+
+	saverr = ENOSYS;
+	if (!lo_data(req)->xattr)
+		goto out;
+
+	if (lo_debug(req)) {
+		fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n",
+			ino, name);
+	}
+
+	if (inode->is_symlink) {
+		/* Sorry, no race free way to setxattr on symlink. */
+		saverr = EPERM;
+		goto out;
+	}
+
+	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+	ret = removexattr(procname, name);
+	saverr = ret == -1 ? errno : 0;
+
+out:
+	fuse_reply_err(req, saverr);
+}
+
+#ifdef HAVE_COPY_FILE_RANGE
+static void lo_copy_file_range(fuse_req_t req, fuse_ino_t ino_in, off_t off_in,
+			       struct fuse_file_info *fi_in,
+			       fuse_ino_t ino_out, off_t off_out,
+			       struct fuse_file_info *fi_out, size_t len,
+			       int flags)
+{
+	ssize_t res;
+
+	if (lo_debug(req))
+		fuse_log(FUSE_LOG_DEBUG, "lo_copy_file_range(ino=%" PRIu64 "/fd=%lu, "
+				"off=%lu, ino=%" PRIu64 "/fd=%lu, "
+				"off=%lu, size=%zd, flags=0x%x)\n",
+			ino_in, fi_in->fh, off_in, ino_out, fi_out->fh, off_out,
+			len, flags);
+
+	res = copy_file_range(fi_in->fh, &off_in, fi_out->fh, &off_out, len,
+			      flags);
+	if (res < 0)
+		fuse_reply_err(req, -errno);
+	else
+		fuse_reply_write(req, res);
+}
+#endif
+
+static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
+		     struct fuse_file_info *fi)
+{
+	off_t res;
+
+	(void)ino;
+	res = lseek(fi->fh, off, whence);
+	if (res != -1)
+		fuse_reply_lseek(req, res);
+	else
+		fuse_reply_err(req, errno);
+}
+
+static struct fuse_lowlevel_ops lo_oper = {
+	.init		= lo_init,
+	.lookup		= lo_lookup,
+	.mkdir		= lo_mkdir,
+	.mknod		= lo_mknod,
+	.symlink	= lo_symlink,
+	.link		= lo_link,
+	.unlink		= lo_unlink,
+	.rmdir		= lo_rmdir,
+	.rename		= lo_rename,
+	.forget		= lo_forget,
+	.forget_multi	= lo_forget_multi,
+	.getattr	= lo_getattr,
+	.setattr	= lo_setattr,
+	.readlink	= lo_readlink,
+	.opendir	= lo_opendir,
+	.readdir	= lo_readdir,
+	.readdirplus	= lo_readdirplus,
+	.releasedir	= lo_releasedir,
+	.fsyncdir	= lo_fsyncdir,
+	.create		= lo_create,
+	.open		= lo_open,
+	.release	= lo_release,
+	.flush		= lo_flush,
+	.fsync		= lo_fsync,
+	.read		= lo_read,
+	.write_buf      = lo_write_buf,
+	.statfs		= lo_statfs,
+	.fallocate	= lo_fallocate,
+	.flock		= lo_flock,
+	.getxattr	= lo_getxattr,
+	.listxattr	= lo_listxattr,
+	.setxattr	= lo_setxattr,
+	.removexattr	= lo_removexattr,
+#ifdef HAVE_COPY_FILE_RANGE
+	.copy_file_range = lo_copy_file_range,
+#endif
+	.lseek		= lo_lseek,
+};
+
+int main(int argc, char *argv[])
+{
+	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
+	struct fuse_session *se;
+	struct fuse_cmdline_opts opts;
+	struct lo_data lo = { .debug = 0,
+	                      .writeback = 0 };
+	int ret = -1;
+
+	/* Don't mask creation mode, kernel already did that */
+	umask(0);
+
+	pthread_mutex_init(&lo.mutex, NULL);
+	lo.root.next = lo.root.prev = &lo.root;
+	lo.root.fd = -1;
+	lo.cache = CACHE_NORMAL;
+
+	if (fuse_parse_cmdline(&args, &opts) != 0)
+		return 1;
+	if (opts.show_help) {
+		printf("usage: %s [options] <mountpoint>\n\n", argv[0]);
+		fuse_cmdline_help();
+		fuse_lowlevel_help();
+		ret = 0;
+		goto err_out1;
+	} else if (opts.show_version) {
+		printf("FUSE library version %s\n", fuse_pkgversion());
+		fuse_lowlevel_version();
+		ret = 0;
+		goto err_out1;
+	}
+
+	if(opts.mountpoint == NULL) {
+		printf("usage: %s [options] <mountpoint>\n", argv[0]);
+		printf("       %s --help\n", argv[0]);
+		ret = 1;
+		goto err_out1;
+	}
+
+	if (fuse_opt_parse(&args, &lo, lo_opts, NULL)== -1)
+		return 1;
+
+	lo.debug = opts.debug;
+	lo.root.refcount = 2;
+	if (lo.source) {
+		struct stat stat;
+		int res;
+
+		res = lstat(lo.source, &stat);
+		if (res == -1) {
+			fuse_log(FUSE_LOG_ERR, "failed to stat source (\"%s\"): %m\n",
+				 lo.source);
+			exit(1);
+		}
+		if (!S_ISDIR(stat.st_mode)) {
+			fuse_log(FUSE_LOG_ERR, "source is not a directory\n");
+			exit(1);
+		}
+
+	} else {
+		lo.source = "/";
+	}
+	lo.root.is_symlink = false;
+	if (!lo.timeout_set) {
+		switch (lo.cache) {
+		case CACHE_NEVER:
+			lo.timeout = 0.0;
+			break;
+
+		case CACHE_NORMAL:
+			lo.timeout = 1.0;
+			break;
+
+		case CACHE_ALWAYS:
+			lo.timeout = 86400.0;
+			break;
+		}
+	} else if (lo.timeout < 0) {
+		fuse_log(FUSE_LOG_ERR, "timeout is negative (%lf)\n",
+			 lo.timeout);
+		exit(1);
+	}
+
+	lo.root.fd = open(lo.source, O_PATH);
+	if (lo.root.fd == -1) {
+		fuse_log(FUSE_LOG_ERR, "open(\"%s\", O_PATH): %m\n",
+			 lo.source);
+		exit(1);
+	}
+
+	se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
+	if (se == NULL)
+	    goto err_out1;
+
+	if (fuse_set_signal_handlers(se) != 0)
+	    goto err_out2;
+
+	if (fuse_session_mount(se, opts.mountpoint) != 0)
+	    goto err_out3;
+
+	fuse_daemonize(opts.foreground);
+
+	/* Block until ctrl+c or fusermount -u */
+	if (opts.singlethread)
+		ret = fuse_session_loop(se);
+	else
+		ret = fuse_session_loop_mt(se, opts.clone_fd);
+
+	fuse_session_unmount(se);
+err_out3:
+	fuse_remove_signal_handlers(se);
+err_out2:
+	fuse_session_destroy(se);
+err_out1:
+	free(opts.mountpoint);
+	fuse_opt_free_args(&args);
+
+	if (lo.root.fd >= 0)
+		close(lo.root.fd);
+
+	return ret ? 1 : 0;
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 006/104] virtiofsd: Trim down imported files
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 005/104] virtiofsd: Add passthrough_ll Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 12:02   ` Daniel P. Berrangé
  2020-01-21  9:58   ` Xiao Yang
  2019-12-12 16:37 ` [PATCH 007/104] virtiofsd: Format imported files to qemu style Dr. David Alan Gilbert (git)
                   ` (100 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

There's a lot of the original fuse code we don't need; trim them down.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse.h                |   8 -
 tools/virtiofsd/fuse_common.h         |   8 -
 tools/virtiofsd/fuse_i.h              |  22 -
 tools/virtiofsd/fuse_log.h            |   8 -
 tools/virtiofsd/fuse_loop_mt.c        | 308 ------------
 tools/virtiofsd/fuse_lowlevel.c       | 653 +-------------------------
 tools/virtiofsd/fuse_lowlevel.h       |   8 -
 tools/virtiofsd/fuse_opt.h            |   8 -
 tools/virtiofsd/helper.c              | 138 ------
 tools/virtiofsd/passthrough_helpers.h |  26 -
 10 files changed, 5 insertions(+), 1182 deletions(-)

diff --git a/tools/virtiofsd/fuse.h b/tools/virtiofsd/fuse.h
index 883f6e59fb..6c16a0041d 100644
--- a/tools/virtiofsd/fuse.h
+++ b/tools/virtiofsd/fuse.h
@@ -25,10 +25,6 @@
 #include <sys/statvfs.h>
 #include <sys/uio.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* ----------------------------------------------------------- *
  * Basic FUSE API					       *
  * ----------------------------------------------------------- */
@@ -1268,8 +1264,4 @@ struct fuse_session *fuse_get_session(struct fuse *f);
  */
 int fuse_open_channel(const char *mountpoint, const char *options);
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* FUSE_H_ */
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 2d686b2ac4..18fba813da 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -28,10 +28,6 @@
 #define FUSE_MAKE_VERSION(maj, min)  ((maj) * 10 + (min))
 #define FUSE_VERSION FUSE_MAKE_VERSION(FUSE_MAJOR_VERSION, FUSE_MINOR_VERSION)
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Information about an open file.
  *
@@ -802,10 +798,6 @@ void fuse_remove_signal_handlers(struct fuse_session *se);
 #  error only API version 30 or greater is supported
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 
 /*
  * This interface uses 64 bit off_t.
diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index d38b630ac5..bcd6a140fc 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -9,8 +9,6 @@
 #include "fuse.h"
 #include "fuse_lowlevel.h"
 
-struct mount_opts;
-
 struct fuse_req {
 	struct fuse_session *se;
 	uint64_t unique;
@@ -45,7 +43,6 @@ struct fuse_session {
 	char *mountpoint;
 	volatile int exited;
 	int fd;
-	struct mount_opts *mo;
 	int debug;
 	int deny_others;
 	struct fuse_lowlevel_ops op;
@@ -58,7 +55,6 @@ struct fuse_session {
 	struct fuse_req interrupts;
 	pthread_mutex_t lock;
 	int got_destroy;
-	pthread_key_t pipe_key;
 	int broken_splice_nonblock;
 	uint64_t notify_ctr;
 	struct fuse_notify_req notify_list;
@@ -106,34 +102,16 @@ struct fuse_chan *fuse_chan_get(struct fuse_chan *ch);
  */
 void fuse_chan_put(struct fuse_chan *ch);
 
-struct mount_opts *parse_mount_opts(struct fuse_args *args);
-void destroy_mount_opts(struct mount_opts *mo);
-void fuse_mount_version(void);
-unsigned get_max_read(struct mount_opts *o);
-void fuse_kern_unmount(const char *mountpoint, int fd);
-int fuse_kern_mount(const char *mountpoint, struct mount_opts *mo);
-
 int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
 			       int count);
 void fuse_free_req(fuse_req_t req);
 
-void cuse_lowlevel_init(fuse_req_t req, fuse_ino_t nodeide, const void *inarg);
-
-int fuse_start_thread(pthread_t *thread_id, void *(*func)(void *), void *arg);
-
-int fuse_session_receive_buf_int(struct fuse_session *se, struct fuse_buf *buf,
-				 struct fuse_chan *ch);
 void fuse_session_process_buf_int(struct fuse_session *se,
 				  const struct fuse_buf *buf, struct fuse_chan *ch);
 
-struct fuse *fuse_new_31(struct fuse_args *args, const struct fuse_operations *op,
-		      size_t op_size, void *private_data);
-int fuse_loop_mt_32(struct fuse *f, struct fuse_loop_config *config);
-int fuse_session_loop_mt_32(struct fuse_session *se, struct fuse_loop_config *config);
 
 #define FUSE_MAX_MAX_PAGES 256
 #define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32
 
 /* room needed in buffer to accommodate header */
 #define FUSE_BUFFER_HEADER_SIZE 0x1000
-
diff --git a/tools/virtiofsd/fuse_log.h b/tools/virtiofsd/fuse_log.h
index 5e112e0f53..0af700da6b 100644
--- a/tools/virtiofsd/fuse_log.h
+++ b/tools/virtiofsd/fuse_log.h
@@ -16,10 +16,6 @@
 
 #include <stdarg.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Log severity level
  *
@@ -75,8 +71,4 @@ void fuse_set_log_func(fuse_log_func_t func);
  */
 void fuse_log(enum fuse_log_level level, const char *fmt, ...);
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* FUSE_LOG_H_ */
diff --git a/tools/virtiofsd/fuse_loop_mt.c b/tools/virtiofsd/fuse_loop_mt.c
index 445e9a0ab0..6fcd47e42c 100644
--- a/tools/virtiofsd/fuse_loop_mt.c
+++ b/tools/virtiofsd/fuse_loop_mt.c
@@ -28,48 +28,6 @@
 /* Environment var controlling the thread stack size */
 #define ENVNAME_THREAD_STACK "FUSE_THREAD_STACK"
 
-struct fuse_worker {
-	struct fuse_worker *prev;
-	struct fuse_worker *next;
-	pthread_t thread_id;
-	size_t bufsize;
-
-	// We need to include fuse_buf so that we can properly free
-	// it when a thread is terminated by pthread_cancel().
-	struct fuse_buf fbuf;
-	struct fuse_chan *ch;
-	struct fuse_mt *mt;
-};
-
-struct fuse_mt {
-	pthread_mutex_t lock;
-	int numworker;
-	int numavail;
-	struct fuse_session *se;
-	struct fuse_worker main;
-	sem_t finish;
-	int exit;
-	int error;
-	int clone_fd;
-	int max_idle;
-};
-
-static struct fuse_chan *fuse_chan_new(int fd)
-{
-	struct fuse_chan *ch = (struct fuse_chan *) malloc(sizeof(*ch));
-	if (ch == NULL) {
-		fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate channel\n");
-		return NULL;
-	}
-
-	memset(ch, 0, sizeof(*ch));
-	ch->fd = fd;
-	ch->ctr = 1;
-	fuse_mutex_init(&ch->lock);
-
-	return ch;
-}
-
 struct fuse_chan *fuse_chan_get(struct fuse_chan *ch)
 {
 	assert(ch->ctr > 0);
@@ -94,269 +52,3 @@ void fuse_chan_put(struct fuse_chan *ch)
 	} else
 		pthread_mutex_unlock(&ch->lock);
 }
-
-static void list_add_worker(struct fuse_worker *w, struct fuse_worker *next)
-{
-	struct fuse_worker *prev = next->prev;
-	w->next = next;
-	w->prev = prev;
-	prev->next = w;
-	next->prev = w;
-}
-
-static void list_del_worker(struct fuse_worker *w)
-{
-	struct fuse_worker *prev = w->prev;
-	struct fuse_worker *next = w->next;
-	prev->next = next;
-	next->prev = prev;
-}
-
-static int fuse_loop_start_thread(struct fuse_mt *mt);
-
-static void *fuse_do_work(void *data)
-{
-	struct fuse_worker *w = (struct fuse_worker *) data;
-	struct fuse_mt *mt = w->mt;
-
-	while (!fuse_session_exited(mt->se)) {
-		int isforget = 0;
-		int res;
-
-		pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
-		res = fuse_session_receive_buf_int(mt->se, &w->fbuf, w->ch);
-		pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
-		if (res == -EINTR)
-			continue;
-		if (res <= 0) {
-			if (res < 0) {
-				fuse_session_exit(mt->se);
-				mt->error = res;
-			}
-			break;
-		}
-
-		pthread_mutex_lock(&mt->lock);
-		if (mt->exit) {
-			pthread_mutex_unlock(&mt->lock);
-			return NULL;
-		}
-
-		/*
-		 * This disgusting hack is needed so that zillions of threads
-		 * are not created on a burst of FORGET messages
-		 */
-		if (!(w->fbuf.flags & FUSE_BUF_IS_FD)) {
-			struct fuse_in_header *in = w->fbuf.mem;
-
-			if (in->opcode == FUSE_FORGET ||
-			    in->opcode == FUSE_BATCH_FORGET)
-				isforget = 1;
-		}
-
-		if (!isforget)
-			mt->numavail--;
-		if (mt->numavail == 0)
-			fuse_loop_start_thread(mt);
-		pthread_mutex_unlock(&mt->lock);
-
-		fuse_session_process_buf_int(mt->se, &w->fbuf, w->ch);
-
-		pthread_mutex_lock(&mt->lock);
-		if (!isforget)
-			mt->numavail++;
-		if (mt->numavail > mt->max_idle) {
-			if (mt->exit) {
-				pthread_mutex_unlock(&mt->lock);
-				return NULL;
-			}
-			list_del_worker(w);
-			mt->numavail--;
-			mt->numworker--;
-			pthread_mutex_unlock(&mt->lock);
-
-			pthread_detach(w->thread_id);
-			free(w->fbuf.mem);
-			fuse_chan_put(w->ch);
-			free(w);
-			return NULL;
-		}
-		pthread_mutex_unlock(&mt->lock);
-	}
-
-	sem_post(&mt->finish);
-
-	return NULL;
-}
-
-int fuse_start_thread(pthread_t *thread_id, void *(*func)(void *), void *arg)
-{
-	sigset_t oldset;
-	sigset_t newset;
-	int res;
-	pthread_attr_t attr;
-	char *stack_size;
-
-	/* Override default stack size */
-	pthread_attr_init(&attr);
-	stack_size = getenv(ENVNAME_THREAD_STACK);
-	if (stack_size && pthread_attr_setstacksize(&attr, atoi(stack_size)))
-		fuse_log(FUSE_LOG_ERR, "fuse: invalid stack size: %s\n", stack_size);
-
-	/* Disallow signal reception in worker threads */
-	sigemptyset(&newset);
-	sigaddset(&newset, SIGTERM);
-	sigaddset(&newset, SIGINT);
-	sigaddset(&newset, SIGHUP);
-	sigaddset(&newset, SIGQUIT);
-	pthread_sigmask(SIG_BLOCK, &newset, &oldset);
-	res = pthread_create(thread_id, &attr, func, arg);
-	pthread_sigmask(SIG_SETMASK, &oldset, NULL);
-	pthread_attr_destroy(&attr);
-	if (res != 0) {
-		fuse_log(FUSE_LOG_ERR, "fuse: error creating thread: %s\n",
-			strerror(res));
-		return -1;
-	}
-
-	return 0;
-}
-
-static struct fuse_chan *fuse_clone_chan(struct fuse_mt *mt)
-{
-	int res;
-	int clonefd;
-	uint32_t masterfd;
-	struct fuse_chan *newch;
-	const char *devname = "/dev/fuse";
-
-#ifndef O_CLOEXEC
-#define O_CLOEXEC 0
-#endif
-	clonefd = open(devname, O_RDWR | O_CLOEXEC);
-	if (clonefd == -1) {
-		fuse_log(FUSE_LOG_ERR, "fuse: failed to open %s: %s\n", devname,
-			strerror(errno));
-		return NULL;
-	}
-	fcntl(clonefd, F_SETFD, FD_CLOEXEC);
-
-	masterfd = mt->se->fd;
-	res = ioctl(clonefd, FUSE_DEV_IOC_CLONE, &masterfd);
-	if (res == -1) {
-		fuse_log(FUSE_LOG_ERR, "fuse: failed to clone device fd: %s\n",
-			strerror(errno));
-		close(clonefd);
-		return NULL;
-	}
-	newch = fuse_chan_new(clonefd);
-	if (newch == NULL)
-		close(clonefd);
-
-	return newch;
-}
-
-static int fuse_loop_start_thread(struct fuse_mt *mt)
-{
-	int res;
-
-	struct fuse_worker *w = malloc(sizeof(struct fuse_worker));
-	if (!w) {
-		fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate worker structure\n");
-		return -1;
-	}
-	memset(w, 0, sizeof(struct fuse_worker));
-	w->fbuf.mem = NULL;
-	w->mt = mt;
-
-	w->ch = NULL;
-	if (mt->clone_fd) {
-		w->ch = fuse_clone_chan(mt);
-		if(!w->ch) {
-			/* Don't attempt this again */
-			fuse_log(FUSE_LOG_ERR, "fuse: trying to continue "
-				"without -o clone_fd.\n");
-			mt->clone_fd = 0;
-		}
-	}
-
-	res = fuse_start_thread(&w->thread_id, fuse_do_work, w);
-	if (res == -1) {
-		fuse_chan_put(w->ch);
-		free(w);
-		return -1;
-	}
-	list_add_worker(w, &mt->main);
-	mt->numavail ++;
-	mt->numworker ++;
-
-	return 0;
-}
-
-static void fuse_join_worker(struct fuse_mt *mt, struct fuse_worker *w)
-{
-	pthread_join(w->thread_id, NULL);
-	pthread_mutex_lock(&mt->lock);
-	list_del_worker(w);
-	pthread_mutex_unlock(&mt->lock);
-	free(w->fbuf.mem);
-	fuse_chan_put(w->ch);
-	free(w);
-}
-
-FUSE_SYMVER(".symver fuse_session_loop_mt_32,fuse_session_loop_mt@@FUSE_3.2");
-int fuse_session_loop_mt_32(struct fuse_session *se, struct fuse_loop_config *config)
-{
-	int err;
-	struct fuse_mt mt;
-	struct fuse_worker *w;
-
-	memset(&mt, 0, sizeof(struct fuse_mt));
-	mt.se = se;
-	mt.clone_fd = config->clone_fd;
-	mt.error = 0;
-	mt.numworker = 0;
-	mt.numavail = 0;
-	mt.max_idle = config->max_idle_threads;
-	mt.main.thread_id = pthread_self();
-	mt.main.prev = mt.main.next = &mt.main;
-	sem_init(&mt.finish, 0, 0);
-	fuse_mutex_init(&mt.lock);
-
-	pthread_mutex_lock(&mt.lock);
-	err = fuse_loop_start_thread(&mt);
-	pthread_mutex_unlock(&mt.lock);
-	if (!err) {
-		/* sem_wait() is interruptible */
-		while (!fuse_session_exited(se))
-			sem_wait(&mt.finish);
-
-		pthread_mutex_lock(&mt.lock);
-		for (w = mt.main.next; w != &mt.main; w = w->next)
-			pthread_cancel(w->thread_id);
-		mt.exit = 1;
-		pthread_mutex_unlock(&mt.lock);
-
-		while (mt.main.next != &mt.main)
-			fuse_join_worker(&mt, mt.main.next);
-
-		err = mt.error;
-	}
-
-	pthread_mutex_destroy(&mt.lock);
-	sem_destroy(&mt.finish);
-	if(se->error != 0)
-		err = se->error;
-	fuse_session_reset(se);
-	return err;
-}
-
-int fuse_session_loop_mt_31(struct fuse_session *se, int clone_fd);
-FUSE_SYMVER(".symver fuse_session_loop_mt_31,fuse_session_loop_mt@FUSE_3.0");
-int fuse_session_loop_mt_31(struct fuse_session *se, int clone_fd)
-{
-	struct fuse_loop_config config;
-	config.clone_fd = clone_fd;
-	config.max_idle_threads = 10;
-	return fuse_session_loop_mt_32(se, &config);
-}
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index f2d7038e34..5d915666d8 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -28,12 +28,6 @@
 #include <assert.h>
 #include <sys/file.h>
 
-#ifndef F_LINUX_SPECIFIC_BASE
-#define F_LINUX_SPECIFIC_BASE       1024
-#endif
-#ifndef F_SETPIPE_SZ
-#define F_SETPIPE_SZ	(F_LINUX_SPECIFIC_BASE + 7)
-#endif
 
 
 #define PARAM(inarg) (((char *)(inarg)) + sizeof(*(inarg)))
@@ -184,19 +178,7 @@ static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 		}
 	}
 
-	ssize_t res = writev(ch ? ch->fd : se->fd,
-			     iov, count);
-	int err = errno;
-
-	if (res == -1) {
-		assert(se != NULL);
-
-		/* ENOENT means the operation was interrupted */
-		if (!fuse_session_exited(se) && err != ENOENT)
-			perror("fuse: writing device");
-		return -err;
-	}
-
+	abort(); /* virtio should have taken it before here */
 	return 0;
 }
 
@@ -480,10 +462,6 @@ static int fuse_send_data_iov_fallback(struct fuse_session *se,
 				       struct fuse_bufvec *buf,
 				       size_t len)
 {
-	struct fuse_bufvec mem_buf = FUSE_BUFVEC_INIT(len);
-	void *mbuf;
-	int res;
-
 	/* Optimize common case */
 	if (buf->count == 1 && buf->idx == 0 && buf->off == 0 &&
 	    !(buf->buf[0].flags & FUSE_BUF_IS_FD)) {
@@ -496,350 +474,10 @@ static int fuse_send_data_iov_fallback(struct fuse_session *se,
 		return fuse_send_msg(se, ch, iov, iov_count);
 	}
 
-	res = posix_memalign(&mbuf, pagesize, len);
-	if (res != 0)
-		return res;
-
-	mem_buf.buf[0].mem = mbuf;
-	res = fuse_buf_copy(&mem_buf, buf, 0);
-	if (res < 0) {
-		free(mbuf);
-		return -res;
-	}
-	len = res;
-
-	iov[iov_count].iov_base = mbuf;
-	iov[iov_count].iov_len = len;
-	iov_count++;
-	res = fuse_send_msg(se, ch, iov, iov_count);
-	free(mbuf);
-
-	return res;
-}
-
-struct fuse_ll_pipe {
-	size_t size;
-	int can_grow;
-	int pipe[2];
-};
-
-static void fuse_ll_pipe_free(struct fuse_ll_pipe *llp)
-{
-	close(llp->pipe[0]);
-	close(llp->pipe[1]);
-	free(llp);
-}
-
-#ifdef HAVE_SPLICE
-#if !defined(HAVE_PIPE2) || !defined(O_CLOEXEC)
-static int fuse_pipe(int fds[2])
-{
-	int rv = pipe(fds);
-
-	if (rv == -1)
-		return rv;
-
-	if (fcntl(fds[0], F_SETFL, O_NONBLOCK) == -1 ||
-	    fcntl(fds[1], F_SETFL, O_NONBLOCK) == -1 ||
-	    fcntl(fds[0], F_SETFD, FD_CLOEXEC) == -1 ||
-	    fcntl(fds[1], F_SETFD, FD_CLOEXEC) == -1) {
-		close(fds[0]);
-		close(fds[1]);
-		rv = -1;
-	}
-	return rv;
-}
-#else
-static int fuse_pipe(int fds[2])
-{
-	return pipe2(fds, O_CLOEXEC | O_NONBLOCK);
-}
-#endif
-
-static struct fuse_ll_pipe *fuse_ll_get_pipe(struct fuse_session *se)
-{
-	struct fuse_ll_pipe *llp = pthread_getspecific(se->pipe_key);
-	if (llp == NULL) {
-		int res;
-
-		llp = malloc(sizeof(struct fuse_ll_pipe));
-		if (llp == NULL)
-			return NULL;
-
-		res = fuse_pipe(llp->pipe);
-		if (res == -1) {
-			free(llp);
-			return NULL;
-		}
-
-		/*
-		 *the default size is 16 pages on linux
-		 */
-		llp->size = pagesize * 16;
-		llp->can_grow = 1;
-
-		pthread_setspecific(se->pipe_key, llp);
-	}
-
-	return llp;
-}
-#endif
-
-static void fuse_ll_clear_pipe(struct fuse_session *se)
-{
-	struct fuse_ll_pipe *llp = pthread_getspecific(se->pipe_key);
-	if (llp) {
-		pthread_setspecific(se->pipe_key, NULL);
-		fuse_ll_pipe_free(llp);
-	}
-}
-
-#if defined(HAVE_SPLICE) && defined(HAVE_VMSPLICE)
-static int read_back(int fd, char *buf, size_t len)
-{
-	int res;
-
-	res = read(fd, buf, len);
-	if (res == -1) {
-		fuse_log(FUSE_LOG_ERR, "fuse: internal error: failed to read back from pipe: %s\n", strerror(errno));
-		return -EIO;
-	}
-	if (res != len) {
-		fuse_log(FUSE_LOG_ERR, "fuse: internal error: short read back from pipe: %i from %zi\n", res, len);
-		return -EIO;
-	}
+	abort(); /* Will have taken vhost path */
 	return 0;
 }
 
-static int grow_pipe_to_max(int pipefd)
-{
-	int max;
-	int res;
-	int maxfd;
-	char buf[32];
-
-	maxfd = open("/proc/sys/fs/pipe-max-size", O_RDONLY);
-	if (maxfd < 0)
-		return -errno;
-
-	res = read(maxfd, buf, sizeof(buf) - 1);
-	if (res < 0) {
-		int saved_errno;
-
-		saved_errno = errno;
-		close(maxfd);
-		return -saved_errno;
-	}
-	close(maxfd);
-	buf[res] = '\0';
-
-	max = atoi(buf);
-	res = fcntl(pipefd, F_SETPIPE_SZ, max);
-	if (res < 0)
-		return -errno;
-	return max;
-}
-
-static int fuse_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
-			       struct iovec *iov, int iov_count,
-			       struct fuse_bufvec *buf, unsigned int flags)
-{
-	int res;
-	size_t len = fuse_buf_size(buf);
-	struct fuse_out_header *out = iov[0].iov_base;
-	struct fuse_ll_pipe *llp;
-	int splice_flags;
-	size_t pipesize;
-	size_t total_fd_size;
-	size_t idx;
-	size_t headerlen;
-	struct fuse_bufvec pipe_buf = FUSE_BUFVEC_INIT(len);
-
-	if (se->broken_splice_nonblock)
-		goto fallback;
-
-	if (flags & FUSE_BUF_NO_SPLICE)
-		goto fallback;
-
-	total_fd_size = 0;
-	for (idx = buf->idx; idx < buf->count; idx++) {
-		if (buf->buf[idx].flags & FUSE_BUF_IS_FD) {
-			total_fd_size = buf->buf[idx].size;
-			if (idx == buf->idx)
-				total_fd_size -= buf->off;
-		}
-	}
-	if (total_fd_size < 2 * pagesize)
-		goto fallback;
-
-	if (se->conn.proto_minor < 14 ||
-	    !(se->conn.want & FUSE_CAP_SPLICE_WRITE))
-		goto fallback;
-
-	llp = fuse_ll_get_pipe(se);
-	if (llp == NULL)
-		goto fallback;
-
-
-	headerlen = iov_length(iov, iov_count);
-
-	out->len = headerlen + len;
-
-	/*
-	 * Heuristic for the required pipe size, does not work if the
-	 * source contains less than page size fragments
-	 */
-	pipesize = pagesize * (iov_count + buf->count + 1) + out->len;
-
-	if (llp->size < pipesize) {
-		if (llp->can_grow) {
-			res = fcntl(llp->pipe[0], F_SETPIPE_SZ, pipesize);
-			if (res == -1) {
-				res = grow_pipe_to_max(llp->pipe[0]);
-				if (res > 0)
-					llp->size = res;
-				llp->can_grow = 0;
-				goto fallback;
-			}
-			llp->size = res;
-		}
-		if (llp->size < pipesize)
-			goto fallback;
-	}
-
-
-	res = vmsplice(llp->pipe[1], iov, iov_count, SPLICE_F_NONBLOCK);
-	if (res == -1)
-		goto fallback;
-
-	if (res != headerlen) {
-		res = -EIO;
-		fuse_log(FUSE_LOG_ERR, "fuse: short vmsplice to pipe: %u/%zu\n", res,
-			headerlen);
-		goto clear_pipe;
-	}
-
-	pipe_buf.buf[0].flags = FUSE_BUF_IS_FD;
-	pipe_buf.buf[0].fd = llp->pipe[1];
-
-	res = fuse_buf_copy(&pipe_buf, buf,
-			    FUSE_BUF_FORCE_SPLICE | FUSE_BUF_SPLICE_NONBLOCK);
-	if (res < 0) {
-		if (res == -EAGAIN || res == -EINVAL) {
-			/*
-			 * Should only get EAGAIN on kernels with
-			 * broken SPLICE_F_NONBLOCK support (<=
-			 * 2.6.35) where this error or a short read is
-			 * returned even if the pipe itself is not
-			 * full
-			 *
-			 * EINVAL might mean that splice can't handle
-			 * this combination of input and output.
-			 */
-			if (res == -EAGAIN)
-				se->broken_splice_nonblock = 1;
-
-			pthread_setspecific(se->pipe_key, NULL);
-			fuse_ll_pipe_free(llp);
-			goto fallback;
-		}
-		res = -res;
-		goto clear_pipe;
-	}
-
-	if (res != 0 && res < len) {
-		struct fuse_bufvec mem_buf = FUSE_BUFVEC_INIT(len);
-		void *mbuf;
-		size_t now_len = res;
-		/*
-		 * For regular files a short count is either
-		 *  1) due to EOF, or
-		 *  2) because of broken SPLICE_F_NONBLOCK (see above)
-		 *
-		 * For other inputs it's possible that we overflowed
-		 * the pipe because of small buffer fragments.
-		 */
-
-		res = posix_memalign(&mbuf, pagesize, len);
-		if (res != 0)
-			goto clear_pipe;
-
-		mem_buf.buf[0].mem = mbuf;
-		mem_buf.off = now_len;
-		res = fuse_buf_copy(&mem_buf, buf, 0);
-		if (res > 0) {
-			char *tmpbuf;
-			size_t extra_len = res;
-			/*
-			 * Trickiest case: got more data.  Need to get
-			 * back the data from the pipe and then fall
-			 * back to regular write.
-			 */
-			tmpbuf = malloc(headerlen);
-			if (tmpbuf == NULL) {
-				free(mbuf);
-				res = ENOMEM;
-				goto clear_pipe;
-			}
-			res = read_back(llp->pipe[0], tmpbuf, headerlen);
-			free(tmpbuf);
-			if (res != 0) {
-				free(mbuf);
-				goto clear_pipe;
-			}
-			res = read_back(llp->pipe[0], mbuf, now_len);
-			if (res != 0) {
-				free(mbuf);
-				goto clear_pipe;
-			}
-			len = now_len + extra_len;
-			iov[iov_count].iov_base = mbuf;
-			iov[iov_count].iov_len = len;
-			iov_count++;
-			res = fuse_send_msg(se, ch, iov, iov_count);
-			free(mbuf);
-			return res;
-		}
-		free(mbuf);
-		res = now_len;
-	}
-	len = res;
-	out->len = headerlen + len;
-
-	if (se->debug) {
-		fuse_log(FUSE_LOG_DEBUG,
-			"   unique: %llu, success, outsize: %i (splice)\n",
-			(unsigned long long) out->unique, out->len);
-	}
-
-	splice_flags = 0;
-	if ((flags & FUSE_BUF_SPLICE_MOVE) &&
-	    (se->conn.want & FUSE_CAP_SPLICE_MOVE))
-		splice_flags |= SPLICE_F_MOVE;
-
-	res = splice(llp->pipe[0], NULL, ch ? ch->fd : se->fd,
-		     NULL, out->len, splice_flags);
-	if (res == -1) {
-		res = -errno;
-		perror("fuse: splice from pipe");
-		goto clear_pipe;
-	}
-	if (res != out->len) {
-		res = -EIO;
-		fuse_log(FUSE_LOG_ERR, "fuse: short splice from pipe: %u/%u\n",
-			res, out->len);
-		goto clear_pipe;
-	}
-	return 0;
-
-clear_pipe:
-	fuse_ll_clear_pipe(se);
-	return res;
-
-fallback:
-	return fuse_send_data_iov_fallback(se, ch, iov, iov_count, buf, len);
-}
-#else
 static int fuse_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
 			       struct iovec *iov, int iov_count,
 			       struct fuse_bufvec *buf, unsigned int flags)
@@ -849,7 +487,6 @@ static int fuse_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
 
 	return fuse_send_data_iov_fallback(se, ch, iov, iov_count, buf, len);
 }
-#endif
 
 int fuse_reply_data(fuse_req_t req, struct fuse_bufvec *bufv,
 		    enum fuse_buf_copy_flags flags)
@@ -1408,16 +1045,11 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
 	if (bufv.buf[0].size < arg->size) {
 		fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
 		fuse_reply_err(req, EIO);
-		goto out;
+		return;
 	}
 	bufv.buf[0].size = arg->size;
 
 	se->op.write_buf(req, nodeid, &bufv, arg->offset, &fi);
-
-out:
-	/* Need to reset the pipe if ->write_buf() didn't consume all data */
-	if ((ibuf->flags & FUSE_BUF_IS_FD) && bufv.idx < bufv.count)
-		fuse_ll_clear_pipe(se);
 }
 
 static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
@@ -2038,17 +1670,6 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		return;
 	}
 
-	unsigned max_read_mo = get_max_read(se->mo);
-	if (se->conn.max_read != max_read_mo) {
-		fuse_log(FUSE_LOG_ERR, "fuse: error: init() and fuse_session_new() "
-			"requested different maximum read size (%u vs %u)\n",
-			se->conn.max_read, max_read_mo);
-		fuse_reply_err(req, EPROTO);
-		se->error = -EPROTO;
-		fuse_session_exit(se);
-		return;
-	}
-
 	if (se->conn.max_write < bufsize - FUSE_BUFFER_HEADER_SIZE) {
 		se->bufsize = se->conn.max_write + FUSE_BUFFER_HEADER_SIZE;
 	}
@@ -2364,8 +1985,6 @@ static void fuse_ll_retrieve_reply(struct fuse_notify_req *nreq,
 	}
 out:
 	free(rreq);
-	if ((ibuf->flags & FUSE_BUF_IS_FD) && bufv.idx < bufv.count)
-		fuse_ll_clear_pipe(se);
 }
 
 int fuse_lowlevel_notify_retrieve(struct fuse_session *se, fuse_ino_t ino,
@@ -2509,21 +2128,6 @@ static const char *opname(enum fuse_opcode opcode)
 		return fuse_ll_ops[opcode].name;
 }
 
-static int fuse_ll_copy_from_pipe(struct fuse_bufvec *dst,
-				  struct fuse_bufvec *src)
-{
-	ssize_t res = fuse_buf_copy(dst, src, 0);
-	if (res < 0) {
-		fuse_log(FUSE_LOG_ERR, "fuse: copy from pipe: %s\n", strerror(-res));
-		return res;
-	}
-	if ((size_t)res < fuse_buf_size(dst)) {
-		fuse_log(FUSE_LOG_ERR, "fuse: copy from pipe: short read\n");
-		return -1;
-	}
-	return 0;
-}
-
 void fuse_session_process_buf(struct fuse_session *se,
 			      const struct fuse_buf *buf)
 {
@@ -2533,36 +2137,12 @@ void fuse_session_process_buf(struct fuse_session *se,
 void fuse_session_process_buf_int(struct fuse_session *se,
 				  const struct fuse_buf *buf, struct fuse_chan *ch)
 {
-	const size_t write_header_size = sizeof(struct fuse_in_header) +
-		sizeof(struct fuse_write_in);
-	struct fuse_bufvec bufv = { .buf[0] = *buf, .count = 1 };
-	struct fuse_bufvec tmpbuf = FUSE_BUFVEC_INIT(write_header_size);
 	struct fuse_in_header *in;
 	const void *inarg;
 	struct fuse_req *req;
-	void *mbuf = NULL;
 	int err;
-	int res;
 
-	if (buf->flags & FUSE_BUF_IS_FD) {
-		if (buf->size < tmpbuf.buf[0].size)
-			tmpbuf.buf[0].size = buf->size;
-
-		mbuf = malloc(tmpbuf.buf[0].size);
-		if (mbuf == NULL) {
-			fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate header\n");
-			goto clear_pipe;
-		}
-		tmpbuf.buf[0].mem = mbuf;
-
-		res = fuse_ll_copy_from_pipe(&tmpbuf, &bufv);
-		if (res < 0)
-			goto clear_pipe;
-
-		in = mbuf;
-	} else {
-		in = buf->mem;
-	}
+	in = buf->mem;
 
 	if (se->debug) {
 		fuse_log(FUSE_LOG_DEBUG,
@@ -2584,7 +2164,7 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 		};
 
 		fuse_send_msg(se, ch, &iov, 1);
-		goto clear_pipe;
+		return;
 	}
 
 	req->unique = in->unique;
@@ -2627,28 +2207,6 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 			fuse_reply_err(intr, EAGAIN);
 	}
 
-	if ((buf->flags & FUSE_BUF_IS_FD) && write_header_size < buf->size &&
-	    (in->opcode != FUSE_WRITE || !se->op.write_buf) &&
-	    in->opcode != FUSE_NOTIFY_REPLY) {
-		void *newmbuf;
-
-		err = ENOMEM;
-		newmbuf = realloc(mbuf, buf->size);
-		if (newmbuf == NULL)
-			goto reply_err;
-		mbuf = newmbuf;
-
-		tmpbuf = FUSE_BUFVEC_INIT(buf->size - write_header_size);
-		tmpbuf.buf[0].mem = (char *)mbuf + write_header_size;
-
-		res = fuse_ll_copy_from_pipe(&tmpbuf, &bufv);
-		err = -res;
-		if (res < 0)
-			goto reply_err;
-
-		in = mbuf;
-	}
-
 	inarg = (void *) &in[1];
 	if (in->opcode == FUSE_WRITE && se->op.write_buf)
 		do_write_buf(req, in->nodeid, inarg, buf);
@@ -2657,16 +2215,10 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 	else
 		fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
 
-out_free:
-	free(mbuf);
 	return;
 
 reply_err:
 	fuse_reply_err(req, err);
-clear_pipe:
-	if (buf->flags & FUSE_BUF_IS_FD)
-		fuse_ll_clear_pipe(se);
-	goto out_free;
 }
 
 #define LL_OPTION(n,o,v) \
@@ -2699,197 +2251,23 @@ void fuse_lowlevel_help(void)
 
 void fuse_session_destroy(struct fuse_session *se)
 {
-	struct fuse_ll_pipe *llp;
-
 	if (se->got_init && !se->got_destroy) {
 		if (se->op.destroy)
 			se->op.destroy(se->userdata);
 	}
-	llp = pthread_getspecific(se->pipe_key);
-	if (llp != NULL)
-		fuse_ll_pipe_free(llp);
-	pthread_key_delete(se->pipe_key);
 	pthread_mutex_destroy(&se->lock);
 	free(se->cuse_data);
 	if (se->fd != -1)
 		close(se->fd);
-	destroy_mount_opts(se->mo);
 	free(se);
 }
 
 
-static void fuse_ll_pipe_destructor(void *data)
-{
-	struct fuse_ll_pipe *llp = data;
-	fuse_ll_pipe_free(llp);
-}
-
-int fuse_session_receive_buf(struct fuse_session *se, struct fuse_buf *buf)
-{
-	return fuse_session_receive_buf_int(se, buf, NULL);
-}
-
-int fuse_session_receive_buf_int(struct fuse_session *se, struct fuse_buf *buf,
-				 struct fuse_chan *ch)
-{
-	int err;
-	ssize_t res;
-#ifdef HAVE_SPLICE
-	size_t bufsize = se->bufsize;
-	struct fuse_ll_pipe *llp;
-	struct fuse_buf tmpbuf;
-
-	if (se->conn.proto_minor < 14 || !(se->conn.want & FUSE_CAP_SPLICE_READ))
-		goto fallback;
-
-	llp = fuse_ll_get_pipe(se);
-	if (llp == NULL)
-		goto fallback;
-
-	if (llp->size < bufsize) {
-		if (llp->can_grow) {
-			res = fcntl(llp->pipe[0], F_SETPIPE_SZ, bufsize);
-			if (res == -1) {
-				llp->can_grow = 0;
-				res = grow_pipe_to_max(llp->pipe[0]);
-				if (res > 0)
-					llp->size = res;
-				goto fallback;
-			}
-			llp->size = res;
-		}
-		if (llp->size < bufsize)
-			goto fallback;
-	}
-
-	res = splice(ch ? ch->fd : se->fd,
-		     NULL, llp->pipe[1], NULL, bufsize, 0);
-	err = errno;
-
-	if (fuse_session_exited(se))
-		return 0;
-
-	if (res == -1) {
-		if (err == ENODEV) {
-			/* Filesystem was unmounted, or connection was aborted
-			   via /sys/fs/fuse/connections */
-			fuse_session_exit(se);
-			return 0;
-		}
-		if (err != EINTR && err != EAGAIN)
-			perror("fuse: splice from device");
-		return -err;
-	}
-
-	if (res < sizeof(struct fuse_in_header)) {
-		fuse_log(FUSE_LOG_ERR, "short splice from fuse device\n");
-		return -EIO;
-	}
-
-	tmpbuf = (struct fuse_buf) {
-		.size = res,
-		.flags = FUSE_BUF_IS_FD,
-		.fd = llp->pipe[0],
-	};
-
-	/*
-	 * Don't bother with zero copy for small requests.
-	 * fuse_loop_mt() needs to check for FORGET so this more than
-	 * just an optimization.
-	 */
-	if (res < sizeof(struct fuse_in_header) +
-	    sizeof(struct fuse_write_in) + pagesize) {
-		struct fuse_bufvec src = { .buf[0] = tmpbuf, .count = 1 };
-		struct fuse_bufvec dst = { .count = 1 };
-
-		if (!buf->mem) {
-			buf->mem = malloc(se->bufsize);
-			if (!buf->mem) {
-				fuse_log(FUSE_LOG_ERR,
-					"fuse: failed to allocate read buffer\n");
-				return -ENOMEM;
-			}
-		}
-		buf->size = se->bufsize;
-		buf->flags = 0;
-		dst.buf[0] = *buf;
-
-		res = fuse_buf_copy(&dst, &src, 0);
-		if (res < 0) {
-			fuse_log(FUSE_LOG_ERR, "fuse: copy from pipe: %s\n",
-				strerror(-res));
-			fuse_ll_clear_pipe(se);
-			return res;
-		}
-		if (res < tmpbuf.size) {
-			fuse_log(FUSE_LOG_ERR, "fuse: copy from pipe: short read\n");
-			fuse_ll_clear_pipe(se);
-			return -EIO;
-		}
-		assert(res == tmpbuf.size);
-
-	} else {
-		/* Don't overwrite buf->mem, as that would cause a leak */
-		buf->fd = tmpbuf.fd;
-		buf->flags = tmpbuf.flags;
-	}
-	buf->size = tmpbuf.size;
-
-	return res;
-
-fallback:
-#endif
-	if (!buf->mem) {
-		buf->mem = malloc(se->bufsize);
-		if (!buf->mem) {
-			fuse_log(FUSE_LOG_ERR,
-				"fuse: failed to allocate read buffer\n");
-			return -ENOMEM;
-		}
-	}
-
-restart:
-	res = read(ch ? ch->fd : se->fd, buf->mem, se->bufsize);
-	err = errno;
-
-	if (fuse_session_exited(se))
-		return 0;
-	if (res == -1) {
-		/* ENOENT means the operation was interrupted, it's safe
-		   to restart */
-		if (err == ENOENT)
-			goto restart;
-
-		if (err == ENODEV) {
-			/* Filesystem was unmounted, or connection was aborted
-			   via /sys/fs/fuse/connections */
-			fuse_session_exit(se);
-			return 0;
-		}
-		/* Errors occurring during normal operation: EINTR (read
-		   interrupted), EAGAIN (nonblocking I/O), ENODEV (filesystem
-		   umounted) */
-		if (err != EINTR && err != EAGAIN)
-			perror("fuse: reading device");
-		return -err;
-	}
-	if ((size_t) res < sizeof(struct fuse_in_header)) {
-		fuse_log(FUSE_LOG_ERR, "short read on fuse device\n");
-		return -EIO;
-	}
-
-	buf->size = res;
-
-	return res;
-}
-
 struct fuse_session *fuse_session_new(struct fuse_args *args,
 				      const struct fuse_lowlevel_ops *op,
 				      size_t op_size, void *userdata)
 {
-	int err;
 	struct fuse_session *se;
-	struct mount_opts *mo;
 
 	if (sizeof(struct fuse_lowlevel_ops) < op_size) {
 		fuse_log(FUSE_LOG_ERR, "fuse: warning: library too old, some operations may not work\n");
@@ -2923,10 +2301,6 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
 		if(fuse_opt_add_arg(args, "-oallow_other") == -1)
 			goto out2;
 	}
-	mo = parse_mount_opts(args);
-	if (mo == NULL)
-		goto out3;
-
 	if(args->argc == 1 &&
 	   args->argv[0][0] == '-') {
 		fuse_log(FUSE_LOG_ERR, "fuse: warning: argv[0] looks like an option, but "
@@ -2952,26 +2326,14 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
 	se->notify_ctr = 1;
 	fuse_mutex_init(&se->lock);
 
-	err = pthread_key_create(&se->pipe_key, fuse_ll_pipe_destructor);
-	if (err) {
-		fuse_log(FUSE_LOG_ERR, "fuse: failed to create thread specific key: %s\n",
-			strerror(err));
-		goto out5;
-	}
-
 	memcpy(&se->op, op, op_size);
 	se->owner = getuid();
 	se->userdata = userdata;
 
-	se->mo = mo;
 	return se;
 
-out5:
-	pthread_mutex_destroy(&se->lock);
 out4:
 	fuse_opt_free_args(args);
-out3:
-	free(mo);
 out2:
 	free(se);
 out1:
@@ -3035,11 +2397,6 @@ int fuse_session_fd(struct fuse_session *se)
 
 void fuse_session_unmount(struct fuse_session *se)
 {
-	if (se->mountpoint != NULL) {
-		fuse_kern_unmount(se->mountpoint, se->fd);
-		free(se->mountpoint);
-		se->mountpoint = NULL;
-	}
 }
 
 #ifdef linux
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 18c6363f07..7aa7bad5b2 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -31,10 +31,6 @@
 #include <sys/statvfs.h>
 #include <sys/uio.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* ----------------------------------------------------------- *
  * Miscellaneous definitions				       *
  * ----------------------------------------------------------- */
@@ -2082,8 +2078,4 @@ void fuse_session_process_buf(struct fuse_session *se,
  */
 int fuse_session_receive_buf(struct fuse_session *se, struct fuse_buf *buf);
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* FUSE_LOWLEVEL_H_ */
diff --git a/tools/virtiofsd/fuse_opt.h b/tools/virtiofsd/fuse_opt.h
index d8573e74fd..69102555be 100644
--- a/tools/virtiofsd/fuse_opt.h
+++ b/tools/virtiofsd/fuse_opt.h
@@ -14,10 +14,6 @@
  * This file defines the option parsing interface of FUSE
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Option description
  *
@@ -264,8 +260,4 @@ void fuse_opt_free_args(struct fuse_args *args);
  */
 int fuse_opt_match(const struct fuse_opt opts[], const char *opt);
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* FUSE_OPT_H_ */
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 64ff7ad6d5..511f981ce6 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -44,10 +44,8 @@ static const struct fuse_opt fuse_helper_opts[] = {
 	FUSE_HELPER_OPT("-s",		singlethread),
 	FUSE_HELPER_OPT("fsname=",	nodefault_subtype),
 	FUSE_OPT_KEY("fsname=",		FUSE_OPT_KEY_KEEP),
-#ifndef __FreeBSD__
 	FUSE_HELPER_OPT("subtype=",	nodefault_subtype),
 	FUSE_OPT_KEY("subtype=",	FUSE_OPT_KEY_KEEP),
-#endif
 	FUSE_HELPER_OPT("clone_fd",	clone_fd),
 	FUSE_HELPER_OPT("max_idle_threads=%u", max_idle_threads),
 	FUSE_OPT_END
@@ -171,34 +169,6 @@ static int fuse_helper_opt_proc(void *data, const char *arg, int key,
 	}
 }
 
-/* Under FreeBSD, there is no subtype option so this
-   function actually sets the fsname */
-static int add_default_subtype(const char *progname, struct fuse_args *args)
-{
-	int res;
-	char *subtype_opt;
-
-	const char *basename = strrchr(progname, '/');
-	if (basename == NULL)
-		basename = progname;
-	else if (basename[1] != '\0')
-		basename++;
-
-	subtype_opt = (char *) malloc(strlen(basename) + 64);
-	if (subtype_opt == NULL) {
-		fuse_log(FUSE_LOG_ERR, "fuse: memory allocation failed\n");
-		return -1;
-	}
-#ifdef __FreeBSD__
-	sprintf(subtype_opt, "-ofsname=%s", basename);
-#else
-	sprintf(subtype_opt, "-osubtype=%s", basename);
-#endif
-	res = fuse_opt_add_arg(args, subtype_opt);
-	free(subtype_opt);
-	return res;
-}
-
 int fuse_parse_cmdline(struct fuse_args *args,
 		       struct fuse_cmdline_opts *opts)
 {
@@ -210,14 +180,6 @@ int fuse_parse_cmdline(struct fuse_args *args,
 			   fuse_helper_opt_proc) == -1)
 		return -1;
 
-	/* *Linux*: if neither -o subtype nor -o fsname are specified,
-	   set subtype to program's basename.
-	   *FreeBSD*: if fsname is not specified, set to program's
-	   basename. */
-	if (!opts->nodefault_subtype)
-		if (add_default_subtype(args->argv[0], args) == -1)
-			return -1;
-
 	return 0;
 }
 
@@ -276,88 +238,6 @@ int fuse_daemonize(int foreground)
 	return 0;
 }
 
-int fuse_main_real(int argc, char *argv[], const struct fuse_operations *op,
-		   size_t op_size, void *user_data)
-{
-	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
-	struct fuse *fuse;
-	struct fuse_cmdline_opts opts;
-	int res;
-
-	if (fuse_parse_cmdline(&args, &opts) != 0)
-		return 1;
-
-	if (opts.show_version) {
-		printf("FUSE library version %s\n", PACKAGE_VERSION);
-		fuse_lowlevel_version();
-		res = 0;
-		goto out1;
-	}
-
-	if (opts.show_help) {
-		if(args.argv[0][0] != '\0')
-			printf("usage: %s [options] <mountpoint>\n\n",
-			       args.argv[0]);
-		printf("FUSE options:\n");
-		fuse_cmdline_help();
-		fuse_lib_help(&args);
-		res = 0;
-		goto out1;
-	}
-
-	if (!opts.show_help &&
-	    !opts.mountpoint) {
-		fuse_log(FUSE_LOG_ERR, "error: no mountpoint specified\n");
-		res = 2;
-		goto out1;
-	}
-
-
-	fuse = fuse_new_31(&args, op, op_size, user_data);
-	if (fuse == NULL) {
-		res = 3;
-		goto out1;
-	}
-
-	if (fuse_mount(fuse,opts.mountpoint) != 0) {
-		res = 4;
-		goto out2;
-	}
-
-	if (fuse_daemonize(opts.foreground) != 0) {
-		res = 5;
-		goto out3;
-	}
-
-	struct fuse_session *se = fuse_get_session(fuse);
-	if (fuse_set_signal_handlers(se) != 0) {
-		res = 6;
-		goto out3;
-	}
-
-	if (opts.singlethread)
-		res = fuse_loop(fuse);
-	else {
-		struct fuse_loop_config loop_config;
-		loop_config.clone_fd = opts.clone_fd;
-		loop_config.max_idle_threads = opts.max_idle_threads;
-		res = fuse_loop_mt_32(fuse, &loop_config);
-	}
-	if (res)
-		res = 7;
-
-	fuse_remove_signal_handlers(se);
-out3:
-	fuse_unmount(fuse);
-out2:
-	fuse_destroy(fuse);
-out1:
-	free(opts.mountpoint);
-	fuse_opt_free_args(&args);
-	return res;
-}
-
-
 void fuse_apply_conn_info_opts(struct fuse_conn_info_opts *opts,
 			       struct fuse_conn_info *conn)
 {
@@ -420,21 +300,3 @@ struct fuse_conn_info_opts* fuse_parse_conn_info_opts(struct fuse_args *args)
 	}
 	return opts;
 }
-
-int fuse_open_channel(const char *mountpoint, const char* options)
-{
-	struct mount_opts *opts = NULL;
-	int fd = -1;
-	const char *argv[] = { "", "-o", options };
-	int argc = sizeof(argv) / sizeof(argv[0]);
-	struct fuse_args args = FUSE_ARGS_INIT(argc, (char**) argv);
-
-	opts = parse_mount_opts(&args);
-	if (opts == NULL)
-		return -1;
-
-	fd = fuse_kern_mount(mountpoint, opts);
-	destroy_mount_opts(opts);
-
-	return fd;
-}
diff --git a/tools/virtiofsd/passthrough_helpers.h b/tools/virtiofsd/passthrough_helpers.h
index 6b77c33600..7c5f561fbc 100644
--- a/tools/virtiofsd/passthrough_helpers.h
+++ b/tools/virtiofsd/passthrough_helpers.h
@@ -42,32 +42,6 @@ static int mknod_wrapper(int dirfd, const char *path, const char *link,
 		res = symlinkat(link, dirfd, path);
 	} else if (S_ISFIFO(mode)) {
 		res = mkfifoat(dirfd, path, mode);
-#ifdef __FreeBSD__
-	} else if (S_ISSOCK(mode)) {
-		struct sockaddr_un su;
-		int fd;
-
-		if (strlen(path) >= sizeof(su.sun_path)) {
-			errno = ENAMETOOLONG;
-			return -1;
-		}
-		fd = socket(AF_UNIX, SOCK_STREAM, 0);
-		if (fd >= 0) {
-			/*
-			 * We must bind the socket to the underlying file
-			 * system to create the socket file, even though
-			 * we'll never listen on this socket.
-			 */
-			su.sun_family = AF_UNIX;
-			strncpy(su.sun_path, path, sizeof(su.sun_path));
-			res = bindat(dirfd, fd, (struct sockaddr*)&su,
-				sizeof(su));
-			if (res == 0)
-				close(fd);
-		} else {
-			res = -1;
-		}
-#endif
 	} else {
 		res = mknodat(dirfd, path, mode, rdev);
 	}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 007/104] virtiofsd: Format imported files to qemu style
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 006/104] virtiofsd: Trim down imported files Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 12:04   ` Daniel P. Berrangé
  2020-01-09 12:21   ` Aleksandar Markovic
  2019-12-12 16:37 ` [PATCH 008/104] virtiofsd: remove mountpoint dummy argument Dr. David Alan Gilbert (git)
                   ` (99 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Mostly using a set like:

indent -nut -i 4 -nlp -br -cs -ce --no-space-after-function-call-names file
clang-format -style=file -i -- file
clang-tidy -fix-errors -checks=readability-braces-around-statements file
clang-format -style=file -i -- file

With manual cleanups.

The .clang-format used is below.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Language:        Cpp
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false # although we like it, it creates churn
AlignConsecutiveDeclarations: false
AlignEscapedNewlinesLeft: true
AlignOperands:   true
AlignTrailingComments: false # churn
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: None
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterReturnType: None # AlwaysBreakAfterDefinitionReturnType is taken into account
AlwaysBreakBeforeMultilineStrings: false
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
  AfterControlStatement: false
  AfterEnum:       false
  AfterFunction:   true
  AfterStruct:     false
  AfterUnion:      false
  BeforeElse:      false
  IndentBraces:    false
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Custom
BreakBeforeTernaryOperators: false
BreakStringLiterals: true
ColumnLimit:     80
ContinuationIndentWidth: 4
Cpp11BracedListStyle: false
DerivePointerAlignment: false
DisableFormat:   false
ForEachMacros:   [
  'CPU_FOREACH',
  'CPU_FOREACH_REVERSE',
  'CPU_FOREACH_SAFE',
  'IOMMU_NOTIFIER_FOREACH',
  'QLIST_FOREACH',
  'QLIST_FOREACH_ENTRY',
  'QLIST_FOREACH_RCU',
  'QLIST_FOREACH_SAFE',
  'QLIST_FOREACH_SAFE_RCU',
  'QSIMPLEQ_FOREACH',
  'QSIMPLEQ_FOREACH_SAFE',
  'QSLIST_FOREACH',
  'QSLIST_FOREACH_SAFE',
  'QTAILQ_FOREACH',
  'QTAILQ_FOREACH_REVERSE',
  'QTAILQ_FOREACH_SAFE',
  'QTAILQ_RAW_FOREACH',
  'RAMBLOCK_FOREACH'
]
IncludeCategories:
  - Regex:           '^"qemu/osdep.h'
    Priority:        -3
  - Regex:           '^"(block|chardev|crypto|disas|exec|fpu|hw|io|libdecnumber|migration|monitor|net|qapi|qemu|qom|standard-headers|sysemu|ui)/'
    Priority:        -2
  - Regex:           '^"(elf.h|qemu-common.h|glib-compat.h|qemu-io.h|trace-tcg.h)'
    Priority:        -1
  - Regex:           '.*'
    Priority:        1
IncludeIsMainRegex: '$'
IndentCaseLabels: false
IndentWidth:     4
IndentWrappedFunctionNames: false
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: '.*_BEGIN$' # only PREC_BEGIN ?
MacroBlockEnd:   '.*_END$'
MaxEmptyLinesToKeep: 2
PointerAlignment: Right
ReflowComments:  true
SortIncludes:    true
SpaceAfterCStyleCast: false
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInContainerLiterals: true
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard:        Auto
UseTab:          Never
...
---
 tools/virtiofsd/buffer.c              |  550 ++--
 tools/virtiofsd/fuse.h                | 1572 +++++------
 tools/virtiofsd/fuse_common.h         |  764 ++---
 tools/virtiofsd/fuse_i.h              |  127 +-
 tools/virtiofsd/fuse_log.c            |   38 +-
 tools/virtiofsd/fuse_log.h            |   32 +-
 tools/virtiofsd/fuse_loop_mt.c        |   66 +-
 tools/virtiofsd/fuse_lowlevel.c       | 3678 +++++++++++++------------
 tools/virtiofsd/fuse_lowlevel.h       | 2401 ++++++++--------
 tools/virtiofsd/fuse_misc.h           |   30 +-
 tools/virtiofsd/fuse_opt.c            |  659 ++---
 tools/virtiofsd/fuse_opt.h            |   79 +-
 tools/virtiofsd/fuse_signals.c        |  118 +-
 tools/virtiofsd/helper.c              |  517 ++--
 tools/virtiofsd/passthrough_helpers.h |   33 +-
 tools/virtiofsd/passthrough_ll.c      | 2063 +++++++-------
 16 files changed, 6530 insertions(+), 6197 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 5ab9b87455..38521f5889 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -1,321 +1,343 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2010  Miklos Szeredi <miklos@szeredi.hu>
-
-  Functions for dealing with `struct fuse_buf` and `struct
-  fuse_bufvec`.
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2010  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * Functions for dealing with `struct fuse_buf` and `struct
+ * fuse_bufvec`.
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB
+ */
 
 #define _GNU_SOURCE
 
 #include "config.h"
 #include "fuse_i.h"
 #include "fuse_lowlevel.h"
+#include <assert.h>
+#include <errno.h>
 #include <string.h>
 #include <unistd.h>
-#include <errno.h>
-#include <assert.h>
 
 size_t fuse_buf_size(const struct fuse_bufvec *bufv)
 {
-	size_t i;
-	size_t size = 0;
-
-	for (i = 0; i < bufv->count; i++) {
-		if (bufv->buf[i].size == SIZE_MAX)
-			size = SIZE_MAX;
-		else
-			size += bufv->buf[i].size;
-	}
-
-	return size;
+    size_t i;
+    size_t size = 0;
+
+    for (i = 0; i < bufv->count; i++) {
+        if (bufv->buf[i].size == SIZE_MAX) {
+            size = SIZE_MAX;
+        } else {
+            size += bufv->buf[i].size;
+        }
+    }
+
+    return size;
 }
 
 static size_t min_size(size_t s1, size_t s2)
 {
-	return s1 < s2 ? s1 : s2;
+    return s1 < s2 ? s1 : s2;
 }
 
 static ssize_t fuse_buf_write(const struct fuse_buf *dst, size_t dst_off,
-			      const struct fuse_buf *src, size_t src_off,
-			      size_t len)
+                              const struct fuse_buf *src, size_t src_off,
+                              size_t len)
 {
-	ssize_t res = 0;
-	size_t copied = 0;
-
-	while (len) {
-		if (dst->flags & FUSE_BUF_FD_SEEK) {
-			res = pwrite(dst->fd, (char *)src->mem + src_off, len,
-				     dst->pos + dst_off);
-		} else {
-			res = write(dst->fd, (char *)src->mem + src_off, len);
-		}
-		if (res == -1) {
-			if (!copied)
-				return -errno;
-			break;
-		}
-		if (res == 0)
-			break;
-
-		copied += res;
-		if (!(dst->flags & FUSE_BUF_FD_RETRY))
-			break;
-
-		src_off += res;
-		dst_off += res;
-		len -= res;
-	}
-
-	return copied;
+    ssize_t res = 0;
+    size_t copied = 0;
+
+    while (len) {
+        if (dst->flags & FUSE_BUF_FD_SEEK) {
+            res = pwrite(dst->fd, (char *)src->mem + src_off, len,
+                         dst->pos + dst_off);
+        } else {
+            res = write(dst->fd, (char *)src->mem + src_off, len);
+        }
+        if (res == -1) {
+            if (!copied) {
+                return -errno;
+            }
+            break;
+        }
+        if (res == 0) {
+            break;
+        }
+
+        copied += res;
+        if (!(dst->flags & FUSE_BUF_FD_RETRY)) {
+            break;
+        }
+
+        src_off += res;
+        dst_off += res;
+        len -= res;
+    }
+
+    return copied;
 }
 
 static ssize_t fuse_buf_read(const struct fuse_buf *dst, size_t dst_off,
-			     const struct fuse_buf *src, size_t src_off,
-			     size_t len)
+                             const struct fuse_buf *src, size_t src_off,
+                             size_t len)
 {
-	ssize_t res = 0;
-	size_t copied = 0;
-
-	while (len) {
-		if (src->flags & FUSE_BUF_FD_SEEK) {
-			res = pread(src->fd, (char *)dst->mem + dst_off, len,
-				     src->pos + src_off);
-		} else {
-			res = read(src->fd, (char *)dst->mem + dst_off, len);
-		}
-		if (res == -1) {
-			if (!copied)
-				return -errno;
-			break;
-		}
-		if (res == 0)
-			break;
-
-		copied += res;
-		if (!(src->flags & FUSE_BUF_FD_RETRY))
-			break;
-
-		dst_off += res;
-		src_off += res;
-		len -= res;
-	}
-
-	return copied;
+    ssize_t res = 0;
+    size_t copied = 0;
+
+    while (len) {
+        if (src->flags & FUSE_BUF_FD_SEEK) {
+            res = pread(src->fd, (char *)dst->mem + dst_off, len,
+                        src->pos + src_off);
+        } else {
+            res = read(src->fd, (char *)dst->mem + dst_off, len);
+        }
+        if (res == -1) {
+            if (!copied) {
+                return -errno;
+            }
+            break;
+        }
+        if (res == 0) {
+            break;
+        }
+
+        copied += res;
+        if (!(src->flags & FUSE_BUF_FD_RETRY)) {
+            break;
+        }
+
+        dst_off += res;
+        src_off += res;
+        len -= res;
+    }
+
+    return copied;
 }
 
 static ssize_t fuse_buf_fd_to_fd(const struct fuse_buf *dst, size_t dst_off,
-				 const struct fuse_buf *src, size_t src_off,
-				 size_t len)
+                                 const struct fuse_buf *src, size_t src_off,
+                                 size_t len)
 {
-	char buf[4096];
-	struct fuse_buf tmp = {
-		.size = sizeof(buf),
-		.flags = 0,
-	};
-	ssize_t res;
-	size_t copied = 0;
-
-	tmp.mem = buf;
-
-	while (len) {
-		size_t this_len = min_size(tmp.size, len);
-		size_t read_len;
-
-		res = fuse_buf_read(&tmp, 0, src, src_off, this_len);
-		if (res < 0) {
-			if (!copied)
-				return res;
-			break;
-		}
-		if (res == 0)
-			break;
-
-		read_len = res;
-		res = fuse_buf_write(dst, dst_off, &tmp, 0, read_len);
-		if (res < 0) {
-			if (!copied)
-				return res;
-			break;
-		}
-		if (res == 0)
-			break;
-
-		copied += res;
-
-		if (res < this_len)
-			break;
-
-		dst_off += res;
-		src_off += res;
-		len -= res;
-	}
-
-	return copied;
+    char buf[4096];
+    struct fuse_buf tmp = {
+        .size = sizeof(buf),
+        .flags = 0,
+    };
+    ssize_t res;
+    size_t copied = 0;
+
+    tmp.mem = buf;
+
+    while (len) {
+        size_t this_len = min_size(tmp.size, len);
+        size_t read_len;
+
+        res = fuse_buf_read(&tmp, 0, src, src_off, this_len);
+        if (res < 0) {
+            if (!copied) {
+                return res;
+            }
+            break;
+        }
+        if (res == 0) {
+            break;
+        }
+
+        read_len = res;
+        res = fuse_buf_write(dst, dst_off, &tmp, 0, read_len);
+        if (res < 0) {
+            if (!copied) {
+                return res;
+            }
+            break;
+        }
+        if (res == 0) {
+            break;
+        }
+
+        copied += res;
+
+        if (res < this_len) {
+            break;
+        }
+
+        dst_off += res;
+        src_off += res;
+        len -= res;
+    }
+
+    return copied;
 }
 
 #ifdef HAVE_SPLICE
 static ssize_t fuse_buf_splice(const struct fuse_buf *dst, size_t dst_off,
-			       const struct fuse_buf *src, size_t src_off,
-			       size_t len, enum fuse_buf_copy_flags flags)
+                               const struct fuse_buf *src, size_t src_off,
+                               size_t len, enum fuse_buf_copy_flags flags)
 {
-	int splice_flags = 0;
-	off_t *srcpos = NULL;
-	off_t *dstpos = NULL;
-	off_t srcpos_val;
-	off_t dstpos_val;
-	ssize_t res;
-	size_t copied = 0;
-
-	if (flags & FUSE_BUF_SPLICE_MOVE)
-		splice_flags |= SPLICE_F_MOVE;
-	if (flags & FUSE_BUF_SPLICE_NONBLOCK)
-		splice_flags |= SPLICE_F_NONBLOCK;
-
-	if (src->flags & FUSE_BUF_FD_SEEK) {
-		srcpos_val = src->pos + src_off;
-		srcpos = &srcpos_val;
-	}
-	if (dst->flags & FUSE_BUF_FD_SEEK) {
-		dstpos_val = dst->pos + dst_off;
-		dstpos = &dstpos_val;
-	}
-
-	while (len) {
-		res = splice(src->fd, srcpos, dst->fd, dstpos, len,
-			     splice_flags);
-		if (res == -1) {
-			if (copied)
-				break;
-
-			if (errno != EINVAL || (flags & FUSE_BUF_FORCE_SPLICE))
-				return -errno;
-
-			/* Maybe splice is not supported for this combination */
-			return fuse_buf_fd_to_fd(dst, dst_off, src, src_off,
-						 len);
-		}
-		if (res == 0)
-			break;
-
-		copied += res;
-		if (!(src->flags & FUSE_BUF_FD_RETRY) &&
-		    !(dst->flags & FUSE_BUF_FD_RETRY)) {
-			break;
-		}
-
-		len -= res;
-	}
-
-	return copied;
+    int splice_flags = 0;
+    off_t *srcpos = NULL;
+    off_t *dstpos = NULL;
+    off_t srcpos_val;
+    off_t dstpos_val;
+    ssize_t res;
+    size_t copied = 0;
+
+    if (flags & FUSE_BUF_SPLICE_MOVE) {
+        splice_flags |= SPLICE_F_MOVE;
+    }
+    if (flags & FUSE_BUF_SPLICE_NONBLOCK) {
+        splice_flags |= SPLICE_F_NONBLOCK;
+    }
+    if (src->flags & FUSE_BUF_FD_SEEK) {
+        srcpos_val = src->pos + src_off;
+        srcpos = &srcpos_val;
+    }
+    if (dst->flags & FUSE_BUF_FD_SEEK) {
+        dstpos_val = dst->pos + dst_off;
+        dstpos = &dstpos_val;
+    }
+
+    while (len) {
+        res = splice(src->fd, srcpos, dst->fd, dstpos, len, splice_flags);
+        if (res == -1) {
+            if (copied) {
+                break;
+            }
+
+            if (errno != EINVAL || (flags & FUSE_BUF_FORCE_SPLICE)) {
+                return -errno;
+            }
+
+            /* Maybe splice is not supported for this combination */
+            return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
+        }
+        if (res == 0) {
+            break;
+        }
+
+        copied += res;
+        if (!(src->flags & FUSE_BUF_FD_RETRY) &&
+            !(dst->flags & FUSE_BUF_FD_RETRY)) {
+            break;
+        }
+
+        len -= res;
+    }
+
+    return copied;
 }
 #else
 static ssize_t fuse_buf_splice(const struct fuse_buf *dst, size_t dst_off,
-			       const struct fuse_buf *src, size_t src_off,
-			       size_t len, enum fuse_buf_copy_flags flags)
+                               const struct fuse_buf *src, size_t src_off,
+                               size_t len, enum fuse_buf_copy_flags flags)
 {
-	(void) flags;
+    (void)flags;
 
-	return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
+    return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
 }
 #endif
 
 
 static ssize_t fuse_buf_copy_one(const struct fuse_buf *dst, size_t dst_off,
-				 const struct fuse_buf *src, size_t src_off,
-				 size_t len, enum fuse_buf_copy_flags flags)
+                                 const struct fuse_buf *src, size_t src_off,
+                                 size_t len, enum fuse_buf_copy_flags flags)
 {
-	int src_is_fd = src->flags & FUSE_BUF_IS_FD;
-	int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
-
-	if (!src_is_fd && !dst_is_fd) {
-		char *dstmem = (char *)dst->mem + dst_off;
-		char *srcmem = (char *)src->mem + src_off;
-
-		if (dstmem != srcmem) {
-			if (dstmem + len <= srcmem || srcmem + len <= dstmem)
-				memcpy(dstmem, srcmem, len);
-			else
-				memmove(dstmem, srcmem, len);
-		}
-
-		return len;
-	} else if (!src_is_fd) {
-		return fuse_buf_write(dst, dst_off, src, src_off, len);
-	} else if (!dst_is_fd) {
-		return fuse_buf_read(dst, dst_off, src, src_off, len);
-	} else if (flags & FUSE_BUF_NO_SPLICE) {
-		return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
-	} else {
-		return fuse_buf_splice(dst, dst_off, src, src_off, len, flags);
-	}
+    int src_is_fd = src->flags & FUSE_BUF_IS_FD;
+    int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
+
+    if (!src_is_fd && !dst_is_fd) {
+        char *dstmem = (char *)dst->mem + dst_off;
+        char *srcmem = (char *)src->mem + src_off;
+
+        if (dstmem != srcmem) {
+            if (dstmem + len <= srcmem || srcmem + len <= dstmem) {
+                memcpy(dstmem, srcmem, len);
+            } else {
+                memmove(dstmem, srcmem, len);
+            }
+        }
+
+        return len;
+    } else if (!src_is_fd) {
+        return fuse_buf_write(dst, dst_off, src, src_off, len);
+    } else if (!dst_is_fd) {
+        return fuse_buf_read(dst, dst_off, src, src_off, len);
+    } else if (flags & FUSE_BUF_NO_SPLICE) {
+        return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
+    } else {
+        return fuse_buf_splice(dst, dst_off, src, src_off, len, flags);
+    }
 }
 
 static const struct fuse_buf *fuse_bufvec_current(struct fuse_bufvec *bufv)
 {
-	if (bufv->idx < bufv->count)
-		return &bufv->buf[bufv->idx];
-	else
-		return NULL;
+    if (bufv->idx < bufv->count) {
+        return &bufv->buf[bufv->idx];
+    } else {
+        return NULL;
+    }
 }
 
 static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
 {
-	const struct fuse_buf *buf = fuse_bufvec_current(bufv);
-
-	bufv->off += len;
-	assert(bufv->off <= buf->size);
-	if (bufv->off == buf->size) {
-		assert(bufv->idx < bufv->count);
-		bufv->idx++;
-		if (bufv->idx == bufv->count)
-			return 0;
-		bufv->off = 0;
-	}
-	return 1;
+    const struct fuse_buf *buf = fuse_bufvec_current(bufv);
+
+    bufv->off += len;
+    assert(bufv->off <= buf->size);
+    if (bufv->off == buf->size) {
+        assert(bufv->idx < bufv->count);
+        bufv->idx++;
+        if (bufv->idx == bufv->count) {
+            return 0;
+        }
+        bufv->off = 0;
+    }
+    return 1;
 }
 
 ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
-		      enum fuse_buf_copy_flags flags)
+                      enum fuse_buf_copy_flags flags)
 {
-	size_t copied = 0;
-
-	if (dstv == srcv)
-		return fuse_buf_size(dstv);
-
-	for (;;) {
-		const struct fuse_buf *src = fuse_bufvec_current(srcv);
-		const struct fuse_buf *dst = fuse_bufvec_current(dstv);
-		size_t src_len;
-		size_t dst_len;
-		size_t len;
-		ssize_t res;
-
-		if (src == NULL || dst == NULL)
-			break;
-
-		src_len = src->size - srcv->off;
-		dst_len = dst->size - dstv->off;
-		len = min_size(src_len, dst_len);
-
-		res = fuse_buf_copy_one(dst, dstv->off, src, srcv->off, len, flags);
-		if (res < 0) {
-			if (!copied)
-				return res;
-			break;
-		}
-		copied += res;
-
-		if (!fuse_bufvec_advance(srcv, res) ||
-		    !fuse_bufvec_advance(dstv, res))
-			break;
-
-		if (res < len)
-			break;
-	}
-
-	return copied;
+    size_t copied = 0;
+
+    if (dstv == srcv) {
+        return fuse_buf_size(dstv);
+    }
+
+    for (;;) {
+        const struct fuse_buf *src = fuse_bufvec_current(srcv);
+        const struct fuse_buf *dst = fuse_bufvec_current(dstv);
+        size_t src_len;
+        size_t dst_len;
+        size_t len;
+        ssize_t res;
+
+        if (src == NULL || dst == NULL) {
+            break;
+        }
+
+        src_len = src->size - srcv->off;
+        dst_len = dst->size - dstv->off;
+        len = min_size(src_len, dst_len);
+
+        res = fuse_buf_copy_one(dst, dstv->off, src, srcv->off, len, flags);
+        if (res < 0) {
+            if (!copied) {
+                return res;
+            }
+            break;
+        }
+        copied += res;
+
+        if (!fuse_bufvec_advance(srcv, res) ||
+            !fuse_bufvec_advance(dstv, res)) {
+            break;
+        }
+
+        if (res < len) {
+            break;
+        }
+    }
+
+    return copied;
 }
diff --git a/tools/virtiofsd/fuse.h b/tools/virtiofsd/fuse.h
index 6c16a0041d..945ebc7a0d 100644
--- a/tools/virtiofsd/fuse.h
+++ b/tools/virtiofsd/fuse.h
@@ -1,15 +1,15 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB.
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB.
+ */
 
 #ifndef FUSE_H_
 #define FUSE_H_
 
-/** @file
+/*
  *
  * This file defines the library interface of FUSE
  *
@@ -19,15 +19,15 @@
 #include "fuse_common.h"
 
 #include <fcntl.h>
-#include <time.h>
-#include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/statvfs.h>
+#include <sys/types.h>
 #include <sys/uio.h>
+#include <time.h>
 
-/* ----------------------------------------------------------- *
- * Basic FUSE API					       *
- * ----------------------------------------------------------- */
+/*
+ * Basic FUSE API
+ */
 
 /** Handle for a FUSE filesystem */
 struct fuse;
@@ -36,38 +36,39 @@ struct fuse;
  * Readdir flags, passed to ->readdir()
  */
 enum fuse_readdir_flags {
-	/**
-	 * "Plus" mode.
-	 *
-	 * The kernel wants to prefill the inode cache during readdir.  The
-	 * filesystem may honour this by filling in the attributes and setting
-	 * FUSE_FILL_DIR_FLAGS for the filler function.  The filesystem may also
-	 * just ignore this flag completely.
-	 */
-	FUSE_READDIR_PLUS = (1 << 0),
+    /**
+     * "Plus" mode.
+     *
+     * The kernel wants to prefill the inode cache during readdir.  The
+     * filesystem may honour this by filling in the attributes and setting
+     * FUSE_FILL_DIR_FLAGS for the filler function.  The filesystem may also
+     * just ignore this flag completely.
+     */
+    FUSE_READDIR_PLUS = (1 << 0),
 };
 
 enum fuse_fill_dir_flags {
-	/**
-	 * "Plus" mode: all file attributes are valid
-	 *
-	 * The attributes are used by the kernel to prefill the inode cache
-	 * during a readdir.
-	 *
-	 * It is okay to set FUSE_FILL_DIR_PLUS if FUSE_READDIR_PLUS is not set
-	 * and vice versa.
-	 */
-	FUSE_FILL_DIR_PLUS = (1 << 1),
+    /**
+     * "Plus" mode: all file attributes are valid
+     *
+     * The attributes are used by the kernel to prefill the inode cache
+     * during a readdir.
+     *
+     * It is okay to set FUSE_FILL_DIR_PLUS if FUSE_READDIR_PLUS is not set
+     * and vice versa.
+     */
+    FUSE_FILL_DIR_PLUS = (1 << 1),
 };
 
-/** Function to add an entry in a readdir() operation
+/**
+ * Function to add an entry in a readdir() operation
  *
  * The *off* parameter can be any non-zero value that enables the
  * filesystem to identify the current point in the directory
  * stream. It does not need to be the actual physical position. A
  * value of zero is reserved to indicate that seeking in directories
  * is not supported.
- * 
+ *
  * @param buf the buffer passed to the readdir() operation
  * @param name the file name of the directory entry
  * @param stat file attributes, can be NULL
@@ -75,9 +76,9 @@ enum fuse_fill_dir_flags {
  * @param flags fill flags
  * @return 1 if buffer is full, zero otherwise
  */
-typedef int (*fuse_fill_dir_t) (void *buf, const char *name,
-				const struct stat *stbuf, off_t off,
-				enum fuse_fill_dir_flags flags);
+typedef int (*fuse_fill_dir_t)(void *buf, const char *name,
+                               const struct stat *stbuf, off_t off,
+                               enum fuse_fill_dir_flags flags);
 /**
  * Configuration of the high-level API
  *
@@ -87,186 +88,186 @@ typedef int (*fuse_fill_dir_t) (void *buf, const char *name,
  * file system implementation.
  */
 struct fuse_config {
-	/**
-	 * If `set_gid` is non-zero, the st_gid attribute of each file
-	 * is overwritten with the value of `gid`.
-	 */
-	int set_gid;
-	unsigned int gid;
-
-	/**
-	 * If `set_uid` is non-zero, the st_uid attribute of each file
-	 * is overwritten with the value of `uid`.
-	 */
-	int set_uid;
-	unsigned int uid;
-
-	/**
-	 * If `set_mode` is non-zero, the any permissions bits set in
-	 * `umask` are unset in the st_mode attribute of each file.
-	 */
-	int set_mode;
-	unsigned int umask;
-
-	/**
-	 * The timeout in seconds for which name lookups will be
-	 * cached.
-	 */
-	double entry_timeout;
-
-	/**
-	 * The timeout in seconds for which a negative lookup will be
-	 * cached. This means, that if file did not exist (lookup
-	 * retuned ENOENT), the lookup will only be redone after the
-	 * timeout, and the file/directory will be assumed to not
-	 * exist until then. A value of zero means that negative
-	 * lookups are not cached.
-	 */
-	double negative_timeout;
-
-	/**
-	 * The timeout in seconds for which file/directory attributes
-	 * (as returned by e.g. the `getattr` handler) are cached.
-	 */
-	double attr_timeout;
-
-	/**
-	 * Allow requests to be interrupted
-	 */
-	int intr;
-
-	/**
-	 * Specify which signal number to send to the filesystem when
-	 * a request is interrupted.  The default is hardcoded to
-	 * USR1.
-	 */
-	int intr_signal;
-
-	/**
-	 * Normally, FUSE assigns inodes to paths only for as long as
-	 * the kernel is aware of them. With this option inodes are
-	 * instead remembered for at least this many seconds.  This
-	 * will require more memory, but may be necessary when using
-	 * applications that make use of inode numbers.
-	 *
-	 * A number of -1 means that inodes will be remembered for the
-	 * entire life-time of the file-system process.
-	 */
-	int remember;
-
-	/**
-	 * The default behavior is that if an open file is deleted,
-	 * the file is renamed to a hidden file (.fuse_hiddenXXX), and
-	 * only removed when the file is finally released.  This
-	 * relieves the filesystem implementation of having to deal
-	 * with this problem. This option disables the hiding
-	 * behavior, and files are removed immediately in an unlink
-	 * operation (or in a rename operation which overwrites an
-	 * existing file).
-	 *
-	 * It is recommended that you not use the hard_remove
-	 * option. When hard_remove is set, the following libc
-	 * functions fail on unlinked files (returning errno of
-	 * ENOENT): read(2), write(2), fsync(2), close(2), f*xattr(2),
-	 * ftruncate(2), fstat(2), fchmod(2), fchown(2)
-	 */
-	int hard_remove;
-
-	/**
-	 * Honor the st_ino field in the functions getattr() and
-	 * fill_dir(). This value is used to fill in the st_ino field
-	 * in the stat(2), lstat(2), fstat(2) functions and the d_ino
-	 * field in the readdir(2) function. The filesystem does not
-	 * have to guarantee uniqueness, however some applications
-	 * rely on this value being unique for the whole filesystem.
-	 *
-	 * Note that this does *not* affect the inode that libfuse 
-	 * and the kernel use internally (also called the "nodeid").
-	 */
-	int use_ino;
-
-	/**
-	 * If use_ino option is not given, still try to fill in the
-	 * d_ino field in readdir(2). If the name was previously
-	 * looked up, and is still in the cache, the inode number
-	 * found there will be used.  Otherwise it will be set to -1.
-	 * If use_ino option is given, this option is ignored.
-	 */
-	int readdir_ino;
-
-	/**
-	 * This option disables the use of page cache (file content cache)
-	 * in the kernel for this filesystem. This has several affects:
-	 *
-	 * 1. Each read(2) or write(2) system call will initiate one
-	 *    or more read or write operations, data will not be
-	 *    cached in the kernel.
-	 *
-	 * 2. The return value of the read() and write() system calls
-	 *    will correspond to the return values of the read and
-	 *    write operations. This is useful for example if the
-	 *    file size is not known in advance (before reading it).
-	 *
-	 * Internally, enabling this option causes fuse to set the
-	 * `direct_io` field of `struct fuse_file_info` - overwriting
-	 * any value that was put there by the file system.
-	 */
-	int direct_io;
-
-	/**
-	 * This option disables flushing the cache of the file
-	 * contents on every open(2).  This should only be enabled on
-	 * filesystems where the file data is never changed
-	 * externally (not through the mounted FUSE filesystem).  Thus
-	 * it is not suitable for network filesystems and other
-	 * intermediate filesystems.
-	 *
-	 * NOTE: if this option is not specified (and neither
-	 * direct_io) data is still cached after the open(2), so a
-	 * read(2) system call will not always initiate a read
-	 * operation.
-	 *
-	 * Internally, enabling this option causes fuse to set the
-	 * `keep_cache` field of `struct fuse_file_info` - overwriting
-	 * any value that was put there by the file system.
-	 */
-	int kernel_cache;
-
-	/**
-	 * This option is an alternative to `kernel_cache`. Instead of
-	 * unconditionally keeping cached data, the cached data is
-	 * invalidated on open(2) if if the modification time or the
-	 * size of the file has changed since it was last opened.
-	 */
-	int auto_cache;
-
-	/**
-	 * The timeout in seconds for which file attributes are cached
-	 * for the purpose of checking if auto_cache should flush the
-	 * file data on open.
-	 */
-	int ac_attr_timeout_set;
-	double ac_attr_timeout;
-
-	/**
-	 * If this option is given the file-system handlers for the
-	 * following operations will not receive path information:
-	 * read, write, flush, release, fsync, readdir, releasedir,
-	 * fsyncdir, lock, ioctl and poll.
-	 *
-	 * For the truncate, getattr, chmod, chown and utimens
-	 * operations the path will be provided only if the struct
-	 * fuse_file_info argument is NULL.
-	 */
-	int nullpath_ok;
-
-	/**
-	 * The remaining options are used by libfuse internally and
-	 * should not be touched.
-	 */
-	int show_help;
-	char *modules;
-	int debug;
+    /**
+     * If `set_gid` is non-zero, the st_gid attribute of each file
+     * is overwritten with the value of `gid`.
+     */
+    int set_gid;
+    unsigned int gid;
+
+    /**
+     * If `set_uid` is non-zero, the st_uid attribute of each file
+     * is overwritten with the value of `uid`.
+     */
+    int set_uid;
+    unsigned int uid;
+
+    /**
+     * If `set_mode` is non-zero, the any permissions bits set in
+     * `umask` are unset in the st_mode attribute of each file.
+     */
+    int set_mode;
+    unsigned int umask;
+
+    /**
+     * The timeout in seconds for which name lookups will be
+     * cached.
+     */
+    double entry_timeout;
+
+    /**
+     * The timeout in seconds for which a negative lookup will be
+     * cached. This means, that if file did not exist (lookup
+     * retuned ENOENT), the lookup will only be redone after the
+     * timeout, and the file/directory will be assumed to not
+     * exist until then. A value of zero means that negative
+     * lookups are not cached.
+     */
+    double negative_timeout;
+
+    /**
+     * The timeout in seconds for which file/directory attributes
+     * (as returned by e.g. the `getattr` handler) are cached.
+     */
+    double attr_timeout;
+
+    /**
+     * Allow requests to be interrupted
+     */
+    int intr;
+
+    /**
+     * Specify which signal number to send to the filesystem when
+     * a request is interrupted.  The default is hardcoded to
+     * USR1.
+     */
+    int intr_signal;
+
+    /**
+     * Normally, FUSE assigns inodes to paths only for as long as
+     * the kernel is aware of them. With this option inodes are
+     * instead remembered for at least this many seconds.  This
+     * will require more memory, but may be necessary when using
+     * applications that make use of inode numbers.
+     *
+     * A number of -1 means that inodes will be remembered for the
+     * entire life-time of the file-system process.
+     */
+    int remember;
+
+    /**
+     * The default behavior is that if an open file is deleted,
+     * the file is renamed to a hidden file (.fuse_hiddenXXX), and
+     * only removed when the file is finally released.  This
+     * relieves the filesystem implementation of having to deal
+     * with this problem. This option disables the hiding
+     * behavior, and files are removed immediately in an unlink
+     * operation (or in a rename operation which overwrites an
+     * existing file).
+     *
+     * It is recommended that you not use the hard_remove
+     * option. When hard_remove is set, the following libc
+     * functions fail on unlinked files (returning errno of
+     * ENOENT): read(2), write(2), fsync(2), close(2), f*xattr(2),
+     * ftruncate(2), fstat(2), fchmod(2), fchown(2)
+     */
+    int hard_remove;
+
+    /**
+     * Honor the st_ino field in the functions getattr() and
+     * fill_dir(). This value is used to fill in the st_ino field
+     * in the stat(2), lstat(2), fstat(2) functions and the d_ino
+     * field in the readdir(2) function. The filesystem does not
+     * have to guarantee uniqueness, however some applications
+     * rely on this value being unique for the whole filesystem.
+     *
+     * Note that this does *not* affect the inode that libfuse
+     * and the kernel use internally (also called the "nodeid").
+     */
+    int use_ino;
+
+    /**
+     * If use_ino option is not given, still try to fill in the
+     * d_ino field in readdir(2). If the name was previously
+     * looked up, and is still in the cache, the inode number
+     * found there will be used.  Otherwise it will be set to -1.
+     * If use_ino option is given, this option is ignored.
+     */
+    int readdir_ino;
+
+    /**
+     * This option disables the use of page cache (file content cache)
+     * in the kernel for this filesystem. This has several affects:
+     *
+     * 1. Each read(2) or write(2) system call will initiate one
+     *    or more read or write operations, data will not be
+     *    cached in the kernel.
+     *
+     * 2. The return value of the read() and write() system calls
+     *    will correspond to the return values of the read and
+     *    write operations. This is useful for example if the
+     *    file size is not known in advance (before reading it).
+     *
+     * Internally, enabling this option causes fuse to set the
+     * `direct_io` field of `struct fuse_file_info` - overwriting
+     * any value that was put there by the file system.
+     */
+    int direct_io;
+
+    /**
+     * This option disables flushing the cache of the file
+     * contents on every open(2).  This should only be enabled on
+     * filesystems where the file data is never changed
+     * externally (not through the mounted FUSE filesystem).  Thus
+     * it is not suitable for network filesystems and other
+     * intermediate filesystems.
+     *
+     * NOTE: if this option is not specified (and neither
+     * direct_io) data is still cached after the open(2), so a
+     * read(2) system call will not always initiate a read
+     * operation.
+     *
+     * Internally, enabling this option causes fuse to set the
+     * `keep_cache` field of `struct fuse_file_info` - overwriting
+     * any value that was put there by the file system.
+     */
+    int kernel_cache;
+
+    /**
+     * This option is an alternative to `kernel_cache`. Instead of
+     * unconditionally keeping cached data, the cached data is
+     * invalidated on open(2) if if the modification time or the
+     * size of the file has changed since it was last opened.
+     */
+    int auto_cache;
+
+    /**
+     * The timeout in seconds for which file attributes are cached
+     * for the purpose of checking if auto_cache should flush the
+     * file data on open.
+     */
+    int ac_attr_timeout_set;
+    double ac_attr_timeout;
+
+    /**
+     * If this option is given the file-system handlers for the
+     * following operations will not receive path information:
+     * read, write, flush, release, fsync, readdir, releasedir,
+     * fsyncdir, lock, ioctl and poll.
+     *
+     * For the truncate, getattr, chmod, chown and utimens
+     * operations the path will be provided only if the struct
+     * fuse_file_info argument is NULL.
+     */
+    int nullpath_ok;
+
+    /**
+     * The remaining options are used by libfuse internally and
+     * should not be touched.
+     */
+    int show_help;
+    char *modules;
+    int debug;
 };
 
 
@@ -293,515 +294,535 @@ struct fuse_config {
  * Almost all operations take a path which can be of any length.
  */
 struct fuse_operations {
-	/** Get file attributes.
-	 *
-	 * Similar to stat().  The 'st_dev' and 'st_blksize' fields are
-	 * ignored. The 'st_ino' field is ignored except if the 'use_ino'
-	 * mount option is given. In that case it is passed to userspace,
-	 * but libfuse and the kernel will still assign a different
-	 * inode for internal use (called the "nodeid").
-	 *
-	 * `fi` will always be NULL if the file is not currently open, but
-	 * may also be NULL if the file is open.
-	 */
-	int (*getattr) (const char *, struct stat *, struct fuse_file_info *fi);
-
-	/** Read the target of a symbolic link
-	 *
-	 * The buffer should be filled with a null terminated string.  The
-	 * buffer size argument includes the space for the terminating
-	 * null character.	If the linkname is too long to fit in the
-	 * buffer, it should be truncated.	The return value should be 0
-	 * for success.
-	 */
-	int (*readlink) (const char *, char *, size_t);
-
-	/** Create a file node
-	 *
-	 * This is called for creation of all non-directory, non-symlink
-	 * nodes.  If the filesystem defines a create() method, then for
-	 * regular files that will be called instead.
-	 */
-	int (*mknod) (const char *, mode_t, dev_t);
-
-	/** Create a directory
-	 *
-	 * Note that the mode argument may not have the type specification
-	 * bits set, i.e. S_ISDIR(mode) can be false.  To obtain the
-	 * correct directory type bits use  mode|S_IFDIR
-	 * */
-	int (*mkdir) (const char *, mode_t);
-
-	/** Remove a file */
-	int (*unlink) (const char *);
-
-	/** Remove a directory */
-	int (*rmdir) (const char *);
-
-	/** Create a symbolic link */
-	int (*symlink) (const char *, const char *);
-
-	/** Rename a file
-	 *
-	 * *flags* may be `RENAME_EXCHANGE` or `RENAME_NOREPLACE`. If
-	 * RENAME_NOREPLACE is specified, the filesystem must not
-	 * overwrite *newname* if it exists and return an error
-	 * instead. If `RENAME_EXCHANGE` is specified, the filesystem
-	 * must atomically exchange the two files, i.e. both must
-	 * exist and neither may be deleted.
-	 */
-	int (*rename) (const char *, const char *, unsigned int flags);
-
-	/** Create a hard link to a file */
-	int (*link) (const char *, const char *);
-
-	/** Change the permission bits of a file
-	 *
-	 * `fi` will always be NULL if the file is not currenlty open, but
-	 * may also be NULL if the file is open.
-	 */
-	int (*chmod) (const char *, mode_t, struct fuse_file_info *fi);
-
-	/** Change the owner and group of a file
-	 *
-	 * `fi` will always be NULL if the file is not currenlty open, but
-	 * may also be NULL if the file is open.
-	 *
-	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
-	 * expected to reset the setuid and setgid bits.
-	 */
-	int (*chown) (const char *, uid_t, gid_t, struct fuse_file_info *fi);
-
-	/** Change the size of a file
-	 *
-	 * `fi` will always be NULL if the file is not currenlty open, but
-	 * may also be NULL if the file is open.
-	 *
-	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
-	 * expected to reset the setuid and setgid bits.
-	 */
-	int (*truncate) (const char *, off_t, struct fuse_file_info *fi);
-
-	/** Open a file
-	 *
-	 * Open flags are available in fi->flags. The following rules
-	 * apply.
-	 *
-	 *  - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be
-	 *    filtered out / handled by the kernel.
-	 *
-	 *  - Access modes (O_RDONLY, O_WRONLY, O_RDWR, O_EXEC, O_SEARCH)
-	 *    should be used by the filesystem to check if the operation is
-	 *    permitted.  If the ``-o default_permissions`` mount option is
-	 *    given, this check is already done by the kernel before calling
-	 *    open() and may thus be omitted by the filesystem.
-	 *
-	 *  - When writeback caching is enabled, the kernel may send
-	 *    read requests even for files opened with O_WRONLY. The
-	 *    filesystem should be prepared to handle this.
-	 *
-	 *  - When writeback caching is disabled, the filesystem is
-	 *    expected to properly handle the O_APPEND flag and ensure
-	 *    that each write is appending to the end of the file.
-	 * 
-         *  - When writeback caching is enabled, the kernel will
-	 *    handle O_APPEND. However, unless all changes to the file
-	 *    come through the kernel this will not work reliably. The
-	 *    filesystem should thus either ignore the O_APPEND flag
-	 *    (and let the kernel handle it), or return an error
-	 *    (indicating that reliably O_APPEND is not available).
-	 *
-	 * Filesystem may store an arbitrary file handle (pointer,
-	 * index, etc) in fi->fh, and use this in other all other file
-	 * operations (read, write, flush, release, fsync).
-	 *
-	 * Filesystem may also implement stateless file I/O and not store
-	 * anything in fi->fh.
-	 *
-	 * There are also some flags (direct_io, keep_cache) which the
-	 * filesystem may set in fi, to change the way the file is opened.
-	 * See fuse_file_info structure in <fuse_common.h> for more details.
-	 *
-	 * If this request is answered with an error code of ENOSYS
-	 * and FUSE_CAP_NO_OPEN_SUPPORT is set in
-	 * `fuse_conn_info.capable`, this is treated as success and
-	 * future calls to open will also succeed without being send
-	 * to the filesystem process.
-	 *
-	 */
-	int (*open) (const char *, struct fuse_file_info *);
-
-	/** Read data from an open file
-	 *
-	 * Read should return exactly the number of bytes requested except
-	 * on EOF or error, otherwise the rest of the data will be
-	 * substituted with zeroes.	 An exception to this is when the
-	 * 'direct_io' mount option is specified, in which case the return
-	 * value of the read system call will reflect the return value of
-	 * this operation.
-	 */
-	int (*read) (const char *, char *, size_t, off_t,
-		     struct fuse_file_info *);
-
-	/** Write data to an open file
-	 *
-	 * Write should return exactly the number of bytes requested
-	 * except on error.	 An exception to this is when the 'direct_io'
-	 * mount option is specified (see read operation).
-	 *
-	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
-	 * expected to reset the setuid and setgid bits.
-	 */
-	int (*write) (const char *, const char *, size_t, off_t,
-		      struct fuse_file_info *);
-
-	/** Get file system statistics
-	 *
-	 * The 'f_favail', 'f_fsid' and 'f_flag' fields are ignored
-	 */
-	int (*statfs) (const char *, struct statvfs *);
-
-	/** Possibly flush cached data
-	 *
-	 * BIG NOTE: This is not equivalent to fsync().  It's not a
-	 * request to sync dirty data.
-	 *
-	 * Flush is called on each close() of a file descriptor, as opposed to
-	 * release which is called on the close of the last file descriptor for
-	 * a file.  Under Linux, errors returned by flush() will be passed to 
-	 * userspace as errors from close(), so flush() is a good place to write
-	 * back any cached dirty data. However, many applications ignore errors 
-	 * on close(), and on non-Linux systems, close() may succeed even if flush()
-	 * returns an error. For these reasons, filesystems should not assume
-	 * that errors returned by flush will ever be noticed or even
-	 * delivered.
-	 *
-	 * NOTE: The flush() method may be called more than once for each
-	 * open().  This happens if more than one file descriptor refers to an
-	 * open file handle, e.g. due to dup(), dup2() or fork() calls.  It is
-	 * not possible to determine if a flush is final, so each flush should
-	 * be treated equally.  Multiple write-flush sequences are relatively
-	 * rare, so this shouldn't be a problem.
-	 *
-	 * Filesystems shouldn't assume that flush will be called at any
-	 * particular point.  It may be called more times than expected, or not
-	 * at all.
-	 *
-	 * [close]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
-	 */
-	int (*flush) (const char *, struct fuse_file_info *);
-
-	/** Release an open file
-	 *
-	 * Release is called when there are no more references to an open
-	 * file: all file descriptors are closed and all memory mappings
-	 * are unmapped.
-	 *
-	 * For every open() call there will be exactly one release() call
-	 * with the same flags and file handle.  It is possible to
-	 * have a file opened more than once, in which case only the last
-	 * release will mean, that no more reads/writes will happen on the
-	 * file.  The return value of release is ignored.
-	 */
-	int (*release) (const char *, struct fuse_file_info *);
-
-	/** Synchronize file contents
-	 *
-	 * If the datasync parameter is non-zero, then only the user data
-	 * should be flushed, not the meta data.
-	 */
-	int (*fsync) (const char *, int, struct fuse_file_info *);
-
-	/** Set extended attributes */
-	int (*setxattr) (const char *, const char *, const char *, size_t, int);
-
-	/** Get extended attributes */
-	int (*getxattr) (const char *, const char *, char *, size_t);
-
-	/** List extended attributes */
-	int (*listxattr) (const char *, char *, size_t);
-
-	/** Remove extended attributes */
-	int (*removexattr) (const char *, const char *);
-
-	/** Open directory
-	 *
-	 * Unless the 'default_permissions' mount option is given,
-	 * this method should check if opendir is permitted for this
-	 * directory. Optionally opendir may also return an arbitrary
-	 * filehandle in the fuse_file_info structure, which will be
-	 * passed to readdir, releasedir and fsyncdir.
-	 */
-	int (*opendir) (const char *, struct fuse_file_info *);
-
-	/** Read directory
-	 *
-	 * The filesystem may choose between two modes of operation:
-	 *
-	 * 1) The readdir implementation ignores the offset parameter, and
-	 * passes zero to the filler function's offset.  The filler
-	 * function will not return '1' (unless an error happens), so the
-	 * whole directory is read in a single readdir operation.
-	 *
-	 * 2) The readdir implementation keeps track of the offsets of the
-	 * directory entries.  It uses the offset parameter and always
-	 * passes non-zero offset to the filler function.  When the buffer
-	 * is full (or an error happens) the filler function will return
-	 * '1'.
-	 */
-	int (*readdir) (const char *, void *, fuse_fill_dir_t, off_t,
-			struct fuse_file_info *, enum fuse_readdir_flags);
-
-	/** Release directory
-	 */
-	int (*releasedir) (const char *, struct fuse_file_info *);
-
-	/** Synchronize directory contents
-	 *
-	 * If the datasync parameter is non-zero, then only the user data
-	 * should be flushed, not the meta data
-	 */
-	int (*fsyncdir) (const char *, int, struct fuse_file_info *);
-
-	/**
-	 * Initialize filesystem
-	 *
-	 * The return value will passed in the `private_data` field of
-	 * `struct fuse_context` to all file operations, and as a
-	 * parameter to the destroy() method. It overrides the initial
-	 * value provided to fuse_main() / fuse_new().
-	 */
-	void *(*init) (struct fuse_conn_info *conn,
-		       struct fuse_config *cfg);
-
-	/**
-	 * Clean up filesystem
-	 *
-	 * Called on filesystem exit.
-	 */
-	void (*destroy) (void *private_data);
-
-	/**
-	 * Check file access permissions
-	 *
-	 * This will be called for the access() system call.  If the
-	 * 'default_permissions' mount option is given, this method is not
-	 * called.
-	 *
-	 * This method is not called under Linux kernel versions 2.4.x
-	 */
-	int (*access) (const char *, int);
-
-	/**
-	 * Create and open a file
-	 *
-	 * If the file does not exist, first create it with the specified
-	 * mode, and then open it.
-	 *
-	 * If this method is not implemented or under Linux kernel
-	 * versions earlier than 2.6.15, the mknod() and open() methods
-	 * will be called instead.
-	 */
-	int (*create) (const char *, mode_t, struct fuse_file_info *);
-
-	/**
-	 * Perform POSIX file locking operation
-	 *
-	 * The cmd argument will be either F_GETLK, F_SETLK or F_SETLKW.
-	 *
-	 * For the meaning of fields in 'struct flock' see the man page
-	 * for fcntl(2).  The l_whence field will always be set to
-	 * SEEK_SET.
-	 *
-	 * For checking lock ownership, the 'fuse_file_info->owner'
-	 * argument must be used.
-	 *
-	 * For F_GETLK operation, the library will first check currently
-	 * held locks, and if a conflicting lock is found it will return
-	 * information without calling this method.	 This ensures, that
-	 * for local locks the l_pid field is correctly filled in.	The
-	 * results may not be accurate in case of race conditions and in
-	 * the presence of hard links, but it's unlikely that an
-	 * application would rely on accurate GETLK results in these
-	 * cases.  If a conflicting lock is not found, this method will be
-	 * called, and the filesystem may fill out l_pid by a meaningful
-	 * value, or it may leave this field zero.
-	 *
-	 * For F_SETLK and F_SETLKW the l_pid field will be set to the pid
-	 * of the process performing the locking operation.
-	 *
-	 * Note: if this method is not implemented, the kernel will still
-	 * allow file locking to work locally.  Hence it is only
-	 * interesting for network filesystems and similar.
-	 */
-	int (*lock) (const char *, struct fuse_file_info *, int cmd,
-		     struct flock *);
-
-	/**
-	 * Change the access and modification times of a file with
-	 * nanosecond resolution
-	 *
-	 * This supersedes the old utime() interface.  New applications
-	 * should use this.
-	 *
-	 * `fi` will always be NULL if the file is not currenlty open, but
-	 * may also be NULL if the file is open.
-	 *
-	 * See the utimensat(2) man page for details.
-	 */
-	 int (*utimens) (const char *, const struct timespec tv[2],
-			 struct fuse_file_info *fi);
-
-	/**
-	 * Map block index within file to block index within device
-	 *
-	 * Note: This makes sense only for block device backed filesystems
-	 * mounted with the 'blkdev' option
-	 */
-	int (*bmap) (const char *, size_t blocksize, uint64_t *idx);
-
-	/**
-	 * Ioctl
-	 *
-	 * flags will have FUSE_IOCTL_COMPAT set for 32bit ioctls in
-	 * 64bit environment.  The size and direction of data is
-	 * determined by _IOC_*() decoding of cmd.  For _IOC_NONE,
-	 * data will be NULL, for _IOC_WRITE data is out area, for
-	 * _IOC_READ in area and if both are set in/out area.  In all
-	 * non-NULL cases, the area is of _IOC_SIZE(cmd) bytes.
-	 *
-	 * If flags has FUSE_IOCTL_DIR then the fuse_file_info refers to a
-	 * directory file handle.
-	 *
-	 * Note : the unsigned long request submitted by the application
-	 * is truncated to 32 bits.
-	 */
-	int (*ioctl) (const char *, unsigned int cmd, void *arg,
-		      struct fuse_file_info *, unsigned int flags, void *data);
-
-	/**
-	 * Poll for IO readiness events
-	 *
-	 * Note: If ph is non-NULL, the client should notify
-	 * when IO readiness events occur by calling
-	 * fuse_notify_poll() with the specified ph.
-	 *
-	 * Regardless of the number of times poll with a non-NULL ph
-	 * is received, single notification is enough to clear all.
-	 * Notifying more times incurs overhead but doesn't harm
-	 * correctness.
-	 *
-	 * The callee is responsible for destroying ph with
-	 * fuse_pollhandle_destroy() when no longer in use.
-	 */
-	int (*poll) (const char *, struct fuse_file_info *,
-		     struct fuse_pollhandle *ph, unsigned *reventsp);
-
-	/** Write contents of buffer to an open file
-	 *
-	 * Similar to the write() method, but data is supplied in a
-	 * generic buffer.  Use fuse_buf_copy() to transfer data to
-	 * the destination.
-	 *
-	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
-	 * expected to reset the setuid and setgid bits.
-	 */
-	int (*write_buf) (const char *, struct fuse_bufvec *buf, off_t off,
-			  struct fuse_file_info *);
-
-	/** Store data from an open file in a buffer
-	 *
-	 * Similar to the read() method, but data is stored and
-	 * returned in a generic buffer.
-	 *
-	 * No actual copying of data has to take place, the source
-	 * file descriptor may simply be stored in the buffer for
-	 * later data transfer.
-	 *
-	 * The buffer must be allocated dynamically and stored at the
-	 * location pointed to by bufp.  If the buffer contains memory
-	 * regions, they too must be allocated using malloc().  The
-	 * allocated memory will be freed by the caller.
-	 */
-	int (*read_buf) (const char *, struct fuse_bufvec **bufp,
-			 size_t size, off_t off, struct fuse_file_info *);
-	/**
-	 * Perform BSD file locking operation
-	 *
-	 * The op argument will be either LOCK_SH, LOCK_EX or LOCK_UN
-	 *
-	 * Nonblocking requests will be indicated by ORing LOCK_NB to
-	 * the above operations
-	 *
-	 * For more information see the flock(2) manual page.
-	 *
-	 * Additionally fi->owner will be set to a value unique to
-	 * this open file.  This same value will be supplied to
-	 * ->release() when the file is released.
-	 *
-	 * Note: if this method is not implemented, the kernel will still
-	 * allow file locking to work locally.  Hence it is only
-	 * interesting for network filesystems and similar.
-	 */
-	int (*flock) (const char *, struct fuse_file_info *, int op);
-
-	/**
-	 * Allocates space for an open file
-	 *
-	 * This function ensures that required space is allocated for specified
-	 * file.  If this function returns success then any subsequent write
-	 * request to specified range is guaranteed not to fail because of lack
-	 * of space on the file system media.
-	 */
-	int (*fallocate) (const char *, int, off_t, off_t,
-			  struct fuse_file_info *);
-
-	/**
-	 * Copy a range of data from one file to another
-	 *
-	 * Performs an optimized copy between two file descriptors without the
-	 * additional cost of transferring data through the FUSE kernel module
-	 * to user space (glibc) and then back into the FUSE filesystem again.
-	 *
-	 * In case this method is not implemented, glibc falls back to reading
-	 * data from the source and writing to the destination. Effectively
-	 * doing an inefficient copy of the data.
-	 */
-	ssize_t (*copy_file_range) (const char *path_in,
-				    struct fuse_file_info *fi_in,
-				    off_t offset_in, const char *path_out,
-				    struct fuse_file_info *fi_out,
-				    off_t offset_out, size_t size, int flags);
-
-	/**
-	 * Find next data or hole after the specified offset
-	 */
-	off_t (*lseek) (const char *, off_t off, int whence, struct fuse_file_info *);
+    /**
+     * Get file attributes.
+     *
+     * Similar to stat().  The 'st_dev' and 'st_blksize' fields are
+     * ignored. The 'st_ino' field is ignored except if the 'use_ino'
+     * mount option is given. In that case it is passed to userspace,
+     * but libfuse and the kernel will still assign a different
+     * inode for internal use (called the "nodeid").
+     *
+     * `fi` will always be NULL if the file is not currently open, but
+     * may also be NULL if the file is open.
+     */
+    int (*getattr)(const char *, struct stat *, struct fuse_file_info *fi);
+
+    /**
+     * Read the target of a symbolic link
+     *
+     * The buffer should be filled with a null terminated string.  The
+     * buffer size argument includes the space for the terminating
+     * null character. If the linkname is too long to fit in the
+     * buffer, it should be truncated. The return value should be 0
+     * for success.
+     */
+    int (*readlink)(const char *, char *, size_t);
+
+    /**
+     * Create a file node
+     *
+     * This is called for creation of all non-directory, non-symlink
+     * nodes.  If the filesystem defines a create() method, then for
+     * regular files that will be called instead.
+     */
+    int (*mknod)(const char *, mode_t, dev_t);
+
+    /**
+     * Create a directory
+     *
+     * Note that the mode argument may not have the type specification
+     * bits set, i.e. S_ISDIR(mode) can be false.  To obtain the
+     * correct directory type bits use  mode|S_IFDIR
+     */
+    int (*mkdir)(const char *, mode_t);
+
+    /** Remove a file */
+    int (*unlink)(const char *);
+
+    /** Remove a directory */
+    int (*rmdir)(const char *);
+
+    /** Create a symbolic link */
+    int (*symlink)(const char *, const char *);
+
+    /**
+     * Rename a file
+     *
+     * *flags* may be `RENAME_EXCHANGE` or `RENAME_NOREPLACE`. If
+     * RENAME_NOREPLACE is specified, the filesystem must not
+     * overwrite *newname* if it exists and return an error
+     * instead. If `RENAME_EXCHANGE` is specified, the filesystem
+     * must atomically exchange the two files, i.e. both must
+     * exist and neither may be deleted.
+     */
+    int (*rename)(const char *, const char *, unsigned int flags);
+
+    /** Create a hard link to a file */
+    int (*link)(const char *, const char *);
+
+    /**
+     * Change the permission bits of a file
+     *
+     * `fi` will always be NULL if the file is not currenlty open, but
+     * may also be NULL if the file is open.
+     */
+    int (*chmod)(const char *, mode_t, struct fuse_file_info *fi);
+
+    /**
+     * Change the owner and group of a file
+     *
+     * `fi` will always be NULL if the file is not currenlty open, but
+     * may also be NULL if the file is open.
+     *
+     * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+     * expected to reset the setuid and setgid bits.
+     */
+    int (*chown)(const char *, uid_t, gid_t, struct fuse_file_info *fi);
+
+    /**
+     * Change the size of a file
+     *
+     * `fi` will always be NULL if the file is not currenlty open, but
+     * may also be NULL if the file is open.
+     *
+     * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+     * expected to reset the setuid and setgid bits.
+     */
+    int (*truncate)(const char *, off_t, struct fuse_file_info *fi);
+
+    /**
+     * Open a file
+     *
+     * Open flags are available in fi->flags. The following rules
+     * apply.
+     *
+     *  - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be
+     *    filtered out / handled by the kernel.
+     *
+     *  - Access modes (O_RDONLY, O_WRONLY, O_RDWR, O_EXEC, O_SEARCH)
+     *    should be used by the filesystem to check if the operation is
+     *    permitted.  If the ``-o default_permissions`` mount option is
+     *    given, this check is already done by the kernel before calling
+     *    open() and may thus be omitted by the filesystem.
+     *
+     *  - When writeback caching is enabled, the kernel may send
+     *    read requests even for files opened with O_WRONLY. The
+     *    filesystem should be prepared to handle this.
+     *
+     *  - When writeback caching is disabled, the filesystem is
+     *    expected to properly handle the O_APPEND flag and ensure
+     *    that each write is appending to the end of the file.
+     *
+     *  - When writeback caching is enabled, the kernel will
+     *    handle O_APPEND. However, unless all changes to the file
+     *    come through the kernel this will not work reliably. The
+     *    filesystem should thus either ignore the O_APPEND flag
+     *    (and let the kernel handle it), or return an error
+     *    (indicating that reliably O_APPEND is not available).
+     *
+     * Filesystem may store an arbitrary file handle (pointer,
+     * index, etc) in fi->fh, and use this in other all other file
+     * operations (read, write, flush, release, fsync).
+     *
+     * Filesystem may also implement stateless file I/O and not store
+     * anything in fi->fh.
+     *
+     * There are also some flags (direct_io, keep_cache) which the
+     * filesystem may set in fi, to change the way the file is opened.
+     * See fuse_file_info structure in <fuse_common.h> for more details.
+     *
+     * If this request is answered with an error code of ENOSYS
+     * and FUSE_CAP_NO_OPEN_SUPPORT is set in
+     * `fuse_conn_info.capable`, this is treated as success and
+     * future calls to open will also succeed without being send
+     * to the filesystem process.
+     *
+     */
+    int (*open)(const char *, struct fuse_file_info *);
+
+    /**
+     * Read data from an open file
+     *
+     * Read should return exactly the number of bytes requested except
+     * on EOF or error, otherwise the rest of the data will be
+     * substituted with zeroes.  An exception to this is when the
+     * 'direct_io' mount option is specified, in which case the return
+     * value of the read system call will reflect the return value of
+     * this operation.
+     */
+    int (*read)(const char *, char *, size_t, off_t, struct fuse_file_info *);
+
+    /**
+     * Write data to an open file
+     *
+     * Write should return exactly the number of bytes requested
+     * except on error.  An exception to this is when the 'direct_io'
+     * mount option is specified (see read operation).
+     *
+     * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+     * expected to reset the setuid and setgid bits.
+     */
+    int (*write)(const char *, const char *, size_t, off_t,
+                 struct fuse_file_info *);
+
+    /**
+     * Get file system statistics
+     *
+     * The 'f_favail', 'f_fsid' and 'f_flag' fields are ignored
+     */
+    int (*statfs)(const char *, struct statvfs *);
+
+    /**
+     * Possibly flush cached data
+     *
+     * BIG NOTE: This is not equivalent to fsync().  It's not a
+     * request to sync dirty data.
+     *
+     * Flush is called on each close() of a file descriptor, as opposed to
+     * release which is called on the close of the last file descriptor for
+     * a file.  Under Linux, errors returned by flush() will be passed to
+     * userspace as errors from close(), so flush() is a good place to write
+     * back any cached dirty data. However, many applications ignore errors
+     * on close(), and on non-Linux systems, close() may succeed even if flush()
+     * returns an error. For these reasons, filesystems should not assume
+     * that errors returned by flush will ever be noticed or even
+     * delivered.
+     *
+     * NOTE: The flush() method may be called more than once for each
+     * open().  This happens if more than one file descriptor refers to an
+     * open file handle, e.g. due to dup(), dup2() or fork() calls.  It is
+     * not possible to determine if a flush is final, so each flush should
+     * be treated equally.  Multiple write-flush sequences are relatively
+     * rare, so this shouldn't be a problem.
+     *
+     * Filesystems shouldn't assume that flush will be called at any
+     * particular point.  It may be called more times than expected, or not
+     * at all.
+     *
+     * [close]:
+     * http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
+     */
+    int (*flush)(const char *, struct fuse_file_info *);
+
+    /**
+     * Release an open file
+     *
+     * Release is called when there are no more references to an open
+     * file: all file descriptors are closed and all memory mappings
+     * are unmapped.
+     *
+     * For every open() call there will be exactly one release() call
+     * with the same flags and file handle.  It is possible to
+     * have a file opened more than once, in which case only the last
+     * release will mean, that no more reads/writes will happen on the
+     * file.  The return value of release is ignored.
+     */
+    int (*release)(const char *, struct fuse_file_info *);
+
+    /*
+     * Synchronize file contents
+     *
+     * If the datasync parameter is non-zero, then only the user data
+     * should be flushed, not the meta data.
+     */
+    int (*fsync)(const char *, int, struct fuse_file_info *);
+
+    /** Set extended attributes */
+    int (*setxattr)(const char *, const char *, const char *, size_t, int);
+
+    /** Get extended attributes */
+    int (*getxattr)(const char *, const char *, char *, size_t);
+
+    /** List extended attributes */
+    int (*listxattr)(const char *, char *, size_t);
+
+    /** Remove extended attributes */
+    int (*removexattr)(const char *, const char *);
+
+    /*
+     * Open directory
+     *
+     * Unless the 'default_permissions' mount option is given,
+     * this method should check if opendir is permitted for this
+     * directory. Optionally opendir may also return an arbitrary
+     * filehandle in the fuse_file_info structure, which will be
+     * passed to readdir, releasedir and fsyncdir.
+     */
+    int (*opendir)(const char *, struct fuse_file_info *);
+
+    /*
+     * Read directory
+     *
+     * The filesystem may choose between two modes of operation:
+     *
+     * 1) The readdir implementation ignores the offset parameter, and
+     * passes zero to the filler function's offset.  The filler
+     * function will not return '1' (unless an error happens), so the
+     * whole directory is read in a single readdir operation.
+     *
+     * 2) The readdir implementation keeps track of the offsets of the
+     * directory entries.  It uses the offset parameter and always
+     * passes non-zero offset to the filler function.  When the buffer
+     * is full (or an error happens) the filler function will return
+     * '1'.
+     */
+    int (*readdir)(const char *, void *, fuse_fill_dir_t, off_t,
+                   struct fuse_file_info *, enum fuse_readdir_flags);
+
+    /**
+     *  Release directory
+     */
+    int (*releasedir)(const char *, struct fuse_file_info *);
+
+    /**
+     * Synchronize directory contents
+     *
+     * If the datasync parameter is non-zero, then only the user data
+     * should be flushed, not the meta data
+     */
+    int (*fsyncdir)(const char *, int, struct fuse_file_info *);
+
+    /**
+     * Initialize filesystem
+     *
+     * The return value will passed in the `private_data` field of
+     * `struct fuse_context` to all file operations, and as a
+     * parameter to the destroy() method. It overrides the initial
+     * value provided to fuse_main() / fuse_new().
+     */
+    void *(*init)(struct fuse_conn_info *conn, struct fuse_config *cfg);
+
+    /**
+     * Clean up filesystem
+     *
+     * Called on filesystem exit.
+     */
+    void (*destroy)(void *private_data);
+
+    /**
+     * Check file access permissions
+     *
+     * This will be called for the access() system call.  If the
+     * 'default_permissions' mount option is given, this method is not
+     * called.
+     *
+     * This method is not called under Linux kernel versions 2.4.x
+     */
+    int (*access)(const char *, int);
+
+    /**
+     * Create and open a file
+     *
+     * If the file does not exist, first create it with the specified
+     * mode, and then open it.
+     *
+     * If this method is not implemented or under Linux kernel
+     * versions earlier than 2.6.15, the mknod() and open() methods
+     * will be called instead.
+     */
+    int (*create)(const char *, mode_t, struct fuse_file_info *);
+
+    /**
+     * Perform POSIX file locking operation
+     *
+     * The cmd argument will be either F_GETLK, F_SETLK or F_SETLKW.
+     *
+     * For the meaning of fields in 'struct flock' see the man page
+     * for fcntl(2).  The l_whence field will always be set to
+     * SEEK_SET.
+     *
+     * For checking lock ownership, the 'fuse_file_info->owner'
+     * argument must be used.
+     *
+     * For F_GETLK operation, the library will first check currently
+     * held locks, and if a conflicting lock is found it will return
+     * information without calling this method.  This ensures, that
+     * for local locks the l_pid field is correctly filled in. The
+     * results may not be accurate in case of race conditions and in
+     * the presence of hard links, but it's unlikely that an
+     * application would rely on accurate GETLK results in these
+     * cases.  If a conflicting lock is not found, this method will be
+     * called, and the filesystem may fill out l_pid by a meaningful
+     * value, or it may leave this field zero.
+     *
+     * For F_SETLK and F_SETLKW the l_pid field will be set to the pid
+     * of the process performing the locking operation.
+     *
+     * Note: if this method is not implemented, the kernel will still
+     * allow file locking to work locally.  Hence it is only
+     * interesting for network filesystems and similar.
+     */
+    int (*lock)(const char *, struct fuse_file_info *, int cmd, struct flock *);
+
+    /**
+     * Change the access and modification times of a file with
+     * nanosecond resolution
+     *
+     * This supersedes the old utime() interface.  New applications
+     * should use this.
+     *
+     * `fi` will always be NULL if the file is not currenlty open, but
+     * may also be NULL if the file is open.
+     *
+     * See the utimensat(2) man page for details.
+     */
+    int (*utimens)(const char *, const struct timespec tv[2],
+                   struct fuse_file_info *fi);
+
+    /**
+     * Map block index within file to block index within device
+     *
+     * Note: This makes sense only for block device backed filesystems
+     * mounted with the 'blkdev' option
+     */
+    int (*bmap)(const char *, size_t blocksize, uint64_t *idx);
+
+    /**
+     * Ioctl
+     *
+     * flags will have FUSE_IOCTL_COMPAT set for 32bit ioctls in
+     * 64bit environment.  The size and direction of data is
+     * determined by _IOC_*() decoding of cmd.  For _IOC_NONE,
+     * data will be NULL, for _IOC_WRITE data is out area, for
+     * _IOC_READ in area and if both are set in/out area.  In all
+     * non-NULL cases, the area is of _IOC_SIZE(cmd) bytes.
+     *
+     * If flags has FUSE_IOCTL_DIR then the fuse_file_info refers to a
+     * directory file handle.
+     *
+     * Note : the unsigned long request submitted by the application
+     * is truncated to 32 bits.
+     */
+    int (*ioctl)(const char *, unsigned int cmd, void *arg,
+                 struct fuse_file_info *, unsigned int flags, void *data);
+
+    /**
+     * Poll for IO readiness events
+     *
+     * Note: If ph is non-NULL, the client should notify
+     * when IO readiness events occur by calling
+     * fuse_notify_poll() with the specified ph.
+     *
+     * Regardless of the number of times poll with a non-NULL ph
+     * is received, single notification is enough to clear all.
+     * Notifying more times incurs overhead but doesn't harm
+     * correctness.
+     *
+     * The callee is responsible for destroying ph with
+     * fuse_pollhandle_destroy() when no longer in use.
+     */
+    int (*poll)(const char *, struct fuse_file_info *,
+                struct fuse_pollhandle *ph, unsigned *reventsp);
+
+    /*
+     * Write contents of buffer to an open file
+     *
+     * Similar to the write() method, but data is supplied in a
+     * generic buffer.  Use fuse_buf_copy() to transfer data to
+     * the destination.
+     *
+     * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+     * expected to reset the setuid and setgid bits.
+     */
+    int (*write_buf)(const char *, struct fuse_bufvec *buf, off_t off,
+                     struct fuse_file_info *);
+
+    /*
+     *  Store data from an open file in a buffer
+     *
+     * Similar to the read() method, but data is stored and
+     * returned in a generic buffer.
+     *
+     * No actual copying of data has to take place, the source
+     * file descriptor may simply be stored in the buffer for
+     * later data transfer.
+     *
+     * The buffer must be allocated dynamically and stored at the
+     * location pointed to by bufp.  If the buffer contains memory
+     * regions, they too must be allocated using malloc().  The
+     * allocated memory will be freed by the caller.
+     */
+    int (*read_buf)(const char *, struct fuse_bufvec **bufp, size_t size,
+                    off_t off, struct fuse_file_info *);
+    /**
+     * Perform BSD file locking operation
+     *
+     * The op argument will be either LOCK_SH, LOCK_EX or LOCK_UN
+     *
+     * Nonblocking requests will be indicated by ORing LOCK_NB to
+     * the above operations
+     *
+     * For more information see the flock(2) manual page.
+     *
+     * Additionally fi->owner will be set to a value unique to
+     * this open file.  This same value will be supplied to
+     * ->release() when the file is released.
+     *
+     * Note: if this method is not implemented, the kernel will still
+     * allow file locking to work locally.  Hence it is only
+     * interesting for network filesystems and similar.
+     */
+    int (*flock)(const char *, struct fuse_file_info *, int op);
+
+    /**
+     * Allocates space for an open file
+     *
+     * This function ensures that required space is allocated for specified
+     * file.  If this function returns success then any subsequent write
+     * request to specified range is guaranteed not to fail because of lack
+     * of space on the file system media.
+     */
+    int (*fallocate)(const char *, int, off_t, off_t, struct fuse_file_info *);
+
+    /**
+     * Copy a range of data from one file to another
+     *
+     * Performs an optimized copy between two file descriptors without the
+     * additional cost of transferring data through the FUSE kernel module
+     * to user space (glibc) and then back into the FUSE filesystem again.
+     *
+     * In case this method is not implemented, glibc falls back to reading
+     * data from the source and writing to the destination. Effectively
+     * doing an inefficient copy of the data.
+     */
+    ssize_t (*copy_file_range)(const char *path_in,
+                               struct fuse_file_info *fi_in, off_t offset_in,
+                               const char *path_out,
+                               struct fuse_file_info *fi_out, off_t offset_out,
+                               size_t size, int flags);
+
+    /**
+     * Find next data or hole after the specified offset
+     */
+    off_t (*lseek)(const char *, off_t off, int whence,
+                   struct fuse_file_info *);
 };
 
-/** Extra context that may be needed by some filesystems
+/*
+ * Extra context that may be needed by some filesystems
  *
  * The uid, gid and pid fields are not filled in case of a writepage
  * operation.
  */
 struct fuse_context {
-	/** Pointer to the fuse object */
-	struct fuse *fuse;
+    /** Pointer to the fuse object */
+    struct fuse *fuse;
 
-	/** User ID of the calling process */
-	uid_t uid;
+    /** User ID of the calling process */
+    uid_t uid;
 
-	/** Group ID of the calling process */
-	gid_t gid;
+    /** Group ID of the calling process */
+    gid_t gid;
 
-	/** Process ID of the calling thread */
-	pid_t pid;
+    /** Process ID of the calling thread */
+    pid_t pid;
 
-	/** Private filesystem data */
-	void *private_data;
+    /** Private filesystem data */
+    void *private_data;
 
-	/** Umask of the calling process */
-	mode_t umask;
+    /** Umask of the calling process */
+    mode_t umask;
 };
 
 /**
@@ -859,15 +880,15 @@ struct fuse_context {
  * Example usage, see hello.c
  */
 /*
-  int fuse_main(int argc, char *argv[], const struct fuse_operations *op,
-  void *private_data);
-*/
-#define fuse_main(argc, argv, op, private_data)				\
-	fuse_main_real(argc, argv, op, sizeof(*(op)), private_data)
+ * int fuse_main(int argc, char *argv[], const struct fuse_operations *op,
+ * void *private_data);
+ */
+#define fuse_main(argc, argv, op, private_data) \
+    fuse_main_real(argc, argv, op, sizeof(*(op)), private_data)
 
-/* ----------------------------------------------------------- *
- * More detailed API					       *
- * ----------------------------------------------------------- */
+/*
+ * More detailed API
+ */
 
 /**
  * Print available options (high- and low-level) to stdout.  This is
@@ -910,12 +931,13 @@ void fuse_lib_help(struct fuse_args *args);
  * @return the created FUSE handle
  */
 #if FUSE_USE_VERSION == 30
-struct fuse *fuse_new_30(struct fuse_args *args, const struct fuse_operations *op,
-			 size_t op_size, void *private_data);
+struct fuse *fuse_new_30(struct fuse_args *args,
+                         const struct fuse_operations *op, size_t op_size,
+                         void *private_data);
 #define fuse_new(args, op, size, data) fuse_new_30(args, op, size, data)
 #else
 struct fuse *fuse_new(struct fuse_args *args, const struct fuse_operations *op,
-		      size_t op_size, void *private_data);
+                      size_t op_size, void *private_data);
 #endif
 
 /**
@@ -940,7 +962,7 @@ void fuse_unmount(struct fuse *f);
 /**
  * Destroy the FUSE handle.
  *
- * NOTE: This function does not unmount the filesystem.	 If this is
+ * NOTE: This function does not unmount the filesystem.  If this is
  * needed, call fuse_unmount() before calling this function.
  *
  * @param f the FUSE handle
@@ -1068,7 +1090,7 @@ int fuse_invalidate_path(struct fuse *f, const char *path);
  * Do not call this directly, use fuse_main()
  */
 int fuse_main_real(int argc, char *argv[], const struct fuse_operations *op,
-		   size_t op_size, void *private_data);
+                   size_t op_size, void *private_data);
 
 /**
  * Start the cleanup thread when using option "remember".
@@ -1119,89 +1141,87 @@ struct fuse_fs;
  */
 
 int fuse_fs_getattr(struct fuse_fs *fs, const char *path, struct stat *buf,
-		    struct fuse_file_info *fi);
-int fuse_fs_rename(struct fuse_fs *fs, const char *oldpath,
-		   const char *newpath, unsigned int flags);
+                    struct fuse_file_info *fi);
+int fuse_fs_rename(struct fuse_fs *fs, const char *oldpath, const char *newpath,
+                   unsigned int flags);
 int fuse_fs_unlink(struct fuse_fs *fs, const char *path);
 int fuse_fs_rmdir(struct fuse_fs *fs, const char *path);
-int fuse_fs_symlink(struct fuse_fs *fs, const char *linkname,
-		    const char *path);
+int fuse_fs_symlink(struct fuse_fs *fs, const char *linkname, const char *path);
 int fuse_fs_link(struct fuse_fs *fs, const char *oldpath, const char *newpath);
-int fuse_fs_release(struct fuse_fs *fs,	 const char *path,
-		    struct fuse_file_info *fi);
+int fuse_fs_release(struct fuse_fs *fs, const char *path,
+                    struct fuse_file_info *fi);
 int fuse_fs_open(struct fuse_fs *fs, const char *path,
-		 struct fuse_file_info *fi);
+                 struct fuse_file_info *fi);
 int fuse_fs_read(struct fuse_fs *fs, const char *path, char *buf, size_t size,
-		 off_t off, struct fuse_file_info *fi);
+                 off_t off, struct fuse_file_info *fi);
 int fuse_fs_read_buf(struct fuse_fs *fs, const char *path,
-		     struct fuse_bufvec **bufp, size_t size, off_t off,
-		     struct fuse_file_info *fi);
+                     struct fuse_bufvec **bufp, size_t size, off_t off,
+                     struct fuse_file_info *fi);
 int fuse_fs_write(struct fuse_fs *fs, const char *path, const char *buf,
-		  size_t size, off_t off, struct fuse_file_info *fi);
+                  size_t size, off_t off, struct fuse_file_info *fi);
 int fuse_fs_write_buf(struct fuse_fs *fs, const char *path,
-		      struct fuse_bufvec *buf, off_t off,
-		      struct fuse_file_info *fi);
+                      struct fuse_bufvec *buf, off_t off,
+                      struct fuse_file_info *fi);
 int fuse_fs_fsync(struct fuse_fs *fs, const char *path, int datasync,
-		  struct fuse_file_info *fi);
+                  struct fuse_file_info *fi);
 int fuse_fs_flush(struct fuse_fs *fs, const char *path,
-		  struct fuse_file_info *fi);
+                  struct fuse_file_info *fi);
 int fuse_fs_statfs(struct fuse_fs *fs, const char *path, struct statvfs *buf);
 int fuse_fs_opendir(struct fuse_fs *fs, const char *path,
-		    struct fuse_file_info *fi);
+                    struct fuse_file_info *fi);
 int fuse_fs_readdir(struct fuse_fs *fs, const char *path, void *buf,
-		    fuse_fill_dir_t filler, off_t off,
-		    struct fuse_file_info *fi, enum fuse_readdir_flags flags);
+                    fuse_fill_dir_t filler, off_t off,
+                    struct fuse_file_info *fi, enum fuse_readdir_flags flags);
 int fuse_fs_fsyncdir(struct fuse_fs *fs, const char *path, int datasync,
-		     struct fuse_file_info *fi);
+                     struct fuse_file_info *fi);
 int fuse_fs_releasedir(struct fuse_fs *fs, const char *path,
-		       struct fuse_file_info *fi);
+                       struct fuse_file_info *fi);
 int fuse_fs_create(struct fuse_fs *fs, const char *path, mode_t mode,
-		   struct fuse_file_info *fi);
+                   struct fuse_file_info *fi);
 int fuse_fs_lock(struct fuse_fs *fs, const char *path,
-		 struct fuse_file_info *fi, int cmd, struct flock *lock);
+                 struct fuse_file_info *fi, int cmd, struct flock *lock);
 int fuse_fs_flock(struct fuse_fs *fs, const char *path,
-		  struct fuse_file_info *fi, int op);
+                  struct fuse_file_info *fi, int op);
 int fuse_fs_chmod(struct fuse_fs *fs, const char *path, mode_t mode,
-		  struct fuse_file_info *fi);
+                  struct fuse_file_info *fi);
 int fuse_fs_chown(struct fuse_fs *fs, const char *path, uid_t uid, gid_t gid,
-		  struct fuse_file_info *fi);
+                  struct fuse_file_info *fi);
 int fuse_fs_truncate(struct fuse_fs *fs, const char *path, off_t size,
-		     struct fuse_file_info *fi);
+                     struct fuse_file_info *fi);
 int fuse_fs_utimens(struct fuse_fs *fs, const char *path,
-		    const struct timespec tv[2], struct fuse_file_info *fi);
+                    const struct timespec tv[2], struct fuse_file_info *fi);
 int fuse_fs_access(struct fuse_fs *fs, const char *path, int mask);
 int fuse_fs_readlink(struct fuse_fs *fs, const char *path, char *buf,
-		     size_t len);
+                     size_t len);
 int fuse_fs_mknod(struct fuse_fs *fs, const char *path, mode_t mode,
-		  dev_t rdev);
+                  dev_t rdev);
 int fuse_fs_mkdir(struct fuse_fs *fs, const char *path, mode_t mode);
 int fuse_fs_setxattr(struct fuse_fs *fs, const char *path, const char *name,
-		     const char *value, size_t size, int flags);
+                     const char *value, size_t size, int flags);
 int fuse_fs_getxattr(struct fuse_fs *fs, const char *path, const char *name,
-		     char *value, size_t size);
+                     char *value, size_t size);
 int fuse_fs_listxattr(struct fuse_fs *fs, const char *path, char *list,
-		      size_t size);
-int fuse_fs_removexattr(struct fuse_fs *fs, const char *path,
-			const char *name);
+                      size_t size);
+int fuse_fs_removexattr(struct fuse_fs *fs, const char *path, const char *name);
 int fuse_fs_bmap(struct fuse_fs *fs, const char *path, size_t blocksize,
-		 uint64_t *idx);
+                 uint64_t *idx);
 int fuse_fs_ioctl(struct fuse_fs *fs, const char *path, unsigned int cmd,
-		  void *arg, struct fuse_file_info *fi, unsigned int flags,
-		  void *data);
+                  void *arg, struct fuse_file_info *fi, unsigned int flags,
+                  void *data);
 int fuse_fs_poll(struct fuse_fs *fs, const char *path,
-		 struct fuse_file_info *fi, struct fuse_pollhandle *ph,
-		 unsigned *reventsp);
+                 struct fuse_file_info *fi, struct fuse_pollhandle *ph,
+                 unsigned *reventsp);
 int fuse_fs_fallocate(struct fuse_fs *fs, const char *path, int mode,
-		 off_t offset, off_t length, struct fuse_file_info *fi);
+                      off_t offset, off_t length, struct fuse_file_info *fi);
 ssize_t fuse_fs_copy_file_range(struct fuse_fs *fs, const char *path_in,
-				struct fuse_file_info *fi_in, off_t off_in,
-				const char *path_out,
-				struct fuse_file_info *fi_out, off_t off_out,
-				size_t len, int flags);
+                                struct fuse_file_info *fi_in, off_t off_in,
+                                const char *path_out,
+                                struct fuse_file_info *fi_out, off_t off_out,
+                                size_t len, int flags);
 off_t fuse_fs_lseek(struct fuse_fs *fs, const char *path, off_t off, int whence,
-		    struct fuse_file_info *fi);
+                    struct fuse_file_info *fi);
 void fuse_fs_init(struct fuse_fs *fs, struct fuse_conn_info *conn,
-		struct fuse_config *cfg);
+                  struct fuse_config *cfg);
 void fuse_fs_destroy(struct fuse_fs *fs);
 
 int fuse_notify_poll(struct fuse_pollhandle *ph);
@@ -1220,7 +1240,7 @@ int fuse_notify_poll(struct fuse_pollhandle *ph);
  * @return a new filesystem object
  */
 struct fuse_fs *fuse_fs_new(const struct fuse_operations *op, size_t op_size,
-			    void *private_data);
+                            void *private_data);
 
 /**
  * Factory for creating filesystem objects
@@ -1237,7 +1257,7 @@ struct fuse_fs *fuse_fs_new(const struct fuse_operations *op, size_t op_size,
  * @return the new filesystem object
  */
 typedef struct fuse_fs *(*fuse_module_factory_t)(struct fuse_args *args,
-						 struct fuse_fs *fs[]);
+                                                 struct fuse_fs *fs[]);
 /**
  * Register filesystem module
  *
@@ -1249,7 +1269,7 @@ typedef struct fuse_fs *(*fuse_module_factory_t)(struct fuse_args *args,
  * @param factory_ the factory function for this filesystem module
  */
 #define FUSE_REGISTER_MODULE(name_, factory_) \
-	fuse_module_factory_t fuse_module_ ## name_ ## _factory = factory_
+    fuse_module_factory_t fuse_module_##name_##_factory = factory_
 
 /** Get session from fuse object */
 struct fuse_session *fuse_get_session(struct fuse *f);
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 18fba813da..7bed38b436 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -1,21 +1,23 @@
-/*  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB.
-*/
+/*
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB.
+ */
 
 /** @file */
 
 #if !defined(FUSE_H_) && !defined(FUSE_LOWLEVEL_H_)
-#error "Never include <fuse_common.h> directly; use <fuse.h> or <fuse_lowlevel.h> instead."
+#error \
+    "Never include <fuse_common.h> directly; use <fuse.h> or <fuse_lowlevel.h> instead."
 #endif
 
 #ifndef FUSE_COMMON_H_
 #define FUSE_COMMON_H_
 
-#include "fuse_opt.h"
 #include "fuse_log.h"
+#include "fuse_opt.h"
 #include <stdint.h>
 #include <sys/types.h>
 
@@ -25,7 +27,7 @@
 /** Minor version of FUSE library interface */
 #define FUSE_MINOR_VERSION 2
 
-#define FUSE_MAKE_VERSION(maj, min)  ((maj) * 10 + (min))
+#define FUSE_MAKE_VERSION(maj, min) ((maj) * 10 + (min))
 #define FUSE_VERSION FUSE_MAKE_VERSION(FUSE_MAJOR_VERSION, FUSE_MINOR_VERSION)
 
 /**
@@ -38,62 +40,78 @@
  * descriptors can share a single file handle.
  */
 struct fuse_file_info {
-	/** Open flags.	 Available in open() and release() */
-	int flags;
-
-	/** In case of a write operation indicates if this was caused
-	    by a delayed write from the page cache. If so, then the
-	    context's pid, uid, and gid fields will not be valid, and
-	    the *fh* value may not match the *fh* value that would
-	    have been sent with the corresponding individual write
-	    requests if write caching had been disabled. */
-	unsigned int writepage : 1;
-
-	/** Can be filled in by open, to use direct I/O on this file. */
-	unsigned int direct_io : 1;
-
-	/** Can be filled in by open. It signals the kernel that any
-	    currently cached file data (ie., data that the filesystem
-	    provided the last time the file was open) need not be
-	    invalidated. Has no effect when set in other contexts (in
-	    particular it does nothing when set by opendir()). */
-	unsigned int keep_cache : 1;
-
-	/** Indicates a flush operation.  Set in flush operation, also
-	    maybe set in highlevel lock operation and lowlevel release
-	    operation. */
-	unsigned int flush : 1;
-
-	/** Can be filled in by open, to indicate that the file is not
-	    seekable. */
-	unsigned int nonseekable : 1;
-
-	/* Indicates that flock locks for this file should be
-	   released.  If set, lock_owner shall contain a valid value.
-	   May only be set in ->release(). */
-	unsigned int flock_release : 1;
-
-	/** Can be filled in by opendir. It signals the kernel to
-	    enable caching of entries returned by readdir().  Has no
-	    effect when set in other contexts (in particular it does
-	    nothing when set by open()). */
-	unsigned int cache_readdir : 1;
-
-	/** Padding.  Reserved for future use*/
-	unsigned int padding : 25;
-	unsigned int padding2 : 32;
-
-	/** File handle id.  May be filled in by filesystem in create,
-	 * open, and opendir().  Available in most other file operations on the
-	 * same file handle. */
-	uint64_t fh;
-
-	/** Lock owner id.  Available in locking operations and flush */
-	uint64_t lock_owner;
-
-	/** Requested poll events.  Available in ->poll.  Only set on kernels
-	    which support it.  If unsupported, this field is set to zero. */
-	uint32_t poll_events;
+    /** Open flags. Available in open() and release() */
+    int flags;
+
+    /*
+     * In case of a write operation indicates if this was caused
+     * by a delayed write from the page cache. If so, then the
+     * context's pid, uid, and gid fields will not be valid, and
+     * the *fh* value may not match the *fh* value that would
+     * have been sent with the corresponding individual write
+     * requests if write caching had been disabled.
+     */
+    unsigned int writepage:1;
+
+    /** Can be filled in by open, to use direct I/O on this file. */
+    unsigned int direct_io:1;
+
+    /*
+     *  Can be filled in by open. It signals the kernel that any
+     *  currently cached file data (ie., data that the filesystem
+     *  provided the last time the file was open) need not be
+     *  invalidated. Has no effect when set in other contexts (in
+     *  particular it does nothing when set by opendir()).
+     */
+    unsigned int keep_cache:1;
+
+    /*
+     *  Indicates a flush operation.  Set in flush operation, also
+     *  maybe set in highlevel lock operation and lowlevel release
+     *  operation.
+     */
+    unsigned int flush:1;
+
+    /*
+     *  Can be filled in by open, to indicate that the file is not
+     *  seekable.
+     */
+    unsigned int nonseekable:1;
+
+    /*
+     * Indicates that flock locks for this file should be
+     * released.  If set, lock_owner shall contain a valid value.
+     * May only be set in ->release().
+     */
+    unsigned int flock_release:1;
+
+    /*
+     *  Can be filled in by opendir. It signals the kernel to
+     *  enable caching of entries returned by readdir().  Has no
+     *  effect when set in other contexts (in particular it does
+     *  nothing when set by open()).
+     */
+    unsigned int cache_readdir:1;
+
+    /** Padding.  Reserved for future use*/
+    unsigned int padding:25;
+    unsigned int padding2:32;
+
+    /*
+     *  File handle id.  May be filled in by filesystem in create,
+     * open, and opendir().  Available in most other file operations on the
+     * same file handle.
+     */
+    uint64_t fh;
+
+    /** Lock owner id.  Available in locking operations and flush */
+    uint64_t lock_owner;
+
+    /*
+     * Requested poll events.  Available in ->poll.  Only set on kernels
+     * which support it.  If unsupported, this field is set to zero.
+     */
+    uint32_t poll_events;
 };
 
 /**
@@ -101,28 +119,28 @@ struct fuse_file_info {
  * fuse_loop_mt().
  */
 struct fuse_loop_config {
-	/**
-	 * whether to use separate device fds for each thread
-	 * (may increase performance)
-	 */
-	int clone_fd;
-
-	/**
-	 * The maximum number of available worker threads before they
-	 * start to get deleted when they become idle. If not
-	 * specified, the default is 10.
-	 *
-	 * Adjusting this has performance implications; a very small number
-	 * of threads in the pool will cause a lot of thread creation and
-	 * deletion overhead and performance may suffer. When set to 0, a new
-	 * thread will be created to service every operation.
-	 */
-	unsigned int max_idle_threads;
+    /**
+     * whether to use separate device fds for each thread
+     * (may increase performance)
+     */
+    int clone_fd;
+
+    /**
+     * The maximum number of available worker threads before they
+     * start to get deleted when they become idle. If not
+     * specified, the default is 10.
+     *
+     * Adjusting this has performance implications; a very small number
+     * of threads in the pool will cause a lot of thread creation and
+     * deletion overhead and performance may suffer. When set to 0, a new
+     * thread will be created to service every operation.
+     */
+    unsigned int max_idle_threads;
 };
 
-/**************************************************************************
- * Capability bits for 'fuse_conn_info.capable' and 'fuse_conn_info.want' *
- **************************************************************************/
+/*
+ * Capability bits for 'fuse_conn_info.capable' and 'fuse_conn_info.want'
+ */
 
 /**
  * Indicates that the filesystem supports asynchronous read requests.
@@ -134,7 +152,7 @@ struct fuse_loop_config {
  *
  * This feature is enabled by default when supported by the kernel.
  */
-#define FUSE_CAP_ASYNC_READ		(1 << 0)
+#define FUSE_CAP_ASYNC_READ (1 << 0)
 
 /**
  * Indicates that the filesystem supports "remote" locking.
@@ -142,7 +160,7 @@ struct fuse_loop_config {
  * This feature is enabled by default when supported by the kernel,
  * and if getlk() and setlk() handlers are implemented.
  */
-#define FUSE_CAP_POSIX_LOCKS		(1 << 1)
+#define FUSE_CAP_POSIX_LOCKS (1 << 1)
 
 /**
  * Indicates that the filesystem supports the O_TRUNC open flag.  If
@@ -151,14 +169,14 @@ struct fuse_loop_config {
  *
  * This feature is enabled by default when supported by the kernel.
  */
-#define FUSE_CAP_ATOMIC_O_TRUNC		(1 << 3)
+#define FUSE_CAP_ATOMIC_O_TRUNC (1 << 3)
 
 /**
  * Indicates that the filesystem supports lookups of "." and "..".
  *
  * This feature is disabled by default.
  */
-#define FUSE_CAP_EXPORT_SUPPORT		(1 << 4)
+#define FUSE_CAP_EXPORT_SUPPORT (1 << 4)
 
 /**
  * Indicates that the kernel should not apply the umask to the
@@ -166,7 +184,7 @@ struct fuse_loop_config {
  *
  * This feature is disabled by default.
  */
-#define FUSE_CAP_DONT_MASK		(1 << 6)
+#define FUSE_CAP_DONT_MASK (1 << 6)
 
 /**
  * Indicates that libfuse should try to use splice() when writing to
@@ -174,7 +192,7 @@ struct fuse_loop_config {
  *
  * This feature is disabled by default.
  */
-#define FUSE_CAP_SPLICE_WRITE		(1 << 7)
+#define FUSE_CAP_SPLICE_WRITE (1 << 7)
 
 /**
  * Indicates that libfuse should try to move pages instead of copying when
@@ -182,7 +200,7 @@ struct fuse_loop_config {
  *
  * This feature is disabled by default.
  */
-#define FUSE_CAP_SPLICE_MOVE		(1 << 8)
+#define FUSE_CAP_SPLICE_MOVE (1 << 8)
 
 /**
  * Indicates that libfuse should try to use splice() when reading from
@@ -191,7 +209,7 @@ struct fuse_loop_config {
  * This feature is enabled by default when supported by the kernel and
  * if the filesystem implements a write_buf() handler.
  */
-#define FUSE_CAP_SPLICE_READ		(1 << 9)
+#define FUSE_CAP_SPLICE_READ (1 << 9)
 
 /**
  * If set, the calls to flock(2) will be emulated using POSIX locks and must
@@ -204,14 +222,14 @@ struct fuse_loop_config {
  * This feature is enabled by default when supported by the kernel and
  * if the filesystem implements a flock() handler.
  */
-#define FUSE_CAP_FLOCK_LOCKS		(1 << 10)
+#define FUSE_CAP_FLOCK_LOCKS (1 << 10)
 
 /**
  * Indicates that the filesystem supports ioctl's on directories.
  *
  * This feature is enabled by default when supported by the kernel.
  */
-#define FUSE_CAP_IOCTL_DIR		(1 << 11)
+#define FUSE_CAP_IOCTL_DIR (1 << 11)
 
 /**
  * Traditionally, while a file is open the FUSE kernel module only
@@ -233,7 +251,7 @@ struct fuse_loop_config {
  *
  * This feature is enabled by default when supported by the kernel.
  */
-#define FUSE_CAP_AUTO_INVAL_DATA	(1 << 12)
+#define FUSE_CAP_AUTO_INVAL_DATA (1 << 12)
 
 /**
  * Indicates that the filesystem supports readdirplus.
@@ -241,7 +259,7 @@ struct fuse_loop_config {
  * This feature is enabled by default when supported by the kernel and if the
  * filesystem implements a readdirplus() handler.
  */
-#define FUSE_CAP_READDIRPLUS		(1 << 13)
+#define FUSE_CAP_READDIRPLUS (1 << 13)
 
 /**
  * Indicates that the filesystem supports adaptive readdirplus.
@@ -269,7 +287,7 @@ struct fuse_loop_config {
  * if the filesystem implements both a readdirplus() and a readdir()
  * handler.
  */
-#define FUSE_CAP_READDIRPLUS_AUTO	(1 << 14)
+#define FUSE_CAP_READDIRPLUS_AUTO (1 << 14)
 
 /**
  * Indicates that the filesystem supports asynchronous direct I/O submission.
@@ -280,7 +298,7 @@ struct fuse_loop_config {
  *
  * This feature is enabled by default when supported by the kernel.
  */
-#define FUSE_CAP_ASYNC_DIO		(1 << 15)
+#define FUSE_CAP_ASYNC_DIO (1 << 15)
 
 /**
  * Indicates that writeback caching should be enabled. This means that
@@ -289,7 +307,7 @@ struct fuse_loop_config {
  *
  * This feature is disabled by default.
  */
-#define FUSE_CAP_WRITEBACK_CACHE	(1 << 16)
+#define FUSE_CAP_WRITEBACK_CACHE (1 << 16)
 
 /**
  * Indicates support for zero-message opens. If this flag is set in
@@ -302,7 +320,7 @@ struct fuse_loop_config {
  * Setting (or unsetting) this flag in the `want` field has *no
  * effect*.
  */
-#define FUSE_CAP_NO_OPEN_SUPPORT	(1 << 17)
+#define FUSE_CAP_NO_OPEN_SUPPORT (1 << 17)
 
 /**
  * Indicates support for parallel directory operations. If this flag
@@ -312,7 +330,7 @@ struct fuse_loop_config {
  *
  * This feature is enabled by default when supported by the kernel.
  */
-#define FUSE_CAP_PARALLEL_DIROPS        (1 << 18)
+#define FUSE_CAP_PARALLEL_DIROPS (1 << 18)
 
 /**
  * Indicates support for POSIX ACLs.
@@ -331,7 +349,7 @@ struct fuse_loop_config {
  *
  * This feature is disabled by default.
  */
-#define FUSE_CAP_POSIX_ACL              (1 << 19)
+#define FUSE_CAP_POSIX_ACL (1 << 19)
 
 /**
  * Indicates that the filesystem is responsible for unsetting
@@ -340,7 +358,7 @@ struct fuse_loop_config {
  *
  * This feature is enabled by default when supported by the kernel.
  */
-#define FUSE_CAP_HANDLE_KILLPRIV         (1 << 20)
+#define FUSE_CAP_HANDLE_KILLPRIV (1 << 20)
 
 /**
  * Indicates support for zero-message opendirs. If this flag is set in
@@ -352,7 +370,7 @@ struct fuse_loop_config {
  *
  * Setting (or unsetting) this flag in the `want` field has *no effect*.
  */
-#define FUSE_CAP_NO_OPENDIR_SUPPORT    (1 << 24)
+#define FUSE_CAP_NO_OPENDIR_SUPPORT (1 << 24)
 
 /**
  * Ioctl flags
@@ -364,12 +382,12 @@ struct fuse_loop_config {
  *
  * FUSE_IOCTL_MAX_IOV: maximum of in_iovecs + out_iovecs
  */
-#define FUSE_IOCTL_COMPAT	(1 << 0)
-#define FUSE_IOCTL_UNRESTRICTED	(1 << 1)
-#define FUSE_IOCTL_RETRY	(1 << 2)
-#define FUSE_IOCTL_DIR		(1 << 4)
+#define FUSE_IOCTL_COMPAT (1 << 0)
+#define FUSE_IOCTL_UNRESTRICTED (1 << 1)
+#define FUSE_IOCTL_RETRY (1 << 2)
+#define FUSE_IOCTL_DIR (1 << 4)
 
-#define FUSE_IOCTL_MAX_IOV	256
+#define FUSE_IOCTL_MAX_IOV 256
 
 /**
  * Connection information, passed to the ->init() method
@@ -379,114 +397,114 @@ struct fuse_loop_config {
  * value must usually be smaller than the indicated value.
  */
 struct fuse_conn_info {
-	/**
-	 * Major version of the protocol (read-only)
-	 */
-	unsigned proto_major;
-
-	/**
-	 * Minor version of the protocol (read-only)
-	 */
-	unsigned proto_minor;
-
-	/**
-	 * Maximum size of the write buffer
-	 */
-	unsigned max_write;
-
-	/**
-	 * Maximum size of read requests. A value of zero indicates no
-	 * limit. However, even if the filesystem does not specify a
-	 * limit, the maximum size of read requests will still be
-	 * limited by the kernel.
-	 *
-	 * NOTE: For the time being, the maximum size of read requests
-	 * must be set both here *and* passed to fuse_session_new()
-	 * using the ``-o max_read=<n>`` mount option. At some point
-	 * in the future, specifying the mount option will no longer
-	 * be necessary.
-	 */
-	unsigned max_read;
-
-	/**
-	 * Maximum readahead
-	 */
-	unsigned max_readahead;
-
-	/**
-	 * Capability flags that the kernel supports (read-only)
-	 */
-	unsigned capable;
-
-	/**
-	 * Capability flags that the filesystem wants to enable.
-	 *
-	 * libfuse attempts to initialize this field with
-	 * reasonable default values before calling the init() handler.
-	 */
-	unsigned want;
-
-	/**
-	 * Maximum number of pending "background" requests. A
-	 * background request is any type of request for which the
-	 * total number is not limited by other means. As of kernel
-	 * 4.8, only two types of requests fall into this category:
-	 *
-	 *   1. Read-ahead requests
-	 *   2. Asynchronous direct I/O requests
-	 *
-	 * Read-ahead requests are generated (if max_readahead is
-	 * non-zero) by the kernel to preemptively fill its caches
-	 * when it anticipates that userspace will soon read more
-	 * data.
-	 *
-	 * Asynchronous direct I/O requests are generated if
-	 * FUSE_CAP_ASYNC_DIO is enabled and userspace submits a large
-	 * direct I/O request. In this case the kernel will internally
-	 * split it up into multiple smaller requests and submit them
-	 * to the filesystem concurrently.
-	 *
-	 * Note that the following requests are *not* background
-	 * requests: writeback requests (limited by the kernel's
-	 * flusher algorithm), regular (i.e., synchronous and
-	 * buffered) userspace read/write requests (limited to one per
-	 * thread), asynchronous read requests (Linux's io_submit(2)
-	 * call actually blocks, so these are also limited to one per
-	 * thread).
-	 */
-	unsigned max_background;
-
-	/**
-	 * Kernel congestion threshold parameter. If the number of pending
-	 * background requests exceeds this number, the FUSE kernel module will
-	 * mark the filesystem as "congested". This instructs the kernel to
-	 * expect that queued requests will take some time to complete, and to
-	 * adjust its algorithms accordingly (e.g. by putting a waiting thread
-	 * to sleep instead of using a busy-loop).
-	 */
-	unsigned congestion_threshold;
-
-	/**
-	 * When FUSE_CAP_WRITEBACK_CACHE is enabled, the kernel is responsible
-	 * for updating mtime and ctime when write requests are received. The
-	 * updated values are passed to the filesystem with setattr() requests.
-	 * However, if the filesystem does not support the full resolution of
-	 * the kernel timestamps (nanoseconds), the mtime and ctime values used
-	 * by kernel and filesystem will differ (and result in an apparent
-	 * change of times after a cache flush).
-	 *
-	 * To prevent this problem, this variable can be used to inform the
-	 * kernel about the timestamp granularity supported by the file-system.
-	 * The value should be power of 10.  The default is 1, i.e. full
-	 * nano-second resolution. Filesystems supporting only second resolution
-	 * should set this to 1000000000.
-	 */
-	unsigned time_gran;
-
-	/**
-	 * For future use.
-	 */
-	unsigned reserved[22];
+    /**
+     * Major version of the protocol (read-only)
+     */
+    unsigned proto_major;
+
+    /**
+     * Minor version of the protocol (read-only)
+     */
+    unsigned proto_minor;
+
+    /**
+     * Maximum size of the write buffer
+     */
+    unsigned max_write;
+
+    /**
+     * Maximum size of read requests. A value of zero indicates no
+     * limit. However, even if the filesystem does not specify a
+     * limit, the maximum size of read requests will still be
+     * limited by the kernel.
+     *
+     * NOTE: For the time being, the maximum size of read requests
+     * must be set both here *and* passed to fuse_session_new()
+     * using the ``-o max_read=<n>`` mount option. At some point
+     * in the future, specifying the mount option will no longer
+     * be necessary.
+     */
+    unsigned max_read;
+
+    /**
+     * Maximum readahead
+     */
+    unsigned max_readahead;
+
+    /**
+     * Capability flags that the kernel supports (read-only)
+     */
+    unsigned capable;
+
+    /**
+     * Capability flags that the filesystem wants to enable.
+     *
+     * libfuse attempts to initialize this field with
+     * reasonable default values before calling the init() handler.
+     */
+    unsigned want;
+
+    /**
+     * Maximum number of pending "background" requests. A
+     * background request is any type of request for which the
+     * total number is not limited by other means. As of kernel
+     * 4.8, only two types of requests fall into this category:
+     *
+     *   1. Read-ahead requests
+     *   2. Asynchronous direct I/O requests
+     *
+     * Read-ahead requests are generated (if max_readahead is
+     * non-zero) by the kernel to preemptively fill its caches
+     * when it anticipates that userspace will soon read more
+     * data.
+     *
+     * Asynchronous direct I/O requests are generated if
+     * FUSE_CAP_ASYNC_DIO is enabled and userspace submits a large
+     * direct I/O request. In this case the kernel will internally
+     * split it up into multiple smaller requests and submit them
+     * to the filesystem concurrently.
+     *
+     * Note that the following requests are *not* background
+     * requests: writeback requests (limited by the kernel's
+     * flusher algorithm), regular (i.e., synchronous and
+     * buffered) userspace read/write requests (limited to one per
+     * thread), asynchronous read requests (Linux's io_submit(2)
+     * call actually blocks, so these are also limited to one per
+     * thread).
+     */
+    unsigned max_background;
+
+    /**
+     * Kernel congestion threshold parameter. If the number of pending
+     * background requests exceeds this number, the FUSE kernel module will
+     * mark the filesystem as "congested". This instructs the kernel to
+     * expect that queued requests will take some time to complete, and to
+     * adjust its algorithms accordingly (e.g. by putting a waiting thread
+     * to sleep instead of using a busy-loop).
+     */
+    unsigned congestion_threshold;
+
+    /**
+     * When FUSE_CAP_WRITEBACK_CACHE is enabled, the kernel is responsible
+     * for updating mtime and ctime when write requests are received. The
+     * updated values are passed to the filesystem with setattr() requests.
+     * However, if the filesystem does not support the full resolution of
+     * the kernel timestamps (nanoseconds), the mtime and ctime values used
+     * by kernel and filesystem will differ (and result in an apparent
+     * change of times after a cache flush).
+     *
+     * To prevent this problem, this variable can be used to inform the
+     * kernel about the timestamp granularity supported by the file-system.
+     * The value should be power of 10.  The default is 1, i.e. full
+     * nano-second resolution. Filesystems supporting only second resolution
+     * should set this to 1000000000.
+     */
+    unsigned time_gran;
+
+    /**
+     * For future use.
+     */
+    unsigned reserved[22];
 };
 
 struct fuse_session;
@@ -513,21 +531,20 @@ struct fuse_conn_info_opts;
  *   -o async_read          sets FUSE_CAP_ASYNC_READ in conn->want
  *   -o sync_read           unsets FUSE_CAP_ASYNC_READ in conn->want
  *   -o atomic_o_trunc      sets FUSE_CAP_ATOMIC_O_TRUNC in conn->want
- *   -o no_remote_lock      Equivalent to -o no_remote_flock,no_remote_posix_lock
- *   -o no_remote_flock     Unsets FUSE_CAP_FLOCK_LOCKS in conn->want
- *   -o no_remote_posix_lock  Unsets FUSE_CAP_POSIX_LOCKS in conn->want
- *   -o [no_]splice_write     (un-)sets FUSE_CAP_SPLICE_WRITE in conn->want
- *   -o [no_]splice_move      (un-)sets FUSE_CAP_SPLICE_MOVE in conn->want
- *   -o [no_]splice_read      (un-)sets FUSE_CAP_SPLICE_READ in conn->want
- *   -o [no_]auto_inval_data  (un-)sets FUSE_CAP_AUTO_INVAL_DATA in conn->want
- *   -o readdirplus=no        unsets FUSE_CAP_READDIRPLUS in conn->want
- *   -o readdirplus=yes       sets FUSE_CAP_READDIRPLUS and unsets
- *                            FUSE_CAP_READDIRPLUS_AUTO in conn->want
- *   -o readdirplus=auto      sets FUSE_CAP_READDIRPLUS and
- *                            FUSE_CAP_READDIRPLUS_AUTO in conn->want
- *   -o [no_]async_dio        (un-)sets FUSE_CAP_ASYNC_DIO in conn->want
- *   -o [no_]writeback_cache  (un-)sets FUSE_CAP_WRITEBACK_CACHE in conn->want
- *   -o time_gran=N           sets conn->time_gran
+ *   -o no_remote_lock      Equivalent to -o
+ *no_remote_flock,no_remote_posix_lock -o no_remote_flock     Unsets
+ *FUSE_CAP_FLOCK_LOCKS in conn->want -o no_remote_posix_lock  Unsets
+ *FUSE_CAP_POSIX_LOCKS in conn->want -o [no_]splice_write     (un-)sets
+ *FUSE_CAP_SPLICE_WRITE in conn->want -o [no_]splice_move      (un-)sets
+ *FUSE_CAP_SPLICE_MOVE in conn->want -o [no_]splice_read      (un-)sets
+ *FUSE_CAP_SPLICE_READ in conn->want -o [no_]auto_inval_data  (un-)sets
+ *FUSE_CAP_AUTO_INVAL_DATA in conn->want -o readdirplus=no        unsets
+ *FUSE_CAP_READDIRPLUS in conn->want -o readdirplus=yes       sets
+ *FUSE_CAP_READDIRPLUS and unsets FUSE_CAP_READDIRPLUS_AUTO in conn->want -o
+ *readdirplus=auto      sets FUSE_CAP_READDIRPLUS and FUSE_CAP_READDIRPLUS_AUTO
+ *in conn->want -o [no_]async_dio        (un-)sets FUSE_CAP_ASYNC_DIO in
+ *conn->want -o [no_]writeback_cache  (un-)sets FUSE_CAP_WRITEBACK_CACHE in
+ *conn->want -o time_gran=N           sets conn->time_gran
  *
  * Known options will be removed from *args*, unknown options will be
  * passed through unchanged.
@@ -535,7 +552,7 @@ struct fuse_conn_info_opts;
  * @param args argument vector (input+output)
  * @return parsed options
  **/
-struct fuse_conn_info_opts* fuse_parse_conn_info_opts(struct fuse_args *args);
+struct fuse_conn_info_opts *fuse_parse_conn_info_opts(struct fuse_args *args);
 
 /**
  * This function applies the (parsed) parameters in *opts* to the
@@ -545,7 +562,7 @@ struct fuse_conn_info_opts* fuse_parse_conn_info_opts(struct fuse_args *args);
  * option has been explicitly set.
  */
 void fuse_apply_conn_info_opts(struct fuse_conn_info_opts *opts,
-			  struct fuse_conn_info *conn);
+                               struct fuse_conn_info *conn);
 
 /**
  * Go into the background
@@ -576,81 +593,81 @@ const char *fuse_pkgversion(void);
  */
 void fuse_pollhandle_destroy(struct fuse_pollhandle *ph);
 
-/* ----------------------------------------------------------- *
- * Data buffer						       *
- * ----------------------------------------------------------- */
+/*
+ * Data buffer
+ */
 
 /**
  * Buffer flags
  */
 enum fuse_buf_flags {
-	/**
-	 * Buffer contains a file descriptor
-	 *
-	 * If this flag is set, the .fd field is valid, otherwise the
-	 * .mem fields is valid.
-	 */
-	FUSE_BUF_IS_FD		= (1 << 1),
-
-	/**
-	 * Seek on the file descriptor
-	 *
-	 * If this flag is set then the .pos field is valid and is
-	 * used to seek to the given offset before performing
-	 * operation on file descriptor.
-	 */
-	FUSE_BUF_FD_SEEK	= (1 << 2),
-
-	/**
-	 * Retry operation on file descriptor
-	 *
-	 * If this flag is set then retry operation on file descriptor
-	 * until .size bytes have been copied or an error or EOF is
-	 * detected.
-	 */
-	FUSE_BUF_FD_RETRY	= (1 << 3),
+    /**
+     * Buffer contains a file descriptor
+     *
+     * If this flag is set, the .fd field is valid, otherwise the
+     * .mem fields is valid.
+     */
+    FUSE_BUF_IS_FD = (1 << 1),
+
+    /**
+     * Seek on the file descriptor
+     *
+     * If this flag is set then the .pos field is valid and is
+     * used to seek to the given offset before performing
+     * operation on file descriptor.
+     */
+    FUSE_BUF_FD_SEEK = (1 << 2),
+
+    /**
+     * Retry operation on file descriptor
+     *
+     * If this flag is set then retry operation on file descriptor
+     * until .size bytes have been copied or an error or EOF is
+     * detected.
+     */
+    FUSE_BUF_FD_RETRY = (1 << 3),
 };
 
 /**
  * Buffer copy flags
  */
 enum fuse_buf_copy_flags {
-	/**
-	 * Don't use splice(2)
-	 *
-	 * Always fall back to using read and write instead of
-	 * splice(2) to copy data from one file descriptor to another.
-	 *
-	 * If this flag is not set, then only fall back if splice is
-	 * unavailable.
-	 */
-	FUSE_BUF_NO_SPLICE	= (1 << 1),
-
-	/**
-	 * Force splice
-	 *
-	 * Always use splice(2) to copy data from one file descriptor
-	 * to another.  If splice is not available, return -EINVAL.
-	 */
-	FUSE_BUF_FORCE_SPLICE	= (1 << 2),
-
-	/**
-	 * Try to move data with splice.
-	 *
-	 * If splice is used, try to move pages from the source to the
-	 * destination instead of copying.  See documentation of
-	 * SPLICE_F_MOVE in splice(2) man page.
-	 */
-	FUSE_BUF_SPLICE_MOVE	= (1 << 3),
-
-	/**
-	 * Don't block on the pipe when copying data with splice
-	 *
-	 * Makes the operations on the pipe non-blocking (if the pipe
-	 * is full or empty).  See SPLICE_F_NONBLOCK in the splice(2)
-	 * man page.
-	 */
-	FUSE_BUF_SPLICE_NONBLOCK= (1 << 4),
+    /**
+     * Don't use splice(2)
+     *
+     * Always fall back to using read and write instead of
+     * splice(2) to copy data from one file descriptor to another.
+     *
+     * If this flag is not set, then only fall back if splice is
+     * unavailable.
+     */
+    FUSE_BUF_NO_SPLICE = (1 << 1),
+
+    /**
+     * Force splice
+     *
+     * Always use splice(2) to copy data from one file descriptor
+     * to another.  If splice is not available, return -EINVAL.
+     */
+    FUSE_BUF_FORCE_SPLICE = (1 << 2),
+
+    /**
+     * Try to move data with splice.
+     *
+     * If splice is used, try to move pages from the source to the
+     * destination instead of copying.  See documentation of
+     * SPLICE_F_MOVE in splice(2) man page.
+     */
+    FUSE_BUF_SPLICE_MOVE = (1 << 3),
+
+    /**
+     * Don't block on the pipe when copying data with splice
+     *
+     * Makes the operations on the pipe non-blocking (if the pipe
+     * is full or empty).  See SPLICE_F_NONBLOCK in the splice(2)
+     * man page.
+     */
+    FUSE_BUF_SPLICE_NONBLOCK = (1 << 4),
 };
 
 /**
@@ -660,36 +677,36 @@ enum fuse_buf_copy_flags {
  * be supplied as a memory pointer or as a file descriptor
  */
 struct fuse_buf {
-	/**
-	 * Size of data in bytes
-	 */
-	size_t size;
-
-	/**
-	 * Buffer flags
-	 */
-	enum fuse_buf_flags flags;
-
-	/**
-	 * Memory pointer
-	 *
-	 * Used unless FUSE_BUF_IS_FD flag is set.
-	 */
-	void *mem;
-
-	/**
-	 * File descriptor
-	 *
-	 * Used if FUSE_BUF_IS_FD flag is set.
-	 */
-	int fd;
-
-	/**
-	 * File position
-	 *
-	 * Used if FUSE_BUF_FD_SEEK flag is set.
-	 */
-	off_t pos;
+    /**
+     * Size of data in bytes
+     */
+    size_t size;
+
+    /**
+     * Buffer flags
+     */
+    enum fuse_buf_flags flags;
+
+    /**
+     * Memory pointer
+     *
+     * Used unless FUSE_BUF_IS_FD flag is set.
+     */
+    void *mem;
+
+    /**
+     * File descriptor
+     *
+     * Used if FUSE_BUF_IS_FD flag is set.
+     */
+    int fd;
+
+    /**
+     * File position
+     *
+     * Used if FUSE_BUF_FD_SEEK flag is set.
+     */
+    off_t pos;
 };
 
 /**
@@ -701,41 +718,39 @@ struct fuse_buf {
  * Allocate dynamically to add more than one buffer.
  */
 struct fuse_bufvec {
-	/**
-	 * Number of buffers in the array
-	 */
-	size_t count;
-
-	/**
-	 * Index of current buffer within the array
-	 */
-	size_t idx;
-
-	/**
-	 * Current offset within the current buffer
-	 */
-	size_t off;
-
-	/**
-	 * Array of buffers
-	 */
-	struct fuse_buf buf[1];
+    /**
+     * Number of buffers in the array
+     */
+    size_t count;
+
+    /**
+     * Index of current buffer within the array
+     */
+    size_t idx;
+
+    /**
+     * Current offset within the current buffer
+     */
+    size_t off;
+
+    /**
+     * Array of buffers
+     */
+    struct fuse_buf buf[1];
 };
 
 /* Initialize bufvec with a single buffer of given size */
-#define FUSE_BUFVEC_INIT(size__)				\
-	((struct fuse_bufvec) {					\
-		/* .count= */ 1,				\
-		/* .idx =  */ 0,				\
-		/* .off =  */ 0,				\
-		/* .buf =  */ { /* [0] = */ {			\
-			/* .size =  */ (size__),		\
-			/* .flags = */ (enum fuse_buf_flags) 0,	\
-			/* .mem =   */ NULL,			\
-			/* .fd =    */ -1,			\
-			/* .pos =   */ 0,			\
-		} }						\
-	} )
+#define FUSE_BUFVEC_INIT(size__)                                      \
+    ((struct fuse_bufvec){ /* .count= */ 1,                           \
+                           /* .idx =  */ 0,                           \
+                           /* .off =  */ 0, /* .buf =  */             \
+                           { /* [0] = */ {                            \
+                               /* .size =  */ (size__),               \
+                               /* .flags = */ (enum fuse_buf_flags)0, \
+                               /* .mem =   */ NULL,                   \
+                               /* .fd =    */ -1,                     \
+                               /* .pos =   */ 0,                      \
+                           } } })
 
 /**
  * Get total size of data in a fuse buffer vector
@@ -754,16 +769,16 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv);
  * @return actual number of bytes copied or -errno on error
  */
 ssize_t fuse_buf_copy(struct fuse_bufvec *dst, struct fuse_bufvec *src,
-		      enum fuse_buf_copy_flags flags);
+                      enum fuse_buf_copy_flags flags);
 
-/* ----------------------------------------------------------- *
- * Signal handling					       *
- * ----------------------------------------------------------- */
+/*
+ * Signal handling
+ */
 
 /**
  * Exit session on HUP, TERM and INT signals and ignore PIPE signal
  *
- * Stores session in a global variable.	 May only be called once per
+ * Stores session in a global variable. May only be called once per
  * process until fuse_remove_signal_handlers() is called.
  *
  * Once either of the POSIX signals arrives, the signal handler calls
@@ -790,12 +805,12 @@ int fuse_set_signal_handlers(struct fuse_session *se);
  */
 void fuse_remove_signal_handlers(struct fuse_session *se);
 
-/* ----------------------------------------------------------- *
- * Compatibility stuff					       *
- * ----------------------------------------------------------- */
+/*
+ * Compatibility stuff
+ */
 
 #if !defined(FUSE_USE_VERSION) || FUSE_USE_VERSION < 30
-#  error only API version 30 or greater is supported
+#error only API version 30 or greater is supported
 #endif
 
 
@@ -805,11 +820,14 @@ void fuse_remove_signal_handlers(struct fuse_session *se);
  * On 32bit systems please add -D_FILE_OFFSET_BITS=64 to your compile flags!
  */
 
-#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && !defined __cplusplus
+#if defined(__GNUC__) &&                                      \
+    (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && \
+    !defined __cplusplus
 _Static_assert(sizeof(off_t) == 8, "fuse: off_t must be 64bit");
 #else
-struct _fuse_off_t_must_be_64bit_dummy_struct \
-	{ unsigned _fuse_off_t_must_be_64bit:((sizeof(off_t) == 8) ? 1 : -1); };
+struct _fuse_off_t_must_be_64bit_dummy_struct {
+    unsigned _fuse_off_t_must_be_64bit:((sizeof(off_t) == 8) ? 1 : -1);
+};
 #endif
 
 #endif /* FUSE_COMMON_H_ */
diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index bcd6a140fc..1119e85e57 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -1,71 +1,71 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB
+ */
 
 #include "fuse.h"
 #include "fuse_lowlevel.h"
 
 struct fuse_req {
-	struct fuse_session *se;
-	uint64_t unique;
-	int ctr;
-	pthread_mutex_t lock;
-	struct fuse_ctx ctx;
-	struct fuse_chan *ch;
-	int interrupted;
-	unsigned int ioctl_64bit : 1;
-	union {
-		struct {
-			uint64_t unique;
-		} i;
-		struct {
-			fuse_interrupt_func_t func;
-			void *data;
-		} ni;
-	} u;
-	struct fuse_req *next;
-	struct fuse_req *prev;
+    struct fuse_session *se;
+    uint64_t unique;
+    int ctr;
+    pthread_mutex_t lock;
+    struct fuse_ctx ctx;
+    struct fuse_chan *ch;
+    int interrupted;
+    unsigned int ioctl_64bit:1;
+    union {
+        struct {
+            uint64_t unique;
+        } i;
+        struct {
+            fuse_interrupt_func_t func;
+            void *data;
+        } ni;
+    } u;
+    struct fuse_req *next;
+    struct fuse_req *prev;
 };
 
 struct fuse_notify_req {
-	uint64_t unique;
-	void (*reply)(struct fuse_notify_req *, fuse_req_t, fuse_ino_t,
-		      const void *, const struct fuse_buf *);
-	struct fuse_notify_req *next;
-	struct fuse_notify_req *prev;
+    uint64_t unique;
+    void (*reply)(struct fuse_notify_req *, fuse_req_t, fuse_ino_t,
+                  const void *, const struct fuse_buf *);
+    struct fuse_notify_req *next;
+    struct fuse_notify_req *prev;
 };
 
 struct fuse_session {
-	char *mountpoint;
-	volatile int exited;
-	int fd;
-	int debug;
-	int deny_others;
-	struct fuse_lowlevel_ops op;
-	int got_init;
-	struct cuse_data *cuse_data;
-	void *userdata;
-	uid_t owner;
-	struct fuse_conn_info conn;
-	struct fuse_req list;
-	struct fuse_req interrupts;
-	pthread_mutex_t lock;
-	int got_destroy;
-	int broken_splice_nonblock;
-	uint64_t notify_ctr;
-	struct fuse_notify_req notify_list;
-	size_t bufsize;
-	int error;
+    char *mountpoint;
+    volatile int exited;
+    int fd;
+    int debug;
+    int deny_others;
+    struct fuse_lowlevel_ops op;
+    int got_init;
+    struct cuse_data *cuse_data;
+    void *userdata;
+    uid_t owner;
+    struct fuse_conn_info conn;
+    struct fuse_req list;
+    struct fuse_req interrupts;
+    pthread_mutex_t lock;
+    int got_destroy;
+    int broken_splice_nonblock;
+    uint64_t notify_ctr;
+    struct fuse_notify_req notify_list;
+    size_t bufsize;
+    int error;
 };
 
 struct fuse_chan {
-	pthread_mutex_t lock;
-	int ctr;
-	int fd;
+    pthread_mutex_t lock;
+    int ctr;
+    int fd;
 };
 
 /**
@@ -76,16 +76,16 @@ struct fuse_chan {
  *
  */
 struct fuse_module {
-	char *name;
-	fuse_module_factory_t factory;
-	struct fuse_module *next;
-	struct fusemod_so *so;
-	int ctr;
+    char *name;
+    fuse_module_factory_t factory;
+    struct fuse_module *next;
+    struct fusemod_so *so;
+    int ctr;
 };
 
-/* ----------------------------------------------------------- *
- * Channel interface (when using -o clone_fd)		       *
- * ----------------------------------------------------------- */
+/*
+ * Channel interface (when using -o clone_fd)
+ */
 
 /**
  * Obtain counted reference to the channel
@@ -103,11 +103,12 @@ struct fuse_chan *fuse_chan_get(struct fuse_chan *ch);
 void fuse_chan_put(struct fuse_chan *ch);
 
 int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
-			       int count);
+                               int count);
 void fuse_free_req(fuse_req_t req);
 
 void fuse_session_process_buf_int(struct fuse_session *se,
-				  const struct fuse_buf *buf, struct fuse_chan *ch);
+                                  const struct fuse_buf *buf,
+                                  struct fuse_chan *ch);
 
 
 #define FUSE_MAX_MAX_PAGES 256
diff --git a/tools/virtiofsd/fuse_log.c b/tools/virtiofsd/fuse_log.c
index 0d268ab014..11345f9ec8 100644
--- a/tools/virtiofsd/fuse_log.c
+++ b/tools/virtiofsd/fuse_log.c
@@ -1,40 +1,40 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2019  Red Hat, Inc.
-
-  Logging API.
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2019  Red Hat, Inc.
+ *
+ * Logging API.
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB
+ */
 
 #include "fuse_log.h"
 
 #include <stdarg.h>
 #include <stdio.h>
 
-static void default_log_func(
-		__attribute__(( unused )) enum fuse_log_level level,
-		const char *fmt, va_list ap)
+static void default_log_func(__attribute__((unused)) enum fuse_log_level level,
+                             const char *fmt, va_list ap)
 {
-	vfprintf(stderr, fmt, ap);
+    vfprintf(stderr, fmt, ap);
 }
 
 static fuse_log_func_t log_func = default_log_func;
 
 void fuse_set_log_func(fuse_log_func_t func)
 {
-	if (!func)
-		func = default_log_func;
+    if (!func) {
+        func = default_log_func;
+    }
 
-	log_func = func;
+    log_func = func;
 }
 
 void fuse_log(enum fuse_log_level level, const char *fmt, ...)
 {
-	va_list ap;
+    va_list ap;
 
-	va_start(ap, fmt);
-	log_func(level, fmt, ap);
-	va_end(ap);
+    va_start(ap, fmt);
+    log_func(level, fmt, ap);
+    va_end(ap);
 }
diff --git a/tools/virtiofsd/fuse_log.h b/tools/virtiofsd/fuse_log.h
index 0af700da6b..bf6c11ff11 100644
--- a/tools/virtiofsd/fuse_log.h
+++ b/tools/virtiofsd/fuse_log.h
@@ -1,10 +1,10 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2019  Red Hat, Inc.
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB.
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2019  Red Hat, Inc.
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB.
+ */
 
 #ifndef FUSE_LOG_H_
 #define FUSE_LOG_H_
@@ -22,14 +22,14 @@
  * These levels correspond to syslog(2) log levels since they are widely used.
  */
 enum fuse_log_level {
-	FUSE_LOG_EMERG,
-	FUSE_LOG_ALERT,
-	FUSE_LOG_CRIT,
-	FUSE_LOG_ERR,
-	FUSE_LOG_WARNING,
-	FUSE_LOG_NOTICE,
-	FUSE_LOG_INFO,
-	FUSE_LOG_DEBUG
+    FUSE_LOG_EMERG,
+    FUSE_LOG_ALERT,
+    FUSE_LOG_CRIT,
+    FUSE_LOG_ERR,
+    FUSE_LOG_WARNING,
+    FUSE_LOG_NOTICE,
+    FUSE_LOG_INFO,
+    FUSE_LOG_DEBUG
 };
 
 /**
@@ -45,8 +45,8 @@ enum fuse_log_level {
  * @param fmt sprintf-style format string including newline
  * @param ap format string arguments
  */
-typedef void (*fuse_log_func_t)(enum fuse_log_level level,
-				const char *fmt, va_list ap);
+typedef void (*fuse_log_func_t)(enum fuse_log_level level, const char *fmt,
+                                va_list ap);
 
 /**
  * Install a custom log handler function.
diff --git a/tools/virtiofsd/fuse_loop_mt.c b/tools/virtiofsd/fuse_loop_mt.c
index 6fcd47e42c..39e080d9ff 100644
--- a/tools/virtiofsd/fuse_loop_mt.c
+++ b/tools/virtiofsd/fuse_loop_mt.c
@@ -1,54 +1,56 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  Implementation of the multi-threaded FUSE session loop.
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB.
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * Implementation of the multi-threaded FUSE session loop.
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB.
+ */
 
 #include "config.h"
+#include "fuse_i.h"
+#include "fuse_kernel.h"
 #include "fuse_lowlevel.h"
 #include "fuse_misc.h"
-#include "fuse_kernel.h"
-#include "fuse_i.h"
 
+#include <assert.h>
+#include <errno.h>
+#include <semaphore.h>
+#include <signal.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
-#include <unistd.h>
-#include <signal.h>
-#include <semaphore.h>
-#include <errno.h>
-#include <sys/time.h>
 #include <sys/ioctl.h>
-#include <assert.h>
+#include <sys/time.h>
+#include <unistd.h>
 
 /* Environment var controlling the thread stack size */
 #define ENVNAME_THREAD_STACK "FUSE_THREAD_STACK"
 
 struct fuse_chan *fuse_chan_get(struct fuse_chan *ch)
 {
-	assert(ch->ctr > 0);
-	pthread_mutex_lock(&ch->lock);
-	ch->ctr++;
-	pthread_mutex_unlock(&ch->lock);
+    assert(ch->ctr > 0);
+    pthread_mutex_lock(&ch->lock);
+    ch->ctr++;
+    pthread_mutex_unlock(&ch->lock);
 
-	return ch;
+    return ch;
 }
 
 void fuse_chan_put(struct fuse_chan *ch)
 {
-	if (ch == NULL)
-		return;
-	pthread_mutex_lock(&ch->lock);
-	ch->ctr--;
-	if (!ch->ctr) {
-		pthread_mutex_unlock(&ch->lock);
-		close(ch->fd);
-		pthread_mutex_destroy(&ch->lock);
-		free(ch);
-	} else
-		pthread_mutex_unlock(&ch->lock);
+    if (ch == NULL) {
+        return;
+    }
+    pthread_mutex_lock(&ch->lock);
+    ch->ctr--;
+    if (!ch->ctr) {
+        pthread_mutex_unlock(&ch->lock);
+        close(ch->fd);
+        pthread_mutex_destroy(&ch->lock);
+        free(ch);
+    } else {
+        pthread_mutex_unlock(&ch->lock);
+    }
 }
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 5d915666d8..42feee5c1c 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1,2398 +1,2539 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  Implementation of (most of) the low-level FUSE API. The session loop
-  functions are implemented in separate files.
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * Implementation of (most of) the low-level FUSE API. The session loop
+ * functions are implemented in separate files.
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB
+ */
 
 #define _GNU_SOURCE
 
 #include "config.h"
 #include "fuse_i.h"
 #include "fuse_kernel.h"
-#include "fuse_opt.h"
 #include "fuse_misc.h"
+#include "fuse_opt.h"
 #include "mount_util.h"
 
+#include <assert.h>
+#include <errno.h>
+#include <limits.h>
+#include <stddef.h>
 #include <stdio.h>
 #include <stdlib.h>
-#include <stddef.h>
 #include <string.h>
-#include <unistd.h>
-#include <limits.h>
-#include <errno.h>
-#include <assert.h>
 #include <sys/file.h>
-
+#include <unistd.h>
 
 
 #define PARAM(inarg) (((char *)(inarg)) + sizeof(*(inarg)))
 #define OFFSET_MAX 0x7fffffffffffffffLL
 
-#define container_of(ptr, type, member) ({				\
-			const typeof( ((type *)0)->member ) *__mptr = (ptr); \
-			(type *)( (char *)__mptr - offsetof(type,member) );})
+#define container_of(ptr, type, member)                    \
+    ({                                                     \
+        const typeof(((type *)0)->member) *__mptr = (ptr); \
+        (type *)((char *)__mptr - offsetof(type, member)); \
+    })
 
 struct fuse_pollhandle {
-	uint64_t kh;
-	struct fuse_session *se;
+    uint64_t kh;
+    struct fuse_session *se;
 };
 
 static size_t pagesize;
 
 static __attribute__((constructor)) void fuse_ll_init_pagesize(void)
 {
-	pagesize = getpagesize();
+    pagesize = getpagesize();
 }
 
 static void convert_stat(const struct stat *stbuf, struct fuse_attr *attr)
 {
-	attr->ino	= stbuf->st_ino;
-	attr->mode	= stbuf->st_mode;
-	attr->nlink	= stbuf->st_nlink;
-	attr->uid	= stbuf->st_uid;
-	attr->gid	= stbuf->st_gid;
-	attr->rdev	= stbuf->st_rdev;
-	attr->size	= stbuf->st_size;
-	attr->blksize	= stbuf->st_blksize;
-	attr->blocks	= stbuf->st_blocks;
-	attr->atime	= stbuf->st_atime;
-	attr->mtime	= stbuf->st_mtime;
-	attr->ctime	= stbuf->st_ctime;
-	attr->atimensec = ST_ATIM_NSEC(stbuf);
-	attr->mtimensec = ST_MTIM_NSEC(stbuf);
-	attr->ctimensec = ST_CTIM_NSEC(stbuf);
+    attr->ino = stbuf->st_ino;
+    attr->mode = stbuf->st_mode;
+    attr->nlink = stbuf->st_nlink;
+    attr->uid = stbuf->st_uid;
+    attr->gid = stbuf->st_gid;
+    attr->rdev = stbuf->st_rdev;
+    attr->size = stbuf->st_size;
+    attr->blksize = stbuf->st_blksize;
+    attr->blocks = stbuf->st_blocks;
+    attr->atime = stbuf->st_atime;
+    attr->mtime = stbuf->st_mtime;
+    attr->ctime = stbuf->st_ctime;
+    attr->atimensec = ST_ATIM_NSEC(stbuf);
+    attr->mtimensec = ST_MTIM_NSEC(stbuf);
+    attr->ctimensec = ST_CTIM_NSEC(stbuf);
 }
 
 static void convert_attr(const struct fuse_setattr_in *attr, struct stat *stbuf)
 {
-	stbuf->st_mode	       = attr->mode;
-	stbuf->st_uid	       = attr->uid;
-	stbuf->st_gid	       = attr->gid;
-	stbuf->st_size	       = attr->size;
-	stbuf->st_atime	       = attr->atime;
-	stbuf->st_mtime	       = attr->mtime;
-	stbuf->st_ctime        = attr->ctime;
-	ST_ATIM_NSEC_SET(stbuf, attr->atimensec);
-	ST_MTIM_NSEC_SET(stbuf, attr->mtimensec);
-	ST_CTIM_NSEC_SET(stbuf, attr->ctimensec);
+    stbuf->st_mode = attr->mode;
+    stbuf->st_uid = attr->uid;
+    stbuf->st_gid = attr->gid;
+    stbuf->st_size = attr->size;
+    stbuf->st_atime = attr->atime;
+    stbuf->st_mtime = attr->mtime;
+    stbuf->st_ctime = attr->ctime;
+    ST_ATIM_NSEC_SET(stbuf, attr->atimensec);
+    ST_MTIM_NSEC_SET(stbuf, attr->mtimensec);
+    ST_CTIM_NSEC_SET(stbuf, attr->ctimensec);
 }
 
-static	size_t iov_length(const struct iovec *iov, size_t count)
+static size_t iov_length(const struct iovec *iov, size_t count)
 {
-	size_t seg;
-	size_t ret = 0;
+    size_t seg;
+    size_t ret = 0;
 
-	for (seg = 0; seg < count; seg++)
-		ret += iov[seg].iov_len;
-	return ret;
+    for (seg = 0; seg < count; seg++) {
+        ret += iov[seg].iov_len;
+    }
+    return ret;
 }
 
 static void list_init_req(struct fuse_req *req)
 {
-	req->next = req;
-	req->prev = req;
+    req->next = req;
+    req->prev = req;
 }
 
 static void list_del_req(struct fuse_req *req)
 {
-	struct fuse_req *prev = req->prev;
-	struct fuse_req *next = req->next;
-	prev->next = next;
-	next->prev = prev;
+    struct fuse_req *prev = req->prev;
+    struct fuse_req *next = req->next;
+    prev->next = next;
+    next->prev = prev;
 }
 
 static void list_add_req(struct fuse_req *req, struct fuse_req *next)
 {
-	struct fuse_req *prev = next->prev;
-	req->next = next;
-	req->prev = prev;
-	prev->next = req;
-	next->prev = req;
+    struct fuse_req *prev = next->prev;
+    req->next = next;
+    req->prev = prev;
+    prev->next = req;
+    next->prev = req;
 }
 
 static void destroy_req(fuse_req_t req)
 {
-	pthread_mutex_destroy(&req->lock);
-	free(req);
+    pthread_mutex_destroy(&req->lock);
+    free(req);
 }
 
 void fuse_free_req(fuse_req_t req)
 {
-	int ctr;
-	struct fuse_session *se = req->se;
+    int ctr;
+    struct fuse_session *se = req->se;
 
-	pthread_mutex_lock(&se->lock);
-	req->u.ni.func = NULL;
-	req->u.ni.data = NULL;
-	list_del_req(req);
-	ctr = --req->ctr;
-	fuse_chan_put(req->ch);
-	req->ch = NULL;
-	pthread_mutex_unlock(&se->lock);
-	if (!ctr)
-		destroy_req(req);
+    pthread_mutex_lock(&se->lock);
+    req->u.ni.func = NULL;
+    req->u.ni.data = NULL;
+    list_del_req(req);
+    ctr = --req->ctr;
+    fuse_chan_put(req->ch);
+    req->ch = NULL;
+    pthread_mutex_unlock(&se->lock);
+    if (!ctr) {
+        destroy_req(req);
+    }
 }
 
 static struct fuse_req *fuse_ll_alloc_req(struct fuse_session *se)
 {
-	struct fuse_req *req;
+    struct fuse_req *req;
 
-	req = (struct fuse_req *) calloc(1, sizeof(struct fuse_req));
-	if (req == NULL) {
-		fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate request\n");
-	} else {
-		req->se = se;
-		req->ctr = 1;
-		list_init_req(req);
-		fuse_mutex_init(&req->lock);
-	}
+    req = (struct fuse_req *)calloc(1, sizeof(struct fuse_req));
+    if (req == NULL) {
+        fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate request\n");
+    } else {
+        req->se = se;
+        req->ctr = 1;
+        list_init_req(req);
+        fuse_mutex_init(&req->lock);
+    }
 
-	return req;
+    return req;
 }
 
 /* Send data. If *ch* is NULL, send via session master fd */
 static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
-			 struct iovec *iov, int count)
+                         struct iovec *iov, int count)
 {
-	struct fuse_out_header *out = iov[0].iov_base;
+    struct fuse_out_header *out = iov[0].iov_base;
 
-	out->len = iov_length(iov, count);
-	if (se->debug) {
-		if (out->unique == 0) {
-			fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n",
-				out->error, out->len);
-		} else if (out->error) {
-			fuse_log(FUSE_LOG_DEBUG,
-				"   unique: %llu, error: %i (%s), outsize: %i\n",
-				(unsigned long long) out->unique, out->error,
-				strerror(-out->error), out->len);
-		} else {
-			fuse_log(FUSE_LOG_DEBUG,
-				"   unique: %llu, success, outsize: %i\n",
-				(unsigned long long) out->unique, out->len);
-		}
-	}
+    out->len = iov_length(iov, count);
+    if (se->debug) {
+        if (out->unique == 0) {
+            fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
+                     out->len);
+        } else if (out->error) {
+            fuse_log(FUSE_LOG_DEBUG,
+                     "   unique: %llu, error: %i (%s), outsize: %i\n",
+                     (unsigned long long)out->unique, out->error,
+                     strerror(-out->error), out->len);
+        } else {
+            fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
+                     (unsigned long long)out->unique, out->len);
+        }
+    }
 
-	abort(); /* virtio should have taken it before here */
-	return 0;
+    abort(); /* virtio should have taken it before here */
+    return 0;
 }
 
 
 int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
-			       int count)
+                               int count)
 {
-	struct fuse_out_header out;
+    struct fuse_out_header out;
 
-	if (error <= -1000 || error > 0) {
-		fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",	error);
-		error = -ERANGE;
-	}
+    if (error <= -1000 || error > 0) {
+        fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
+        error = -ERANGE;
+    }
 
-	out.unique = req->unique;
-	out.error = error;
+    out.unique = req->unique;
+    out.error = error;
 
-	iov[0].iov_base = &out;
-	iov[0].iov_len = sizeof(struct fuse_out_header);
+    iov[0].iov_base = &out;
+    iov[0].iov_len = sizeof(struct fuse_out_header);
 
-	return fuse_send_msg(req->se, req->ch, iov, count);
+    return fuse_send_msg(req->se, req->ch, iov, count);
 }
 
 static int send_reply_iov(fuse_req_t req, int error, struct iovec *iov,
-			  int count)
+                          int count)
 {
-	int res;
+    int res;
 
-	res = fuse_send_reply_iov_nofree(req, error, iov, count);
-	fuse_free_req(req);
-	return res;
+    res = fuse_send_reply_iov_nofree(req, error, iov, count);
+    fuse_free_req(req);
+    return res;
 }
 
 static int send_reply(fuse_req_t req, int error, const void *arg,
-		      size_t argsize)
+                      size_t argsize)
 {
-	struct iovec iov[2];
-	int count = 1;
-	if (argsize) {
-		iov[1].iov_base = (void *) arg;
-		iov[1].iov_len = argsize;
-		count++;
-	}
-	return send_reply_iov(req, error, iov, count);
+    struct iovec iov[2];
+    int count = 1;
+    if (argsize) {
+        iov[1].iov_base = (void *)arg;
+        iov[1].iov_len = argsize;
+        count++;
+    }
+    return send_reply_iov(req, error, iov, count);
 }
 
 int fuse_reply_iov(fuse_req_t req, const struct iovec *iov, int count)
 {
-	int res;
-	struct iovec *padded_iov;
+    int res;
+    struct iovec *padded_iov;
 
-	padded_iov = malloc((count + 1) * sizeof(struct iovec));
-	if (padded_iov == NULL)
-		return fuse_reply_err(req, ENOMEM);
+    padded_iov = malloc((count + 1) * sizeof(struct iovec));
+    if (padded_iov == NULL) {
+        return fuse_reply_err(req, ENOMEM);
+    }
 
-	memcpy(padded_iov + 1, iov, count * sizeof(struct iovec));
-	count++;
+    memcpy(padded_iov + 1, iov, count * sizeof(struct iovec));
+    count++;
 
-	res = send_reply_iov(req, 0, padded_iov, count);
-	free(padded_iov);
+    res = send_reply_iov(req, 0, padded_iov, count);
+    free(padded_iov);
 
-	return res;
+    return res;
 }
 
 
-/* `buf` is allowed to be empty so that the proper size may be
-   allocated by the caller */
+/*
+ * 'buf` is allowed to be empty so that the proper size may be
+ * allocated by the caller
+ */
 size_t fuse_add_direntry(fuse_req_t req, char *buf, size_t bufsize,
-			 const char *name, const struct stat *stbuf, off_t off)
+                         const char *name, const struct stat *stbuf, off_t off)
 {
-	(void)req;
-	size_t namelen;
-	size_t entlen;
-	size_t entlen_padded;
-	struct fuse_dirent *dirent;
+    (void)req;
+    size_t namelen;
+    size_t entlen;
+    size_t entlen_padded;
+    struct fuse_dirent *dirent;
 
-	namelen = strlen(name);
-	entlen = FUSE_NAME_OFFSET + namelen;
-	entlen_padded = FUSE_DIRENT_ALIGN(entlen);
+    namelen = strlen(name);
+    entlen = FUSE_NAME_OFFSET + namelen;
+    entlen_padded = FUSE_DIRENT_ALIGN(entlen);
 
-	if ((buf == NULL) || (entlen_padded > bufsize))
-	  return entlen_padded;
+    if ((buf == NULL) || (entlen_padded > bufsize)) {
+        return entlen_padded;
+    }
 
-	dirent = (struct fuse_dirent*) buf;
-	dirent->ino = stbuf->st_ino;
-	dirent->off = off;
-	dirent->namelen = namelen;
-	dirent->type = (stbuf->st_mode & S_IFMT) >> 12;
-	memcpy(dirent->name, name, namelen);
-	memset(dirent->name + namelen, 0, entlen_padded - entlen);
+    dirent = (struct fuse_dirent *)buf;
+    dirent->ino = stbuf->st_ino;
+    dirent->off = off;
+    dirent->namelen = namelen;
+    dirent->type = (stbuf->st_mode & S_IFMT) >> 12;
+    memcpy(dirent->name, name, namelen);
+    memset(dirent->name + namelen, 0, entlen_padded - entlen);
 
-	return entlen_padded;
+    return entlen_padded;
 }
 
 static void convert_statfs(const struct statvfs *stbuf,
-			   struct fuse_kstatfs *kstatfs)
+                           struct fuse_kstatfs *kstatfs)
 {
-	kstatfs->bsize	 = stbuf->f_bsize;
-	kstatfs->frsize	 = stbuf->f_frsize;
-	kstatfs->blocks	 = stbuf->f_blocks;
-	kstatfs->bfree	 = stbuf->f_bfree;
-	kstatfs->bavail	 = stbuf->f_bavail;
-	kstatfs->files	 = stbuf->f_files;
-	kstatfs->ffree	 = stbuf->f_ffree;
-	kstatfs->namelen = stbuf->f_namemax;
+    kstatfs->bsize = stbuf->f_bsize;
+    kstatfs->frsize = stbuf->f_frsize;
+    kstatfs->blocks = stbuf->f_blocks;
+    kstatfs->bfree = stbuf->f_bfree;
+    kstatfs->bavail = stbuf->f_bavail;
+    kstatfs->files = stbuf->f_files;
+    kstatfs->ffree = stbuf->f_ffree;
+    kstatfs->namelen = stbuf->f_namemax;
 }
 
 static int send_reply_ok(fuse_req_t req, const void *arg, size_t argsize)
 {
-	return send_reply(req, 0, arg, argsize);
+    return send_reply(req, 0, arg, argsize);
 }
 
 int fuse_reply_err(fuse_req_t req, int err)
 {
-	return send_reply(req, -err, NULL, 0);
+    return send_reply(req, -err, NULL, 0);
 }
 
 void fuse_reply_none(fuse_req_t req)
 {
-	fuse_free_req(req);
+    fuse_free_req(req);
 }
 
 static unsigned long calc_timeout_sec(double t)
 {
-	if (t > (double) ULONG_MAX)
-		return ULONG_MAX;
-	else if (t < 0.0)
-		return 0;
-	else
-		return (unsigned long) t;
+    if (t > (double)ULONG_MAX) {
+        return ULONG_MAX;
+    } else if (t < 0.0) {
+        return 0;
+    } else {
+        return (unsigned long)t;
+    }
 }
 
 static unsigned int calc_timeout_nsec(double t)
 {
-	double f = t - (double) calc_timeout_sec(t);
-	if (f < 0.0)
-		return 0;
-	else if (f >= 0.999999999)
-		return 999999999;
-	else
-		return (unsigned int) (f * 1.0e9);
+    double f = t - (double)calc_timeout_sec(t);
+    if (f < 0.0) {
+        return 0;
+    } else if (f >= 0.999999999) {
+        return 999999999;
+    } else {
+        return (unsigned int)(f * 1.0e9);
+    }
 }
 
 static void fill_entry(struct fuse_entry_out *arg,
-		       const struct fuse_entry_param *e)
+                       const struct fuse_entry_param *e)
 {
-	arg->nodeid = e->ino;
-	arg->generation = e->generation;
-	arg->entry_valid = calc_timeout_sec(e->entry_timeout);
-	arg->entry_valid_nsec = calc_timeout_nsec(e->entry_timeout);
-	arg->attr_valid = calc_timeout_sec(e->attr_timeout);
-	arg->attr_valid_nsec = calc_timeout_nsec(e->attr_timeout);
-	convert_stat(&e->attr, &arg->attr);
+    arg->nodeid = e->ino;
+    arg->generation = e->generation;
+    arg->entry_valid = calc_timeout_sec(e->entry_timeout);
+    arg->entry_valid_nsec = calc_timeout_nsec(e->entry_timeout);
+    arg->attr_valid = calc_timeout_sec(e->attr_timeout);
+    arg->attr_valid_nsec = calc_timeout_nsec(e->attr_timeout);
+    convert_stat(&e->attr, &arg->attr);
 }
 
-/* `buf` is allowed to be empty so that the proper size may be
-   allocated by the caller */
+/*
+ * `buf` is allowed to be empty so that the proper size may be
+ * allocated by the caller
+ */
 size_t fuse_add_direntry_plus(fuse_req_t req, char *buf, size_t bufsize,
-			      const char *name,
-			      const struct fuse_entry_param *e, off_t off)
-{
-	(void)req;
-	size_t namelen;
-	size_t entlen;
-	size_t entlen_padded;
-
-	namelen = strlen(name);
-	entlen = FUSE_NAME_OFFSET_DIRENTPLUS + namelen;
-	entlen_padded = FUSE_DIRENT_ALIGN(entlen);
-	if ((buf == NULL) || (entlen_padded > bufsize))
-	  return entlen_padded;
-
-	struct fuse_direntplus *dp = (struct fuse_direntplus *) buf;
-	memset(&dp->entry_out, 0, sizeof(dp->entry_out));
-	fill_entry(&dp->entry_out, e);
-
-	struct fuse_dirent *dirent = &dp->dirent;
-	dirent->ino = e->attr.st_ino;
-	dirent->off = off;
-	dirent->namelen = namelen;
-	dirent->type = (e->attr.st_mode & S_IFMT) >> 12;
-	memcpy(dirent->name, name, namelen);
-	memset(dirent->name + namelen, 0, entlen_padded - entlen);
-
-	return entlen_padded;
-}
-
-static void fill_open(struct fuse_open_out *arg,
-		      const struct fuse_file_info *f)
-{
-	arg->fh = f->fh;
-	if (f->direct_io)
-		arg->open_flags |= FOPEN_DIRECT_IO;
-	if (f->keep_cache)
-		arg->open_flags |= FOPEN_KEEP_CACHE;
-	if (f->cache_readdir)
-		arg->open_flags |= FOPEN_CACHE_DIR;
-	if (f->nonseekable)
-		arg->open_flags |= FOPEN_NONSEEKABLE;
+                              const char *name,
+                              const struct fuse_entry_param *e, off_t off)
+{
+    (void)req;
+    size_t namelen;
+    size_t entlen;
+    size_t entlen_padded;
+
+    namelen = strlen(name);
+    entlen = FUSE_NAME_OFFSET_DIRENTPLUS + namelen;
+    entlen_padded = FUSE_DIRENT_ALIGN(entlen);
+    if ((buf == NULL) || (entlen_padded > bufsize)) {
+        return entlen_padded;
+    }
+
+    struct fuse_direntplus *dp = (struct fuse_direntplus *)buf;
+    memset(&dp->entry_out, 0, sizeof(dp->entry_out));
+    fill_entry(&dp->entry_out, e);
+
+    struct fuse_dirent *dirent = &dp->dirent;
+    dirent->ino = e->attr.st_ino;
+    dirent->off = off;
+    dirent->namelen = namelen;
+    dirent->type = (e->attr.st_mode & S_IFMT) >> 12;
+    memcpy(dirent->name, name, namelen);
+    memset(dirent->name + namelen, 0, entlen_padded - entlen);
+
+    return entlen_padded;
+}
+
+static void fill_open(struct fuse_open_out *arg, const struct fuse_file_info *f)
+{
+    arg->fh = f->fh;
+    if (f->direct_io) {
+        arg->open_flags |= FOPEN_DIRECT_IO;
+    }
+    if (f->keep_cache) {
+        arg->open_flags |= FOPEN_KEEP_CACHE;
+    }
+    if (f->cache_readdir) {
+        arg->open_flags |= FOPEN_CACHE_DIR;
+    }
+    if (f->nonseekable) {
+        arg->open_flags |= FOPEN_NONSEEKABLE;
+    }
 }
 
 int fuse_reply_entry(fuse_req_t req, const struct fuse_entry_param *e)
 {
-	struct fuse_entry_out arg;
-	size_t size = req->se->conn.proto_minor < 9 ?
-		FUSE_COMPAT_ENTRY_OUT_SIZE : sizeof(arg);
+    struct fuse_entry_out arg;
+    size_t size = req->se->conn.proto_minor < 9 ? FUSE_COMPAT_ENTRY_OUT_SIZE :
+                                                  sizeof(arg);
 
-	/* before ABI 7.4 e->ino == 0 was invalid, only ENOENT meant
-	   negative entry */
-	if (!e->ino && req->se->conn.proto_minor < 4)
-		return fuse_reply_err(req, ENOENT);
+    /*
+     * before ABI 7.4 e->ino == 0 was invalid, only ENOENT meant
+     * negative entry
+     */
+    if (!e->ino && req->se->conn.proto_minor < 4) {
+        return fuse_reply_err(req, ENOENT);
+    }
 
-	memset(&arg, 0, sizeof(arg));
-	fill_entry(&arg, e);
-	return send_reply_ok(req, &arg, size);
+    memset(&arg, 0, sizeof(arg));
+    fill_entry(&arg, e);
+    return send_reply_ok(req, &arg, size);
 }
 
 int fuse_reply_create(fuse_req_t req, const struct fuse_entry_param *e,
-		      const struct fuse_file_info *f)
+                      const struct fuse_file_info *f)
 {
-	char buf[sizeof(struct fuse_entry_out) + sizeof(struct fuse_open_out)];
-	size_t entrysize = req->se->conn.proto_minor < 9 ?
-		FUSE_COMPAT_ENTRY_OUT_SIZE : sizeof(struct fuse_entry_out);
-	struct fuse_entry_out *earg = (struct fuse_entry_out *) buf;
-	struct fuse_open_out *oarg = (struct fuse_open_out *) (buf + entrysize);
+    char buf[sizeof(struct fuse_entry_out) + sizeof(struct fuse_open_out)];
+    size_t entrysize = req->se->conn.proto_minor < 9 ?
+                           FUSE_COMPAT_ENTRY_OUT_SIZE :
+                           sizeof(struct fuse_entry_out);
+    struct fuse_entry_out *earg = (struct fuse_entry_out *)buf;
+    struct fuse_open_out *oarg = (struct fuse_open_out *)(buf + entrysize);
 
-	memset(buf, 0, sizeof(buf));
-	fill_entry(earg, e);
-	fill_open(oarg, f);
-	return send_reply_ok(req, buf,
-			     entrysize + sizeof(struct fuse_open_out));
+    memset(buf, 0, sizeof(buf));
+    fill_entry(earg, e);
+    fill_open(oarg, f);
+    return send_reply_ok(req, buf, entrysize + sizeof(struct fuse_open_out));
 }
 
 int fuse_reply_attr(fuse_req_t req, const struct stat *attr,
-		    double attr_timeout)
+                    double attr_timeout)
 {
-	struct fuse_attr_out arg;
-	size_t size = req->se->conn.proto_minor < 9 ?
-		FUSE_COMPAT_ATTR_OUT_SIZE : sizeof(arg);
+    struct fuse_attr_out arg;
+    size_t size =
+        req->se->conn.proto_minor < 9 ? FUSE_COMPAT_ATTR_OUT_SIZE : sizeof(arg);
 
-	memset(&arg, 0, sizeof(arg));
-	arg.attr_valid = calc_timeout_sec(attr_timeout);
-	arg.attr_valid_nsec = calc_timeout_nsec(attr_timeout);
-	convert_stat(attr, &arg.attr);
+    memset(&arg, 0, sizeof(arg));
+    arg.attr_valid = calc_timeout_sec(attr_timeout);
+    arg.attr_valid_nsec = calc_timeout_nsec(attr_timeout);
+    convert_stat(attr, &arg.attr);
 
-	return send_reply_ok(req, &arg, size);
+    return send_reply_ok(req, &arg, size);
 }
 
 int fuse_reply_readlink(fuse_req_t req, const char *linkname)
 {
-	return send_reply_ok(req, linkname, strlen(linkname));
+    return send_reply_ok(req, linkname, strlen(linkname));
 }
 
 int fuse_reply_open(fuse_req_t req, const struct fuse_file_info *f)
 {
-	struct fuse_open_out arg;
+    struct fuse_open_out arg;
 
-	memset(&arg, 0, sizeof(arg));
-	fill_open(&arg, f);
-	return send_reply_ok(req, &arg, sizeof(arg));
+    memset(&arg, 0, sizeof(arg));
+    fill_open(&arg, f);
+    return send_reply_ok(req, &arg, sizeof(arg));
 }
 
 int fuse_reply_write(fuse_req_t req, size_t count)
 {
-	struct fuse_write_out arg;
+    struct fuse_write_out arg;
 
-	memset(&arg, 0, sizeof(arg));
-	arg.size = count;
+    memset(&arg, 0, sizeof(arg));
+    arg.size = count;
 
-	return send_reply_ok(req, &arg, sizeof(arg));
+    return send_reply_ok(req, &arg, sizeof(arg));
 }
 
 int fuse_reply_buf(fuse_req_t req, const char *buf, size_t size)
 {
-	return send_reply_ok(req, buf, size);
+    return send_reply_ok(req, buf, size);
 }
 
 static int fuse_send_data_iov_fallback(struct fuse_session *se,
-				       struct fuse_chan *ch,
-				       struct iovec *iov, int iov_count,
-				       struct fuse_bufvec *buf,
-				       size_t len)
+                                       struct fuse_chan *ch, struct iovec *iov,
+                                       int iov_count, struct fuse_bufvec *buf,
+                                       size_t len)
 {
-	/* Optimize common case */
-	if (buf->count == 1 && buf->idx == 0 && buf->off == 0 &&
-	    !(buf->buf[0].flags & FUSE_BUF_IS_FD)) {
-		/* FIXME: also avoid memory copy if there are multiple buffers
-		   but none of them contain an fd */
+    /* Optimize common case */
+    if (buf->count == 1 && buf->idx == 0 && buf->off == 0 &&
+        !(buf->buf[0].flags & FUSE_BUF_IS_FD)) {
+        /*
+         * FIXME: also avoid memory copy if there are multiple buffers
+         * but none of them contain an fd
+         */
 
-		iov[iov_count].iov_base = buf->buf[0].mem;
-		iov[iov_count].iov_len = len;
-		iov_count++;
-		return fuse_send_msg(se, ch, iov, iov_count);
-	}
+        iov[iov_count].iov_base = buf->buf[0].mem;
+        iov[iov_count].iov_len = len;
+        iov_count++;
+        return fuse_send_msg(se, ch, iov, iov_count);
+    }
 
-	abort(); /* Will have taken vhost path */
-	return 0;
+    abort(); /* Will have taken vhost path */
+    return 0;
 }
 
 static int fuse_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
-			       struct iovec *iov, int iov_count,
-			       struct fuse_bufvec *buf, unsigned int flags)
+                              struct iovec *iov, int iov_count,
+                              struct fuse_bufvec *buf, unsigned int flags)
 {
-	size_t len = fuse_buf_size(buf);
-	(void) flags;
+    size_t len = fuse_buf_size(buf);
+    (void)flags;
 
-	return fuse_send_data_iov_fallback(se, ch, iov, iov_count, buf, len);
+    return fuse_send_data_iov_fallback(se, ch, iov, iov_count, buf, len);
 }
 
 int fuse_reply_data(fuse_req_t req, struct fuse_bufvec *bufv,
-		    enum fuse_buf_copy_flags flags)
+                    enum fuse_buf_copy_flags flags)
 {
-	struct iovec iov[2];
-	struct fuse_out_header out;
-	int res;
+    struct iovec iov[2];
+    struct fuse_out_header out;
+    int res;
 
-	iov[0].iov_base = &out;
-	iov[0].iov_len = sizeof(struct fuse_out_header);
+    iov[0].iov_base = &out;
+    iov[0].iov_len = sizeof(struct fuse_out_header);
 
-	out.unique = req->unique;
-	out.error = 0;
+    out.unique = req->unique;
+    out.error = 0;
 
-	res = fuse_send_data_iov(req->se, req->ch, iov, 1, bufv, flags);
-	if (res <= 0) {
-		fuse_free_req(req);
-		return res;
-	} else {
-		return fuse_reply_err(req, res);
-	}
+    res = fuse_send_data_iov(req->se, req->ch, iov, 1, bufv, flags);
+    if (res <= 0) {
+        fuse_free_req(req);
+        return res;
+    } else {
+        return fuse_reply_err(req, res);
+    }
 }
 
 int fuse_reply_statfs(fuse_req_t req, const struct statvfs *stbuf)
 {
-	struct fuse_statfs_out arg;
-	size_t size = req->se->conn.proto_minor < 4 ?
-		FUSE_COMPAT_STATFS_SIZE : sizeof(arg);
+    struct fuse_statfs_out arg;
+    size_t size =
+        req->se->conn.proto_minor < 4 ? FUSE_COMPAT_STATFS_SIZE : sizeof(arg);
 
-	memset(&arg, 0, sizeof(arg));
-	convert_statfs(stbuf, &arg.st);
+    memset(&arg, 0, sizeof(arg));
+    convert_statfs(stbuf, &arg.st);
 
-	return send_reply_ok(req, &arg, size);
+    return send_reply_ok(req, &arg, size);
 }
 
 int fuse_reply_xattr(fuse_req_t req, size_t count)
 {
-	struct fuse_getxattr_out arg;
+    struct fuse_getxattr_out arg;
 
-	memset(&arg, 0, sizeof(arg));
-	arg.size = count;
+    memset(&arg, 0, sizeof(arg));
+    arg.size = count;
 
-	return send_reply_ok(req, &arg, sizeof(arg));
+    return send_reply_ok(req, &arg, sizeof(arg));
 }
 
 int fuse_reply_lock(fuse_req_t req, const struct flock *lock)
 {
-	struct fuse_lk_out arg;
+    struct fuse_lk_out arg;
 
-	memset(&arg, 0, sizeof(arg));
-	arg.lk.type = lock->l_type;
-	if (lock->l_type != F_UNLCK) {
-		arg.lk.start = lock->l_start;
-		if (lock->l_len == 0)
-			arg.lk.end = OFFSET_MAX;
-		else
-			arg.lk.end = lock->l_start + lock->l_len - 1;
-	}
-	arg.lk.pid = lock->l_pid;
-	return send_reply_ok(req, &arg, sizeof(arg));
+    memset(&arg, 0, sizeof(arg));
+    arg.lk.type = lock->l_type;
+    if (lock->l_type != F_UNLCK) {
+        arg.lk.start = lock->l_start;
+        if (lock->l_len == 0) {
+            arg.lk.end = OFFSET_MAX;
+        } else {
+            arg.lk.end = lock->l_start + lock->l_len - 1;
+        }
+    }
+    arg.lk.pid = lock->l_pid;
+    return send_reply_ok(req, &arg, sizeof(arg));
 }
 
 int fuse_reply_bmap(fuse_req_t req, uint64_t idx)
 {
-	struct fuse_bmap_out arg;
+    struct fuse_bmap_out arg;
 
-	memset(&arg, 0, sizeof(arg));
-	arg.block = idx;
+    memset(&arg, 0, sizeof(arg));
+    arg.block = idx;
 
-	return send_reply_ok(req, &arg, sizeof(arg));
+    return send_reply_ok(req, &arg, sizeof(arg));
 }
 
 static struct fuse_ioctl_iovec *fuse_ioctl_iovec_copy(const struct iovec *iov,
-						      size_t count)
-{
-	struct fuse_ioctl_iovec *fiov;
-	size_t i;
-
-	fiov = malloc(sizeof(fiov[0]) * count);
-	if (!fiov)
-		return NULL;
-
-	for (i = 0; i < count; i++) {
-		fiov[i].base = (uintptr_t) iov[i].iov_base;
-		fiov[i].len = iov[i].iov_len;
-	}
-
-	return fiov;
-}
-
-int fuse_reply_ioctl_retry(fuse_req_t req,
-			   const struct iovec *in_iov, size_t in_count,
-			   const struct iovec *out_iov, size_t out_count)
-{
-	struct fuse_ioctl_out arg;
-	struct fuse_ioctl_iovec *in_fiov = NULL;
-	struct fuse_ioctl_iovec *out_fiov = NULL;
-	struct iovec iov[4];
-	size_t count = 1;
-	int res;
-
-	memset(&arg, 0, sizeof(arg));
-	arg.flags |= FUSE_IOCTL_RETRY;
-	arg.in_iovs = in_count;
-	arg.out_iovs = out_count;
-	iov[count].iov_base = &arg;
-	iov[count].iov_len = sizeof(arg);
-	count++;
-
-	if (req->se->conn.proto_minor < 16) {
-		if (in_count) {
-			iov[count].iov_base = (void *)in_iov;
-			iov[count].iov_len = sizeof(in_iov[0]) * in_count;
-			count++;
-		}
-
-		if (out_count) {
-			iov[count].iov_base = (void *)out_iov;
-			iov[count].iov_len = sizeof(out_iov[0]) * out_count;
-			count++;
-		}
-	} else {
-		/* Can't handle non-compat 64bit ioctls on 32bit */
-		if (sizeof(void *) == 4 && req->ioctl_64bit) {
-			res = fuse_reply_err(req, EINVAL);
-			goto out;
-		}
-
-		if (in_count) {
-			in_fiov = fuse_ioctl_iovec_copy(in_iov, in_count);
-			if (!in_fiov)
-				goto enomem;
-
-			iov[count].iov_base = (void *)in_fiov;
-			iov[count].iov_len = sizeof(in_fiov[0]) * in_count;
-			count++;
-		}
-		if (out_count) {
-			out_fiov = fuse_ioctl_iovec_copy(out_iov, out_count);
-			if (!out_fiov)
-				goto enomem;
-
-			iov[count].iov_base = (void *)out_fiov;
-			iov[count].iov_len = sizeof(out_fiov[0]) * out_count;
-			count++;
-		}
-	}
-
-	res = send_reply_iov(req, 0, iov, count);
+                                                      size_t count)
+{
+    struct fuse_ioctl_iovec *fiov;
+    size_t i;
+
+    fiov = malloc(sizeof(fiov[0]) * count);
+    if (!fiov) {
+        return NULL;
+    }
+
+    for (i = 0; i < count; i++) {
+        fiov[i].base = (uintptr_t)iov[i].iov_base;
+        fiov[i].len = iov[i].iov_len;
+    }
+
+    return fiov;
+}
+
+int fuse_reply_ioctl_retry(fuse_req_t req, const struct iovec *in_iov,
+                           size_t in_count, const struct iovec *out_iov,
+                           size_t out_count)
+{
+    struct fuse_ioctl_out arg;
+    struct fuse_ioctl_iovec *in_fiov = NULL;
+    struct fuse_ioctl_iovec *out_fiov = NULL;
+    struct iovec iov[4];
+    size_t count = 1;
+    int res;
+
+    memset(&arg, 0, sizeof(arg));
+    arg.flags |= FUSE_IOCTL_RETRY;
+    arg.in_iovs = in_count;
+    arg.out_iovs = out_count;
+    iov[count].iov_base = &arg;
+    iov[count].iov_len = sizeof(arg);
+    count++;
+
+    if (req->se->conn.proto_minor < 16) {
+        if (in_count) {
+            iov[count].iov_base = (void *)in_iov;
+            iov[count].iov_len = sizeof(in_iov[0]) * in_count;
+            count++;
+        }
+
+        if (out_count) {
+            iov[count].iov_base = (void *)out_iov;
+            iov[count].iov_len = sizeof(out_iov[0]) * out_count;
+            count++;
+        }
+    } else {
+        /* Can't handle non-compat 64bit ioctls on 32bit */
+        if (sizeof(void *) == 4 && req->ioctl_64bit) {
+            res = fuse_reply_err(req, EINVAL);
+            goto out;
+        }
+
+        if (in_count) {
+            in_fiov = fuse_ioctl_iovec_copy(in_iov, in_count);
+            if (!in_fiov) {
+                goto enomem;
+            }
+
+            iov[count].iov_base = (void *)in_fiov;
+            iov[count].iov_len = sizeof(in_fiov[0]) * in_count;
+            count++;
+        }
+        if (out_count) {
+            out_fiov = fuse_ioctl_iovec_copy(out_iov, out_count);
+            if (!out_fiov) {
+                goto enomem;
+            }
+
+            iov[count].iov_base = (void *)out_fiov;
+            iov[count].iov_len = sizeof(out_fiov[0]) * out_count;
+            count++;
+        }
+    }
+
+    res = send_reply_iov(req, 0, iov, count);
 out:
-	free(in_fiov);
-	free(out_fiov);
+    free(in_fiov);
+    free(out_fiov);
 
-	return res;
+    return res;
 
 enomem:
-	res = fuse_reply_err(req, ENOMEM);
-	goto out;
+    res = fuse_reply_err(req, ENOMEM);
+    goto out;
 }
 
 int fuse_reply_ioctl(fuse_req_t req, int result, const void *buf, size_t size)
 {
-	struct fuse_ioctl_out arg;
-	struct iovec iov[3];
-	size_t count = 1;
+    struct fuse_ioctl_out arg;
+    struct iovec iov[3];
+    size_t count = 1;
 
-	memset(&arg, 0, sizeof(arg));
-	arg.result = result;
-	iov[count].iov_base = &arg;
-	iov[count].iov_len = sizeof(arg);
-	count++;
+    memset(&arg, 0, sizeof(arg));
+    arg.result = result;
+    iov[count].iov_base = &arg;
+    iov[count].iov_len = sizeof(arg);
+    count++;
 
-	if (size) {
-		iov[count].iov_base = (char *) buf;
-		iov[count].iov_len = size;
-		count++;
-	}
+    if (size) {
+        iov[count].iov_base = (char *)buf;
+        iov[count].iov_len = size;
+        count++;
+    }
 
-	return send_reply_iov(req, 0, iov, count);
+    return send_reply_iov(req, 0, iov, count);
 }
 
 int fuse_reply_ioctl_iov(fuse_req_t req, int result, const struct iovec *iov,
-			 int count)
+                         int count)
 {
-	struct iovec *padded_iov;
-	struct fuse_ioctl_out arg;
-	int res;
+    struct iovec *padded_iov;
+    struct fuse_ioctl_out arg;
+    int res;
 
-	padded_iov = malloc((count + 2) * sizeof(struct iovec));
-	if (padded_iov == NULL)
-		return fuse_reply_err(req, ENOMEM);
+    padded_iov = malloc((count + 2) * sizeof(struct iovec));
+    if (padded_iov == NULL) {
+        return fuse_reply_err(req, ENOMEM);
+    }
 
-	memset(&arg, 0, sizeof(arg));
-	arg.result = result;
-	padded_iov[1].iov_base = &arg;
-	padded_iov[1].iov_len = sizeof(arg);
+    memset(&arg, 0, sizeof(arg));
+    arg.result = result;
+    padded_iov[1].iov_base = &arg;
+    padded_iov[1].iov_len = sizeof(arg);
 
-	memcpy(&padded_iov[2], iov, count * sizeof(struct iovec));
+    memcpy(&padded_iov[2], iov, count * sizeof(struct iovec));
 
-	res = send_reply_iov(req, 0, padded_iov, count + 2);
-	free(padded_iov);
+    res = send_reply_iov(req, 0, padded_iov, count + 2);
+    free(padded_iov);
 
-	return res;
+    return res;
 }
 
 int fuse_reply_poll(fuse_req_t req, unsigned revents)
 {
-	struct fuse_poll_out arg;
+    struct fuse_poll_out arg;
 
-	memset(&arg, 0, sizeof(arg));
-	arg.revents = revents;
+    memset(&arg, 0, sizeof(arg));
+    arg.revents = revents;
 
-	return send_reply_ok(req, &arg, sizeof(arg));
+    return send_reply_ok(req, &arg, sizeof(arg));
 }
 
 int fuse_reply_lseek(fuse_req_t req, off_t off)
 {
-	struct fuse_lseek_out arg;
+    struct fuse_lseek_out arg;
 
-	memset(&arg, 0, sizeof(arg));
-	arg.offset = off;
+    memset(&arg, 0, sizeof(arg));
+    arg.offset = off;
 
-	return send_reply_ok(req, &arg, sizeof(arg));
+    return send_reply_ok(req, &arg, sizeof(arg));
 }
 
 static void do_lookup(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	char *name = (char *) inarg;
+    char *name = (char *)inarg;
 
-	if (req->se->op.lookup)
-		req->se->op.lookup(req, nodeid, name);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.lookup) {
+        req->se->op.lookup(req, nodeid, name);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_forget(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_forget_in *arg = (struct fuse_forget_in *) inarg;
+    struct fuse_forget_in *arg = (struct fuse_forget_in *)inarg;
 
-	if (req->se->op.forget)
-		req->se->op.forget(req, nodeid, arg->nlookup);
-	else
-		fuse_reply_none(req);
+    if (req->se->op.forget) {
+        req->se->op.forget(req, nodeid, arg->nlookup);
+    } else {
+        fuse_reply_none(req);
+    }
 }
 
 static void do_batch_forget(fuse_req_t req, fuse_ino_t nodeid,
-			    const void *inarg)
+                            const void *inarg)
 {
-	struct fuse_batch_forget_in *arg = (void *) inarg;
-	struct fuse_forget_one *param = (void *) PARAM(arg);
-	unsigned int i;
+    struct fuse_batch_forget_in *arg = (void *)inarg;
+    struct fuse_forget_one *param = (void *)PARAM(arg);
+    unsigned int i;
 
-	(void) nodeid;
+    (void)nodeid;
 
-	if (req->se->op.forget_multi) {
-		req->se->op.forget_multi(req, arg->count,
-				     (struct fuse_forget_data *) param);
-	} else if (req->se->op.forget) {
-		for (i = 0; i < arg->count; i++) {
-			struct fuse_forget_one *forget = &param[i];
-			struct fuse_req *dummy_req;
+    if (req->se->op.forget_multi) {
+        req->se->op.forget_multi(req, arg->count,
+                                 (struct fuse_forget_data *)param);
+    } else if (req->se->op.forget) {
+        for (i = 0; i < arg->count; i++) {
+            struct fuse_forget_one *forget = &param[i];
+            struct fuse_req *dummy_req;
 
-			dummy_req = fuse_ll_alloc_req(req->se);
-			if (dummy_req == NULL)
-				break;
+            dummy_req = fuse_ll_alloc_req(req->se);
+            if (dummy_req == NULL) {
+                break;
+            }
 
-			dummy_req->unique = req->unique;
-			dummy_req->ctx = req->ctx;
-			dummy_req->ch = NULL;
+            dummy_req->unique = req->unique;
+            dummy_req->ctx = req->ctx;
+            dummy_req->ch = NULL;
 
-			req->se->op.forget(dummy_req, forget->nodeid,
-					  forget->nlookup);
-		}
-		fuse_reply_none(req);
-	} else {
-		fuse_reply_none(req);
-	}
+            req->se->op.forget(dummy_req, forget->nodeid, forget->nlookup);
+        }
+        fuse_reply_none(req);
+    } else {
+        fuse_reply_none(req);
+    }
 }
 
 static void do_getattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_file_info *fip = NULL;
-	struct fuse_file_info fi;
+    struct fuse_file_info *fip = NULL;
+    struct fuse_file_info fi;
 
-	if (req->se->conn.proto_minor >= 9) {
-		struct fuse_getattr_in *arg = (struct fuse_getattr_in *) inarg;
+    if (req->se->conn.proto_minor >= 9) {
+        struct fuse_getattr_in *arg = (struct fuse_getattr_in *)inarg;
 
-		if (arg->getattr_flags & FUSE_GETATTR_FH) {
-			memset(&fi, 0, sizeof(fi));
-			fi.fh = arg->fh;
-			fip = &fi;
-		}
-	}
+        if (arg->getattr_flags & FUSE_GETATTR_FH) {
+            memset(&fi, 0, sizeof(fi));
+            fi.fh = arg->fh;
+            fip = &fi;
+        }
+    }
 
-	if (req->se->op.getattr)
-		req->se->op.getattr(req, nodeid, fip);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.getattr) {
+        req->se->op.getattr(req, nodeid, fip);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_setattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_setattr_in *arg = (struct fuse_setattr_in *) inarg;
-
-	if (req->se->op.setattr) {
-		struct fuse_file_info *fi = NULL;
-		struct fuse_file_info fi_store;
-		struct stat stbuf;
-		memset(&stbuf, 0, sizeof(stbuf));
-		convert_attr(arg, &stbuf);
-		if (arg->valid & FATTR_FH) {
-			arg->valid &= ~FATTR_FH;
-			memset(&fi_store, 0, sizeof(fi_store));
-			fi = &fi_store;
-			fi->fh = arg->fh;
-		}
-		arg->valid &=
-			FUSE_SET_ATTR_MODE	|
-			FUSE_SET_ATTR_UID	|
-			FUSE_SET_ATTR_GID	|
-			FUSE_SET_ATTR_SIZE	|
-			FUSE_SET_ATTR_ATIME	|
-			FUSE_SET_ATTR_MTIME	|
-			FUSE_SET_ATTR_ATIME_NOW	|
-			FUSE_SET_ATTR_MTIME_NOW |
-			FUSE_SET_ATTR_CTIME;
-
-		req->se->op.setattr(req, nodeid, &stbuf, arg->valid, fi);
-	} else
-		fuse_reply_err(req, ENOSYS);
+    struct fuse_setattr_in *arg = (struct fuse_setattr_in *)inarg;
+
+    if (req->se->op.setattr) {
+        struct fuse_file_info *fi = NULL;
+        struct fuse_file_info fi_store;
+        struct stat stbuf;
+        memset(&stbuf, 0, sizeof(stbuf));
+        convert_attr(arg, &stbuf);
+        if (arg->valid & FATTR_FH) {
+            arg->valid &= ~FATTR_FH;
+            memset(&fi_store, 0, sizeof(fi_store));
+            fi = &fi_store;
+            fi->fh = arg->fh;
+        }
+        arg->valid &= FUSE_SET_ATTR_MODE | FUSE_SET_ATTR_UID |
+                      FUSE_SET_ATTR_GID | FUSE_SET_ATTR_SIZE |
+                      FUSE_SET_ATTR_ATIME | FUSE_SET_ATTR_MTIME |
+                      FUSE_SET_ATTR_ATIME_NOW | FUSE_SET_ATTR_MTIME_NOW |
+                      FUSE_SET_ATTR_CTIME;
+
+        req->se->op.setattr(req, nodeid, &stbuf, arg->valid, fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_access(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_access_in *arg = (struct fuse_access_in *) inarg;
+    struct fuse_access_in *arg = (struct fuse_access_in *)inarg;
 
-	if (req->se->op.access)
-		req->se->op.access(req, nodeid, arg->mask);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.access) {
+        req->se->op.access(req, nodeid, arg->mask);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_readlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	(void) inarg;
+    (void)inarg;
 
-	if (req->se->op.readlink)
-		req->se->op.readlink(req, nodeid);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.readlink) {
+        req->se->op.readlink(req, nodeid);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_mknod(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_mknod_in *arg = (struct fuse_mknod_in *) inarg;
-	char *name = PARAM(arg);
+    struct fuse_mknod_in *arg = (struct fuse_mknod_in *)inarg;
+    char *name = PARAM(arg);
 
-	if (req->se->conn.proto_minor >= 12)
-		req->ctx.umask = arg->umask;
-	else
-		name = (char *) inarg + FUSE_COMPAT_MKNOD_IN_SIZE;
+    if (req->se->conn.proto_minor >= 12) {
+        req->ctx.umask = arg->umask;
+    } else {
+        name = (char *)inarg + FUSE_COMPAT_MKNOD_IN_SIZE;
+    }
 
-	if (req->se->op.mknod)
-		req->se->op.mknod(req, nodeid, name, arg->mode, arg->rdev);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.mknod) {
+        req->se->op.mknod(req, nodeid, name, arg->mode, arg->rdev);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_mkdir_in *arg = (struct fuse_mkdir_in *) inarg;
+    struct fuse_mkdir_in *arg = (struct fuse_mkdir_in *)inarg;
 
-	if (req->se->conn.proto_minor >= 12)
-		req->ctx.umask = arg->umask;
+    if (req->se->conn.proto_minor >= 12) {
+        req->ctx.umask = arg->umask;
+    }
 
-	if (req->se->op.mkdir)
-		req->se->op.mkdir(req, nodeid, PARAM(arg), arg->mode);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.mkdir) {
+        req->se->op.mkdir(req, nodeid, PARAM(arg), arg->mode);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_unlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	char *name = (char *) inarg;
+    char *name = (char *)inarg;
 
-	if (req->se->op.unlink)
-		req->se->op.unlink(req, nodeid, name);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.unlink) {
+        req->se->op.unlink(req, nodeid, name);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_rmdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	char *name = (char *) inarg;
+    char *name = (char *)inarg;
 
-	if (req->se->op.rmdir)
-		req->se->op.rmdir(req, nodeid, name);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.rmdir) {
+        req->se->op.rmdir(req, nodeid, name);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_symlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	char *name = (char *) inarg;
-	char *linkname = ((char *) inarg) + strlen((char *) inarg) + 1;
+    char *name = (char *)inarg;
+    char *linkname = ((char *)inarg) + strlen((char *)inarg) + 1;
 
-	if (req->se->op.symlink)
-		req->se->op.symlink(req, linkname, nodeid, name);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.symlink) {
+        req->se->op.symlink(req, linkname, nodeid, name);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_rename(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_rename_in *arg = (struct fuse_rename_in *) inarg;
-	char *oldname = PARAM(arg);
-	char *newname = oldname + strlen(oldname) + 1;
+    struct fuse_rename_in *arg = (struct fuse_rename_in *)inarg;
+    char *oldname = PARAM(arg);
+    char *newname = oldname + strlen(oldname) + 1;
 
-	if (req->se->op.rename)
-		req->se->op.rename(req, nodeid, oldname, arg->newdir, newname,
-				  0);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.rename) {
+        req->se->op.rename(req, nodeid, oldname, arg->newdir, newname, 0);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_rename2(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_rename2_in *arg = (struct fuse_rename2_in *) inarg;
-	char *oldname = PARAM(arg);
-	char *newname = oldname + strlen(oldname) + 1;
+    struct fuse_rename2_in *arg = (struct fuse_rename2_in *)inarg;
+    char *oldname = PARAM(arg);
+    char *newname = oldname + strlen(oldname) + 1;
 
-	if (req->se->op.rename)
-		req->se->op.rename(req, nodeid, oldname, arg->newdir, newname,
-				  arg->flags);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.rename) {
+        req->se->op.rename(req, nodeid, oldname, arg->newdir, newname,
+                           arg->flags);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_link(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_link_in *arg = (struct fuse_link_in *) inarg;
+    struct fuse_link_in *arg = (struct fuse_link_in *)inarg;
 
-	if (req->se->op.link)
-		req->se->op.link(req, arg->oldnodeid, nodeid, PARAM(arg));
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.link) {
+        req->se->op.link(req, arg->oldnodeid, nodeid, PARAM(arg));
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_create(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_create_in *arg = (struct fuse_create_in *) inarg;
+    struct fuse_create_in *arg = (struct fuse_create_in *)inarg;
 
-	if (req->se->op.create) {
-		struct fuse_file_info fi;
-		char *name = PARAM(arg);
+    if (req->se->op.create) {
+        struct fuse_file_info fi;
+        char *name = PARAM(arg);
 
-		memset(&fi, 0, sizeof(fi));
-		fi.flags = arg->flags;
+        memset(&fi, 0, sizeof(fi));
+        fi.flags = arg->flags;
 
-		if (req->se->conn.proto_minor >= 12)
-			req->ctx.umask = arg->umask;
-		else
-			name = (char *) inarg + sizeof(struct fuse_open_in);
+        if (req->se->conn.proto_minor >= 12) {
+            req->ctx.umask = arg->umask;
+        } else {
+            name = (char *)inarg + sizeof(struct fuse_open_in);
+        }
 
-		req->se->op.create(req, nodeid, name, arg->mode, &fi);
-	} else
-		fuse_reply_err(req, ENOSYS);
+        req->se->op.create(req, nodeid, name, arg->mode, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_open(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_open_in *arg = (struct fuse_open_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_open_in *arg = (struct fuse_open_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.flags = arg->flags;
+    memset(&fi, 0, sizeof(fi));
+    fi.flags = arg->flags;
 
-	if (req->se->op.open)
-		req->se->op.open(req, nodeid, &fi);
-	else
-		fuse_reply_open(req, &fi);
+    if (req->se->op.open) {
+        req->se->op.open(req, nodeid, &fi);
+    } else {
+        fuse_reply_open(req, &fi);
+    }
 }
 
 static void do_read(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_read_in *arg = (struct fuse_read_in *) inarg;
+    struct fuse_read_in *arg = (struct fuse_read_in *)inarg;
 
-	if (req->se->op.read) {
-		struct fuse_file_info fi;
+    if (req->se->op.read) {
+        struct fuse_file_info fi;
 
-		memset(&fi, 0, sizeof(fi));
-		fi.fh = arg->fh;
-		if (req->se->conn.proto_minor >= 9) {
-			fi.lock_owner = arg->lock_owner;
-			fi.flags = arg->flags;
-		}
-		req->se->op.read(req, nodeid, arg->size, arg->offset, &fi);
-	} else
-		fuse_reply_err(req, ENOSYS);
+        memset(&fi, 0, sizeof(fi));
+        fi.fh = arg->fh;
+        if (req->se->conn.proto_minor >= 9) {
+            fi.lock_owner = arg->lock_owner;
+            fi.flags = arg->flags;
+        }
+        req->se->op.read(req, nodeid, arg->size, arg->offset, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_write_in *arg = (struct fuse_write_in *) inarg;
-	struct fuse_file_info fi;
-	char *param;
+    struct fuse_write_in *arg = (struct fuse_write_in *)inarg;
+    struct fuse_file_info fi;
+    char *param;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
-	fi.writepage = (arg->write_flags & FUSE_WRITE_CACHE) != 0;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
+    fi.writepage = (arg->write_flags & FUSE_WRITE_CACHE) != 0;
 
-	if (req->se->conn.proto_minor < 9) {
-		param = ((char *) arg) + FUSE_COMPAT_WRITE_IN_SIZE;
-	} else {
-		fi.lock_owner = arg->lock_owner;
-		fi.flags = arg->flags;
-		param = PARAM(arg);
-	}
+    if (req->se->conn.proto_minor < 9) {
+        param = ((char *)arg) + FUSE_COMPAT_WRITE_IN_SIZE;
+    } else {
+        fi.lock_owner = arg->lock_owner;
+        fi.flags = arg->flags;
+        param = PARAM(arg);
+    }
 
-	if (req->se->op.write)
-		req->se->op.write(req, nodeid, param, arg->size,
-				 arg->offset, &fi);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.write) {
+        req->se->op.write(req, nodeid, param, arg->size, arg->offset, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
-			 const struct fuse_buf *ibuf)
-{
-	struct fuse_session *se = req->se;
-	struct fuse_bufvec bufv = {
-		.buf[0] = *ibuf,
-		.count = 1,
-	};
-	struct fuse_write_in *arg = (struct fuse_write_in *) inarg;
-	struct fuse_file_info fi;
-
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
-	fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
-
-	if (se->conn.proto_minor < 9) {
-		bufv.buf[0].mem = ((char *) arg) + FUSE_COMPAT_WRITE_IN_SIZE;
-		bufv.buf[0].size -= sizeof(struct fuse_in_header) +
-			FUSE_COMPAT_WRITE_IN_SIZE;
-		assert(!(bufv.buf[0].flags & FUSE_BUF_IS_FD));
-	} else {
-		fi.lock_owner = arg->lock_owner;
-		fi.flags = arg->flags;
-		if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD))
-			bufv.buf[0].mem = PARAM(arg);
-
-		bufv.buf[0].size -= sizeof(struct fuse_in_header) +
-			sizeof(struct fuse_write_in);
-	}
-	if (bufv.buf[0].size < arg->size) {
-		fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
-		fuse_reply_err(req, EIO);
-		return;
-	}
-	bufv.buf[0].size = arg->size;
-
-	se->op.write_buf(req, nodeid, &bufv, arg->offset, &fi);
+                         const struct fuse_buf *ibuf)
+{
+    struct fuse_session *se = req->se;
+    struct fuse_bufvec bufv = {
+        .buf[0] = *ibuf,
+        .count = 1,
+    };
+    struct fuse_write_in *arg = (struct fuse_write_in *)inarg;
+    struct fuse_file_info fi;
+
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
+    fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
+
+    if (se->conn.proto_minor < 9) {
+        bufv.buf[0].mem = ((char *)arg) + FUSE_COMPAT_WRITE_IN_SIZE;
+        bufv.buf[0].size -=
+            sizeof(struct fuse_in_header) + FUSE_COMPAT_WRITE_IN_SIZE;
+        assert(!(bufv.buf[0].flags & FUSE_BUF_IS_FD));
+    } else {
+        fi.lock_owner = arg->lock_owner;
+        fi.flags = arg->flags;
+        if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD)) {
+            bufv.buf[0].mem = PARAM(arg);
+        }
+
+        bufv.buf[0].size -=
+            sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in);
+    }
+    if (bufv.buf[0].size < arg->size) {
+        fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
+        fuse_reply_err(req, EIO);
+        return;
+    }
+    bufv.buf[0].size = arg->size;
+
+    se->op.write_buf(req, nodeid, &bufv, arg->offset, &fi);
 }
 
 static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_flush_in *arg = (struct fuse_flush_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_flush_in *arg = (struct fuse_flush_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
-	fi.flush = 1;
-	if (req->se->conn.proto_minor >= 7)
-		fi.lock_owner = arg->lock_owner;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
+    fi.flush = 1;
+    if (req->se->conn.proto_minor >= 7) {
+        fi.lock_owner = arg->lock_owner;
+    }
 
-	if (req->se->op.flush)
-		req->se->op.flush(req, nodeid, &fi);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.flush) {
+        req->se->op.flush(req, nodeid, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_release(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_release_in *arg = (struct fuse_release_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_release_in *arg = (struct fuse_release_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.flags = arg->flags;
-	fi.fh = arg->fh;
-	if (req->se->conn.proto_minor >= 8) {
-		fi.flush = (arg->release_flags & FUSE_RELEASE_FLUSH) ? 1 : 0;
-		fi.lock_owner = arg->lock_owner;
-	}
-	if (arg->release_flags & FUSE_RELEASE_FLOCK_UNLOCK) {
-		fi.flock_release = 1;
-		fi.lock_owner = arg->lock_owner;
-	}
+    memset(&fi, 0, sizeof(fi));
+    fi.flags = arg->flags;
+    fi.fh = arg->fh;
+    if (req->se->conn.proto_minor >= 8) {
+        fi.flush = (arg->release_flags & FUSE_RELEASE_FLUSH) ? 1 : 0;
+        fi.lock_owner = arg->lock_owner;
+    }
+    if (arg->release_flags & FUSE_RELEASE_FLOCK_UNLOCK) {
+        fi.flock_release = 1;
+        fi.lock_owner = arg->lock_owner;
+    }
 
-	if (req->se->op.release)
-		req->se->op.release(req, nodeid, &fi);
-	else
-		fuse_reply_err(req, 0);
+    if (req->se->op.release) {
+        req->se->op.release(req, nodeid, &fi);
+    } else {
+        fuse_reply_err(req, 0);
+    }
 }
 
 static void do_fsync(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_fsync_in *arg = (struct fuse_fsync_in *) inarg;
-	struct fuse_file_info fi;
-	int datasync = arg->fsync_flags & 1;
+    struct fuse_fsync_in *arg = (struct fuse_fsync_in *)inarg;
+    struct fuse_file_info fi;
+    int datasync = arg->fsync_flags & 1;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
 
-	if (req->se->op.fsync)
-		req->se->op.fsync(req, nodeid, datasync, &fi);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.fsync) {
+        req->se->op.fsync(req, nodeid, datasync, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_opendir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_open_in *arg = (struct fuse_open_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_open_in *arg = (struct fuse_open_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.flags = arg->flags;
+    memset(&fi, 0, sizeof(fi));
+    fi.flags = arg->flags;
 
-	if (req->se->op.opendir)
-		req->se->op.opendir(req, nodeid, &fi);
-	else
-		fuse_reply_open(req, &fi);
+    if (req->se->op.opendir) {
+        req->se->op.opendir(req, nodeid, &fi);
+    } else {
+        fuse_reply_open(req, &fi);
+    }
 }
 
 static void do_readdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_read_in *arg = (struct fuse_read_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_read_in *arg = (struct fuse_read_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
 
-	if (req->se->op.readdir)
-		req->se->op.readdir(req, nodeid, arg->size, arg->offset, &fi);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.readdir) {
+        req->se->op.readdir(req, nodeid, arg->size, arg->offset, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_readdirplus(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_read_in *arg = (struct fuse_read_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_read_in *arg = (struct fuse_read_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
 
-	if (req->se->op.readdirplus)
-		req->se->op.readdirplus(req, nodeid, arg->size, arg->offset, &fi);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.readdirplus) {
+        req->se->op.readdirplus(req, nodeid, arg->size, arg->offset, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_releasedir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_release_in *arg = (struct fuse_release_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_release_in *arg = (struct fuse_release_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.flags = arg->flags;
-	fi.fh = arg->fh;
+    memset(&fi, 0, sizeof(fi));
+    fi.flags = arg->flags;
+    fi.fh = arg->fh;
 
-	if (req->se->op.releasedir)
-		req->se->op.releasedir(req, nodeid, &fi);
-	else
-		fuse_reply_err(req, 0);
+    if (req->se->op.releasedir) {
+        req->se->op.releasedir(req, nodeid, &fi);
+    } else {
+        fuse_reply_err(req, 0);
+    }
 }
 
 static void do_fsyncdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_fsync_in *arg = (struct fuse_fsync_in *) inarg;
-	struct fuse_file_info fi;
-	int datasync = arg->fsync_flags & 1;
+    struct fuse_fsync_in *arg = (struct fuse_fsync_in *)inarg;
+    struct fuse_file_info fi;
+    int datasync = arg->fsync_flags & 1;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
 
-	if (req->se->op.fsyncdir)
-		req->se->op.fsyncdir(req, nodeid, datasync, &fi);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.fsyncdir) {
+        req->se->op.fsyncdir(req, nodeid, datasync, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_statfs(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	(void) nodeid;
-	(void) inarg;
+    (void)nodeid;
+    (void)inarg;
 
-	if (req->se->op.statfs)
-		req->se->op.statfs(req, nodeid);
-	else {
-		struct statvfs buf = {
-			.f_namemax = 255,
-			.f_bsize = 512,
-		};
-		fuse_reply_statfs(req, &buf);
-	}
+    if (req->se->op.statfs) {
+        req->se->op.statfs(req, nodeid);
+    } else {
+        struct statvfs buf = {
+            .f_namemax = 255,
+            .f_bsize = 512,
+        };
+        fuse_reply_statfs(req, &buf);
+    }
 }
 
 static void do_setxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_setxattr_in *arg = (struct fuse_setxattr_in *) inarg;
-	char *name = PARAM(arg);
-	char *value = name + strlen(name) + 1;
+    struct fuse_setxattr_in *arg = (struct fuse_setxattr_in *)inarg;
+    char *name = PARAM(arg);
+    char *value = name + strlen(name) + 1;
 
-	if (req->se->op.setxattr)
-		req->se->op.setxattr(req, nodeid, name, value, arg->size,
-				    arg->flags);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.setxattr) {
+        req->se->op.setxattr(req, nodeid, name, value, arg->size, arg->flags);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_getxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *) inarg;
+    struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *)inarg;
 
-	if (req->se->op.getxattr)
-		req->se->op.getxattr(req, nodeid, PARAM(arg), arg->size);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.getxattr) {
+        req->se->op.getxattr(req, nodeid, PARAM(arg), arg->size);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_listxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *) inarg;
+    struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *)inarg;
 
-	if (req->se->op.listxattr)
-		req->se->op.listxattr(req, nodeid, arg->size);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.listxattr) {
+        req->se->op.listxattr(req, nodeid, arg->size);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_removexattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	char *name = (char *) inarg;
+    char *name = (char *)inarg;
 
-	if (req->se->op.removexattr)
-		req->se->op.removexattr(req, nodeid, name);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.removexattr) {
+        req->se->op.removexattr(req, nodeid, name);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void convert_fuse_file_lock(struct fuse_file_lock *fl,
-				   struct flock *flock)
+                                   struct flock *flock)
 {
-	memset(flock, 0, sizeof(struct flock));
-	flock->l_type = fl->type;
-	flock->l_whence = SEEK_SET;
-	flock->l_start = fl->start;
-	if (fl->end == OFFSET_MAX)
-		flock->l_len = 0;
-	else
-		flock->l_len = fl->end - fl->start + 1;
-	flock->l_pid = fl->pid;
+    memset(flock, 0, sizeof(struct flock));
+    flock->l_type = fl->type;
+    flock->l_whence = SEEK_SET;
+    flock->l_start = fl->start;
+    if (fl->end == OFFSET_MAX) {
+        flock->l_len = 0;
+    } else {
+        flock->l_len = fl->end - fl->start + 1;
+    }
+    flock->l_pid = fl->pid;
 }
 
 static void do_getlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_lk_in *arg = (struct fuse_lk_in *) inarg;
-	struct fuse_file_info fi;
-	struct flock flock;
+    struct fuse_lk_in *arg = (struct fuse_lk_in *)inarg;
+    struct fuse_file_info fi;
+    struct flock flock;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
-	fi.lock_owner = arg->owner;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
+    fi.lock_owner = arg->owner;
 
-	convert_fuse_file_lock(&arg->lk, &flock);
-	if (req->se->op.getlk)
-		req->se->op.getlk(req, nodeid, &fi, &flock);
-	else
-		fuse_reply_err(req, ENOSYS);
+    convert_fuse_file_lock(&arg->lk, &flock);
+    if (req->se->op.getlk) {
+        req->se->op.getlk(req, nodeid, &fi, &flock);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_setlk_common(fuse_req_t req, fuse_ino_t nodeid,
-			    const void *inarg, int sleep)
-{
-	struct fuse_lk_in *arg = (struct fuse_lk_in *) inarg;
-	struct fuse_file_info fi;
-	struct flock flock;
-
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
-	fi.lock_owner = arg->owner;
-
-	if (arg->lk_flags & FUSE_LK_FLOCK) {
-		int op = 0;
-
-		switch (arg->lk.type) {
-		case F_RDLCK:
-			op = LOCK_SH;
-			break;
-		case F_WRLCK:
-			op = LOCK_EX;
-			break;
-		case F_UNLCK:
-			op = LOCK_UN;
-			break;
-		}
-		if (!sleep)
-			op |= LOCK_NB;
-
-		if (req->se->op.flock)
-			req->se->op.flock(req, nodeid, &fi, op);
-		else
-			fuse_reply_err(req, ENOSYS);
-	} else {
-		convert_fuse_file_lock(&arg->lk, &flock);
-		if (req->se->op.setlk)
-			req->se->op.setlk(req, nodeid, &fi, &flock, sleep);
-		else
-			fuse_reply_err(req, ENOSYS);
-	}
+                            const void *inarg, int sleep)
+{
+    struct fuse_lk_in *arg = (struct fuse_lk_in *)inarg;
+    struct fuse_file_info fi;
+    struct flock flock;
+
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
+    fi.lock_owner = arg->owner;
+
+    if (arg->lk_flags & FUSE_LK_FLOCK) {
+        int op = 0;
+
+        switch (arg->lk.type) {
+        case F_RDLCK:
+            op = LOCK_SH;
+            break;
+        case F_WRLCK:
+            op = LOCK_EX;
+            break;
+        case F_UNLCK:
+            op = LOCK_UN;
+            break;
+        }
+        if (!sleep) {
+            op |= LOCK_NB;
+        }
+
+        if (req->se->op.flock) {
+            req->se->op.flock(req, nodeid, &fi, op);
+        } else {
+            fuse_reply_err(req, ENOSYS);
+        }
+    } else {
+        convert_fuse_file_lock(&arg->lk, &flock);
+        if (req->se->op.setlk) {
+            req->se->op.setlk(req, nodeid, &fi, &flock, sleep);
+        } else {
+            fuse_reply_err(req, ENOSYS);
+        }
+    }
 }
 
 static void do_setlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	do_setlk_common(req, nodeid, inarg, 0);
+    do_setlk_common(req, nodeid, inarg, 0);
 }
 
 static void do_setlkw(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	do_setlk_common(req, nodeid, inarg, 1);
+    do_setlk_common(req, nodeid, inarg, 1);
 }
 
 static int find_interrupted(struct fuse_session *se, struct fuse_req *req)
 {
-	struct fuse_req *curr;
-
-	for (curr = se->list.next; curr != &se->list; curr = curr->next) {
-		if (curr->unique == req->u.i.unique) {
-			fuse_interrupt_func_t func;
-			void *data;
-
-			curr->ctr++;
-			pthread_mutex_unlock(&se->lock);
-
-			/* Ugh, ugly locking */
-			pthread_mutex_lock(&curr->lock);
-			pthread_mutex_lock(&se->lock);
-			curr->interrupted = 1;
-			func = curr->u.ni.func;
-			data = curr->u.ni.data;
-			pthread_mutex_unlock(&se->lock);
-			if (func)
-				func(curr, data);
-			pthread_mutex_unlock(&curr->lock);
-
-			pthread_mutex_lock(&se->lock);
-			curr->ctr--;
-			if (!curr->ctr)
-				destroy_req(curr);
-
-			return 1;
-		}
-	}
-	for (curr = se->interrupts.next; curr != &se->interrupts;
-	     curr = curr->next) {
-		if (curr->u.i.unique == req->u.i.unique)
-			return 1;
-	}
-	return 0;
+    struct fuse_req *curr;
+
+    for (curr = se->list.next; curr != &se->list; curr = curr->next) {
+        if (curr->unique == req->u.i.unique) {
+            fuse_interrupt_func_t func;
+            void *data;
+
+            curr->ctr++;
+            pthread_mutex_unlock(&se->lock);
+
+            /* Ugh, ugly locking */
+            pthread_mutex_lock(&curr->lock);
+            pthread_mutex_lock(&se->lock);
+            curr->interrupted = 1;
+            func = curr->u.ni.func;
+            data = curr->u.ni.data;
+            pthread_mutex_unlock(&se->lock);
+            if (func) {
+                func(curr, data);
+            }
+            pthread_mutex_unlock(&curr->lock);
+
+            pthread_mutex_lock(&se->lock);
+            curr->ctr--;
+            if (!curr->ctr) {
+                destroy_req(curr);
+            }
+
+            return 1;
+        }
+    }
+    for (curr = se->interrupts.next; curr != &se->interrupts;
+         curr = curr->next) {
+        if (curr->u.i.unique == req->u.i.unique) {
+            return 1;
+        }
+    }
+    return 0;
 }
 
 static void do_interrupt(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_interrupt_in *arg = (struct fuse_interrupt_in *) inarg;
-	struct fuse_session *se = req->se;
+    struct fuse_interrupt_in *arg = (struct fuse_interrupt_in *)inarg;
+    struct fuse_session *se = req->se;
 
-	(void) nodeid;
-	if (se->debug)
-		fuse_log(FUSE_LOG_DEBUG, "INTERRUPT: %llu\n",
-			(unsigned long long) arg->unique);
+    (void)nodeid;
+    if (se->debug) {
+        fuse_log(FUSE_LOG_DEBUG, "INTERRUPT: %llu\n",
+                 (unsigned long long)arg->unique);
+    }
 
-	req->u.i.unique = arg->unique;
+    req->u.i.unique = arg->unique;
 
-	pthread_mutex_lock(&se->lock);
-	if (find_interrupted(se, req))
-		destroy_req(req);
-	else
-		list_add_req(req, &se->interrupts);
-	pthread_mutex_unlock(&se->lock);
+    pthread_mutex_lock(&se->lock);
+    if (find_interrupted(se, req)) {
+        destroy_req(req);
+    } else {
+        list_add_req(req, &se->interrupts);
+    }
+    pthread_mutex_unlock(&se->lock);
 }
 
 static struct fuse_req *check_interrupt(struct fuse_session *se,
-					struct fuse_req *req)
-{
-	struct fuse_req *curr;
-
-	for (curr = se->interrupts.next; curr != &se->interrupts;
-	     curr = curr->next) {
-		if (curr->u.i.unique == req->unique) {
-			req->interrupted = 1;
-			list_del_req(curr);
-			free(curr);
-			return NULL;
-		}
-	}
-	curr = se->interrupts.next;
-	if (curr != &se->interrupts) {
-		list_del_req(curr);
-		list_init_req(curr);
-		return curr;
-	} else
-		return NULL;
+                                        struct fuse_req *req)
+{
+    struct fuse_req *curr;
+
+    for (curr = se->interrupts.next; curr != &se->interrupts;
+         curr = curr->next) {
+        if (curr->u.i.unique == req->unique) {
+            req->interrupted = 1;
+            list_del_req(curr);
+            free(curr);
+            return NULL;
+        }
+    }
+    curr = se->interrupts.next;
+    if (curr != &se->interrupts) {
+        list_del_req(curr);
+        list_init_req(curr);
+        return curr;
+    } else {
+        return NULL;
+    }
 }
 
 static void do_bmap(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_bmap_in *arg = (struct fuse_bmap_in *) inarg;
+    struct fuse_bmap_in *arg = (struct fuse_bmap_in *)inarg;
 
-	if (req->se->op.bmap)
-		req->se->op.bmap(req, nodeid, arg->blocksize, arg->block);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.bmap) {
+        req->se->op.bmap(req, nodeid, arg->blocksize, arg->block);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_ioctl(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_ioctl_in *arg = (struct fuse_ioctl_in *) inarg;
-	unsigned int flags = arg->flags;
-	void *in_buf = arg->in_size ? PARAM(arg) : NULL;
-	struct fuse_file_info fi;
+    struct fuse_ioctl_in *arg = (struct fuse_ioctl_in *)inarg;
+    unsigned int flags = arg->flags;
+    void *in_buf = arg->in_size ? PARAM(arg) : NULL;
+    struct fuse_file_info fi;
 
-	if (flags & FUSE_IOCTL_DIR &&
-	    !(req->se->conn.want & FUSE_CAP_IOCTL_DIR)) {
-		fuse_reply_err(req, ENOTTY);
-		return;
-	}
+    if (flags & FUSE_IOCTL_DIR && !(req->se->conn.want & FUSE_CAP_IOCTL_DIR)) {
+        fuse_reply_err(req, ENOTTY);
+        return;
+    }
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
 
-	if (sizeof(void *) == 4 && req->se->conn.proto_minor >= 16 &&
-	    !(flags & FUSE_IOCTL_32BIT)) {
-		req->ioctl_64bit = 1;
-	}
+    if (sizeof(void *) == 4 && req->se->conn.proto_minor >= 16 &&
+        !(flags & FUSE_IOCTL_32BIT)) {
+        req->ioctl_64bit = 1;
+    }
 
-	if (req->se->op.ioctl)
-		req->se->op.ioctl(req, nodeid, arg->cmd,
-				 (void *)(uintptr_t)arg->arg, &fi, flags,
-				 in_buf, arg->in_size, arg->out_size);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.ioctl) {
+        req->se->op.ioctl(req, nodeid, arg->cmd, (void *)(uintptr_t)arg->arg,
+                          &fi, flags, in_buf, arg->in_size, arg->out_size);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 void fuse_pollhandle_destroy(struct fuse_pollhandle *ph)
 {
-	free(ph);
+    free(ph);
 }
 
 static void do_poll(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_poll_in *arg = (struct fuse_poll_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_poll_in *arg = (struct fuse_poll_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
-	fi.poll_events = arg->events;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
+    fi.poll_events = arg->events;
 
-	if (req->se->op.poll) {
-		struct fuse_pollhandle *ph = NULL;
+    if (req->se->op.poll) {
+        struct fuse_pollhandle *ph = NULL;
 
-		if (arg->flags & FUSE_POLL_SCHEDULE_NOTIFY) {
-			ph = malloc(sizeof(struct fuse_pollhandle));
-			if (ph == NULL) {
-				fuse_reply_err(req, ENOMEM);
-				return;
-			}
-			ph->kh = arg->kh;
-			ph->se = req->se;
-		}
+        if (arg->flags & FUSE_POLL_SCHEDULE_NOTIFY) {
+            ph = malloc(sizeof(struct fuse_pollhandle));
+            if (ph == NULL) {
+                fuse_reply_err(req, ENOMEM);
+                return;
+            }
+            ph->kh = arg->kh;
+            ph->se = req->se;
+        }
 
-		req->se->op.poll(req, nodeid, &fi, ph);
-	} else {
-		fuse_reply_err(req, ENOSYS);
-	}
+        req->se->op.poll(req, nodeid, &fi, ph);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_fallocate(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_fallocate_in *arg = (struct fuse_fallocate_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_fallocate_in *arg = (struct fuse_fallocate_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
 
-	if (req->se->op.fallocate)
-		req->se->op.fallocate(req, nodeid, arg->mode, arg->offset, arg->length, &fi);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.fallocate) {
+        req->se->op.fallocate(req, nodeid, arg->mode, arg->offset, arg->length,
+                              &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
-static void do_copy_file_range(fuse_req_t req, fuse_ino_t nodeid_in, const void *inarg)
+static void do_copy_file_range(fuse_req_t req, fuse_ino_t nodeid_in,
+                               const void *inarg)
 {
-	struct fuse_copy_file_range_in *arg = (struct fuse_copy_file_range_in *) inarg;
-	struct fuse_file_info fi_in, fi_out;
+    struct fuse_copy_file_range_in *arg =
+        (struct fuse_copy_file_range_in *)inarg;
+    struct fuse_file_info fi_in, fi_out;
 
-	memset(&fi_in, 0, sizeof(fi_in));
-	fi_in.fh = arg->fh_in;
+    memset(&fi_in, 0, sizeof(fi_in));
+    fi_in.fh = arg->fh_in;
 
-	memset(&fi_out, 0, sizeof(fi_out));
-	fi_out.fh = arg->fh_out;
+    memset(&fi_out, 0, sizeof(fi_out));
+    fi_out.fh = arg->fh_out;
 
 
-	if (req->se->op.copy_file_range)
-		req->se->op.copy_file_range(req, nodeid_in, arg->off_in,
-					    &fi_in, arg->nodeid_out,
-					    arg->off_out, &fi_out, arg->len,
-					    arg->flags);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.copy_file_range) {
+        req->se->op.copy_file_range(req, nodeid_in, arg->off_in, &fi_in,
+                                    arg->nodeid_out, arg->off_out, &fi_out,
+                                    arg->len, arg->flags);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_lseek(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_lseek_in *arg = (struct fuse_lseek_in *) inarg;
-	struct fuse_file_info fi;
+    struct fuse_lseek_in *arg = (struct fuse_lseek_in *)inarg;
+    struct fuse_file_info fi;
 
-	memset(&fi, 0, sizeof(fi));
-	fi.fh = arg->fh;
+    memset(&fi, 0, sizeof(fi));
+    fi.fh = arg->fh;
 
-	if (req->se->op.lseek)
-		req->se->op.lseek(req, nodeid, arg->offset, arg->whence, &fi);
-	else
-		fuse_reply_err(req, ENOSYS);
+    if (req->se->op.lseek) {
+        req->se->op.lseek(req, nodeid, arg->offset, arg->whence, &fi);
+    } else {
+        fuse_reply_err(req, ENOSYS);
+    }
 }
 
 static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_init_in *arg = (struct fuse_init_in *) inarg;
-	struct fuse_init_out outarg;
-	struct fuse_session *se = req->se;
-	size_t bufsize = se->bufsize;
-	size_t outargsize = sizeof(outarg);
-
-	(void) nodeid;
-	if (se->debug) {
-		fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
-		if (arg->major == 7 && arg->minor >= 6) {
-			fuse_log(FUSE_LOG_DEBUG, "flags=0x%08x\n", arg->flags);
-			fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n",
-				arg->max_readahead);
-		}
-	}
-	se->conn.proto_major = arg->major;
-	se->conn.proto_minor = arg->minor;
-	se->conn.capable = 0;
-	se->conn.want = 0;
-
-	memset(&outarg, 0, sizeof(outarg));
-	outarg.major = FUSE_KERNEL_VERSION;
-	outarg.minor = FUSE_KERNEL_MINOR_VERSION;
-
-	if (arg->major < 7) {
-		fuse_log(FUSE_LOG_ERR, "fuse: unsupported protocol version: %u.%u\n",
-			arg->major, arg->minor);
-		fuse_reply_err(req, EPROTO);
-		return;
-	}
-
-	if (arg->major > 7) {
-		/* Wait for a second INIT request with a 7.X version */
-		send_reply_ok(req, &outarg, sizeof(outarg));
-		return;
-	}
-
-	if (arg->minor >= 6) {
-		if (arg->max_readahead < se->conn.max_readahead)
-			se->conn.max_readahead = arg->max_readahead;
-		if (arg->flags & FUSE_ASYNC_READ)
-			se->conn.capable |= FUSE_CAP_ASYNC_READ;
-		if (arg->flags & FUSE_POSIX_LOCKS)
-			se->conn.capable |= FUSE_CAP_POSIX_LOCKS;
-		if (arg->flags & FUSE_ATOMIC_O_TRUNC)
-			se->conn.capable |= FUSE_CAP_ATOMIC_O_TRUNC;
-		if (arg->flags & FUSE_EXPORT_SUPPORT)
-			se->conn.capable |= FUSE_CAP_EXPORT_SUPPORT;
-		if (arg->flags & FUSE_DONT_MASK)
-			se->conn.capable |= FUSE_CAP_DONT_MASK;
-		if (arg->flags & FUSE_FLOCK_LOCKS)
-			se->conn.capable |= FUSE_CAP_FLOCK_LOCKS;
-		if (arg->flags & FUSE_AUTO_INVAL_DATA)
-			se->conn.capable |= FUSE_CAP_AUTO_INVAL_DATA;
-		if (arg->flags & FUSE_DO_READDIRPLUS)
-			se->conn.capable |= FUSE_CAP_READDIRPLUS;
-		if (arg->flags & FUSE_READDIRPLUS_AUTO)
-			se->conn.capable |= FUSE_CAP_READDIRPLUS_AUTO;
-		if (arg->flags & FUSE_ASYNC_DIO)
-			se->conn.capable |= FUSE_CAP_ASYNC_DIO;
-		if (arg->flags & FUSE_WRITEBACK_CACHE)
-			se->conn.capable |= FUSE_CAP_WRITEBACK_CACHE;
-		if (arg->flags & FUSE_NO_OPEN_SUPPORT)
-			se->conn.capable |= FUSE_CAP_NO_OPEN_SUPPORT;
-		if (arg->flags & FUSE_PARALLEL_DIROPS)
-			se->conn.capable |= FUSE_CAP_PARALLEL_DIROPS;
-		if (arg->flags & FUSE_POSIX_ACL)
-			se->conn.capable |= FUSE_CAP_POSIX_ACL;
-		if (arg->flags & FUSE_HANDLE_KILLPRIV)
-			se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV;
-		if (arg->flags & FUSE_NO_OPENDIR_SUPPORT)
-			se->conn.capable |= FUSE_CAP_NO_OPENDIR_SUPPORT;
-		if (!(arg->flags & FUSE_MAX_PAGES)) {
-			size_t max_bufsize =
-				FUSE_DEFAULT_MAX_PAGES_PER_REQ * getpagesize()
-				+ FUSE_BUFFER_HEADER_SIZE;
-			if (bufsize > max_bufsize) {
-				bufsize = max_bufsize;
-			}
-		}
-	} else {
-		se->conn.max_readahead = 0;
-	}
-
-	if (se->conn.proto_minor >= 14) {
+    struct fuse_init_in *arg = (struct fuse_init_in *)inarg;
+    struct fuse_init_out outarg;
+    struct fuse_session *se = req->se;
+    size_t bufsize = se->bufsize;
+    size_t outargsize = sizeof(outarg);
+
+    (void)nodeid;
+    if (se->debug) {
+        fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
+        if (arg->major == 7 && arg->minor >= 6) {
+            fuse_log(FUSE_LOG_DEBUG, "flags=0x%08x\n", arg->flags);
+            fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n",
+                     arg->max_readahead);
+        }
+    }
+    se->conn.proto_major = arg->major;
+    se->conn.proto_minor = arg->minor;
+    se->conn.capable = 0;
+    se->conn.want = 0;
+
+    memset(&outarg, 0, sizeof(outarg));
+    outarg.major = FUSE_KERNEL_VERSION;
+    outarg.minor = FUSE_KERNEL_MINOR_VERSION;
+
+    if (arg->major < 7) {
+        fuse_log(FUSE_LOG_ERR, "fuse: unsupported protocol version: %u.%u\n",
+                 arg->major, arg->minor);
+        fuse_reply_err(req, EPROTO);
+        return;
+    }
+
+    if (arg->major > 7) {
+        /* Wait for a second INIT request with a 7.X version */
+        send_reply_ok(req, &outarg, sizeof(outarg));
+        return;
+    }
+
+    if (arg->minor >= 6) {
+        if (arg->max_readahead < se->conn.max_readahead) {
+            se->conn.max_readahead = arg->max_readahead;
+        }
+        if (arg->flags & FUSE_ASYNC_READ) {
+            se->conn.capable |= FUSE_CAP_ASYNC_READ;
+        }
+        if (arg->flags & FUSE_POSIX_LOCKS) {
+            se->conn.capable |= FUSE_CAP_POSIX_LOCKS;
+        }
+        if (arg->flags & FUSE_ATOMIC_O_TRUNC) {
+            se->conn.capable |= FUSE_CAP_ATOMIC_O_TRUNC;
+        }
+        if (arg->flags & FUSE_EXPORT_SUPPORT) {
+            se->conn.capable |= FUSE_CAP_EXPORT_SUPPORT;
+        }
+        if (arg->flags & FUSE_DONT_MASK) {
+            se->conn.capable |= FUSE_CAP_DONT_MASK;
+        }
+        if (arg->flags & FUSE_FLOCK_LOCKS) {
+            se->conn.capable |= FUSE_CAP_FLOCK_LOCKS;
+        }
+        if (arg->flags & FUSE_AUTO_INVAL_DATA) {
+            se->conn.capable |= FUSE_CAP_AUTO_INVAL_DATA;
+        }
+        if (arg->flags & FUSE_DO_READDIRPLUS) {
+            se->conn.capable |= FUSE_CAP_READDIRPLUS;
+        }
+        if (arg->flags & FUSE_READDIRPLUS_AUTO) {
+            se->conn.capable |= FUSE_CAP_READDIRPLUS_AUTO;
+        }
+        if (arg->flags & FUSE_ASYNC_DIO) {
+            se->conn.capable |= FUSE_CAP_ASYNC_DIO;
+        }
+        if (arg->flags & FUSE_WRITEBACK_CACHE) {
+            se->conn.capable |= FUSE_CAP_WRITEBACK_CACHE;
+        }
+        if (arg->flags & FUSE_NO_OPEN_SUPPORT) {
+            se->conn.capable |= FUSE_CAP_NO_OPEN_SUPPORT;
+        }
+        if (arg->flags & FUSE_PARALLEL_DIROPS) {
+            se->conn.capable |= FUSE_CAP_PARALLEL_DIROPS;
+        }
+        if (arg->flags & FUSE_POSIX_ACL) {
+            se->conn.capable |= FUSE_CAP_POSIX_ACL;
+        }
+        if (arg->flags & FUSE_HANDLE_KILLPRIV) {
+            se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV;
+        }
+        if (arg->flags & FUSE_NO_OPENDIR_SUPPORT) {
+            se->conn.capable |= FUSE_CAP_NO_OPENDIR_SUPPORT;
+        }
+        if (!(arg->flags & FUSE_MAX_PAGES)) {
+            size_t max_bufsize =
+                FUSE_DEFAULT_MAX_PAGES_PER_REQ * getpagesize() +
+                FUSE_BUFFER_HEADER_SIZE;
+            if (bufsize > max_bufsize) {
+                bufsize = max_bufsize;
+            }
+        }
+    } else {
+        se->conn.max_readahead = 0;
+    }
+
+    if (se->conn.proto_minor >= 14) {
 #ifdef HAVE_SPLICE
 #ifdef HAVE_VMSPLICE
-		se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
+        se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
 #endif
-		se->conn.capable |= FUSE_CAP_SPLICE_READ;
+        se->conn.capable |= FUSE_CAP_SPLICE_READ;
 #endif
-	}
-	if (se->conn.proto_minor >= 18)
-		se->conn.capable |= FUSE_CAP_IOCTL_DIR;
-
-	/* Default settings for modern filesystems.
-	 *
-	 * Most of these capabilities were disabled by default in
-	 * libfuse2 for backwards compatibility reasons. In libfuse3,
-	 * we can finally enable them by default (as long as they're
-	 * supported by the kernel).
-	 */
-#define LL_SET_DEFAULT(cond, cap) \
-	if ((cond) && (se->conn.capable & (cap))) \
-		se->conn.want |= (cap)
-	LL_SET_DEFAULT(1, FUSE_CAP_ASYNC_READ);
-	LL_SET_DEFAULT(1, FUSE_CAP_PARALLEL_DIROPS);
-	LL_SET_DEFAULT(1, FUSE_CAP_AUTO_INVAL_DATA);
-	LL_SET_DEFAULT(1, FUSE_CAP_HANDLE_KILLPRIV);
-	LL_SET_DEFAULT(1, FUSE_CAP_ASYNC_DIO);
-	LL_SET_DEFAULT(1, FUSE_CAP_IOCTL_DIR);
-	LL_SET_DEFAULT(1, FUSE_CAP_ATOMIC_O_TRUNC);
-	LL_SET_DEFAULT(se->op.write_buf, FUSE_CAP_SPLICE_READ);
-	LL_SET_DEFAULT(se->op.getlk && se->op.setlk,
-		       FUSE_CAP_POSIX_LOCKS);
-	LL_SET_DEFAULT(se->op.flock, FUSE_CAP_FLOCK_LOCKS);
-	LL_SET_DEFAULT(se->op.readdirplus, FUSE_CAP_READDIRPLUS);
-	LL_SET_DEFAULT(se->op.readdirplus && se->op.readdir,
-		       FUSE_CAP_READDIRPLUS_AUTO);
-	se->conn.time_gran = 1;
-	
-	if (bufsize < FUSE_MIN_READ_BUFFER) {
-		fuse_log(FUSE_LOG_ERR, "fuse: warning: buffer size too small: %zu\n",
-			bufsize);
-		bufsize = FUSE_MIN_READ_BUFFER;
-	}
-	se->bufsize = bufsize;
-
-	if (se->conn.max_write > bufsize - FUSE_BUFFER_HEADER_SIZE)
-		se->conn.max_write = bufsize - FUSE_BUFFER_HEADER_SIZE;
-
-	se->got_init = 1;
-	if (se->op.init)
-		se->op.init(se->userdata, &se->conn);
-
-	if (se->conn.want & (~se->conn.capable)) {
-		fuse_log(FUSE_LOG_ERR, "fuse: error: filesystem requested capabilities "
-			"0x%x that are not supported by kernel, aborting.\n",
-			se->conn.want & (~se->conn.capable));
-		fuse_reply_err(req, EPROTO);
-		se->error = -EPROTO;
-		fuse_session_exit(se);
-		return;
-	}
-
-	if (se->conn.max_write < bufsize - FUSE_BUFFER_HEADER_SIZE) {
-		se->bufsize = se->conn.max_write + FUSE_BUFFER_HEADER_SIZE;
-	}
-	if (arg->flags & FUSE_MAX_PAGES) {
-		outarg.flags |= FUSE_MAX_PAGES;
-		outarg.max_pages = (se->conn.max_write - 1) / getpagesize() + 1;
-	}
-
-	/* Always enable big writes, this is superseded
-	   by the max_write option */
-	outarg.flags |= FUSE_BIG_WRITES;
-
-	if (se->conn.want & FUSE_CAP_ASYNC_READ)
-		outarg.flags |= FUSE_ASYNC_READ;
-	if (se->conn.want & FUSE_CAP_POSIX_LOCKS)
-		outarg.flags |= FUSE_POSIX_LOCKS;
-	if (se->conn.want & FUSE_CAP_ATOMIC_O_TRUNC)
-		outarg.flags |= FUSE_ATOMIC_O_TRUNC;
-	if (se->conn.want & FUSE_CAP_EXPORT_SUPPORT)
-		outarg.flags |= FUSE_EXPORT_SUPPORT;
-	if (se->conn.want & FUSE_CAP_DONT_MASK)
-		outarg.flags |= FUSE_DONT_MASK;
-	if (se->conn.want & FUSE_CAP_FLOCK_LOCKS)
-		outarg.flags |= FUSE_FLOCK_LOCKS;
-	if (se->conn.want & FUSE_CAP_AUTO_INVAL_DATA)
-		outarg.flags |= FUSE_AUTO_INVAL_DATA;
-	if (se->conn.want & FUSE_CAP_READDIRPLUS)
-		outarg.flags |= FUSE_DO_READDIRPLUS;
-	if (se->conn.want & FUSE_CAP_READDIRPLUS_AUTO)
-		outarg.flags |= FUSE_READDIRPLUS_AUTO;
-	if (se->conn.want & FUSE_CAP_ASYNC_DIO)
-		outarg.flags |= FUSE_ASYNC_DIO;
-	if (se->conn.want & FUSE_CAP_WRITEBACK_CACHE)
-		outarg.flags |= FUSE_WRITEBACK_CACHE;
-	if (se->conn.want & FUSE_CAP_POSIX_ACL)
-		outarg.flags |= FUSE_POSIX_ACL;
-	outarg.max_readahead = se->conn.max_readahead;
-	outarg.max_write = se->conn.max_write;
-	if (se->conn.proto_minor >= 13) {
-		if (se->conn.max_background >= (1 << 16))
-			se->conn.max_background = (1 << 16) - 1;
-		if (se->conn.congestion_threshold > se->conn.max_background)
-			se->conn.congestion_threshold = se->conn.max_background;
-		if (!se->conn.congestion_threshold) {
-			se->conn.congestion_threshold =
-				se->conn.max_background * 3 / 4;
-		}
-
-		outarg.max_background = se->conn.max_background;
-		outarg.congestion_threshold = se->conn.congestion_threshold;
-	}
-	if (se->conn.proto_minor >= 23)
-		outarg.time_gran = se->conn.time_gran;
-
-	if (se->debug) {
-		fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major, outarg.minor);
-		fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
-		fuse_log(FUSE_LOG_DEBUG, "   max_readahead=0x%08x\n",
-			outarg.max_readahead);
-		fuse_log(FUSE_LOG_DEBUG, "   max_write=0x%08x\n", outarg.max_write);
-		fuse_log(FUSE_LOG_DEBUG, "   max_background=%i\n",
-			outarg.max_background);
-		fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
-			outarg.congestion_threshold);
-		fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n",
-			outarg.time_gran);
-	}
-	if (arg->minor < 5)
-		outargsize = FUSE_COMPAT_INIT_OUT_SIZE;
-	else if (arg->minor < 23)
-		outargsize = FUSE_COMPAT_22_INIT_OUT_SIZE;
-
-	send_reply_ok(req, &outarg, outargsize);
+    }
+    if (se->conn.proto_minor >= 18) {
+        se->conn.capable |= FUSE_CAP_IOCTL_DIR;
+    }
+
+    /*
+     * Default settings for modern filesystems.
+     *
+     * Most of these capabilities were disabled by default in
+     * libfuse2 for backwards compatibility reasons. In libfuse3,
+     * we can finally enable them by default (as long as they're
+     * supported by the kernel).
+     */
+#define LL_SET_DEFAULT(cond, cap)             \
+    if ((cond) && (se->conn.capable & (cap))) \
+        se->conn.want |= (cap)
+    LL_SET_DEFAULT(1, FUSE_CAP_ASYNC_READ);
+    LL_SET_DEFAULT(1, FUSE_CAP_PARALLEL_DIROPS);
+    LL_SET_DEFAULT(1, FUSE_CAP_AUTO_INVAL_DATA);
+    LL_SET_DEFAULT(1, FUSE_CAP_HANDLE_KILLPRIV);
+    LL_SET_DEFAULT(1, FUSE_CAP_ASYNC_DIO);
+    LL_SET_DEFAULT(1, FUSE_CAP_IOCTL_DIR);
+    LL_SET_DEFAULT(1, FUSE_CAP_ATOMIC_O_TRUNC);
+    LL_SET_DEFAULT(se->op.write_buf, FUSE_CAP_SPLICE_READ);
+    LL_SET_DEFAULT(se->op.getlk && se->op.setlk, FUSE_CAP_POSIX_LOCKS);
+    LL_SET_DEFAULT(se->op.flock, FUSE_CAP_FLOCK_LOCKS);
+    LL_SET_DEFAULT(se->op.readdirplus, FUSE_CAP_READDIRPLUS);
+    LL_SET_DEFAULT(se->op.readdirplus && se->op.readdir,
+                   FUSE_CAP_READDIRPLUS_AUTO);
+    se->conn.time_gran = 1;
+
+    if (bufsize < FUSE_MIN_READ_BUFFER) {
+        fuse_log(FUSE_LOG_ERR, "fuse: warning: buffer size too small: %zu\n",
+                 bufsize);
+        bufsize = FUSE_MIN_READ_BUFFER;
+    }
+    se->bufsize = bufsize;
+
+    if (se->conn.max_write > bufsize - FUSE_BUFFER_HEADER_SIZE) {
+        se->conn.max_write = bufsize - FUSE_BUFFER_HEADER_SIZE;
+    }
+
+    se->got_init = 1;
+    if (se->op.init) {
+        se->op.init(se->userdata, &se->conn);
+    }
+
+    if (se->conn.want & (~se->conn.capable)) {
+        fuse_log(FUSE_LOG_ERR,
+                 "fuse: error: filesystem requested capabilities "
+                 "0x%x that are not supported by kernel, aborting.\n",
+                 se->conn.want & (~se->conn.capable));
+        fuse_reply_err(req, EPROTO);
+        se->error = -EPROTO;
+        fuse_session_exit(se);
+        return;
+    }
+
+    if (se->conn.max_write < bufsize - FUSE_BUFFER_HEADER_SIZE) {
+        se->bufsize = se->conn.max_write + FUSE_BUFFER_HEADER_SIZE;
+    }
+    if (arg->flags & FUSE_MAX_PAGES) {
+        outarg.flags |= FUSE_MAX_PAGES;
+        outarg.max_pages = (se->conn.max_write - 1) / getpagesize() + 1;
+    }
+
+    /*
+     * Always enable big writes, this is superseded
+     * by the max_write option
+     */
+    outarg.flags |= FUSE_BIG_WRITES;
+
+    if (se->conn.want & FUSE_CAP_ASYNC_READ) {
+        outarg.flags |= FUSE_ASYNC_READ;
+    }
+    if (se->conn.want & FUSE_CAP_POSIX_LOCKS) {
+        outarg.flags |= FUSE_POSIX_LOCKS;
+    }
+    if (se->conn.want & FUSE_CAP_ATOMIC_O_TRUNC) {
+        outarg.flags |= FUSE_ATOMIC_O_TRUNC;
+    }
+    if (se->conn.want & FUSE_CAP_EXPORT_SUPPORT) {
+        outarg.flags |= FUSE_EXPORT_SUPPORT;
+    }
+    if (se->conn.want & FUSE_CAP_DONT_MASK) {
+        outarg.flags |= FUSE_DONT_MASK;
+    }
+    if (se->conn.want & FUSE_CAP_FLOCK_LOCKS) {
+        outarg.flags |= FUSE_FLOCK_LOCKS;
+    }
+    if (se->conn.want & FUSE_CAP_AUTO_INVAL_DATA) {
+        outarg.flags |= FUSE_AUTO_INVAL_DATA;
+    }
+    if (se->conn.want & FUSE_CAP_READDIRPLUS) {
+        outarg.flags |= FUSE_DO_READDIRPLUS;
+    }
+    if (se->conn.want & FUSE_CAP_READDIRPLUS_AUTO) {
+        outarg.flags |= FUSE_READDIRPLUS_AUTO;
+    }
+    if (se->conn.want & FUSE_CAP_ASYNC_DIO) {
+        outarg.flags |= FUSE_ASYNC_DIO;
+    }
+    if (se->conn.want & FUSE_CAP_WRITEBACK_CACHE) {
+        outarg.flags |= FUSE_WRITEBACK_CACHE;
+    }
+    if (se->conn.want & FUSE_CAP_POSIX_ACL) {
+        outarg.flags |= FUSE_POSIX_ACL;
+    }
+    outarg.max_readahead = se->conn.max_readahead;
+    outarg.max_write = se->conn.max_write;
+    if (se->conn.proto_minor >= 13) {
+        if (se->conn.max_background >= (1 << 16)) {
+            se->conn.max_background = (1 << 16) - 1;
+        }
+        if (se->conn.congestion_threshold > se->conn.max_background) {
+            se->conn.congestion_threshold = se->conn.max_background;
+        }
+        if (!se->conn.congestion_threshold) {
+            se->conn.congestion_threshold = se->conn.max_background * 3 / 4;
+        }
+
+        outarg.max_background = se->conn.max_background;
+        outarg.congestion_threshold = se->conn.congestion_threshold;
+    }
+    if (se->conn.proto_minor >= 23) {
+        outarg.time_gran = se->conn.time_gran;
+    }
+
+    if (se->debug) {
+        fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major,
+                 outarg.minor);
+        fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
+        fuse_log(FUSE_LOG_DEBUG, "   max_readahead=0x%08x\n",
+                 outarg.max_readahead);
+        fuse_log(FUSE_LOG_DEBUG, "   max_write=0x%08x\n", outarg.max_write);
+        fuse_log(FUSE_LOG_DEBUG, "   max_background=%i\n",
+                 outarg.max_background);
+        fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
+                 outarg.congestion_threshold);
+        fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n", outarg.time_gran);
+    }
+    if (arg->minor < 5) {
+        outargsize = FUSE_COMPAT_INIT_OUT_SIZE;
+    } else if (arg->minor < 23) {
+        outargsize = FUSE_COMPAT_22_INIT_OUT_SIZE;
+    }
+
+    send_reply_ok(req, &outarg, outargsize);
 }
 
 static void do_destroy(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
-	struct fuse_session *se = req->se;
+    struct fuse_session *se = req->se;
 
-	(void) nodeid;
-	(void) inarg;
+    (void)nodeid;
+    (void)inarg;
 
-	se->got_destroy = 1;
-	if (se->op.destroy)
-		se->op.destroy(se->userdata);
+    se->got_destroy = 1;
+    if (se->op.destroy) {
+        se->op.destroy(se->userdata);
+    }
 
-	send_reply_ok(req, NULL, 0);
+    send_reply_ok(req, NULL, 0);
 }
 
 static void list_del_nreq(struct fuse_notify_req *nreq)
 {
-	struct fuse_notify_req *prev = nreq->prev;
-	struct fuse_notify_req *next = nreq->next;
-	prev->next = next;
-	next->prev = prev;
+    struct fuse_notify_req *prev = nreq->prev;
+    struct fuse_notify_req *next = nreq->next;
+    prev->next = next;
+    next->prev = prev;
 }
 
 static void list_add_nreq(struct fuse_notify_req *nreq,
-			  struct fuse_notify_req *next)
+                          struct fuse_notify_req *next)
 {
-	struct fuse_notify_req *prev = next->prev;
-	nreq->next = next;
-	nreq->prev = prev;
-	prev->next = nreq;
-	next->prev = nreq;
+    struct fuse_notify_req *prev = next->prev;
+    nreq->next = next;
+    nreq->prev = prev;
+    prev->next = nreq;
+    next->prev = nreq;
 }
 
 static void list_init_nreq(struct fuse_notify_req *nreq)
 {
-	nreq->next = nreq;
-	nreq->prev = nreq;
+    nreq->next = nreq;
+    nreq->prev = nreq;
 }
 
 static void do_notify_reply(fuse_req_t req, fuse_ino_t nodeid,
-			    const void *inarg, const struct fuse_buf *buf)
+                            const void *inarg, const struct fuse_buf *buf)
 {
-	struct fuse_session *se = req->se;
-	struct fuse_notify_req *nreq;
-	struct fuse_notify_req *head;
+    struct fuse_session *se = req->se;
+    struct fuse_notify_req *nreq;
+    struct fuse_notify_req *head;
 
-	pthread_mutex_lock(&se->lock);
-	head = &se->notify_list;
-	for (nreq = head->next; nreq != head; nreq = nreq->next) {
-		if (nreq->unique == req->unique) {
-			list_del_nreq(nreq);
-			break;
-		}
-	}
-	pthread_mutex_unlock(&se->lock);
+    pthread_mutex_lock(&se->lock);
+    head = &se->notify_list;
+    for (nreq = head->next; nreq != head; nreq = nreq->next) {
+        if (nreq->unique == req->unique) {
+            list_del_nreq(nreq);
+            break;
+        }
+    }
+    pthread_mutex_unlock(&se->lock);
 
-	if (nreq != head)
-		nreq->reply(nreq, req, nodeid, inarg, buf);
+    if (nreq != head) {
+        nreq->reply(nreq, req, nodeid, inarg, buf);
+    }
 }
 
 static int send_notify_iov(struct fuse_session *se, int notify_code,
-			   struct iovec *iov, int count)
+                           struct iovec *iov, int count)
 {
-	struct fuse_out_header out;
+    struct fuse_out_header out;
 
-	if (!se->got_init)
-		return -ENOTCONN;
+    if (!se->got_init) {
+        return -ENOTCONN;
+    }
 
-	out.unique = 0;
-	out.error = notify_code;
-	iov[0].iov_base = &out;
-	iov[0].iov_len = sizeof(struct fuse_out_header);
+    out.unique = 0;
+    out.error = notify_code;
+    iov[0].iov_base = &out;
+    iov[0].iov_len = sizeof(struct fuse_out_header);
 
-	return fuse_send_msg(se, NULL, iov, count);
+    return fuse_send_msg(se, NULL, iov, count);
 }
 
 int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph)
 {
-	if (ph != NULL) {
-		struct fuse_notify_poll_wakeup_out outarg;
-		struct iovec iov[2];
+    if (ph != NULL) {
+        struct fuse_notify_poll_wakeup_out outarg;
+        struct iovec iov[2];
 
-		outarg.kh = ph->kh;
+        outarg.kh = ph->kh;
 
-		iov[1].iov_base = &outarg;
-		iov[1].iov_len = sizeof(outarg);
+        iov[1].iov_base = &outarg;
+        iov[1].iov_len = sizeof(outarg);
 
-		return send_notify_iov(ph->se, FUSE_NOTIFY_POLL, iov, 2);
-	} else {
-		return 0;
-	}
+        return send_notify_iov(ph->se, FUSE_NOTIFY_POLL, iov, 2);
+    } else {
+        return 0;
+    }
 }
 
 int fuse_lowlevel_notify_inval_inode(struct fuse_session *se, fuse_ino_t ino,
-				     off_t off, off_t len)
+                                     off_t off, off_t len)
 {
-	struct fuse_notify_inval_inode_out outarg;
-	struct iovec iov[2];
+    struct fuse_notify_inval_inode_out outarg;
+    struct iovec iov[2];
+
+    if (!se) {
+        return -EINVAL;
+    }
 
-	if (!se)
-		return -EINVAL;
+    if (se->conn.proto_major < 6 || se->conn.proto_minor < 12) {
+        return -ENOSYS;
+    }
 
-	if (se->conn.proto_major < 6 || se->conn.proto_minor < 12)
-		return -ENOSYS;
-	
-	outarg.ino = ino;
-	outarg.off = off;
-	outarg.len = len;
+    outarg.ino = ino;
+    outarg.off = off;
+    outarg.len = len;
 
-	iov[1].iov_base = &outarg;
-	iov[1].iov_len = sizeof(outarg);
+    iov[1].iov_base = &outarg;
+    iov[1].iov_len = sizeof(outarg);
 
-	return send_notify_iov(se, FUSE_NOTIFY_INVAL_INODE, iov, 2);
+    return send_notify_iov(se, FUSE_NOTIFY_INVAL_INODE, iov, 2);
 }
 
 int fuse_lowlevel_notify_inval_entry(struct fuse_session *se, fuse_ino_t parent,
-				     const char *name, size_t namelen)
+                                     const char *name, size_t namelen)
 {
-	struct fuse_notify_inval_entry_out outarg;
-	struct iovec iov[3];
+    struct fuse_notify_inval_entry_out outarg;
+    struct iovec iov[3];
+
+    if (!se) {
+        return -EINVAL;
+    }
 
-	if (!se)
-		return -EINVAL;
-	
-	if (se->conn.proto_major < 6 || se->conn.proto_minor < 12)
-		return -ENOSYS;
+    if (se->conn.proto_major < 6 || se->conn.proto_minor < 12) {
+        return -ENOSYS;
+    }
 
-	outarg.parent = parent;
-	outarg.namelen = namelen;
-	outarg.padding = 0;
+    outarg.parent = parent;
+    outarg.namelen = namelen;
+    outarg.padding = 0;
 
-	iov[1].iov_base = &outarg;
-	iov[1].iov_len = sizeof(outarg);
-	iov[2].iov_base = (void *)name;
-	iov[2].iov_len = namelen + 1;
+    iov[1].iov_base = &outarg;
+    iov[1].iov_len = sizeof(outarg);
+    iov[2].iov_base = (void *)name;
+    iov[2].iov_len = namelen + 1;
 
-	return send_notify_iov(se, FUSE_NOTIFY_INVAL_ENTRY, iov, 3);
+    return send_notify_iov(se, FUSE_NOTIFY_INVAL_ENTRY, iov, 3);
 }
 
-int fuse_lowlevel_notify_delete(struct fuse_session *se,
-				fuse_ino_t parent, fuse_ino_t child,
-				const char *name, size_t namelen)
+int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
+                                fuse_ino_t child, const char *name,
+                                size_t namelen)
 {
-	struct fuse_notify_delete_out outarg;
-	struct iovec iov[3];
+    struct fuse_notify_delete_out outarg;
+    struct iovec iov[3];
 
-	if (!se)
-		return -EINVAL;
+    if (!se) {
+        return -EINVAL;
+    }
 
-	if (se->conn.proto_major < 6 || se->conn.proto_minor < 18)
-		return -ENOSYS;
+    if (se->conn.proto_major < 6 || se->conn.proto_minor < 18) {
+        return -ENOSYS;
+    }
 
-	outarg.parent = parent;
-	outarg.child = child;
-	outarg.namelen = namelen;
-	outarg.padding = 0;
+    outarg.parent = parent;
+    outarg.child = child;
+    outarg.namelen = namelen;
+    outarg.padding = 0;
 
-	iov[1].iov_base = &outarg;
-	iov[1].iov_len = sizeof(outarg);
-	iov[2].iov_base = (void *)name;
-	iov[2].iov_len = namelen + 1;
+    iov[1].iov_base = &outarg;
+    iov[1].iov_len = sizeof(outarg);
+    iov[2].iov_base = (void *)name;
+    iov[2].iov_len = namelen + 1;
 
-	return send_notify_iov(se, FUSE_NOTIFY_DELETE, iov, 3);
+    return send_notify_iov(se, FUSE_NOTIFY_DELETE, iov, 3);
 }
 
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
-			       off_t offset, struct fuse_bufvec *bufv,
-			       enum fuse_buf_copy_flags flags)
+                               off_t offset, struct fuse_bufvec *bufv,
+                               enum fuse_buf_copy_flags flags)
 {
-	struct fuse_out_header out;
-	struct fuse_notify_store_out outarg;
-	struct iovec iov[3];
-	size_t size = fuse_buf_size(bufv);
-	int res;
+    struct fuse_out_header out;
+    struct fuse_notify_store_out outarg;
+    struct iovec iov[3];
+    size_t size = fuse_buf_size(bufv);
+    int res;
 
-	if (!se)
-		return -EINVAL;
+    if (!se) {
+        return -EINVAL;
+    }
 
-	if (se->conn.proto_major < 6 || se->conn.proto_minor < 15)
-		return -ENOSYS;
+    if (se->conn.proto_major < 6 || se->conn.proto_minor < 15) {
+        return -ENOSYS;
+    }
 
-	out.unique = 0;
-	out.error = FUSE_NOTIFY_STORE;
+    out.unique = 0;
+    out.error = FUSE_NOTIFY_STORE;
 
-	outarg.nodeid = ino;
-	outarg.offset = offset;
-	outarg.size = size;
-	outarg.padding = 0;
+    outarg.nodeid = ino;
+    outarg.offset = offset;
+    outarg.size = size;
+    outarg.padding = 0;
 
-	iov[0].iov_base = &out;
-	iov[0].iov_len = sizeof(out);
-	iov[1].iov_base = &outarg;
-	iov[1].iov_len = sizeof(outarg);
+    iov[0].iov_base = &out;
+    iov[0].iov_len = sizeof(out);
+    iov[1].iov_base = &outarg;
+    iov[1].iov_len = sizeof(outarg);
 
-	res = fuse_send_data_iov(se, NULL, iov, 2, bufv, flags);
-	if (res > 0)
-		res = -res;
+    res = fuse_send_data_iov(se, NULL, iov, 2, bufv, flags);
+    if (res > 0) {
+        res = -res;
+    }
 
-	return res;
+    return res;
 }
 
 struct fuse_retrieve_req {
-	struct fuse_notify_req nreq;
-	void *cookie;
+    struct fuse_notify_req nreq;
+    void *cookie;
 };
 
-static void fuse_ll_retrieve_reply(struct fuse_notify_req *nreq,
-				   fuse_req_t req, fuse_ino_t ino,
-				   const void *inarg,
-				   const struct fuse_buf *ibuf)
-{
-	struct fuse_session *se = req->se;
-	struct fuse_retrieve_req *rreq =
-		container_of(nreq, struct fuse_retrieve_req, nreq);
-	const struct fuse_notify_retrieve_in *arg = inarg;
-	struct fuse_bufvec bufv = {
-		.buf[0] = *ibuf,
-		.count = 1,
-	};
-
-	if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD))
-		bufv.buf[0].mem = PARAM(arg);
-
-	bufv.buf[0].size -= sizeof(struct fuse_in_header) +
-		sizeof(struct fuse_notify_retrieve_in);
-
-	if (bufv.buf[0].size < arg->size) {
-		fuse_log(FUSE_LOG_ERR, "fuse: retrieve reply: buffer size too small\n");
-		fuse_reply_none(req);
-		goto out;
-	}
-	bufv.buf[0].size = arg->size;
-
-	if (se->op.retrieve_reply) {
-		se->op.retrieve_reply(req, rreq->cookie, ino,
-					  arg->offset, &bufv);
-	} else {
-		fuse_reply_none(req);
-	}
+static void fuse_ll_retrieve_reply(struct fuse_notify_req *nreq, fuse_req_t req,
+                                   fuse_ino_t ino, const void *inarg,
+                                   const struct fuse_buf *ibuf)
+{
+    struct fuse_session *se = req->se;
+    struct fuse_retrieve_req *rreq =
+        container_of(nreq, struct fuse_retrieve_req, nreq);
+    const struct fuse_notify_retrieve_in *arg = inarg;
+    struct fuse_bufvec bufv = {
+        .buf[0] = *ibuf,
+        .count = 1,
+    };
+
+    if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD)) {
+        bufv.buf[0].mem = PARAM(arg);
+    }
+
+    bufv.buf[0].size -=
+        sizeof(struct fuse_in_header) + sizeof(struct fuse_notify_retrieve_in);
+
+    if (bufv.buf[0].size < arg->size) {
+        fuse_log(FUSE_LOG_ERR, "fuse: retrieve reply: buffer size too small\n");
+        fuse_reply_none(req);
+        goto out;
+    }
+    bufv.buf[0].size = arg->size;
+
+    if (se->op.retrieve_reply) {
+        se->op.retrieve_reply(req, rreq->cookie, ino, arg->offset, &bufv);
+    } else {
+        fuse_reply_none(req);
+    }
 out:
-	free(rreq);
+    free(rreq);
 }
 
 int fuse_lowlevel_notify_retrieve(struct fuse_session *se, fuse_ino_t ino,
-				  size_t size, off_t offset, void *cookie)
+                                  size_t size, off_t offset, void *cookie)
 {
-	struct fuse_notify_retrieve_out outarg;
-	struct iovec iov[2];
-	struct fuse_retrieve_req *rreq;
-	int err;
+    struct fuse_notify_retrieve_out outarg;
+    struct iovec iov[2];
+    struct fuse_retrieve_req *rreq;
+    int err;
 
-	if (!se)
-		return -EINVAL;
+    if (!se) {
+        return -EINVAL;
+    }
 
-	if (se->conn.proto_major < 6 || se->conn.proto_minor < 15)
-		return -ENOSYS;
+    if (se->conn.proto_major < 6 || se->conn.proto_minor < 15) {
+        return -ENOSYS;
+    }
 
-	rreq = malloc(sizeof(*rreq));
-	if (rreq == NULL)
-		return -ENOMEM;
+    rreq = malloc(sizeof(*rreq));
+    if (rreq == NULL) {
+        return -ENOMEM;
+    }
 
-	pthread_mutex_lock(&se->lock);
-	rreq->cookie = cookie;
-	rreq->nreq.unique = se->notify_ctr++;
-	rreq->nreq.reply = fuse_ll_retrieve_reply;
-	list_add_nreq(&rreq->nreq, &se->notify_list);
-	pthread_mutex_unlock(&se->lock);
+    pthread_mutex_lock(&se->lock);
+    rreq->cookie = cookie;
+    rreq->nreq.unique = se->notify_ctr++;
+    rreq->nreq.reply = fuse_ll_retrieve_reply;
+    list_add_nreq(&rreq->nreq, &se->notify_list);
+    pthread_mutex_unlock(&se->lock);
 
-	outarg.notify_unique = rreq->nreq.unique;
-	outarg.nodeid = ino;
-	outarg.offset = offset;
-	outarg.size = size;
-	outarg.padding = 0;
+    outarg.notify_unique = rreq->nreq.unique;
+    outarg.nodeid = ino;
+    outarg.offset = offset;
+    outarg.size = size;
+    outarg.padding = 0;
 
-	iov[1].iov_base = &outarg;
-	iov[1].iov_len = sizeof(outarg);
+    iov[1].iov_base = &outarg;
+    iov[1].iov_len = sizeof(outarg);
 
-	err = send_notify_iov(se, FUSE_NOTIFY_RETRIEVE, iov, 2);
-	if (err) {
-		pthread_mutex_lock(&se->lock);
-		list_del_nreq(&rreq->nreq);
-		pthread_mutex_unlock(&se->lock);
-		free(rreq);
-	}
+    err = send_notify_iov(se, FUSE_NOTIFY_RETRIEVE, iov, 2);
+    if (err) {
+        pthread_mutex_lock(&se->lock);
+        list_del_nreq(&rreq->nreq);
+        pthread_mutex_unlock(&se->lock);
+        free(rreq);
+    }
 
-	return err;
+    return err;
 }
 
 void *fuse_req_userdata(fuse_req_t req)
 {
-	return req->se->userdata;
+    return req->se->userdata;
 }
 
 const struct fuse_ctx *fuse_req_ctx(fuse_req_t req)
 {
-	return &req->ctx;
+    return &req->ctx;
 }
 
 void fuse_req_interrupt_func(fuse_req_t req, fuse_interrupt_func_t func,
-			     void *data)
+                             void *data)
 {
-	pthread_mutex_lock(&req->lock);
-	pthread_mutex_lock(&req->se->lock);
-	req->u.ni.func = func;
-	req->u.ni.data = data;
-	pthread_mutex_unlock(&req->se->lock);
-	if (req->interrupted && func)
-		func(req, data);
-	pthread_mutex_unlock(&req->lock);
+    pthread_mutex_lock(&req->lock);
+    pthread_mutex_lock(&req->se->lock);
+    req->u.ni.func = func;
+    req->u.ni.data = data;
+    pthread_mutex_unlock(&req->se->lock);
+    if (req->interrupted && func) {
+        func(req, data);
+    }
+    pthread_mutex_unlock(&req->lock);
 }
 
 int fuse_req_interrupted(fuse_req_t req)
 {
-	int interrupted;
+    int interrupted;
 
-	pthread_mutex_lock(&req->se->lock);
-	interrupted = req->interrupted;
-	pthread_mutex_unlock(&req->se->lock);
+    pthread_mutex_lock(&req->se->lock);
+    interrupted = req->interrupted;
+    pthread_mutex_unlock(&req->se->lock);
 
-	return interrupted;
+    return interrupted;
 }
 
 static struct {
-	void (*func)(fuse_req_t, fuse_ino_t, const void *);
-	const char *name;
+    void (*func)(fuse_req_t, fuse_ino_t, const void *);
+    const char *name;
 } fuse_ll_ops[] = {
-	[FUSE_LOOKUP]	   = { do_lookup,      "LOOKUP"	     },
-	[FUSE_FORGET]	   = { do_forget,      "FORGET"	     },
-	[FUSE_GETATTR]	   = { do_getattr,     "GETATTR"     },
-	[FUSE_SETATTR]	   = { do_setattr,     "SETATTR"     },
-	[FUSE_READLINK]	   = { do_readlink,    "READLINK"    },
-	[FUSE_SYMLINK]	   = { do_symlink,     "SYMLINK"     },
-	[FUSE_MKNOD]	   = { do_mknod,       "MKNOD"	     },
-	[FUSE_MKDIR]	   = { do_mkdir,       "MKDIR"	     },
-	[FUSE_UNLINK]	   = { do_unlink,      "UNLINK"	     },
-	[FUSE_RMDIR]	   = { do_rmdir,       "RMDIR"	     },
-	[FUSE_RENAME]	   = { do_rename,      "RENAME"	     },
-	[FUSE_LINK]	   = { do_link,	       "LINK"	     },
-	[FUSE_OPEN]	   = { do_open,	       "OPEN"	     },
-	[FUSE_READ]	   = { do_read,	       "READ"	     },
-	[FUSE_WRITE]	   = { do_write,       "WRITE"	     },
-	[FUSE_STATFS]	   = { do_statfs,      "STATFS"	     },
-	[FUSE_RELEASE]	   = { do_release,     "RELEASE"     },
-	[FUSE_FSYNC]	   = { do_fsync,       "FSYNC"	     },
-	[FUSE_SETXATTR]	   = { do_setxattr,    "SETXATTR"    },
-	[FUSE_GETXATTR]	   = { do_getxattr,    "GETXATTR"    },
-	[FUSE_LISTXATTR]   = { do_listxattr,   "LISTXATTR"   },
-	[FUSE_REMOVEXATTR] = { do_removexattr, "REMOVEXATTR" },
-	[FUSE_FLUSH]	   = { do_flush,       "FLUSH"	     },
-	[FUSE_INIT]	   = { do_init,	       "INIT"	     },
-	[FUSE_OPENDIR]	   = { do_opendir,     "OPENDIR"     },
-	[FUSE_READDIR]	   = { do_readdir,     "READDIR"     },
-	[FUSE_RELEASEDIR]  = { do_releasedir,  "RELEASEDIR"  },
-	[FUSE_FSYNCDIR]	   = { do_fsyncdir,    "FSYNCDIR"    },
-	[FUSE_GETLK]	   = { do_getlk,       "GETLK"	     },
-	[FUSE_SETLK]	   = { do_setlk,       "SETLK"	     },
-	[FUSE_SETLKW]	   = { do_setlkw,      "SETLKW"	     },
-	[FUSE_ACCESS]	   = { do_access,      "ACCESS"	     },
-	[FUSE_CREATE]	   = { do_create,      "CREATE"	     },
-	[FUSE_INTERRUPT]   = { do_interrupt,   "INTERRUPT"   },
-	[FUSE_BMAP]	   = { do_bmap,	       "BMAP"	     },
-	[FUSE_IOCTL]	   = { do_ioctl,       "IOCTL"	     },
-	[FUSE_POLL]	   = { do_poll,        "POLL"	     },
-	[FUSE_FALLOCATE]   = { do_fallocate,   "FALLOCATE"   },
-	[FUSE_DESTROY]	   = { do_destroy,     "DESTROY"     },
-	[FUSE_NOTIFY_REPLY] = { (void *) 1,    "NOTIFY_REPLY" },
-	[FUSE_BATCH_FORGET] = { do_batch_forget, "BATCH_FORGET" },
-	[FUSE_READDIRPLUS] = { do_readdirplus,	"READDIRPLUS"},
-	[FUSE_RENAME2]     = { do_rename2,      "RENAME2"    },
-	[FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
-	[FUSE_LSEEK]	   = { do_lseek,       "LSEEK"	     },
-	[CUSE_INIT]	   = { cuse_lowlevel_init, "CUSE_INIT"   },
+    [FUSE_LOOKUP] = { do_lookup, "LOOKUP" },
+    [FUSE_FORGET] = { do_forget, "FORGET" },
+    [FUSE_GETATTR] = { do_getattr, "GETATTR" },
+    [FUSE_SETATTR] = { do_setattr, "SETATTR" },
+    [FUSE_READLINK] = { do_readlink, "READLINK" },
+    [FUSE_SYMLINK] = { do_symlink, "SYMLINK" },
+    [FUSE_MKNOD] = { do_mknod, "MKNOD" },
+    [FUSE_MKDIR] = { do_mkdir, "MKDIR" },
+    [FUSE_UNLINK] = { do_unlink, "UNLINK" },
+    [FUSE_RMDIR] = { do_rmdir, "RMDIR" },
+    [FUSE_RENAME] = { do_rename, "RENAME" },
+    [FUSE_LINK] = { do_link, "LINK" },
+    [FUSE_OPEN] = { do_open, "OPEN" },
+    [FUSE_READ] = { do_read, "READ" },
+    [FUSE_WRITE] = { do_write, "WRITE" },
+    [FUSE_STATFS] = { do_statfs, "STATFS" },
+    [FUSE_RELEASE] = { do_release, "RELEASE" },
+    [FUSE_FSYNC] = { do_fsync, "FSYNC" },
+    [FUSE_SETXATTR] = { do_setxattr, "SETXATTR" },
+    [FUSE_GETXATTR] = { do_getxattr, "GETXATTR" },
+    [FUSE_LISTXATTR] = { do_listxattr, "LISTXATTR" },
+    [FUSE_REMOVEXATTR] = { do_removexattr, "REMOVEXATTR" },
+    [FUSE_FLUSH] = { do_flush, "FLUSH" },
+    [FUSE_INIT] = { do_init, "INIT" },
+    [FUSE_OPENDIR] = { do_opendir, "OPENDIR" },
+    [FUSE_READDIR] = { do_readdir, "READDIR" },
+    [FUSE_RELEASEDIR] = { do_releasedir, "RELEASEDIR" },
+    [FUSE_FSYNCDIR] = { do_fsyncdir, "FSYNCDIR" },
+    [FUSE_GETLK] = { do_getlk, "GETLK" },
+    [FUSE_SETLK] = { do_setlk, "SETLK" },
+    [FUSE_SETLKW] = { do_setlkw, "SETLKW" },
+    [FUSE_ACCESS] = { do_access, "ACCESS" },
+    [FUSE_CREATE] = { do_create, "CREATE" },
+    [FUSE_INTERRUPT] = { do_interrupt, "INTERRUPT" },
+    [FUSE_BMAP] = { do_bmap, "BMAP" },
+    [FUSE_IOCTL] = { do_ioctl, "IOCTL" },
+    [FUSE_POLL] = { do_poll, "POLL" },
+    [FUSE_FALLOCATE] = { do_fallocate, "FALLOCATE" },
+    [FUSE_DESTROY] = { do_destroy, "DESTROY" },
+    [FUSE_NOTIFY_REPLY] = { (void *)1, "NOTIFY_REPLY" },
+    [FUSE_BATCH_FORGET] = { do_batch_forget, "BATCH_FORGET" },
+    [FUSE_READDIRPLUS] = { do_readdirplus, "READDIRPLUS" },
+    [FUSE_RENAME2] = { do_rename2, "RENAME2" },
+    [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
+    [FUSE_LSEEK] = { do_lseek, "LSEEK" },
+    [CUSE_INIT] = { cuse_lowlevel_init, "CUSE_INIT" },
 };
 
 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
 
 static const char *opname(enum fuse_opcode opcode)
 {
-	if (opcode >= FUSE_MAXOP || !fuse_ll_ops[opcode].name)
-		return "???";
-	else
-		return fuse_ll_ops[opcode].name;
+    if (opcode >= FUSE_MAXOP || !fuse_ll_ops[opcode].name) {
+        return "???";
+    } else {
+        return fuse_ll_ops[opcode].name;
+    }
 }
 
 void fuse_session_process_buf(struct fuse_session *se,
-			      const struct fuse_buf *buf)
+                              const struct fuse_buf *buf)
 {
-	fuse_session_process_buf_int(se, buf, NULL);
+    fuse_session_process_buf_int(se, buf, NULL);
 }
 
 void fuse_session_process_buf_int(struct fuse_session *se,
-				  const struct fuse_buf *buf, struct fuse_chan *ch)
-{
-	struct fuse_in_header *in;
-	const void *inarg;
-	struct fuse_req *req;
-	int err;
-
-	in = buf->mem;
-
-	if (se->debug) {
-		fuse_log(FUSE_LOG_DEBUG,
-			"unique: %llu, opcode: %s (%i), nodeid: %llu, insize: %zu, pid: %u\n",
-			(unsigned long long) in->unique,
-			opname((enum fuse_opcode) in->opcode), in->opcode,
-			(unsigned long long) in->nodeid, buf->size, in->pid);
-	}
-
-	req = fuse_ll_alloc_req(se);
-	if (req == NULL) {
-		struct fuse_out_header out = {
-			.unique = in->unique,
-			.error = -ENOMEM,
-		};
-		struct iovec iov = {
-			.iov_base = &out,
-			.iov_len = sizeof(struct fuse_out_header),
-		};
-
-		fuse_send_msg(se, ch, &iov, 1);
-		return;
-	}
-
-	req->unique = in->unique;
-	req->ctx.uid = in->uid;
-	req->ctx.gid = in->gid;
-	req->ctx.pid = in->pid;
-	req->ch = ch ? fuse_chan_get(ch) : NULL;
-
-	err = EIO;
-	if (!se->got_init) {
-		enum fuse_opcode expected;
-
-		expected = se->cuse_data ? CUSE_INIT : FUSE_INIT;
-		if (in->opcode != expected)
-			goto reply_err;
-	} else if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT)
-		goto reply_err;
-
-	err = EACCES;
-	/* Implement -o allow_root */
-	if (se->deny_others && in->uid != se->owner && in->uid != 0 &&
-		 in->opcode != FUSE_INIT && in->opcode != FUSE_READ &&
-		 in->opcode != FUSE_WRITE && in->opcode != FUSE_FSYNC &&
-		 in->opcode != FUSE_RELEASE && in->opcode != FUSE_READDIR &&
-		 in->opcode != FUSE_FSYNCDIR && in->opcode != FUSE_RELEASEDIR &&
-		 in->opcode != FUSE_NOTIFY_REPLY &&
-		 in->opcode != FUSE_READDIRPLUS)
-		goto reply_err;
-
-	err = ENOSYS;
-	if (in->opcode >= FUSE_MAXOP || !fuse_ll_ops[in->opcode].func)
-		goto reply_err;
-	if (in->opcode != FUSE_INTERRUPT) {
-		struct fuse_req *intr;
-		pthread_mutex_lock(&se->lock);
-		intr = check_interrupt(se, req);
-		list_add_req(req, &se->list);
-		pthread_mutex_unlock(&se->lock);
-		if (intr)
-			fuse_reply_err(intr, EAGAIN);
-	}
-
-	inarg = (void *) &in[1];
-	if (in->opcode == FUSE_WRITE && se->op.write_buf)
-		do_write_buf(req, in->nodeid, inarg, buf);
-	else if (in->opcode == FUSE_NOTIFY_REPLY)
-		do_notify_reply(req, in->nodeid, inarg, buf);
-	else
-		fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
-
-	return;
+                                  const struct fuse_buf *buf,
+                                  struct fuse_chan *ch)
+{
+    struct fuse_in_header *in;
+    const void *inarg;
+    struct fuse_req *req;
+    int err;
+
+    in = buf->mem;
+
+    if (se->debug) {
+        fuse_log(FUSE_LOG_DEBUG,
+                 "unique: %llu, opcode: %s (%i), nodeid: %llu, insize: %zu, "
+                 "pid: %u\n",
+                 (unsigned long long)in->unique,
+                 opname((enum fuse_opcode)in->opcode), in->opcode,
+                 (unsigned long long)in->nodeid, buf->size, in->pid);
+    }
+
+    req = fuse_ll_alloc_req(se);
+    if (req == NULL) {
+        struct fuse_out_header out = {
+            .unique = in->unique,
+            .error = -ENOMEM,
+        };
+        struct iovec iov = {
+            .iov_base = &out,
+            .iov_len = sizeof(struct fuse_out_header),
+        };
+
+        fuse_send_msg(se, ch, &iov, 1);
+        return;
+    }
+
+    req->unique = in->unique;
+    req->ctx.uid = in->uid;
+    req->ctx.gid = in->gid;
+    req->ctx.pid = in->pid;
+    req->ch = ch ? fuse_chan_get(ch) : NULL;
+
+    err = EIO;
+    if (!se->got_init) {
+        enum fuse_opcode expected;
+
+        expected = se->cuse_data ? CUSE_INIT : FUSE_INIT;
+        if (in->opcode != expected) {
+            goto reply_err;
+        }
+    } else if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT) {
+        goto reply_err;
+    }
+
+    err = EACCES;
+    /* Implement -o allow_root */
+    if (se->deny_others && in->uid != se->owner && in->uid != 0 &&
+        in->opcode != FUSE_INIT && in->opcode != FUSE_READ &&
+        in->opcode != FUSE_WRITE && in->opcode != FUSE_FSYNC &&
+        in->opcode != FUSE_RELEASE && in->opcode != FUSE_READDIR &&
+        in->opcode != FUSE_FSYNCDIR && in->opcode != FUSE_RELEASEDIR &&
+        in->opcode != FUSE_NOTIFY_REPLY && in->opcode != FUSE_READDIRPLUS) {
+        goto reply_err;
+    }
+
+    err = ENOSYS;
+    if (in->opcode >= FUSE_MAXOP || !fuse_ll_ops[in->opcode].func) {
+        goto reply_err;
+    }
+    if (in->opcode != FUSE_INTERRUPT) {
+        struct fuse_req *intr;
+        pthread_mutex_lock(&se->lock);
+        intr = check_interrupt(se, req);
+        list_add_req(req, &se->list);
+        pthread_mutex_unlock(&se->lock);
+        if (intr) {
+            fuse_reply_err(intr, EAGAIN);
+        }
+    }
+
+    inarg = (void *)&in[1];
+    if (in->opcode == FUSE_WRITE && se->op.write_buf) {
+        do_write_buf(req, in->nodeid, inarg, buf);
+    } else if (in->opcode == FUSE_NOTIFY_REPLY) {
+        do_notify_reply(req, in->nodeid, inarg, buf);
+    } else {
+        fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
+    }
+
+    return;
 
 reply_err:
-	fuse_reply_err(req, err);
+    fuse_reply_err(req, err);
 }
 
-#define LL_OPTION(n,o,v) \
-	{ n, offsetof(struct fuse_session, o), v }
+#define LL_OPTION(n, o, v)                     \
+    {                                          \
+        n, offsetof(struct fuse_session, o), v \
+    }
 
 static const struct fuse_opt fuse_ll_opts[] = {
-	LL_OPTION("debug", debug, 1),
-	LL_OPTION("-d", debug, 1),
-	LL_OPTION("--debug", debug, 1),
-	LL_OPTION("allow_root", deny_others, 1),
-	FUSE_OPT_END
+    LL_OPTION("debug", debug, 1), LL_OPTION("-d", debug, 1),
+    LL_OPTION("--debug", debug, 1), LL_OPTION("allow_root", deny_others, 1),
+    FUSE_OPT_END
 };
 
 void fuse_lowlevel_version(void)
 {
-	printf("using FUSE kernel interface version %i.%i\n",
-	       FUSE_KERNEL_VERSION, FUSE_KERNEL_MINOR_VERSION);
-	fuse_mount_version();
+    printf("using FUSE kernel interface version %i.%i\n", FUSE_KERNEL_VERSION,
+           FUSE_KERNEL_MINOR_VERSION);
+    fuse_mount_version();
 }
 
 void fuse_lowlevel_help(void)
 {
-	/* These are not all options, but the ones that are
-	   potentially of interest to an end-user */
-	printf(
-"    -o allow_other         allow access by all users\n"
-"    -o allow_root          allow access by root\n"
-"    -o auto_unmount        auto unmount on process termination\n");
+    /*
+     * These are not all options, but the ones that are
+     * potentially of interest to an end-user
+     */
+    printf("    -o allow_other         allow access by all users\n"
+           "    -o allow_root          allow access by root\n"
+           "    -o auto_unmount        auto unmount on process termination\n");
 }
 
 void fuse_session_destroy(struct fuse_session *se)
 {
-	if (se->got_init && !se->got_destroy) {
-		if (se->op.destroy)
-			se->op.destroy(se->userdata);
-	}
-	pthread_mutex_destroy(&se->lock);
-	free(se->cuse_data);
-	if (se->fd != -1)
-		close(se->fd);
-	free(se);
+    if (se->got_init && !se->got_destroy) {
+        if (se->op.destroy) {
+            se->op.destroy(se->userdata);
+        }
+    }
+    pthread_mutex_destroy(&se->lock);
+    free(se->cuse_data);
+    if (se->fd != -1) {
+        close(se->fd);
+    }
+    free(se);
 }
 
 
 struct fuse_session *fuse_session_new(struct fuse_args *args,
-				      const struct fuse_lowlevel_ops *op,
-				      size_t op_size, void *userdata)
-{
-	struct fuse_session *se;
-
-	if (sizeof(struct fuse_lowlevel_ops) < op_size) {
-		fuse_log(FUSE_LOG_ERR, "fuse: warning: library too old, some operations may not work\n");
-		op_size = sizeof(struct fuse_lowlevel_ops);
-	}
-
-	if (args->argc == 0) {
-		fuse_log(FUSE_LOG_ERR, "fuse: empty argv passed to fuse_session_new().\n");
-		return NULL;
-	}
-
-	se = (struct fuse_session *) calloc(1, sizeof(struct fuse_session));
-	if (se == NULL) {
-		fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate fuse object\n");
-		goto out1;
-	}
-	se->fd = -1;
-	se->conn.max_write = UINT_MAX;
-	se->conn.max_readahead = UINT_MAX;
-
-	/* Parse options */
-	if(fuse_opt_parse(args, se, fuse_ll_opts, NULL) == -1)
-		goto out2;
-	if(se->deny_others) {
-		/* Allowing access only by root is done by instructing
-		 * kernel to allow access by everyone, and then restricting
-		 * access to root and mountpoint owner in libfuse.
-		 */
-		// We may be adding the option a second time, but
-		// that doesn't hurt.
-		if(fuse_opt_add_arg(args, "-oallow_other") == -1)
-			goto out2;
-	}
-	if(args->argc == 1 &&
-	   args->argv[0][0] == '-') {
-		fuse_log(FUSE_LOG_ERR, "fuse: warning: argv[0] looks like an option, but "
-			"will be ignored\n");
-	} else if (args->argc != 1) {
-		int i;
-		fuse_log(FUSE_LOG_ERR, "fuse: unknown option(s): `");
-		for(i = 1; i < args->argc-1; i++)
-			fuse_log(FUSE_LOG_ERR, "%s ", args->argv[i]);
-		fuse_log(FUSE_LOG_ERR, "%s'\n", args->argv[i]);
-		goto out4;
-	}
-
-	if (se->debug)
-		fuse_log(FUSE_LOG_DEBUG, "FUSE library version: %s\n", PACKAGE_VERSION);
-
-	se->bufsize = FUSE_MAX_MAX_PAGES * getpagesize() +
-		FUSE_BUFFER_HEADER_SIZE;
-
-	list_init_req(&se->list);
-	list_init_req(&se->interrupts);
-	list_init_nreq(&se->notify_list);
-	se->notify_ctr = 1;
-	fuse_mutex_init(&se->lock);
-
-	memcpy(&se->op, op, op_size);
-	se->owner = getuid();
-	se->userdata = userdata;
-
-	return se;
+                                      const struct fuse_lowlevel_ops *op,
+                                      size_t op_size, void *userdata)
+{
+    struct fuse_session *se;
+
+    if (sizeof(struct fuse_lowlevel_ops) < op_size) {
+        fuse_log(
+            FUSE_LOG_ERR,
+            "fuse: warning: library too old, some operations may not work\n");
+        op_size = sizeof(struct fuse_lowlevel_ops);
+    }
+
+    if (args->argc == 0) {
+        fuse_log(FUSE_LOG_ERR,
+                 "fuse: empty argv passed to fuse_session_new().\n");
+        return NULL;
+    }
+
+    se = (struct fuse_session *)calloc(1, sizeof(struct fuse_session));
+    if (se == NULL) {
+        fuse_log(FUSE_LOG_ERR, "fuse: failed to allocate fuse object\n");
+        goto out1;
+    }
+    se->fd = -1;
+    se->conn.max_write = UINT_MAX;
+    se->conn.max_readahead = UINT_MAX;
+
+    /* Parse options */
+    if (fuse_opt_parse(args, se, fuse_ll_opts, NULL) == -1) {
+        goto out2;
+    }
+    if (se->deny_others) {
+        /*
+         * Allowing access only by root is done by instructing
+         * kernel to allow access by everyone, and then restricting
+         * access to root and mountpoint owner in libfuse.
+         */
+        /*
+         * We may be adding the option a second time, but
+         * that doesn't hurt.
+         */
+        if (fuse_opt_add_arg(args, "-oallow_other") == -1) {
+            goto out2;
+        }
+    }
+    if (args->argc == 1 && args->argv[0][0] == '-') {
+        fuse_log(FUSE_LOG_ERR,
+                 "fuse: warning: argv[0] looks like an option, but "
+                 "will be ignored\n");
+    } else if (args->argc != 1) {
+        int i;
+        fuse_log(FUSE_LOG_ERR, "fuse: unknown option(s): `");
+        for (i = 1; i < args->argc - 1; i++) {
+            fuse_log(FUSE_LOG_ERR, "%s ", args->argv[i]);
+        }
+        fuse_log(FUSE_LOG_ERR, "%s'\n", args->argv[i]);
+        goto out4;
+    }
+
+    if (se->debug) {
+        fuse_log(FUSE_LOG_DEBUG, "FUSE library version: %s\n", PACKAGE_VERSION);
+    }
+
+    se->bufsize = FUSE_MAX_MAX_PAGES * getpagesize() + FUSE_BUFFER_HEADER_SIZE;
+
+    list_init_req(&se->list);
+    list_init_req(&se->interrupts);
+    list_init_nreq(&se->notify_list);
+    se->notify_ctr = 1;
+    fuse_mutex_init(&se->lock);
+
+    memcpy(&se->op, op, op_size);
+    se->owner = getuid();
+    se->userdata = userdata;
+
+    return se;
 
 out4:
-	fuse_opt_free_args(args);
+    fuse_opt_free_args(args);
 out2:
-	free(se);
+    free(se);
 out1:
-	return NULL;
+    return NULL;
 }
 
 int fuse_session_mount(struct fuse_session *se, const char *mountpoint)
 {
-	int fd;
-
-	/*
-	 * Make sure file descriptors 0, 1 and 2 are open, otherwise chaos
-	 * would ensue.
-	 */
-	do {
-		fd = open("/dev/null", O_RDWR);
-		if (fd > 2)
-			close(fd);
-	} while (fd >= 0 && fd <= 2);
-
-	/*
-	 * To allow FUSE daemons to run without privileges, the caller may open
-	 * /dev/fuse before launching the file system and pass on the file
-	 * descriptor by specifying /dev/fd/N as the mount point. Note that the
-	 * parent process takes care of performing the mount in this case.
-	 */
-	fd = fuse_mnt_parse_fuse_fd(mountpoint);
-	if (fd != -1) {
-		if (fcntl(fd, F_GETFD) == -1) {
-			fuse_log(FUSE_LOG_ERR,
-				"fuse: Invalid file descriptor /dev/fd/%u\n",
-				fd);
-			return -1;
-		}
-		se->fd = fd;
-		return 0;
-	}
-
-	/* Open channel */
-	fd = fuse_kern_mount(mountpoint, se->mo);
-	if (fd == -1)
-		return -1;
-	se->fd = fd;
-
-	/* Save mountpoint */
-	se->mountpoint = strdup(mountpoint);
-	if (se->mountpoint == NULL)
-		goto error_out;
-
-	return 0;
+    int fd;
+
+    /*
+     * Make sure file descriptors 0, 1 and 2 are open, otherwise chaos
+     * would ensue.
+     */
+    do {
+        fd = open("/dev/null", O_RDWR);
+        if (fd > 2) {
+            close(fd);
+        }
+    } while (fd >= 0 && fd <= 2);
+
+    /*
+     * To allow FUSE daemons to run without privileges, the caller may open
+     * /dev/fuse before launching the file system and pass on the file
+     * descriptor by specifying /dev/fd/N as the mount point. Note that the
+     * parent process takes care of performing the mount in this case.
+     */
+    fd = fuse_mnt_parse_fuse_fd(mountpoint);
+    if (fd != -1) {
+        if (fcntl(fd, F_GETFD) == -1) {
+            fuse_log(FUSE_LOG_ERR, "fuse: Invalid file descriptor /dev/fd/%u\n",
+                     fd);
+            return -1;
+        }
+        se->fd = fd;
+        return 0;
+    }
+
+    /* Open channel */
+    fd = fuse_kern_mount(mountpoint, se->mo);
+    if (fd == -1) {
+        return -1;
+    }
+    se->fd = fd;
+
+    /* Save mountpoint */
+    se->mountpoint = strdup(mountpoint);
+    if (se->mountpoint == NULL) {
+        goto error_out;
+    }
+
+    return 0;
 
 error_out:
-	fuse_kern_unmount(mountpoint, fd);
-	return -1;
+    fuse_kern_unmount(mountpoint, fd);
+    return -1;
 }
 
 int fuse_session_fd(struct fuse_session *se)
 {
-	return se->fd;
+    return se->fd;
 }
 
 void fuse_session_unmount(struct fuse_session *se)
@@ -2402,61 +2543,66 @@ void fuse_session_unmount(struct fuse_session *se)
 #ifdef linux
 int fuse_req_getgroups(fuse_req_t req, int size, gid_t list[])
 {
-	char *buf;
-	size_t bufsize = 1024;
-	char path[128];
-	int ret;
-	int fd;
-	unsigned long pid = req->ctx.pid;
-	char *s;
+    char *buf;
+    size_t bufsize = 1024;
+    char path[128];
+    int ret;
+    int fd;
+    unsigned long pid = req->ctx.pid;
+    char *s;
 
-	sprintf(path, "/proc/%lu/task/%lu/status", pid, pid);
+    sprintf(path, "/proc/%lu/task/%lu/status", pid, pid);
 
 retry:
-	buf = malloc(bufsize);
-	if (buf == NULL)
-		return -ENOMEM;
-
-	ret = -EIO;
-	fd = open(path, O_RDONLY);
-	if (fd == -1)
-		goto out_free;
-
-	ret = read(fd, buf, bufsize);
-	close(fd);
-	if (ret < 0) {
-		ret = -EIO;
-		goto out_free;
-	}
-
-	if ((size_t)ret == bufsize) {
-		free(buf);
-		bufsize *= 4;
-		goto retry;
-	}
-
-	ret = -EIO;
-	s = strstr(buf, "\nGroups:");
-	if (s == NULL)
-		goto out_free;
-
-	s += 8;
-	ret = 0;
-	while (1) {
-		char *end;
-		unsigned long val = strtoul(s, &end, 0);
-		if (end == s)
-			break;
-
-		s = end;
-		if (ret < size)
-			list[ret] = val;
-		ret++;
-	}
+    buf = malloc(bufsize);
+    if (buf == NULL) {
+        return -ENOMEM;
+    }
+
+    ret = -EIO;
+    fd = open(path, O_RDONLY);
+    if (fd == -1) {
+        goto out_free;
+    }
+
+    ret = read(fd, buf, bufsize);
+    close(fd);
+    if (ret < 0) {
+        ret = -EIO;
+        goto out_free;
+    }
+
+    if ((size_t)ret == bufsize) {
+        free(buf);
+        bufsize *= 4;
+        goto retry;
+    }
+
+    ret = -EIO;
+    s = strstr(buf, "\nGroups:");
+    if (s == NULL) {
+        goto out_free;
+    }
+
+    s += 8;
+    ret = 0;
+    while (1) {
+        char *end;
+        unsigned long val = strtoul(s, &end, 0);
+        if (end == s) {
+            break;
+        }
+
+        s = end;
+        if (ret < size) {
+            list[ret] = val;
+        }
+        ret++;
+    }
 
 out_free:
-	free(buf);
-	return ret;
+    free(buf);
+    return ret;
 }
 #else /* linux */
 /*
@@ -2464,23 +2610,25 @@ out_free:
  */
 int fuse_req_getgroups(fuse_req_t req, int size, gid_t list[])
 {
-	(void) req; (void) size; (void) list;
-	return -ENOSYS;
+    (void)req;
+    (void)size;
+    (void)list;
+    return -ENOSYS;
 }
 #endif
 
 void fuse_session_exit(struct fuse_session *se)
 {
-	se->exited = 1;
+    se->exited = 1;
 }
 
 void fuse_session_reset(struct fuse_session *se)
 {
-	se->exited = 0;
-	se->error = 0;
+    se->exited = 0;
+    se->error = 0;
 }
 
 int fuse_session_exited(struct fuse_session *se)
 {
-	return se->exited;
+    return se->exited;
 }
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 7aa7bad5b2..85cc027382 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1,15 +1,16 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB.
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB.
+ */
 
 #ifndef FUSE_LOWLEVEL_H_
 #define FUSE_LOWLEVEL_H_
 
-/** @file
+/**
+ * @file
  *
  * Low level API
  *
@@ -24,16 +25,16 @@
 
 #include "fuse_common.h"
 
-#include <utime.h>
 #include <fcntl.h>
-#include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/statvfs.h>
+#include <sys/types.h>
 #include <sys/uio.h>
+#include <utime.h>
 
-/* ----------------------------------------------------------- *
- * Miscellaneous definitions				       *
- * ----------------------------------------------------------- */
+/*
+ * Miscellaneous definitions
+ */
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -53,47 +54,54 @@ struct fuse_session;
 
 /** Directory entry parameters supplied to fuse_reply_entry() */
 struct fuse_entry_param {
-	/** Unique inode number
-	 *
-	 * In lookup, zero means negative entry (from version 2.5)
-	 * Returning ENOENT also means negative entry, but by setting zero
-	 * ino the kernel may cache negative entries for entry_timeout
-	 * seconds.
-	 */
-	fuse_ino_t ino;
-
-	/** Generation number for this entry.
-	 *
-	 * If the file system will be exported over NFS, the
-	 * ino/generation pairs need to be unique over the file
-	 * system's lifetime (rather than just the mount time). So if
-	 * the file system reuses an inode after it has been deleted,
-	 * it must assign a new, previously unused generation number
-	 * to the inode at the same time.
-	 *
-	 */
-	uint64_t generation;
-
-	/** Inode attributes.
-	 *
-	 * Even if attr_timeout == 0, attr must be correct. For example,
-	 * for open(), FUSE uses attr.st_size from lookup() to determine
-	 * how many bytes to request. If this value is not correct,
-	 * incorrect data will be returned.
-	 */
-	struct stat attr;
-
-	/** Validity timeout (in seconds) for inode attributes. If
-	    attributes only change as a result of requests that come
-	    through the kernel, this should be set to a very large
-	    value. */
-	double attr_timeout;
-
-	/** Validity timeout (in seconds) for the name. If directory
-	    entries are changed/deleted only as a result of requests
-	    that come through the kernel, this should be set to a very
-	    large value. */
-	double entry_timeout;
+    /**
+     * Unique inode number
+     *
+     * In lookup, zero means negative entry (from version 2.5)
+     * Returning ENOENT also means negative entry, but by setting zero
+     * ino the kernel may cache negative entries for entry_timeout
+     * seconds.
+     */
+    fuse_ino_t ino;
+
+    /**
+     * Generation number for this entry.
+     *
+     * If the file system will be exported over NFS, the
+     * ino/generation pairs need to be unique over the file
+     * system's lifetime (rather than just the mount time). So if
+     * the file system reuses an inode after it has been deleted,
+     * it must assign a new, previously unused generation number
+     * to the inode at the same time.
+     *
+     */
+    uint64_t generation;
+
+    /**
+     * Inode attributes.
+     *
+     * Even if attr_timeout == 0, attr must be correct. For example,
+     * for open(), FUSE uses attr.st_size from lookup() to determine
+     * how many bytes to request. If this value is not correct,
+     * incorrect data will be returned.
+     */
+    struct stat attr;
+
+    /**
+     * Validity timeout (in seconds) for inode attributes. If
+     *  attributes only change as a result of requests that come
+     *  through the kernel, this should be set to a very large
+     *  value.
+     */
+    double attr_timeout;
+
+    /**
+     * Validity timeout (in seconds) for the name. If directory
+     *  entries are changed/deleted only as a result of requests
+     *  that come through the kernel, this should be set to a very
+     *  large value.
+     */
+    double entry_timeout;
 };
 
 /**
@@ -105,38 +113,38 @@ struct fuse_entry_param {
  * there is no valid uid/pid/gid that could be reported.
  */
 struct fuse_ctx {
-	/** User ID of the calling process */
-	uid_t uid;
+    /** User ID of the calling process */
+    uid_t uid;
 
-	/** Group ID of the calling process */
-	gid_t gid;
+    /** Group ID of the calling process */
+    gid_t gid;
 
-	/** Thread ID of the calling process */
-	pid_t pid;
+    /** Thread ID of the calling process */
+    pid_t pid;
 
-	/** Umask of the calling process */
-	mode_t umask;
+    /** Umask of the calling process */
+    mode_t umask;
 };
 
 struct fuse_forget_data {
-	fuse_ino_t ino;
-	uint64_t nlookup;
+    fuse_ino_t ino;
+    uint64_t nlookup;
 };
 
 /* 'to_set' flags in setattr */
-#define FUSE_SET_ATTR_MODE	(1 << 0)
-#define FUSE_SET_ATTR_UID	(1 << 1)
-#define FUSE_SET_ATTR_GID	(1 << 2)
-#define FUSE_SET_ATTR_SIZE	(1 << 3)
-#define FUSE_SET_ATTR_ATIME	(1 << 4)
-#define FUSE_SET_ATTR_MTIME	(1 << 5)
-#define FUSE_SET_ATTR_ATIME_NOW	(1 << 7)
-#define FUSE_SET_ATTR_MTIME_NOW	(1 << 8)
-#define FUSE_SET_ATTR_CTIME	(1 << 10)
-
-/* ----------------------------------------------------------- *
- * Request methods and replies				       *
- * ----------------------------------------------------------- */
+#define FUSE_SET_ATTR_MODE (1 << 0)
+#define FUSE_SET_ATTR_UID (1 << 1)
+#define FUSE_SET_ATTR_GID (1 << 2)
+#define FUSE_SET_ATTR_SIZE (1 << 3)
+#define FUSE_SET_ATTR_ATIME (1 << 4)
+#define FUSE_SET_ATTR_MTIME (1 << 5)
+#define FUSE_SET_ATTR_ATIME_NOW (1 << 7)
+#define FUSE_SET_ATTR_MTIME_NOW (1 << 8)
+#define FUSE_SET_ATTR_CTIME (1 << 10)
+
+/*
+ * Request methods and replies
+ */
 
 /**
  * Low level filesystem operations
@@ -166,1075 +174,1069 @@ struct fuse_forget_data {
  * this file will not be called.
  */
 struct fuse_lowlevel_ops {
-	/**
-	 * Initialize filesystem
-	 *
-	 * This function is called when libfuse establishes
-	 * communication with the FUSE kernel module. The file system
-	 * should use this module to inspect and/or modify the
-	 * connection parameters provided in the `conn` structure.
-	 *
-	 * Note that some parameters may be overwritten by options
-	 * passed to fuse_session_new() which take precedence over the
-	 * values set in this handler.
-	 *
-	 * There's no reply to this function
-	 *
-	 * @param userdata the user data passed to fuse_session_new()
-	 */
-	void (*init) (void *userdata, struct fuse_conn_info *conn);
-
-	/**
-	 * Clean up filesystem.
-	 *
-	 * Called on filesystem exit. When this method is called, the
-	 * connection to the kernel may be gone already, so that eg. calls
-	 * to fuse_lowlevel_notify_* will fail.
-	 *
-	 * There's no reply to this function
-	 *
-	 * @param userdata the user data passed to fuse_session_new()
-	 */
-	void (*destroy) (void *userdata);
-
-	/**
-	 * Look up a directory entry by name and get its attributes.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_entry
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param parent inode number of the parent directory
-	 * @param name the name to look up
-	 */
-	void (*lookup) (fuse_req_t req, fuse_ino_t parent, const char *name);
-
-	/**
-	 * Forget about an inode
-	 *
-	 * This function is called when the kernel removes an inode
-	 * from its internal caches.
-	 *
-	 * The inode's lookup count increases by one for every call to
-	 * fuse_reply_entry and fuse_reply_create. The nlookup parameter
-	 * indicates by how much the lookup count should be decreased.
-	 *
-	 * Inodes with a non-zero lookup count may receive request from
-	 * the kernel even after calls to unlink, rmdir or (when
-	 * overwriting an existing file) rename. Filesystems must handle
-	 * such requests properly and it is recommended to defer removal
-	 * of the inode until the lookup count reaches zero. Calls to
-	 * unlink, rmdir or rename will be followed closely by forget
-	 * unless the file or directory is open, in which case the
-	 * kernel issues forget only after the release or releasedir
-	 * calls.
-	 *
-	 * Note that if a file system will be exported over NFS the
-	 * inodes lifetime must extend even beyond forget. See the
-	 * generation field in struct fuse_entry_param above.
-	 *
-	 * On unmount the lookup count for all inodes implicitly drops
-	 * to zero. It is not guaranteed that the file system will
-	 * receive corresponding forget messages for the affected
-	 * inodes.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_none
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param nlookup the number of lookups to forget
-	 */
-	void (*forget) (fuse_req_t req, fuse_ino_t ino, uint64_t nlookup);
-
-	/**
-	 * Get file attributes.
-	 *
-	 * If writeback caching is enabled, the kernel may have a
-	 * better idea of a file's length than the FUSE file system
-	 * (eg if there has been a write that extended the file size,
-	 * but that has not yet been passed to the filesystem.n
-	 *
-	 * In this case, the st_size value provided by the file system
-	 * will be ignored.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_attr
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi for future use, currently always NULL
-	 */
-	void (*getattr) (fuse_req_t req, fuse_ino_t ino,
-			 struct fuse_file_info *fi);
-
-	/**
-	 * Set file attributes
-	 *
-	 * In the 'attr' argument only members indicated by the 'to_set'
-	 * bitmask contain valid values.  Other members contain undefined
-	 * values.
-	 *
-	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
-	 * expected to reset the setuid and setgid bits if the file
-	 * size or owner is being changed.
-	 *
-	 * If the setattr was invoked from the ftruncate() system call
-	 * under Linux kernel versions 2.6.15 or later, the fi->fh will
-	 * contain the value set by the open method or will be undefined
-	 * if the open method didn't set any value.  Otherwise (not
-	 * ftruncate call, or kernel version earlier than 2.6.15) the fi
-	 * parameter will be NULL.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_attr
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param attr the attributes
-	 * @param to_set bit mask of attributes which should be set
-	 * @param fi file information, or NULL
-	 */
-	void (*setattr) (fuse_req_t req, fuse_ino_t ino, struct stat *attr,
-			 int to_set, struct fuse_file_info *fi);
-
-	/**
-	 * Read symbolic link
-	 *
-	 * Valid replies:
-	 *   fuse_reply_readlink
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 */
-	void (*readlink) (fuse_req_t req, fuse_ino_t ino);
-
-	/**
-	 * Create file node
-	 *
-	 * Create a regular file, character device, block device, fifo or
-	 * socket node.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_entry
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param parent inode number of the parent directory
-	 * @param name to create
-	 * @param mode file type and mode with which to create the new file
-	 * @param rdev the device number (only valid if created file is a device)
-	 */
-	void (*mknod) (fuse_req_t req, fuse_ino_t parent, const char *name,
-		       mode_t mode, dev_t rdev);
-
-	/**
-	 * Create a directory
-	 *
-	 * Valid replies:
-	 *   fuse_reply_entry
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param parent inode number of the parent directory
-	 * @param name to create
-	 * @param mode with which to create the new file
-	 */
-	void (*mkdir) (fuse_req_t req, fuse_ino_t parent, const char *name,
-		       mode_t mode);
-
-	/**
-	 * Remove a file
-	 *
-	 * If the file's inode's lookup count is non-zero, the file
-	 * system is expected to postpone any removal of the inode
-	 * until the lookup count reaches zero (see description of the
-	 * forget function).
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param parent inode number of the parent directory
-	 * @param name to remove
-	 */
-	void (*unlink) (fuse_req_t req, fuse_ino_t parent, const char *name);
-
-	/**
-	 * Remove a directory
-	 *
-	 * If the directory's inode's lookup count is non-zero, the
-	 * file system is expected to postpone any removal of the
-	 * inode until the lookup count reaches zero (see description
-	 * of the forget function).
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param parent inode number of the parent directory
-	 * @param name to remove
-	 */
-	void (*rmdir) (fuse_req_t req, fuse_ino_t parent, const char *name);
-
-	/**
-	 * Create a symbolic link
-	 *
-	 * Valid replies:
-	 *   fuse_reply_entry
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param link the contents of the symbolic link
-	 * @param parent inode number of the parent directory
-	 * @param name to create
-	 */
-	void (*symlink) (fuse_req_t req, const char *link, fuse_ino_t parent,
-			 const char *name);
-
-	/** Rename a file
-	 *
-	 * If the target exists it should be atomically replaced. If
-	 * the target's inode's lookup count is non-zero, the file
-	 * system is expected to postpone any removal of the inode
-	 * until the lookup count reaches zero (see description of the
-	 * forget function).
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent failure with error code EINVAL, i.e. all
-	 * future bmap requests will fail with EINVAL without being
-	 * send to the filesystem process.
-	 *
-	 * *flags* may be `RENAME_EXCHANGE` or `RENAME_NOREPLACE`. If
-	 * RENAME_NOREPLACE is specified, the filesystem must not
-	 * overwrite *newname* if it exists and return an error
-	 * instead. If `RENAME_EXCHANGE` is specified, the filesystem
-	 * must atomically exchange the two files, i.e. both must
-	 * exist and neither may be deleted.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param parent inode number of the old parent directory
-	 * @param name old name
-	 * @param newparent inode number of the new parent directory
-	 * @param newname new name
-	 */
-	void (*rename) (fuse_req_t req, fuse_ino_t parent, const char *name,
-			fuse_ino_t newparent, const char *newname,
-			unsigned int flags);
-
-	/**
-	 * Create a hard link
-	 *
-	 * Valid replies:
-	 *   fuse_reply_entry
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the old inode number
-	 * @param newparent inode number of the new parent directory
-	 * @param newname new name to create
-	 */
-	void (*link) (fuse_req_t req, fuse_ino_t ino, fuse_ino_t newparent,
-		      const char *newname);
-
-	/**
-	 * Open a file
-	 *
-	 * Open flags are available in fi->flags. The following rules
-	 * apply.
-	 *
-	 *  - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be
-	 *    filtered out / handled by the kernel.
-	 *
-	 *  - Access modes (O_RDONLY, O_WRONLY, O_RDWR) should be used
-	 *    by the filesystem to check if the operation is
-	 *    permitted.  If the ``-o default_permissions`` mount
-	 *    option is given, this check is already done by the
-	 *    kernel before calling open() and may thus be omitted by
-	 *    the filesystem.
-	 *
-	 *  - When writeback caching is enabled, the kernel may send
-	 *    read requests even for files opened with O_WRONLY. The
-	 *    filesystem should be prepared to handle this.
-	 *
-	 *  - When writeback caching is disabled, the filesystem is
-	 *    expected to properly handle the O_APPEND flag and ensure
-	 *    that each write is appending to the end of the file.
-	 * 
-         *  - When writeback caching is enabled, the kernel will
-	 *    handle O_APPEND. However, unless all changes to the file
-	 *    come through the kernel this will not work reliably. The
-	 *    filesystem should thus either ignore the O_APPEND flag
-	 *    (and let the kernel handle it), or return an error
-	 *    (indicating that reliably O_APPEND is not available).
-	 *
-	 * Filesystem may store an arbitrary file handle (pointer,
-	 * index, etc) in fi->fh, and use this in other all other file
-	 * operations (read, write, flush, release, fsync).
-	 *
-	 * Filesystem may also implement stateless file I/O and not store
-	 * anything in fi->fh.
-	 *
-	 * There are also some flags (direct_io, keep_cache) which the
-	 * filesystem may set in fi, to change the way the file is opened.
-	 * See fuse_file_info structure in <fuse_common.h> for more details.
-	 *
-	 * If this request is answered with an error code of ENOSYS
-	 * and FUSE_CAP_NO_OPEN_SUPPORT is set in
-	 * `fuse_conn_info.capable`, this is treated as success and
-	 * future calls to open and release will also succeed without being
-	 * sent to the filesystem process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_open
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi file information
-	 */
-	void (*open) (fuse_req_t req, fuse_ino_t ino,
-		      struct fuse_file_info *fi);
-
-	/**
-	 * Read data
-	 *
-	 * Read should send exactly the number of bytes requested except
-	 * on EOF or error, otherwise the rest of the data will be
-	 * substituted with zeroes.  An exception to this is when the file
-	 * has been opened in 'direct_io' mode, in which case the return
-	 * value of the read system call will reflect the return value of
-	 * this operation.
-	 *
-	 * fi->fh will contain the value set by the open method, or will
-	 * be undefined if the open method didn't set any value.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_buf
-	 *   fuse_reply_iov
-	 *   fuse_reply_data
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param size number of bytes to read
-	 * @param off offset to read from
-	 * @param fi file information
-	 */
-	void (*read) (fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
-		      struct fuse_file_info *fi);
-
-	/**
-	 * Write data
-	 *
-	 * Write should return exactly the number of bytes requested
-	 * except on error.  An exception to this is when the file has
-	 * been opened in 'direct_io' mode, in which case the return value
-	 * of the write system call will reflect the return value of this
-	 * operation.
-	 *
-	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
-	 * expected to reset the setuid and setgid bits.
-	 *
-	 * fi->fh will contain the value set by the open method, or will
-	 * be undefined if the open method didn't set any value.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_write
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param buf data to write
-	 * @param size number of bytes to write
-	 * @param off offset to write to
-	 * @param fi file information
-	 */
-	void (*write) (fuse_req_t req, fuse_ino_t ino, const char *buf,
-		       size_t size, off_t off, struct fuse_file_info *fi);
-
-	/**
-	 * Flush method
-	 *
-	 * This is called on each close() of the opened file.
-	 *
-	 * Since file descriptors can be duplicated (dup, dup2, fork), for
-	 * one open call there may be many flush calls.
-	 *
-	 * Filesystems shouldn't assume that flush will always be called
-	 * after some writes, or that if will be called at all.
-	 *
-	 * fi->fh will contain the value set by the open method, or will
-	 * be undefined if the open method didn't set any value.
-	 *
-	 * NOTE: the name of the method is misleading, since (unlike
-	 * fsync) the filesystem is not forced to flush pending writes.
-	 * One reason to flush data is if the filesystem wants to return
-	 * write errors during close.  However, such use is non-portable
-	 * because POSIX does not require [close] to wait for delayed I/O to
-	 * complete.
-	 *
-	 * If the filesystem supports file locking operations (setlk,
-	 * getlk) it should remove all locks belonging to 'fi->owner'.
-	 *
-	 * If this request is answered with an error code of ENOSYS,
-	 * this is treated as success and future calls to flush() will
-	 * succeed automatically without being send to the filesystem
-	 * process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi file information
-	 *
-	 * [close]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
-	 */
-	void (*flush) (fuse_req_t req, fuse_ino_t ino,
-		       struct fuse_file_info *fi);
-
-	/**
-	 * Release an open file
-	 *
-	 * Release is called when there are no more references to an open
-	 * file: all file descriptors are closed and all memory mappings
-	 * are unmapped.
-	 *
-	 * For every open call there will be exactly one release call (unless
-	 * the filesystem is force-unmounted).
-	 *
-	 * The filesystem may reply with an error, but error values are
-	 * not returned to close() or munmap() which triggered the
-	 * release.
-	 *
-	 * fi->fh will contain the value set by the open method, or will
-	 * be undefined if the open method didn't set any value.
-	 * fi->flags will contain the same flags as for open.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi file information
-	 */
-	void (*release) (fuse_req_t req, fuse_ino_t ino,
-			 struct fuse_file_info *fi);
-
-	/**
-	 * Synchronize file contents
-	 *
-	 * If the datasync parameter is non-zero, then only the user data
-	 * should be flushed, not the meta data.
-	 *
-	 * If this request is answered with an error code of ENOSYS,
-	 * this is treated as success and future calls to fsync() will
-	 * succeed automatically without being send to the filesystem
-	 * process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param datasync flag indicating if only data should be flushed
-	 * @param fi file information
-	 */
-	void (*fsync) (fuse_req_t req, fuse_ino_t ino, int datasync,
-		       struct fuse_file_info *fi);
-
-	/**
-	 * Open a directory
-	 *
-	 * Filesystem may store an arbitrary file handle (pointer, index,
-	 * etc) in fi->fh, and use this in other all other directory
-	 * stream operations (readdir, releasedir, fsyncdir).
-	 *
-	 * If this request is answered with an error code of ENOSYS and
-	 * FUSE_CAP_NO_OPENDIR_SUPPORT is set in `fuse_conn_info.capable`,
-	 * this is treated as success and future calls to opendir and
-	 * releasedir will also succeed without being sent to the filesystem
-	 * process. In addition, the kernel will cache readdir results
-	 * as if opendir returned FOPEN_KEEP_CACHE | FOPEN_CACHE_DIR.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_open
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi file information
-	 */
-	void (*opendir) (fuse_req_t req, fuse_ino_t ino,
-			 struct fuse_file_info *fi);
-
-	/**
-	 * Read directory
-	 *
-	 * Send a buffer filled using fuse_add_direntry(), with size not
-	 * exceeding the requested size.  Send an empty buffer on end of
-	 * stream.
-	 *
-	 * fi->fh will contain the value set by the opendir method, or
-	 * will be undefined if the opendir method didn't set any value.
-	 *
-	 * Returning a directory entry from readdir() does not affect
-	 * its lookup count.
-	 *
-         * If off_t is non-zero, then it will correspond to one of the off_t
-	 * values that was previously returned by readdir() for the same
-	 * directory handle. In this case, readdir() should skip over entries
-	 * coming before the position defined by the off_t value. If entries
-	 * are added or removed while the directory handle is open, they filesystem
-	 * may still include the entries that have been removed, and may not
-	 * report the entries that have been created. However, addition or
-	 * removal of entries must never cause readdir() to skip over unrelated
-	 * entries or to report them more than once. This means
-	 * that off_t can not be a simple index that enumerates the entries
-	 * that have been returned but must contain sufficient information to
-	 * uniquely determine the next directory entry to return even when the
-	 * set of entries is changing.
-	 *
-	 * The function does not have to report the '.' and '..'
-	 * entries, but is allowed to do so. Note that, if readdir does
-	 * not return '.' or '..', they will not be implicitly returned,
-	 * and this behavior is observable by the caller.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_buf
-	 *   fuse_reply_data
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param size maximum number of bytes to send
-	 * @param off offset to continue reading the directory stream
-	 * @param fi file information
-	 */
-	void (*readdir) (fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
-			 struct fuse_file_info *fi);
-
-	/**
-	 * Release an open directory
-	 *
-	 * For every opendir call there will be exactly one releasedir
-	 * call (unless the filesystem is force-unmounted).
-	 *
-	 * fi->fh will contain the value set by the opendir method, or
-	 * will be undefined if the opendir method didn't set any value.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi file information
-	 */
-	void (*releasedir) (fuse_req_t req, fuse_ino_t ino,
-			    struct fuse_file_info *fi);
-
-	/**
-	 * Synchronize directory contents
-	 *
-	 * If the datasync parameter is non-zero, then only the directory
-	 * contents should be flushed, not the meta data.
-	 *
-	 * fi->fh will contain the value set by the opendir method, or
-	 * will be undefined if the opendir method didn't set any value.
-	 *
-	 * If this request is answered with an error code of ENOSYS,
-	 * this is treated as success and future calls to fsyncdir() will
-	 * succeed automatically without being send to the filesystem
-	 * process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param datasync flag indicating if only data should be flushed
-	 * @param fi file information
-	 */
-	void (*fsyncdir) (fuse_req_t req, fuse_ino_t ino, int datasync,
-			  struct fuse_file_info *fi);
-
-	/**
-	 * Get file system statistics
-	 *
-	 * Valid replies:
-	 *   fuse_reply_statfs
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number, zero means "undefined"
-	 */
-	void (*statfs) (fuse_req_t req, fuse_ino_t ino);
-
-	/**
-	 * Set an extended attribute
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
-	 * future setxattr() requests will fail with EOPNOTSUPP without being
-	 * send to the filesystem process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 */
-	void (*setxattr) (fuse_req_t req, fuse_ino_t ino, const char *name,
-			  const char *value, size_t size, int flags);
-
-	/**
-	 * Get an extended attribute
-	 *
-	 * If size is zero, the size of the value should be sent with
-	 * fuse_reply_xattr.
-	 *
-	 * If the size is non-zero, and the value fits in the buffer, the
-	 * value should be sent with fuse_reply_buf.
-	 *
-	 * If the size is too small for the value, the ERANGE error should
-	 * be sent.
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
-	 * future getxattr() requests will fail with EOPNOTSUPP without being
-	 * send to the filesystem process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_buf
-	 *   fuse_reply_data
-	 *   fuse_reply_xattr
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param name of the extended attribute
-	 * @param size maximum size of the value to send
-	 */
-	void (*getxattr) (fuse_req_t req, fuse_ino_t ino, const char *name,
-			  size_t size);
-
-	/**
-	 * List extended attribute names
-	 *
-	 * If size is zero, the total size of the attribute list should be
-	 * sent with fuse_reply_xattr.
-	 *
-	 * If the size is non-zero, and the null character separated
-	 * attribute list fits in the buffer, the list should be sent with
-	 * fuse_reply_buf.
-	 *
-	 * If the size is too small for the list, the ERANGE error should
-	 * be sent.
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
-	 * future listxattr() requests will fail with EOPNOTSUPP without being
-	 * send to the filesystem process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_buf
-	 *   fuse_reply_data
-	 *   fuse_reply_xattr
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param size maximum size of the list to send
-	 */
-	void (*listxattr) (fuse_req_t req, fuse_ino_t ino, size_t size);
-
-	/**
-	 * Remove an extended attribute
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
-	 * future removexattr() requests will fail with EOPNOTSUPP without being
-	 * send to the filesystem process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param name of the extended attribute
-	 */
-	void (*removexattr) (fuse_req_t req, fuse_ino_t ino, const char *name);
-
-	/**
-	 * Check file access permissions
-	 *
-	 * This will be called for the access() and chdir() system
-	 * calls.  If the 'default_permissions' mount option is given,
-	 * this method is not called.
-	 *
-	 * This method is not called under Linux kernel versions 2.4.x
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent success, i.e. this and all future access()
-	 * requests will succeed without being send to the filesystem process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param mask requested access mode
-	 */
-	void (*access) (fuse_req_t req, fuse_ino_t ino, int mask);
-
-	/**
-	 * Create and open a file
-	 *
-	 * If the file does not exist, first create it with the specified
-	 * mode, and then open it.
-	 *
-	 * See the description of the open handler for more
-	 * information.
-	 *
-	 * If this method is not implemented or under Linux kernel
-	 * versions earlier than 2.6.15, the mknod() and open() methods
-	 * will be called instead.
-	 *
-	 * If this request is answered with an error code of ENOSYS, the handler
-	 * is treated as not implemented (i.e., for this and future requests the
-	 * mknod() and open() handlers will be called instead).
-	 *
-	 * Valid replies:
-	 *   fuse_reply_create
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param parent inode number of the parent directory
-	 * @param name to create
-	 * @param mode file type and mode with which to create the new file
-	 * @param fi file information
-	 */
-	void (*create) (fuse_req_t req, fuse_ino_t parent, const char *name,
-			mode_t mode, struct fuse_file_info *fi);
-
-	/**
-	 * Test for a POSIX file lock
-	 *
-	 * Valid replies:
-	 *   fuse_reply_lock
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi file information
-	 * @param lock the region/type to test
-	 */
-	void (*getlk) (fuse_req_t req, fuse_ino_t ino,
-		       struct fuse_file_info *fi, struct flock *lock);
-
-	/**
-	 * Acquire, modify or release a POSIX file lock
-	 *
-	 * For POSIX threads (NPTL) there's a 1-1 relation between pid and
-	 * owner, but otherwise this is not always the case.  For checking
-	 * lock ownership, 'fi->owner' must be used.  The l_pid field in
-	 * 'struct flock' should only be used to fill in this field in
-	 * getlk().
-	 *
-	 * Note: if the locking methods are not implemented, the kernel
-	 * will still allow file locking to work locally.  Hence these are
-	 * only interesting for network filesystems and similar.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi file information
-	 * @param lock the region/type to set
-	 * @param sleep locking operation may sleep
-	 */
-	void (*setlk) (fuse_req_t req, fuse_ino_t ino,
-		       struct fuse_file_info *fi,
-		       struct flock *lock, int sleep);
-
-	/**
-	 * Map block index within file to block index within device
-	 *
-	 * Note: This makes sense only for block device backed filesystems
-	 * mounted with the 'blkdev' option
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent failure, i.e. all future bmap() requests will
-	 * fail with the same error code without being send to the filesystem
-	 * process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_bmap
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param blocksize unit of block index
-	 * @param idx block index within file
-	 */
-	void (*bmap) (fuse_req_t req, fuse_ino_t ino, size_t blocksize,
-		      uint64_t idx);
-
-	/**
-	 * Ioctl
-	 *
-	 * Note: For unrestricted ioctls (not allowed for FUSE
-	 * servers), data in and out areas can be discovered by giving
-	 * iovs and setting FUSE_IOCTL_RETRY in *flags*.  For
-	 * restricted ioctls, kernel prepares in/out data area
-	 * according to the information encoded in cmd.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_ioctl_retry
-	 *   fuse_reply_ioctl
-	 *   fuse_reply_ioctl_iov
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param cmd ioctl command
-	 * @param arg ioctl argument
-	 * @param fi file information
-	 * @param flags for FUSE_IOCTL_* flags
-	 * @param in_buf data fetched from the caller
-	 * @param in_bufsz number of fetched bytes
-	 * @param out_bufsz maximum size of output data
-	 *
-	 * Note : the unsigned long request submitted by the application
-	 * is truncated to 32 bits.
-	 */
-	void (*ioctl) (fuse_req_t req, fuse_ino_t ino, unsigned int cmd,
-		       void *arg, struct fuse_file_info *fi, unsigned flags,
-		       const void *in_buf, size_t in_bufsz, size_t out_bufsz);
-
-	/**
-	 * Poll for IO readiness
-	 *
-	 * Note: If ph is non-NULL, the client should notify
-	 * when IO readiness events occur by calling
-	 * fuse_lowlevel_notify_poll() with the specified ph.
-	 *
-	 * Regardless of the number of times poll with a non-NULL ph
-	 * is received, single notification is enough to clear all.
-	 * Notifying more times incurs overhead but doesn't harm
-	 * correctness.
-	 *
-	 * The callee is responsible for destroying ph with
-	 * fuse_pollhandle_destroy() when no longer in use.
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as success (with a kernel-defined default poll-mask) and
-	 * future calls to pull() will succeed the same way without being send
-	 * to the filesystem process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_poll
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi file information
-	 * @param ph poll handle to be used for notification
-	 */
-	void (*poll) (fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
-		      struct fuse_pollhandle *ph);
-
-	/**
-	 * Write data made available in a buffer
-	 *
-	 * This is a more generic version of the ->write() method.  If
-	 * FUSE_CAP_SPLICE_READ is set in fuse_conn_info.want and the
-	 * kernel supports splicing from the fuse device, then the
-	 * data will be made available in pipe for supporting zero
-	 * copy data transfer.
-	 *
-	 * buf->count is guaranteed to be one (and thus buf->idx is
-	 * always zero). The write_buf handler must ensure that
-	 * bufv->off is correctly updated (reflecting the number of
-	 * bytes read from bufv->buf[0]).
-	 *
-	 * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
-	 * expected to reset the setuid and setgid bits.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_write
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param bufv buffer containing the data
-	 * @param off offset to write to
-	 * @param fi file information
-	 */
-	void (*write_buf) (fuse_req_t req, fuse_ino_t ino,
-			   struct fuse_bufvec *bufv, off_t off,
-			   struct fuse_file_info *fi);
-
-	/**
-	 * Callback function for the retrieve request
-	 *
-	 * Valid replies:
-	 *	fuse_reply_none
-	 *
-	 * @param req request handle
-	 * @param cookie user data supplied to fuse_lowlevel_notify_retrieve()
-	 * @param ino the inode number supplied to fuse_lowlevel_notify_retrieve()
-	 * @param offset the offset supplied to fuse_lowlevel_notify_retrieve()
-	 * @param bufv the buffer containing the returned data
-	 */
-	void (*retrieve_reply) (fuse_req_t req, void *cookie, fuse_ino_t ino,
-				off_t offset, struct fuse_bufvec *bufv);
-
-	/**
-	 * Forget about multiple inodes
-	 *
-	 * See description of the forget function for more
-	 * information.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_none
-	 *
-	 * @param req request handle
-	 */
-	void (*forget_multi) (fuse_req_t req, size_t count,
-			      struct fuse_forget_data *forgets);
-
-	/**
-	 * Acquire, modify or release a BSD file lock
-	 *
-	 * Note: if the locking methods are not implemented, the kernel
-	 * will still allow file locking to work locally.  Hence these are
-	 * only interesting for network filesystems and similar.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param fi file information
-	 * @param op the locking operation, see flock(2)
-	 */
-	void (*flock) (fuse_req_t req, fuse_ino_t ino,
-		       struct fuse_file_info *fi, int op);
-
-	/**
-	 * Allocate requested space. If this function returns success then
-	 * subsequent writes to the specified range shall not fail due to the lack
-	 * of free space on the file system storage media.
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
-	 * future fallocate() requests will fail with EOPNOTSUPP without being
-	 * send to the filesystem process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param offset starting point for allocated region
-	 * @param length size of allocated region
-	 * @param mode determines the operation to be performed on the given range,
-	 *             see fallocate(2)
-	 */
-	void (*fallocate) (fuse_req_t req, fuse_ino_t ino, int mode,
-		       off_t offset, off_t length, struct fuse_file_info *fi);
-
-	/**
-	 * Read directory with attributes
-	 *
-	 * Send a buffer filled using fuse_add_direntry_plus(), with size not
-	 * exceeding the requested size.  Send an empty buffer on end of
-	 * stream.
-	 *
-	 * fi->fh will contain the value set by the opendir method, or
-	 * will be undefined if the opendir method didn't set any value.
-	 *
-	 * In contrast to readdir() (which does not affect the lookup counts),
-	 * the lookup count of every entry returned by readdirplus(), except "."
-	 * and "..", is incremented by one.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_buf
-	 *   fuse_reply_data
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param size maximum number of bytes to send
-	 * @param off offset to continue reading the directory stream
-	 * @param fi file information
-	 */
-	void (*readdirplus) (fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
-			 struct fuse_file_info *fi);
-
-	/**
-	 * Copy a range of data from one file to another
-	 *
-	 * Performs an optimized copy between two file descriptors without the
-	 * additional cost of transferring data through the FUSE kernel module
-	 * to user space (glibc) and then back into the FUSE filesystem again.
-	 *
-	 * In case this method is not implemented, glibc falls back to reading
-	 * data from the source and writing to the destination. Effectively
-	 * doing an inefficient copy of the data.
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
-	 * future copy_file_range() requests will fail with EOPNOTSUPP without
-	 * being send to the filesystem process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_write
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino_in the inode number or the source file
-	 * @param off_in starting point from were the data should be read
-	 * @param fi_in file information of the source file
-	 * @param ino_out the inode number or the destination file
-	 * @param off_out starting point where the data should be written
-	 * @param fi_out file information of the destination file
-	 * @param len maximum size of the data to copy
-	 * @param flags passed along with the copy_file_range() syscall
-	 */
-	void (*copy_file_range) (fuse_req_t req, fuse_ino_t ino_in,
-				 off_t off_in, struct fuse_file_info *fi_in,
-				 fuse_ino_t ino_out, off_t off_out,
-				 struct fuse_file_info *fi_out, size_t len,
-				 int flags);
-
-	/**
-	 * Find next data or hole after the specified offset
-	 *
-	 * If this request is answered with an error code of ENOSYS, this is
-	 * treated as a permanent failure, i.e. all future lseek() requests will
-	 * fail with the same error code without being send to the filesystem
-	 * process.
-	 *
-	 * Valid replies:
-	 *   fuse_reply_lseek
-	 *   fuse_reply_err
-	 *
-	 * @param req request handle
-	 * @param ino the inode number
-	 * @param off offset to start search from
-	 * @param whence either SEEK_DATA or SEEK_HOLE
-	 * @param fi file information
-	 */
-	void (*lseek) (fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
-		       struct fuse_file_info *fi);
+    /**
+     * Initialize filesystem
+     *
+     * This function is called when libfuse establishes
+     * communication with the FUSE kernel module. The file system
+     * should use this module to inspect and/or modify the
+     * connection parameters provided in the `conn` structure.
+     *
+     * Note that some parameters may be overwritten by options
+     * passed to fuse_session_new() which take precedence over the
+     * values set in this handler.
+     *
+     * There's no reply to this function
+     *
+     * @param userdata the user data passed to fuse_session_new()
+     */
+    void (*init)(void *userdata, struct fuse_conn_info *conn);
+
+    /**
+     * Clean up filesystem.
+     *
+     * Called on filesystem exit. When this method is called, the
+     * connection to the kernel may be gone already, so that eg. calls
+     * to fuse_lowlevel_notify_* will fail.
+     *
+     * There's no reply to this function
+     *
+     * @param userdata the user data passed to fuse_session_new()
+     */
+    void (*destroy)(void *userdata);
+
+    /**
+     * Look up a directory entry by name and get its attributes.
+     *
+     * Valid replies:
+     *   fuse_reply_entry
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param parent inode number of the parent directory
+     * @param name the name to look up
+     */
+    void (*lookup)(fuse_req_t req, fuse_ino_t parent, const char *name);
+
+    /**
+     * Forget about an inode
+     *
+     * This function is called when the kernel removes an inode
+     * from its internal caches.
+     *
+     * The inode's lookup count increases by one for every call to
+     * fuse_reply_entry and fuse_reply_create. The nlookup parameter
+     * indicates by how much the lookup count should be decreased.
+     *
+     * Inodes with a non-zero lookup count may receive request from
+     * the kernel even after calls to unlink, rmdir or (when
+     * overwriting an existing file) rename. Filesystems must handle
+     * such requests properly and it is recommended to defer removal
+     * of the inode until the lookup count reaches zero. Calls to
+     * unlink, rmdir or rename will be followed closely by forget
+     * unless the file or directory is open, in which case the
+     * kernel issues forget only after the release or releasedir
+     * calls.
+     *
+     * Note that if a file system will be exported over NFS the
+     * inodes lifetime must extend even beyond forget. See the
+     * generation field in struct fuse_entry_param above.
+     *
+     * On unmount the lookup count for all inodes implicitly drops
+     * to zero. It is not guaranteed that the file system will
+     * receive corresponding forget messages for the affected
+     * inodes.
+     *
+     * Valid replies:
+     *   fuse_reply_none
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param nlookup the number of lookups to forget
+     */
+    void (*forget)(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup);
+
+    /**
+     * Get file attributes.
+     *
+     * If writeback caching is enabled, the kernel may have a
+     * better idea of a file's length than the FUSE file system
+     * (eg if there has been a write that extended the file size,
+     * but that has not yet been passed to the filesystem.n
+     *
+     * In this case, the st_size value provided by the file system
+     * will be ignored.
+     *
+     * Valid replies:
+     *   fuse_reply_attr
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi for future use, currently always NULL
+     */
+    void (*getattr)(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi);
+
+    /**
+     * Set file attributes
+     *
+     * In the 'attr' argument only members indicated by the 'to_set'
+     * bitmask contain valid values.  Other members contain undefined
+     * values.
+     *
+     * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+     * expected to reset the setuid and setgid bits if the file
+     * size or owner is being changed.
+     *
+     * If the setattr was invoked from the ftruncate() system call
+     * under Linux kernel versions 2.6.15 or later, the fi->fh will
+     * contain the value set by the open method or will be undefined
+     * if the open method didn't set any value.  Otherwise (not
+     * ftruncate call, or kernel version earlier than 2.6.15) the fi
+     * parameter will be NULL.
+     *
+     * Valid replies:
+     *   fuse_reply_attr
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param attr the attributes
+     * @param to_set bit mask of attributes which should be set
+     * @param fi file information, or NULL
+     */
+    void (*setattr)(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
+                    int to_set, struct fuse_file_info *fi);
+
+    /**
+     * Read symbolic link
+     *
+     * Valid replies:
+     *   fuse_reply_readlink
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     */
+    void (*readlink)(fuse_req_t req, fuse_ino_t ino);
+
+    /**
+     * Create file node
+     *
+     * Create a regular file, character device, block device, fifo or
+     * socket node.
+     *
+     * Valid replies:
+     *   fuse_reply_entry
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param parent inode number of the parent directory
+     * @param name to create
+     * @param mode file type and mode with which to create the new file
+     * @param rdev the device number (only valid if created file is a device)
+     */
+    void (*mknod)(fuse_req_t req, fuse_ino_t parent, const char *name,
+                  mode_t mode, dev_t rdev);
+
+    /**
+     * Create a directory
+     *
+     * Valid replies:
+     *   fuse_reply_entry
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param parent inode number of the parent directory
+     * @param name to create
+     * @param mode with which to create the new file
+     */
+    void (*mkdir)(fuse_req_t req, fuse_ino_t parent, const char *name,
+                  mode_t mode);
+
+    /**
+     * Remove a file
+     *
+     * If the file's inode's lookup count is non-zero, the file
+     * system is expected to postpone any removal of the inode
+     * until the lookup count reaches zero (see description of the
+     * forget function).
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param parent inode number of the parent directory
+     * @param name to remove
+     */
+    void (*unlink)(fuse_req_t req, fuse_ino_t parent, const char *name);
+
+    /**
+     * Remove a directory
+     *
+     * If the directory's inode's lookup count is non-zero, the
+     * file system is expected to postpone any removal of the
+     * inode until the lookup count reaches zero (see description
+     * of the forget function).
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param parent inode number of the parent directory
+     * @param name to remove
+     */
+    void (*rmdir)(fuse_req_t req, fuse_ino_t parent, const char *name);
+
+    /**
+     * Create a symbolic link
+     *
+     * Valid replies:
+     *   fuse_reply_entry
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param link the contents of the symbolic link
+     * @param parent inode number of the parent directory
+     * @param name to create
+     */
+    void (*symlink)(fuse_req_t req, const char *link, fuse_ino_t parent,
+                    const char *name);
+
+    /**
+     * Rename a file
+     *
+     * If the target exists it should be atomically replaced. If
+     * the target's inode's lookup count is non-zero, the file
+     * system is expected to postpone any removal of the inode
+     * until the lookup count reaches zero (see description of the
+     * forget function).
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent failure with error code EINVAL, i.e. all
+     * future bmap requests will fail with EINVAL without being
+     * send to the filesystem process.
+     *
+     * *flags* may be `RENAME_EXCHANGE` or `RENAME_NOREPLACE`. If
+     * RENAME_NOREPLACE is specified, the filesystem must not
+     * overwrite *newname* if it exists and return an error
+     * instead. If `RENAME_EXCHANGE` is specified, the filesystem
+     * must atomically exchange the two files, i.e. both must
+     * exist and neither may be deleted.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param parent inode number of the old parent directory
+     * @param name old name
+     * @param newparent inode number of the new parent directory
+     * @param newname new name
+     */
+    void (*rename)(fuse_req_t req, fuse_ino_t parent, const char *name,
+                   fuse_ino_t newparent, const char *newname,
+                   unsigned int flags);
+
+    /**
+     * Create a hard link
+     *
+     * Valid replies:
+     *   fuse_reply_entry
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the old inode number
+     * @param newparent inode number of the new parent directory
+     * @param newname new name to create
+     */
+    void (*link)(fuse_req_t req, fuse_ino_t ino, fuse_ino_t newparent,
+                 const char *newname);
+
+    /**
+     * Open a file
+     *
+     * Open flags are available in fi->flags. The following rules
+     * apply.
+     *
+     *  - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be
+     *    filtered out / handled by the kernel.
+     *
+     *  - Access modes (O_RDONLY, O_WRONLY, O_RDWR) should be used
+     *    by the filesystem to check if the operation is
+     *    permitted.  If the ``-o default_permissions`` mount
+     *    option is given, this check is already done by the
+     *    kernel before calling open() and may thus be omitted by
+     *    the filesystem.
+     *
+     *  - When writeback caching is enabled, the kernel may send
+     *    read requests even for files opened with O_WRONLY. The
+     *    filesystem should be prepared to handle this.
+     *
+     *  - When writeback caching is disabled, the filesystem is
+     *    expected to properly handle the O_APPEND flag and ensure
+     *    that each write is appending to the end of the file.
+     *
+     *  - When writeback caching is enabled, the kernel will
+     *    handle O_APPEND. However, unless all changes to the file
+     *    come through the kernel this will not work reliably. The
+     *    filesystem should thus either ignore the O_APPEND flag
+     *    (and let the kernel handle it), or return an error
+     *    (indicating that reliably O_APPEND is not available).
+     *
+     * Filesystem may store an arbitrary file handle (pointer,
+     * index, etc) in fi->fh, and use this in other all other file
+     * operations (read, write, flush, release, fsync).
+     *
+     * Filesystem may also implement stateless file I/O and not store
+     * anything in fi->fh.
+     *
+     * There are also some flags (direct_io, keep_cache) which the
+     * filesystem may set in fi, to change the way the file is opened.
+     * See fuse_file_info structure in <fuse_common.h> for more details.
+     *
+     * If this request is answered with an error code of ENOSYS
+     * and FUSE_CAP_NO_OPEN_SUPPORT is set in
+     * `fuse_conn_info.capable`, this is treated as success and
+     * future calls to open and release will also succeed without being
+     * sent to the filesystem process.
+     *
+     * Valid replies:
+     *   fuse_reply_open
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi file information
+     */
+    void (*open)(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi);
+
+    /**
+     * Read data
+     *
+     * Read should send exactly the number of bytes requested except
+     * on EOF or error, otherwise the rest of the data will be
+     * substituted with zeroes.  An exception to this is when the file
+     * has been opened in 'direct_io' mode, in which case the return
+     * value of the read system call will reflect the return value of
+     * this operation.
+     *
+     * fi->fh will contain the value set by the open method, or will
+     * be undefined if the open method didn't set any value.
+     *
+     * Valid replies:
+     *   fuse_reply_buf
+     *   fuse_reply_iov
+     *   fuse_reply_data
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param size number of bytes to read
+     * @param off offset to read from
+     * @param fi file information
+     */
+    void (*read)(fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
+                 struct fuse_file_info *fi);
+
+    /**
+     * Write data
+     *
+     * Write should return exactly the number of bytes requested
+     * except on error.  An exception to this is when the file has
+     * been opened in 'direct_io' mode, in which case the return value
+     * of the write system call will reflect the return value of this
+     * operation.
+     *
+     * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+     * expected to reset the setuid and setgid bits.
+     *
+     * fi->fh will contain the value set by the open method, or will
+     * be undefined if the open method didn't set any value.
+     *
+     * Valid replies:
+     *   fuse_reply_write
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param buf data to write
+     * @param size number of bytes to write
+     * @param off offset to write to
+     * @param fi file information
+     */
+    void (*write)(fuse_req_t req, fuse_ino_t ino, const char *buf, size_t size,
+                  off_t off, struct fuse_file_info *fi);
+
+    /**
+     * Flush method
+     *
+     * This is called on each close() of the opened file.
+     *
+     * Since file descriptors can be duplicated (dup, dup2, fork), for
+     * one open call there may be many flush calls.
+     *
+     * Filesystems shouldn't assume that flush will always be called
+     * after some writes, or that if will be called at all.
+     *
+     * fi->fh will contain the value set by the open method, or will
+     * be undefined if the open method didn't set any value.
+     *
+     * NOTE: the name of the method is misleading, since (unlike
+     * fsync) the filesystem is not forced to flush pending writes.
+     * One reason to flush data is if the filesystem wants to return
+     * write errors during close.  However, such use is non-portable
+     * because POSIX does not require [close] to wait for delayed I/O to
+     * complete.
+     *
+     * If the filesystem supports file locking operations (setlk,
+     * getlk) it should remove all locks belonging to 'fi->owner'.
+     *
+     * If this request is answered with an error code of ENOSYS,
+     * this is treated as success and future calls to flush() will
+     * succeed automatically without being send to the filesystem
+     * process.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi file information
+     *
+     * [close]:
+     * http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
+     */
+    void (*flush)(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi);
+
+    /**
+     * Release an open file
+     *
+     * Release is called when there are no more references to an open
+     * file: all file descriptors are closed and all memory mappings
+     * are unmapped.
+     *
+     * For every open call there will be exactly one release call (unless
+     * the filesystem is force-unmounted).
+     *
+     * The filesystem may reply with an error, but error values are
+     * not returned to close() or munmap() which triggered the
+     * release.
+     *
+     * fi->fh will contain the value set by the open method, or will
+     * be undefined if the open method didn't set any value.
+     * fi->flags will contain the same flags as for open.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi file information
+     */
+    void (*release)(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi);
+
+    /**
+     * Synchronize file contents
+     *
+     * If the datasync parameter is non-zero, then only the user data
+     * should be flushed, not the meta data.
+     *
+     * If this request is answered with an error code of ENOSYS,
+     * this is treated as success and future calls to fsync() will
+     * succeed automatically without being send to the filesystem
+     * process.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param datasync flag indicating if only data should be flushed
+     * @param fi file information
+     */
+    void (*fsync)(fuse_req_t req, fuse_ino_t ino, int datasync,
+                  struct fuse_file_info *fi);
+
+    /**
+     * Open a directory
+     *
+     * Filesystem may store an arbitrary file handle (pointer, index,
+     * etc) in fi->fh, and use this in other all other directory
+     * stream operations (readdir, releasedir, fsyncdir).
+     *
+     * If this request is answered with an error code of ENOSYS and
+     * FUSE_CAP_NO_OPENDIR_SUPPORT is set in `fuse_conn_info.capable`,
+     * this is treated as success and future calls to opendir and
+     * releasedir will also succeed without being sent to the filesystem
+     * process. In addition, the kernel will cache readdir results
+     * as if opendir returned FOPEN_KEEP_CACHE | FOPEN_CACHE_DIR.
+     *
+     * Valid replies:
+     *   fuse_reply_open
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi file information
+     */
+    void (*opendir)(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi);
+
+    /**
+     * Read directory
+     *
+     * Send a buffer filled using fuse_add_direntry(), with size not
+     * exceeding the requested size.  Send an empty buffer on end of
+     * stream.
+     *
+     * fi->fh will contain the value set by the opendir method, or
+     * will be undefined if the opendir method didn't set any value.
+     *
+     * Returning a directory entry from readdir() does not affect
+     * its lookup count.
+     *
+     * If off_t is non-zero, then it will correspond to one of the off_t
+     * values that was previously returned by readdir() for the same
+     * directory handle. In this case, readdir() should skip over entries
+     * coming before the position defined by the off_t value. If entries
+     * are added or removed while the directory handle is open, they filesystem
+     * may still include the entries that have been removed, and may not
+     * report the entries that have been created. However, addition or
+     * removal of entries must never cause readdir() to skip over unrelated
+     * entries or to report them more than once. This means
+     * that off_t can not be a simple index that enumerates the entries
+     * that have been returned but must contain sufficient information to
+     * uniquely determine the next directory entry to return even when the
+     * set of entries is changing.
+     *
+     * The function does not have to report the '.' and '..'
+     * entries, but is allowed to do so. Note that, if readdir does
+     * not return '.' or '..', they will not be implicitly returned,
+     * and this behavior is observable by the caller.
+     *
+     * Valid replies:
+     *   fuse_reply_buf
+     *   fuse_reply_data
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param size maximum number of bytes to send
+     * @param off offset to continue reading the directory stream
+     * @param fi file information
+     */
+    void (*readdir)(fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
+                    struct fuse_file_info *fi);
+
+    /**
+     * Release an open directory
+     *
+     * For every opendir call there will be exactly one releasedir
+     * call (unless the filesystem is force-unmounted).
+     *
+     * fi->fh will contain the value set by the opendir method, or
+     * will be undefined if the opendir method didn't set any value.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi file information
+     */
+    void (*releasedir)(fuse_req_t req, fuse_ino_t ino,
+                       struct fuse_file_info *fi);
+
+    /**
+     * Synchronize directory contents
+     *
+     * If the datasync parameter is non-zero, then only the directory
+     * contents should be flushed, not the meta data.
+     *
+     * fi->fh will contain the value set by the opendir method, or
+     * will be undefined if the opendir method didn't set any value.
+     *
+     * If this request is answered with an error code of ENOSYS,
+     * this is treated as success and future calls to fsyncdir() will
+     * succeed automatically without being send to the filesystem
+     * process.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param datasync flag indicating if only data should be flushed
+     * @param fi file information
+     */
+    void (*fsyncdir)(fuse_req_t req, fuse_ino_t ino, int datasync,
+                     struct fuse_file_info *fi);
+
+    /**
+     * Get file system statistics
+     *
+     * Valid replies:
+     *   fuse_reply_statfs
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number, zero means "undefined"
+     */
+    void (*statfs)(fuse_req_t req, fuse_ino_t ino);
+
+    /**
+     * Set an extended attribute
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+     * future setxattr() requests will fail with EOPNOTSUPP without being
+     * send to the filesystem process.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     */
+    void (*setxattr)(fuse_req_t req, fuse_ino_t ino, const char *name,
+                     const char *value, size_t size, int flags);
+
+    /**
+     * Get an extended attribute
+     *
+     * If size is zero, the size of the value should be sent with
+     * fuse_reply_xattr.
+     *
+     * If the size is non-zero, and the value fits in the buffer, the
+     * value should be sent with fuse_reply_buf.
+     *
+     * If the size is too small for the value, the ERANGE error should
+     * be sent.
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+     * future getxattr() requests will fail with EOPNOTSUPP without being
+     * send to the filesystem process.
+     *
+     * Valid replies:
+     *   fuse_reply_buf
+     *   fuse_reply_data
+     *   fuse_reply_xattr
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param name of the extended attribute
+     * @param size maximum size of the value to send
+     */
+    void (*getxattr)(fuse_req_t req, fuse_ino_t ino, const char *name,
+                     size_t size);
+
+    /**
+     * List extended attribute names
+     *
+     * If size is zero, the total size of the attribute list should be
+     * sent with fuse_reply_xattr.
+     *
+     * If the size is non-zero, and the null character separated
+     * attribute list fits in the buffer, the list should be sent with
+     * fuse_reply_buf.
+     *
+     * If the size is too small for the list, the ERANGE error should
+     * be sent.
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+     * future listxattr() requests will fail with EOPNOTSUPP without being
+     * send to the filesystem process.
+     *
+     * Valid replies:
+     *   fuse_reply_buf
+     *   fuse_reply_data
+     *   fuse_reply_xattr
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param size maximum size of the list to send
+     */
+    void (*listxattr)(fuse_req_t req, fuse_ino_t ino, size_t size);
+
+    /**
+     * Remove an extended attribute
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+     * future removexattr() requests will fail with EOPNOTSUPP without being
+     * send to the filesystem process.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param name of the extended attribute
+     */
+    void (*removexattr)(fuse_req_t req, fuse_ino_t ino, const char *name);
+
+    /**
+     * Check file access permissions
+     *
+     * This will be called for the access() and chdir() system
+     * calls.  If the 'default_permissions' mount option is given,
+     * this method is not called.
+     *
+     * This method is not called under Linux kernel versions 2.4.x
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent success, i.e. this and all future access()
+     * requests will succeed without being send to the filesystem process.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param mask requested access mode
+     */
+    void (*access)(fuse_req_t req, fuse_ino_t ino, int mask);
+
+    /**
+     * Create and open a file
+     *
+     * If the file does not exist, first create it with the specified
+     * mode, and then open it.
+     *
+     * See the description of the open handler for more
+     * information.
+     *
+     * If this method is not implemented or under Linux kernel
+     * versions earlier than 2.6.15, the mknod() and open() methods
+     * will be called instead.
+     *
+     * If this request is answered with an error code of ENOSYS, the handler
+     * is treated as not implemented (i.e., for this and future requests the
+     * mknod() and open() handlers will be called instead).
+     *
+     * Valid replies:
+     *   fuse_reply_create
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param parent inode number of the parent directory
+     * @param name to create
+     * @param mode file type and mode with which to create the new file
+     * @param fi file information
+     */
+    void (*create)(fuse_req_t req, fuse_ino_t parent, const char *name,
+                   mode_t mode, struct fuse_file_info *fi);
+
+    /**
+     * Test for a POSIX file lock
+     *
+     * Valid replies:
+     *   fuse_reply_lock
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi file information
+     * @param lock the region/type to test
+     */
+    void (*getlk)(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
+                  struct flock *lock);
+
+    /**
+     * Acquire, modify or release a POSIX file lock
+     *
+     * For POSIX threads (NPTL) there's a 1-1 relation between pid and
+     * owner, but otherwise this is not always the case.  For checking
+     * lock ownership, 'fi->owner' must be used.  The l_pid field in
+     * 'struct flock' should only be used to fill in this field in
+     * getlk().
+     *
+     * Note: if the locking methods are not implemented, the kernel
+     * will still allow file locking to work locally.  Hence these are
+     * only interesting for network filesystems and similar.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi file information
+     * @param lock the region/type to set
+     * @param sleep locking operation may sleep
+     */
+    void (*setlk)(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
+                  struct flock *lock, int sleep);
+
+    /**
+     * Map block index within file to block index within device
+     *
+     * Note: This makes sense only for block device backed filesystems
+     * mounted with the 'blkdev' option
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent failure, i.e. all future bmap() requests will
+     * fail with the same error code without being send to the filesystem
+     * process.
+     *
+     * Valid replies:
+     *   fuse_reply_bmap
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param blocksize unit of block index
+     * @param idx block index within file
+     */
+    void (*bmap)(fuse_req_t req, fuse_ino_t ino, size_t blocksize,
+                 uint64_t idx);
+
+    /**
+     * Ioctl
+     *
+     * Note: For unrestricted ioctls (not allowed for FUSE
+     * servers), data in and out areas can be discovered by giving
+     * iovs and setting FUSE_IOCTL_RETRY in *flags*.  For
+     * restricted ioctls, kernel prepares in/out data area
+     * according to the information encoded in cmd.
+     *
+     * Valid replies:
+     *   fuse_reply_ioctl_retry
+     *   fuse_reply_ioctl
+     *   fuse_reply_ioctl_iov
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param cmd ioctl command
+     * @param arg ioctl argument
+     * @param fi file information
+     * @param flags for FUSE_IOCTL_* flags
+     * @param in_buf data fetched from the caller
+     * @param in_bufsz number of fetched bytes
+     * @param out_bufsz maximum size of output data
+     *
+     * Note : the unsigned long request submitted by the application
+     * is truncated to 32 bits.
+     */
+    void (*ioctl)(fuse_req_t req, fuse_ino_t ino, unsigned int cmd, void *arg,
+                  struct fuse_file_info *fi, unsigned flags, const void *in_buf,
+                  size_t in_bufsz, size_t out_bufsz);
+
+    /**
+     * Poll for IO readiness
+     *
+     * Note: If ph is non-NULL, the client should notify
+     * when IO readiness events occur by calling
+     * fuse_lowlevel_notify_poll() with the specified ph.
+     *
+     * Regardless of the number of times poll with a non-NULL ph
+     * is received, single notification is enough to clear all.
+     * Notifying more times incurs overhead but doesn't harm
+     * correctness.
+     *
+     * The callee is responsible for destroying ph with
+     * fuse_pollhandle_destroy() when no longer in use.
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as success (with a kernel-defined default poll-mask) and
+     * future calls to pull() will succeed the same way without being send
+     * to the filesystem process.
+     *
+     * Valid replies:
+     *   fuse_reply_poll
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi file information
+     * @param ph poll handle to be used for notification
+     */
+    void (*poll)(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
+                 struct fuse_pollhandle *ph);
+
+    /**
+     * Write data made available in a buffer
+     *
+     * This is a more generic version of the ->write() method.  If
+     * FUSE_CAP_SPLICE_READ is set in fuse_conn_info.want and the
+     * kernel supports splicing from the fuse device, then the
+     * data will be made available in pipe for supporting zero
+     * copy data transfer.
+     *
+     * buf->count is guaranteed to be one (and thus buf->idx is
+     * always zero). The write_buf handler must ensure that
+     * bufv->off is correctly updated (reflecting the number of
+     * bytes read from bufv->buf[0]).
+     *
+     * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
+     * expected to reset the setuid and setgid bits.
+     *
+     * Valid replies:
+     *   fuse_reply_write
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param bufv buffer containing the data
+     * @param off offset to write to
+     * @param fi file information
+     */
+    void (*write_buf)(fuse_req_t req, fuse_ino_t ino, struct fuse_bufvec *bufv,
+                      off_t off, struct fuse_file_info *fi);
+
+    /**
+     * Callback function for the retrieve request
+     *
+     * Valid replies:
+     *  fuse_reply_none
+     *
+     * @param req request handle
+     * @param cookie user data supplied to fuse_lowlevel_notify_retrieve()
+     * @param ino the inode number supplied to fuse_lowlevel_notify_retrieve()
+     * @param offset the offset supplied to fuse_lowlevel_notify_retrieve()
+     * @param bufv the buffer containing the returned data
+     */
+    void (*retrieve_reply)(fuse_req_t req, void *cookie, fuse_ino_t ino,
+                           off_t offset, struct fuse_bufvec *bufv);
+
+    /**
+     * Forget about multiple inodes
+     *
+     * See description of the forget function for more
+     * information.
+     *
+     * Valid replies:
+     *   fuse_reply_none
+     *
+     * @param req request handle
+     */
+    void (*forget_multi)(fuse_req_t req, size_t count,
+                         struct fuse_forget_data *forgets);
+
+    /**
+     * Acquire, modify or release a BSD file lock
+     *
+     * Note: if the locking methods are not implemented, the kernel
+     * will still allow file locking to work locally.  Hence these are
+     * only interesting for network filesystems and similar.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param fi file information
+     * @param op the locking operation, see flock(2)
+     */
+    void (*flock)(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
+                  int op);
+
+    /**
+     * Allocate requested space. If this function returns success then
+     * subsequent writes to the specified range shall not fail due to the lack
+     * of free space on the file system storage media.
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+     * future fallocate() requests will fail with EOPNOTSUPP without being
+     * send to the filesystem process.
+     *
+     * Valid replies:
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param offset starting point for allocated region
+     * @param length size of allocated region
+     * @param mode determines the operation to be performed on the given range,
+     *             see fallocate(2)
+     */
+    void (*fallocate)(fuse_req_t req, fuse_ino_t ino, int mode, off_t offset,
+                      off_t length, struct fuse_file_info *fi);
+
+    /**
+     * Read directory with attributes
+     *
+     * Send a buffer filled using fuse_add_direntry_plus(), with size not
+     * exceeding the requested size.  Send an empty buffer on end of
+     * stream.
+     *
+     * fi->fh will contain the value set by the opendir method, or
+     * will be undefined if the opendir method didn't set any value.
+     *
+     * In contrast to readdir() (which does not affect the lookup counts),
+     * the lookup count of every entry returned by readdirplus(), except "."
+     * and "..", is incremented by one.
+     *
+     * Valid replies:
+     *   fuse_reply_buf
+     *   fuse_reply_data
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param size maximum number of bytes to send
+     * @param off offset to continue reading the directory stream
+     * @param fi file information
+     */
+    void (*readdirplus)(fuse_req_t req, fuse_ino_t ino, size_t size, off_t off,
+                        struct fuse_file_info *fi);
+
+    /**
+     * Copy a range of data from one file to another
+     *
+     * Performs an optimized copy between two file descriptors without the
+     * additional cost of transferring data through the FUSE kernel module
+     * to user space (glibc) and then back into the FUSE filesystem again.
+     *
+     * In case this method is not implemented, glibc falls back to reading
+     * data from the source and writing to the destination. Effectively
+     * doing an inefficient copy of the data.
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent failure with error code EOPNOTSUPP, i.e. all
+     * future copy_file_range() requests will fail with EOPNOTSUPP without
+     * being send to the filesystem process.
+     *
+     * Valid replies:
+     *   fuse_reply_write
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino_in the inode number or the source file
+     * @param off_in starting point from were the data should be read
+     * @param fi_in file information of the source file
+     * @param ino_out the inode number or the destination file
+     * @param off_out starting point where the data should be written
+     * @param fi_out file information of the destination file
+     * @param len maximum size of the data to copy
+     * @param flags passed along with the copy_file_range() syscall
+     */
+    void (*copy_file_range)(fuse_req_t req, fuse_ino_t ino_in, off_t off_in,
+                            struct fuse_file_info *fi_in, fuse_ino_t ino_out,
+                            off_t off_out, struct fuse_file_info *fi_out,
+                            size_t len, int flags);
+
+    /**
+     * Find next data or hole after the specified offset
+     *
+     * If this request is answered with an error code of ENOSYS, this is
+     * treated as a permanent failure, i.e. all future lseek() requests will
+     * fail with the same error code without being send to the filesystem
+     * process.
+     *
+     * Valid replies:
+     *   fuse_reply_lseek
+     *   fuse_reply_err
+     *
+     * @param req request handle
+     * @param ino the inode number
+     * @param off offset to start search from
+     * @param whence either SEEK_DATA or SEEK_HOLE
+     * @param fi file information
+     */
+    void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
+                  struct fuse_file_info *fi);
 };
 
 /**
@@ -1305,7 +1307,7 @@ int fuse_reply_entry(fuse_req_t req, const struct fuse_entry_param *e);
  * @return zero for success, -errno for failure to send reply
  */
 int fuse_reply_create(fuse_req_t req, const struct fuse_entry_param *e,
-		      const struct fuse_file_info *fi);
+                      const struct fuse_file_info *fi);
 
 /**
  * Reply with attributes
@@ -1315,11 +1317,11 @@ int fuse_reply_create(fuse_req_t req, const struct fuse_entry_param *e,
  *
  * @param req request handle
  * @param attr the attributes
- * @param attr_timeout	validity timeout (in seconds) for the attributes
+ * @param attr_timeout validity timeout (in seconds) for the attributes
  * @return zero for success, -errno for failure to send reply
  */
 int fuse_reply_attr(fuse_req_t req, const struct stat *attr,
-		    double attr_timeout);
+                    double attr_timeout);
 
 /**
  * Reply with the contents of a symbolic link
@@ -1417,7 +1419,7 @@ int fuse_reply_buf(fuse_req_t req, const char *buf, size_t size);
  * @return zero for success, -errno for failure to send reply
  */
 int fuse_reply_data(fuse_req_t req, struct fuse_bufvec *bufv,
-		    enum fuse_buf_copy_flags flags);
+                    enum fuse_buf_copy_flags flags);
 
 /**
  * Reply with data vector
@@ -1480,9 +1482,9 @@ int fuse_reply_lock(fuse_req_t req, const struct flock *lock);
  */
 int fuse_reply_bmap(fuse_req_t req, uint64_t idx);
 
-/* ----------------------------------------------------------- *
- * Filling a buffer in readdir				       *
- * ----------------------------------------------------------- */
+/*
+ * Filling a buffer in readdir
+ */
 
 /**
  * Add a directory entry to the buffer
@@ -1512,8 +1514,7 @@ int fuse_reply_bmap(fuse_req_t req, uint64_t idx);
  * @return the space needed for the entry
  */
 size_t fuse_add_direntry(fuse_req_t req, char *buf, size_t bufsize,
-			 const char *name, const struct stat *stbuf,
-			 off_t off);
+                         const char *name, const struct stat *stbuf, off_t off);
 
 /**
  * Add a directory entry to the buffer with the attributes
@@ -1529,8 +1530,8 @@ size_t fuse_add_direntry(fuse_req_t req, char *buf, size_t bufsize,
  * @return the space needed for the entry
  */
 size_t fuse_add_direntry_plus(fuse_req_t req, char *buf, size_t bufsize,
-			      const char *name,
-			      const struct fuse_entry_param *e, off_t off);
+                              const char *name,
+                              const struct fuse_entry_param *e, off_t off);
 
 /**
  * Reply to ask for data fetch and output buffer preparation.  ioctl
@@ -1547,9 +1548,9 @@ size_t fuse_add_direntry_plus(fuse_req_t req, char *buf, size_t bufsize,
  * @param out_count number of entries in out_iov
  * @return zero for success, -errno for failure to send reply
  */
-int fuse_reply_ioctl_retry(fuse_req_t req,
-			   const struct iovec *in_iov, size_t in_count,
-			   const struct iovec *out_iov, size_t out_count);
+int fuse_reply_ioctl_retry(fuse_req_t req, const struct iovec *in_iov,
+                           size_t in_count, const struct iovec *out_iov,
+                           size_t out_count);
 
 /**
  * Reply to finish ioctl
@@ -1576,7 +1577,7 @@ int fuse_reply_ioctl(fuse_req_t req, int result, const void *buf, size_t size);
  * @param count the size of vector
  */
 int fuse_reply_ioctl_iov(fuse_req_t req, int result, const struct iovec *iov,
-			 int count);
+                         int count);
 
 /**
  * Reply with poll result event mask
@@ -1598,9 +1599,9 @@ int fuse_reply_poll(fuse_req_t req, unsigned revents);
  */
 int fuse_reply_lseek(fuse_req_t req, off_t off);
 
-/* ----------------------------------------------------------- *
- * Notification						       *
- * ----------------------------------------------------------- */
+/*
+ * Notification
+ */
 
 /**
  * Notify IO readiness event
@@ -1635,7 +1636,7 @@ int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph);
  * @return zero for success, -errno for failure
  */
 int fuse_lowlevel_notify_inval_inode(struct fuse_session *se, fuse_ino_t ino,
-				     off_t off, off_t len);
+                                     off_t off, off_t len);
 
 /**
  * Notify to invalidate parent attributes and the dentry matching
@@ -1663,7 +1664,7 @@ int fuse_lowlevel_notify_inval_inode(struct fuse_session *se, fuse_ino_t ino,
  * @return zero for success, -errno for failure
  */
 int fuse_lowlevel_notify_inval_entry(struct fuse_session *se, fuse_ino_t parent,
-				     const char *name, size_t namelen);
+                                     const char *name, size_t namelen);
 
 /**
  * This function behaves like fuse_lowlevel_notify_inval_entry() with
@@ -1693,9 +1694,9 @@ int fuse_lowlevel_notify_inval_entry(struct fuse_session *se, fuse_ino_t parent,
  * @param namelen strlen() of file name
  * @return zero for success, -errno for failure
  */
-int fuse_lowlevel_notify_delete(struct fuse_session *se,
-				fuse_ino_t parent, fuse_ino_t child,
-				const char *name, size_t namelen);
+int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
+                                fuse_ino_t child, const char *name,
+                                size_t namelen);
 
 /**
  * Store data to the kernel buffers
@@ -1723,8 +1724,8 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
  * @return zero for success, -errno for failure
  */
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
-			       off_t offset, struct fuse_bufvec *bufv,
-			       enum fuse_buf_copy_flags flags);
+                               off_t offset, struct fuse_bufvec *bufv,
+                               enum fuse_buf_copy_flags flags);
 /**
  * Retrieve data from the kernel buffers
  *
@@ -1755,12 +1756,12 @@ int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
  * @return zero for success, -errno for failure
  */
 int fuse_lowlevel_notify_retrieve(struct fuse_session *se, fuse_ino_t ino,
-				  size_t size, off_t offset, void *cookie);
+                                  size_t size, off_t offset, void *cookie);
 
 
-/* ----------------------------------------------------------- *
- * Utility functions					       *
- * ----------------------------------------------------------- */
+/*
+ * Utility functions
+ */
 
 /**
  * Get the userdata from the request
@@ -1822,7 +1823,7 @@ typedef void (*fuse_interrupt_func_t)(fuse_req_t req, void *data);
  * @param data user data passed to the callback function
  */
 void fuse_req_interrupt_func(fuse_req_t req, fuse_interrupt_func_t func,
-			     void *data);
+                             void *data);
 
 /**
  * Check if a request has already been interrupted
@@ -1833,9 +1834,9 @@ void fuse_req_interrupt_func(fuse_req_t req, fuse_interrupt_func_t func,
 int fuse_req_interrupted(fuse_req_t req);
 
 
-/* ----------------------------------------------------------- *
- * Inquiry functions                                           *
- * ----------------------------------------------------------- */
+/*
+ * Inquiry functions
+ */
 
 /**
  * Print low-level version information to stdout.
@@ -1854,20 +1855,20 @@ void fuse_lowlevel_help(void);
  */
 void fuse_cmdline_help(void);
 
-/* ----------------------------------------------------------- *
- * Filesystem setup & teardown                                 *
- * ----------------------------------------------------------- */
+/*
+ * Filesystem setup & teardown
+ */
 
 struct fuse_cmdline_opts {
-	int singlethread;
-	int foreground;
-	int debug;
-	int nodefault_subtype;
-	char *mountpoint;
-	int show_version;
-	int show_help;
-	int clone_fd;
-	unsigned int max_idle_threads;
+    int singlethread;
+    int foreground;
+    int debug;
+    int nodefault_subtype;
+    char *mountpoint;
+    int show_version;
+    int show_help;
+    int clone_fd;
+    unsigned int max_idle_threads;
 };
 
 /**
@@ -1888,8 +1889,7 @@ struct fuse_cmdline_opts {
  * @param opts output argument for parsed options
  * @return 0 on success, -1 on failure
  */
-int fuse_parse_cmdline(struct fuse_args *args,
-		       struct fuse_cmdline_opts *opts);
+int fuse_parse_cmdline(struct fuse_args *args, struct fuse_cmdline_opts *opts);
 
 /**
  * Create a low level session.
@@ -1920,8 +1920,8 @@ int fuse_parse_cmdline(struct fuse_args *args,
  * @return the fuse session on success, NULL on failure
  **/
 struct fuse_session *fuse_session_new(struct fuse_args *args,
-				      const struct fuse_lowlevel_ops *op,
-				      size_t op_size, void *userdata);
+                                      const struct fuse_lowlevel_ops *op,
+                                      size_t op_size, void *userdata);
 
 /**
  * Mount a FUSE file system.
@@ -1965,14 +1965,15 @@ int fuse_session_loop(struct fuse_session *se);
  * fuse_session_loop().
  *
  * @param se the session
- * @param config session loop configuration 
+ * @param config session loop configuration
  * @return see fuse_session_loop()
  */
 #if FUSE_USE_VERSION < 32
 int fuse_session_loop_mt_31(struct fuse_session *se, int clone_fd);
 #define fuse_session_loop_mt(se, clone_fd) fuse_session_loop_mt_31(se, clone_fd)
 #else
-int fuse_session_loop_mt(struct fuse_session *se, struct fuse_loop_config *config);
+int fuse_session_loop_mt(struct fuse_session *se,
+                         struct fuse_loop_config *config);
 #endif
 
 /**
@@ -2034,9 +2035,9 @@ void fuse_session_unmount(struct fuse_session *se);
  */
 void fuse_session_destroy(struct fuse_session *se);
 
-/* ----------------------------------------------------------- *
- * Custom event loop support                                   *
- * ----------------------------------------------------------- */
+/*
+ * Custom event loop support
+ */
 
 /**
  * Return file descriptor for communication with kernel.
@@ -2063,7 +2064,7 @@ int fuse_session_fd(struct fuse_session *se);
  * @param buf the fuse_buf containing the request
  */
 void fuse_session_process_buf(struct fuse_session *se,
-			      const struct fuse_buf *buf);
+                              const struct fuse_buf *buf);
 
 /**
  * Read a raw request from the kernel into the supplied buffer.
diff --git a/tools/virtiofsd/fuse_misc.h b/tools/virtiofsd/fuse_misc.h
index 2f6663ed7d..f252baa752 100644
--- a/tools/virtiofsd/fuse_misc.h
+++ b/tools/virtiofsd/fuse_misc.h
@@ -1,18 +1,18 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB
+ */
 
 #include <pthread.h>
 
 /*
-  Versioned symbols cannot be used in some cases because it
-    - confuse the dynamic linker in uClibc
-    - not supported on MacOSX (in MachO binary format)
-*/
+ * Versioned symbols cannot be used in some cases because it
+ *   - confuse the dynamic linker in uClibc
+ *   - not supported on MacOSX (in MachO binary format)
+ */
 #if (!defined(__UCLIBC__) && !defined(__APPLE__))
 #define FUSE_SYMVER(x) __asm__(x)
 #else
@@ -25,11 +25,11 @@
 /* Is this hack still needed? */
 static inline void fuse_mutex_init(pthread_mutex_t *mut)
 {
-	pthread_mutexattr_t attr;
-	pthread_mutexattr_init(&attr);
-	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ADAPTIVE_NP);
-	pthread_mutex_init(mut, &attr);
-	pthread_mutexattr_destroy(&attr);
+    pthread_mutexattr_t attr;
+    pthread_mutexattr_init(&attr);
+    pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ADAPTIVE_NP);
+    pthread_mutex_init(mut, &attr);
+    pthread_mutexattr_destroy(&attr);
 }
 #endif
 
diff --git a/tools/virtiofsd/fuse_opt.c b/tools/virtiofsd/fuse_opt.c
index 93066b926e..edd36f4a3b 100644
--- a/tools/virtiofsd/fuse_opt.c
+++ b/tools/virtiofsd/fuse_opt.c
@@ -1,423 +1,450 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  Implementation of option parsing routines (dealing with `struct
-  fuse_args`).
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * Implementation of option parsing routines (dealing with `struct
+ * fuse_args`).
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB
+ */
 
+#include "fuse_opt.h"
 #include "config.h"
 #include "fuse_i.h"
-#include "fuse_opt.h"
 #include "fuse_misc.h"
 
+#include <assert.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
-#include <assert.h>
 
 struct fuse_opt_context {
-	void *data;
-	const struct fuse_opt *opt;
-	fuse_opt_proc_t proc;
-	int argctr;
-	int argc;
-	char **argv;
-	struct fuse_args outargs;
-	char *opts;
-	int nonopt;
+    void *data;
+    const struct fuse_opt *opt;
+    fuse_opt_proc_t proc;
+    int argctr;
+    int argc;
+    char **argv;
+    struct fuse_args outargs;
+    char *opts;
+    int nonopt;
 };
 
 void fuse_opt_free_args(struct fuse_args *args)
 {
-	if (args) {
-		if (args->argv && args->allocated) {
-			int i;
-			for (i = 0; i < args->argc; i++)
-				free(args->argv[i]);
-			free(args->argv);
-		}
-		args->argc = 0;
-		args->argv = NULL;
-		args->allocated = 0;
-	}
+    if (args) {
+        if (args->argv && args->allocated) {
+            int i;
+            for (i = 0; i < args->argc; i++) {
+                free(args->argv[i]);
+            }
+            free(args->argv);
+        }
+        args->argc = 0;
+        args->argv = NULL;
+        args->allocated = 0;
+    }
 }
 
 static int alloc_failed(void)
 {
-	fuse_log(FUSE_LOG_ERR, "fuse: memory allocation failed\n");
-	return -1;
+    fuse_log(FUSE_LOG_ERR, "fuse: memory allocation failed\n");
+    return -1;
 }
 
 int fuse_opt_add_arg(struct fuse_args *args, const char *arg)
 {
-	char **newargv;
-	char *newarg;
-
-	assert(!args->argv || args->allocated);
-
-	newarg = strdup(arg);
-	if (!newarg)
-		return alloc_failed();
-
-	newargv = realloc(args->argv, (args->argc + 2) * sizeof(char *));
-	if (!newargv) {
-		free(newarg);
-		return alloc_failed();
-	}
-
-	args->argv = newargv;
-	args->allocated = 1;
-	args->argv[args->argc++] = newarg;
-	args->argv[args->argc] = NULL;
-	return 0;
+    char **newargv;
+    char *newarg;
+
+    assert(!args->argv || args->allocated);
+
+    newarg = strdup(arg);
+    if (!newarg) {
+        return alloc_failed();
+    }
+
+    newargv = realloc(args->argv, (args->argc + 2) * sizeof(char *));
+    if (!newargv) {
+        free(newarg);
+        return alloc_failed();
+    }
+
+    args->argv = newargv;
+    args->allocated = 1;
+    args->argv[args->argc++] = newarg;
+    args->argv[args->argc] = NULL;
+    return 0;
 }
 
 static int fuse_opt_insert_arg_common(struct fuse_args *args, int pos,
-				      const char *arg)
+                                      const char *arg)
 {
-	assert(pos <= args->argc);
-	if (fuse_opt_add_arg(args, arg) == -1)
-		return -1;
-
-	if (pos != args->argc - 1) {
-		char *newarg = args->argv[args->argc - 1];
-		memmove(&args->argv[pos + 1], &args->argv[pos],
-			sizeof(char *) * (args->argc - pos - 1));
-		args->argv[pos] = newarg;
-	}
-	return 0;
+    assert(pos <= args->argc);
+    if (fuse_opt_add_arg(args, arg) == -1) {
+        return -1;
+    }
+
+    if (pos != args->argc - 1) {
+        char *newarg = args->argv[args->argc - 1];
+        memmove(&args->argv[pos + 1], &args->argv[pos],
+                sizeof(char *) * (args->argc - pos - 1));
+        args->argv[pos] = newarg;
+    }
+    return 0;
 }
 
 int fuse_opt_insert_arg(struct fuse_args *args, int pos, const char *arg)
 {
-	return fuse_opt_insert_arg_common(args, pos, arg);
+    return fuse_opt_insert_arg_common(args, pos, arg);
 }
 
 static int next_arg(struct fuse_opt_context *ctx, const char *opt)
 {
-	if (ctx->argctr + 1 >= ctx->argc) {
-		fuse_log(FUSE_LOG_ERR, "fuse: missing argument after `%s'\n", opt);
-		return -1;
-	}
-	ctx->argctr++;
-	return 0;
+    if (ctx->argctr + 1 >= ctx->argc) {
+        fuse_log(FUSE_LOG_ERR, "fuse: missing argument after `%s'\n", opt);
+        return -1;
+    }
+    ctx->argctr++;
+    return 0;
 }
 
 static int add_arg(struct fuse_opt_context *ctx, const char *arg)
 {
-	return fuse_opt_add_arg(&ctx->outargs, arg);
+    return fuse_opt_add_arg(&ctx->outargs, arg);
 }
 
 static int add_opt_common(char **opts, const char *opt, int esc)
 {
-	unsigned oldlen = *opts ? strlen(*opts) : 0;
-	char *d = realloc(*opts, oldlen + 1 + strlen(opt) * 2 + 1);
-
-	if (!d)
-		return alloc_failed();
-
-	*opts = d;
-	if (oldlen) {
-		d += oldlen;
-		*d++ = ',';
-	}
-
-	for (; *opt; opt++) {
-		if (esc && (*opt == ',' || *opt == '\\'))
-			*d++ = '\\';
-		*d++ = *opt;
-	}
-	*d = '\0';
-
-	return 0;
+    unsigned oldlen = *opts ? strlen(*opts) : 0;
+    char *d = realloc(*opts, oldlen + 1 + strlen(opt) * 2 + 1);
+
+    if (!d) {
+        return alloc_failed();
+    }
+
+    *opts = d;
+    if (oldlen) {
+        d += oldlen;
+        *d++ = ',';
+    }
+
+    for (; *opt; opt++) {
+        if (esc && (*opt == ',' || *opt == '\\')) {
+            *d++ = '\\';
+        }
+        *d++ = *opt;
+    }
+    *d = '\0';
+
+    return 0;
 }
 
 int fuse_opt_add_opt(char **opts, const char *opt)
 {
-	return add_opt_common(opts, opt, 0);
+    return add_opt_common(opts, opt, 0);
 }
 
 int fuse_opt_add_opt_escaped(char **opts, const char *opt)
 {
-	return add_opt_common(opts, opt, 1);
+    return add_opt_common(opts, opt, 1);
 }
 
 static int add_opt(struct fuse_opt_context *ctx, const char *opt)
 {
-	return add_opt_common(&ctx->opts, opt, 1);
+    return add_opt_common(&ctx->opts, opt, 1);
 }
 
 static int call_proc(struct fuse_opt_context *ctx, const char *arg, int key,
-		     int iso)
+                     int iso)
 {
-	if (key == FUSE_OPT_KEY_DISCARD)
-		return 0;
-
-	if (key != FUSE_OPT_KEY_KEEP && ctx->proc) {
-		int res = ctx->proc(ctx->data, arg, key, &ctx->outargs);
-		if (res == -1 || !res)
-			return res;
-	}
-	if (iso)
-		return add_opt(ctx, arg);
-	else
-		return add_arg(ctx, arg);
+    if (key == FUSE_OPT_KEY_DISCARD) {
+        return 0;
+    }
+
+    if (key != FUSE_OPT_KEY_KEEP && ctx->proc) {
+        int res = ctx->proc(ctx->data, arg, key, &ctx->outargs);
+        if (res == -1 || !res) {
+            return res;
+        }
+    }
+    if (iso) {
+        return add_opt(ctx, arg);
+    } else {
+        return add_arg(ctx, arg);
+    }
 }
 
 static int match_template(const char *t, const char *arg, unsigned *sepp)
 {
-	int arglen = strlen(arg);
-	const char *sep = strchr(t, '=');
-	sep = sep ? sep : strchr(t, ' ');
-	if (sep && (!sep[1] || sep[1] == '%')) {
-		int tlen = sep - t;
-		if (sep[0] == '=')
-			tlen ++;
-		if (arglen >= tlen && strncmp(arg, t, tlen) == 0) {
-			*sepp = sep - t;
-			return 1;
-		}
-	}
-	if (strcmp(t, arg) == 0) {
-		*sepp = 0;
-		return 1;
-	}
-	return 0;
+    int arglen = strlen(arg);
+    const char *sep = strchr(t, '=');
+    sep = sep ? sep : strchr(t, ' ');
+    if (sep && (!sep[1] || sep[1] == '%')) {
+        int tlen = sep - t;
+        if (sep[0] == '=') {
+            tlen++;
+        }
+        if (arglen >= tlen && strncmp(arg, t, tlen) == 0) {
+            *sepp = sep - t;
+            return 1;
+        }
+    }
+    if (strcmp(t, arg) == 0) {
+        *sepp = 0;
+        return 1;
+    }
+    return 0;
 }
 
 static const struct fuse_opt *find_opt(const struct fuse_opt *opt,
-				       const char *arg, unsigned *sepp)
+                                       const char *arg, unsigned *sepp)
 {
-	for (; opt && opt->templ; opt++)
-		if (match_template(opt->templ, arg, sepp))
-			return opt;
-	return NULL;
+    for (; opt && opt->templ; opt++) {
+        if (match_template(opt->templ, arg, sepp)) {
+            return opt;
+        }
+    }
+    return NULL;
 }
 
 int fuse_opt_match(const struct fuse_opt *opts, const char *opt)
 {
-	unsigned dummy;
-	return find_opt(opts, opt, &dummy) ? 1 : 0;
+    unsigned dummy;
+    return find_opt(opts, opt, &dummy) ? 1 : 0;
 }
 
 static int process_opt_param(void *var, const char *format, const char *param,
-			     const char *arg)
+                             const char *arg)
 {
-	assert(format[0] == '%');
-	if (format[1] == 's') {
-		char **s = var;
-		char *copy = strdup(param);
-		if (!copy)
-			return alloc_failed();
-
-		free(*s);
-		*s = copy;
-	} else {
-		if (sscanf(param, format, var) != 1) {
-			fuse_log(FUSE_LOG_ERR, "fuse: invalid parameter in option `%s'\n", arg);
-			return -1;
-		}
-	}
-	return 0;
+    assert(format[0] == '%');
+    if (format[1] == 's') {
+        char **s = var;
+        char *copy = strdup(param);
+        if (!copy) {
+            return alloc_failed();
+        }
+
+        free(*s);
+        *s = copy;
+    } else {
+        if (sscanf(param, format, var) != 1) {
+            fuse_log(FUSE_LOG_ERR, "fuse: invalid parameter in option `%s'\n",
+                     arg);
+            return -1;
+        }
+    }
+    return 0;
 }
 
-static int process_opt(struct fuse_opt_context *ctx,
-		       const struct fuse_opt *opt, unsigned sep,
-		       const char *arg, int iso)
+static int process_opt(struct fuse_opt_context *ctx, const struct fuse_opt *opt,
+                       unsigned sep, const char *arg, int iso)
 {
-	if (opt->offset == -1U) {
-		if (call_proc(ctx, arg, opt->value, iso) == -1)
-			return -1;
-	} else {
-		void *var = (char *)ctx->data + opt->offset;
-		if (sep && opt->templ[sep + 1]) {
-			const char *param = arg + sep;
-			if (opt->templ[sep] == '=')
-				param ++;
-			if (process_opt_param(var, opt->templ + sep + 1,
-					      param, arg) == -1)
-				return -1;
-		} else
-			*(int *)var = opt->value;
-	}
-	return 0;
+    if (opt->offset == -1U) {
+        if (call_proc(ctx, arg, opt->value, iso) == -1) {
+            return -1;
+        }
+    } else {
+        void *var = (char *)ctx->data + opt->offset;
+        if (sep && opt->templ[sep + 1]) {
+            const char *param = arg + sep;
+            if (opt->templ[sep] == '=') {
+                param++;
+            }
+            if (process_opt_param(var, opt->templ + sep + 1, param, arg) ==
+                -1) {
+                return -1;
+            }
+        } else {
+            *(int *)var = opt->value;
+        }
+    }
+    return 0;
 }
 
 static int process_opt_sep_arg(struct fuse_opt_context *ctx,
-			       const struct fuse_opt *opt, unsigned sep,
-			       const char *arg, int iso)
+                               const struct fuse_opt *opt, unsigned sep,
+                               const char *arg, int iso)
 {
-	int res;
-	char *newarg;
-	char *param;
-
-	if (next_arg(ctx, arg) == -1)
-		return -1;
-
-	param = ctx->argv[ctx->argctr];
-	newarg = malloc(sep + strlen(param) + 1);
-	if (!newarg)
-		return alloc_failed();
-
-	memcpy(newarg, arg, sep);
-	strcpy(newarg + sep, param);
-	res = process_opt(ctx, opt, sep, newarg, iso);
-	free(newarg);
-
-	return res;
+    int res;
+    char *newarg;
+    char *param;
+
+    if (next_arg(ctx, arg) == -1) {
+        return -1;
+    }
+
+    param = ctx->argv[ctx->argctr];
+    newarg = malloc(sep + strlen(param) + 1);
+    if (!newarg) {
+        return alloc_failed();
+    }
+
+    memcpy(newarg, arg, sep);
+    strcpy(newarg + sep, param);
+    res = process_opt(ctx, opt, sep, newarg, iso);
+    free(newarg);
+
+    return res;
 }
 
 static int process_gopt(struct fuse_opt_context *ctx, const char *arg, int iso)
 {
-	unsigned sep;
-	const struct fuse_opt *opt = find_opt(ctx->opt, arg, &sep);
-	if (opt) {
-		for (; opt; opt = find_opt(opt + 1, arg, &sep)) {
-			int res;
-			if (sep && opt->templ[sep] == ' ' && !arg[sep])
-				res = process_opt_sep_arg(ctx, opt, sep, arg,
-							  iso);
-			else
-				res = process_opt(ctx, opt, sep, arg, iso);
-			if (res == -1)
-				return -1;
-		}
-		return 0;
-	} else
-		return call_proc(ctx, arg, FUSE_OPT_KEY_OPT, iso);
+    unsigned sep;
+    const struct fuse_opt *opt = find_opt(ctx->opt, arg, &sep);
+    if (opt) {
+        for (; opt; opt = find_opt(opt + 1, arg, &sep)) {
+            int res;
+            if (sep && opt->templ[sep] == ' ' && !arg[sep]) {
+                res = process_opt_sep_arg(ctx, opt, sep, arg, iso);
+            } else {
+                res = process_opt(ctx, opt, sep, arg, iso);
+            }
+            if (res == -1) {
+                return -1;
+            }
+        }
+        return 0;
+    } else {
+        return call_proc(ctx, arg, FUSE_OPT_KEY_OPT, iso);
+    }
 }
 
 static int process_real_option_group(struct fuse_opt_context *ctx, char *opts)
 {
-	char *s = opts;
-	char *d = s;
-	int end = 0;
-
-	while (!end) {
-		if (*s == '\0')
-			end = 1;
-		if (*s == ',' || end) {
-			int res;
-
-			*d = '\0';
-			res = process_gopt(ctx, opts, 1);
-			if (res == -1)
-				return -1;
-			d = opts;
-		} else {
-			if (s[0] == '\\' && s[1] != '\0') {
-				s++;
-				if (s[0] >= '0' && s[0] <= '3' &&
-				    s[1] >= '0' && s[1] <= '7' &&
-				    s[2] >= '0' && s[2] <= '7') {
-					*d++ = (s[0] - '0') * 0100 +
-						(s[1] - '0') * 0010 +
-						(s[2] - '0');
-					s += 2;
-				} else {
-					*d++ = *s;
-				}
-			} else {
-				*d++ = *s;
-			}
-		}
-		s++;
-	}
-
-	return 0;
+    char *s = opts;
+    char *d = s;
+    int end = 0;
+
+    while (!end) {
+        if (*s == '\0') {
+            end = 1;
+        }
+        if (*s == ',' || end) {
+            int res;
+
+            *d = '\0';
+            res = process_gopt(ctx, opts, 1);
+            if (res == -1) {
+                return -1;
+            }
+            d = opts;
+        } else {
+            if (s[0] == '\\' && s[1] != '\0') {
+                s++;
+                if (s[0] >= '0' && s[0] <= '3' && s[1] >= '0' && s[1] <= '7' &&
+                    s[2] >= '0' && s[2] <= '7') {
+                    *d++ = (s[0] - '0') * 0100 + (s[1] - '0') * 0010 +
+                           (s[2] - '0');
+                    s += 2;
+                } else {
+                    *d++ = *s;
+                }
+            } else {
+                *d++ = *s;
+            }
+        }
+        s++;
+    }
+
+    return 0;
 }
 
 static int process_option_group(struct fuse_opt_context *ctx, const char *opts)
 {
-	int res;
-	char *copy = strdup(opts);
-
-	if (!copy) {
-		fuse_log(FUSE_LOG_ERR, "fuse: memory allocation failed\n");
-		return -1;
-	}
-	res = process_real_option_group(ctx, copy);
-	free(copy);
-	return res;
+    int res;
+    char *copy = strdup(opts);
+
+    if (!copy) {
+        fuse_log(FUSE_LOG_ERR, "fuse: memory allocation failed\n");
+        return -1;
+    }
+    res = process_real_option_group(ctx, copy);
+    free(copy);
+    return res;
 }
 
 static int process_one(struct fuse_opt_context *ctx, const char *arg)
 {
-	if (ctx->nonopt || arg[0] != '-')
-		return call_proc(ctx, arg, FUSE_OPT_KEY_NONOPT, 0);
-	else if (arg[1] == 'o') {
-		if (arg[2])
-			return process_option_group(ctx, arg + 2);
-		else {
-			if (next_arg(ctx, arg) == -1)
-				return -1;
-
-			return process_option_group(ctx,
-						    ctx->argv[ctx->argctr]);
-		}
-	} else if (arg[1] == '-' && !arg[2]) {
-		if (add_arg(ctx, arg) == -1)
-			return -1;
-		ctx->nonopt = ctx->outargs.argc;
-		return 0;
-	} else
-		return process_gopt(ctx, arg, 0);
+    if (ctx->nonopt || arg[0] != '-') {
+        return call_proc(ctx, arg, FUSE_OPT_KEY_NONOPT, 0);
+    } else if (arg[1] == 'o') {
+        if (arg[2]) {
+            return process_option_group(ctx, arg + 2);
+        } else {
+            if (next_arg(ctx, arg) == -1) {
+                return -1;
+            }
+
+            return process_option_group(ctx, ctx->argv[ctx->argctr]);
+        }
+    } else if (arg[1] == '-' && !arg[2]) {
+        if (add_arg(ctx, arg) == -1) {
+            return -1;
+        }
+        ctx->nonopt = ctx->outargs.argc;
+        return 0;
+    } else {
+        return process_gopt(ctx, arg, 0);
+    }
 }
 
 static int opt_parse(struct fuse_opt_context *ctx)
 {
-	if (ctx->argc) {
-		if (add_arg(ctx, ctx->argv[0]) == -1)
-			return -1;
-	}
-
-	for (ctx->argctr = 1; ctx->argctr < ctx->argc; ctx->argctr++)
-		if (process_one(ctx, ctx->argv[ctx->argctr]) == -1)
-			return -1;
-
-	if (ctx->opts) {
-		if (fuse_opt_insert_arg(&ctx->outargs, 1, "-o") == -1 ||
-		    fuse_opt_insert_arg(&ctx->outargs, 2, ctx->opts) == -1)
-			return -1;
-	}
-
-	/* If option separator ("--") is the last argument, remove it */
-	if (ctx->nonopt && ctx->nonopt == ctx->outargs.argc &&
-	    strcmp(ctx->outargs.argv[ctx->outargs.argc - 1], "--") == 0) {
-		free(ctx->outargs.argv[ctx->outargs.argc - 1]);
-		ctx->outargs.argv[--ctx->outargs.argc] = NULL;
-	}
-
-	return 0;
+    if (ctx->argc) {
+        if (add_arg(ctx, ctx->argv[0]) == -1) {
+            return -1;
+        }
+    }
+
+    for (ctx->argctr = 1; ctx->argctr < ctx->argc; ctx->argctr++) {
+        if (process_one(ctx, ctx->argv[ctx->argctr]) == -1) {
+            return -1;
+        }
+    }
+
+    if (ctx->opts) {
+        if (fuse_opt_insert_arg(&ctx->outargs, 1, "-o") == -1 ||
+            fuse_opt_insert_arg(&ctx->outargs, 2, ctx->opts) == -1) {
+            return -1;
+        }
+    }
+
+    /* If option separator ("--") is the last argument, remove it */
+    if (ctx->nonopt && ctx->nonopt == ctx->outargs.argc &&
+        strcmp(ctx->outargs.argv[ctx->outargs.argc - 1], "--") == 0) {
+        free(ctx->outargs.argv[ctx->outargs.argc - 1]);
+        ctx->outargs.argv[--ctx->outargs.argc] = NULL;
+    }
+
+    return 0;
 }
 
 int fuse_opt_parse(struct fuse_args *args, void *data,
-		   const struct fuse_opt opts[], fuse_opt_proc_t proc)
+                   const struct fuse_opt opts[], fuse_opt_proc_t proc)
 {
-	int res;
-	struct fuse_opt_context ctx = {
-		.data = data,
-		.opt = opts,
-		.proc = proc,
-	};
-
-	if (!args || !args->argv || !args->argc)
-		return 0;
-
-	ctx.argc = args->argc;
-	ctx.argv = args->argv;
-
-	res = opt_parse(&ctx);
-	if (res != -1) {
-		struct fuse_args tmp = *args;
-		*args = ctx.outargs;
-		ctx.outargs = tmp;
-	}
-	free(ctx.opts);
-	fuse_opt_free_args(&ctx.outargs);
-	return res;
+    int res;
+    struct fuse_opt_context ctx = {
+        .data = data,
+        .opt = opts,
+        .proc = proc,
+    };
+
+    if (!args || !args->argv || !args->argc) {
+        return 0;
+    }
+
+    ctx.argc = args->argc;
+    ctx.argv = args->argv;
+
+    res = opt_parse(&ctx);
+    if (res != -1) {
+        struct fuse_args tmp = *args;
+        *args = ctx.outargs;
+        ctx.outargs = tmp;
+    }
+    free(ctx.opts);
+    fuse_opt_free_args(&ctx.outargs);
+    return res;
 }
diff --git a/tools/virtiofsd/fuse_opt.h b/tools/virtiofsd/fuse_opt.h
index 69102555be..8f59b4d301 100644
--- a/tools/virtiofsd/fuse_opt.h
+++ b/tools/virtiofsd/fuse_opt.h
@@ -1,10 +1,10 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB.
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB.
+ */
 
 #ifndef FUSE_OPT_H_
 #define FUSE_OPT_H_
@@ -37,7 +37,7 @@
  *
  *  - 'offsetof(struct foo, member)'  actions i) and iii)
  *
- *  - -1			      action ii)
+ *  - -1                              action ii)
  *
  * The 'offsetof()' macro is defined in the <stddef.h> header.
  *
@@ -48,7 +48,7 @@
  *
  * The types of templates are:
  *
- * 1) "-x", "-foo", "--foo", "--foo-bar", etc.	These match only
+ * 1) "-x", "-foo", "--foo", "--foo-bar", etc. These match only
  *   themselves.  Invalid values are "--" and anything beginning
  *   with "-o"
  *
@@ -71,58 +71,67 @@
  * freed.
  */
 struct fuse_opt {
-	/** Matching template and optional parameter formatting */
-	const char *templ;
+    /** Matching template and optional parameter formatting */
+    const char *templ;
 
-	/**
-	 * Offset of variable within 'data' parameter of fuse_opt_parse()
-	 * or -1
-	 */
-	unsigned long offset;
+    /**
+     * Offset of variable within 'data' parameter of fuse_opt_parse()
+     * or -1
+     */
+    unsigned long offset;
 
-	/**
-	 * Value to set the variable to, or to be passed as 'key' to the
-	 * processing function.	 Ignored if template has a format
-	 */
-	int value;
+    /**
+     * Value to set the variable to, or to be passed as 'key' to the
+     * processing function. Ignored if template has a format
+     */
+    int value;
 };
 
 /**
- * Key option.	In case of a match, the processing function will be
+ * Key option. In case of a match, the processing function will be
  * called with the specified key.
  */
-#define FUSE_OPT_KEY(templ, key) { templ, -1U, key }
+#define FUSE_OPT_KEY(templ, key) \
+    {                            \
+        templ, -1U, key          \
+    }
 
 /**
- * Last option.	 An array of 'struct fuse_opt' must end with a NULL
+ * Last option. An array of 'struct fuse_opt' must end with a NULL
  * template value
  */
-#define FUSE_OPT_END { NULL, 0, 0 }
+#define FUSE_OPT_END \
+    {                \
+        NULL, 0, 0   \
+    }
 
 /**
  * Argument list
  */
 struct fuse_args {
-	/** Argument count */
-	int argc;
+    /** Argument count */
+    int argc;
 
-	/** Argument vector.  NULL terminated */
-	char **argv;
+    /** Argument vector.  NULL terminated */
+    char **argv;
 
-	/** Is 'argv' allocated? */
-	int allocated;
+    /** Is 'argv' allocated? */
+    int allocated;
 };
 
 /**
  * Initializer for 'struct fuse_args'
  */
-#define FUSE_ARGS_INIT(argc, argv) { argc, argv, 0 }
+#define FUSE_ARGS_INIT(argc, argv) \
+    {                              \
+        argc, argv, 0              \
+    }
 
 /**
  * Key value passed to the processing function if an option did not
  * match any template
  */
-#define FUSE_OPT_KEY_OPT     -1
+#define FUSE_OPT_KEY_OPT -1
 
 /**
  * Key value passed to the processing function for all non-options
@@ -130,7 +139,7 @@ struct fuse_args {
  * Non-options are the arguments beginning with a character other than
  * '-' or all arguments after the special '--' option
  */
-#define FUSE_OPT_KEY_NONOPT  -2
+#define FUSE_OPT_KEY_NONOPT -2
 
 /**
  * Special key value for options to keep
@@ -174,7 +183,7 @@ struct fuse_args {
  * @return -1 on error, 0 if arg is to be discarded, 1 if arg should be kept
  */
 typedef int (*fuse_opt_proc_t)(void *data, const char *arg, int key,
-			       struct fuse_args *outargs);
+                               struct fuse_args *outargs);
 
 /**
  * Option parsing function
@@ -197,7 +206,7 @@ typedef int (*fuse_opt_proc_t)(void *data, const char *arg, int key,
  * @return -1 on error, 0 on success
  */
 int fuse_opt_parse(struct fuse_args *args, void *data,
-		   const struct fuse_opt opts[], fuse_opt_proc_t proc);
+                   const struct fuse_opt opts[], fuse_opt_proc_t proc);
 
 /**
  * Add an option to a comma separated option list
diff --git a/tools/virtiofsd/fuse_signals.c b/tools/virtiofsd/fuse_signals.c
index 4271947bd4..19d6791cb9 100644
--- a/tools/virtiofsd/fuse_signals.c
+++ b/tools/virtiofsd/fuse_signals.c
@@ -1,91 +1,95 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  Utility functions for setting signal handlers.
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * Utility functions for setting signal handlers.
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB
+ */
 
 #include "config.h"
-#include "fuse_lowlevel.h"
 #include "fuse_i.h"
+#include "fuse_lowlevel.h"
 
-#include <stdio.h>
-#include <string.h>
 #include <signal.h>
+#include <stdio.h>
 #include <stdlib.h>
+#include <string.h>
 
 static struct fuse_session *fuse_instance;
 
 static void exit_handler(int sig)
 {
-	if (fuse_instance) {
-		fuse_session_exit(fuse_instance);
-		if(sig <= 0) {
-			fuse_log(FUSE_LOG_ERR, "assertion error: signal value <= 0\n");
-			abort();
-		}
-		fuse_instance->error = sig;
-	}
+    if (fuse_instance) {
+        fuse_session_exit(fuse_instance);
+        if (sig <= 0) {
+            fuse_log(FUSE_LOG_ERR, "assertion error: signal value <= 0\n");
+            abort();
+        }
+        fuse_instance->error = sig;
+    }
 }
 
 static void do_nothing(int sig)
 {
-	(void) sig;
+    (void)sig;
 }
 
 static int set_one_signal_handler(int sig, void (*handler)(int), int remove)
 {
-	struct sigaction sa;
-	struct sigaction old_sa;
+    struct sigaction sa;
+    struct sigaction old_sa;
 
-	memset(&sa, 0, sizeof(struct sigaction));
-	sa.sa_handler = remove ? SIG_DFL : handler;
-	sigemptyset(&(sa.sa_mask));
-	sa.sa_flags = 0;
+    memset(&sa, 0, sizeof(struct sigaction));
+    sa.sa_handler = remove ? SIG_DFL : handler;
+    sigemptyset(&(sa.sa_mask));
+    sa.sa_flags = 0;
 
-	if (sigaction(sig, NULL, &old_sa) == -1) {
-		perror("fuse: cannot get old signal handler");
-		return -1;
-	}
+    if (sigaction(sig, NULL, &old_sa) == -1) {
+        perror("fuse: cannot get old signal handler");
+        return -1;
+    }
 
-	if (old_sa.sa_handler == (remove ? handler : SIG_DFL) &&
-	    sigaction(sig, &sa, NULL) == -1) {
-		perror("fuse: cannot set signal handler");
-		return -1;
-	}
-	return 0;
+    if (old_sa.sa_handler == (remove ? handler : SIG_DFL) &&
+        sigaction(sig, &sa, NULL) == -1) {
+        perror("fuse: cannot set signal handler");
+        return -1;
+    }
+    return 0;
 }
 
 int fuse_set_signal_handlers(struct fuse_session *se)
 {
-	/* If we used SIG_IGN instead of the do_nothing function,
-	   then we would be unable to tell if we set SIG_IGN (and
-	   thus should reset to SIG_DFL in fuse_remove_signal_handlers)
-	   or if it was already set to SIG_IGN (and should be left
-	   untouched. */
-	if (set_one_signal_handler(SIGHUP, exit_handler, 0) == -1 ||
-	    set_one_signal_handler(SIGINT, exit_handler, 0) == -1 ||
-	    set_one_signal_handler(SIGTERM, exit_handler, 0) == -1 ||
-	    set_one_signal_handler(SIGPIPE, do_nothing, 0) == -1)
-		return -1;
+    /*
+     * If we used SIG_IGN instead of the do_nothing function,
+     * then we would be unable to tell if we set SIG_IGN (and
+     * thus should reset to SIG_DFL in fuse_remove_signal_handlers)
+     * or if it was already set to SIG_IGN (and should be left
+     * untouched.
+     */
+    if (set_one_signal_handler(SIGHUP, exit_handler, 0) == -1 ||
+        set_one_signal_handler(SIGINT, exit_handler, 0) == -1 ||
+        set_one_signal_handler(SIGTERM, exit_handler, 0) == -1 ||
+        set_one_signal_handler(SIGPIPE, do_nothing, 0) == -1) {
+        return -1;
+    }
 
-	fuse_instance = se;
-	return 0;
+    fuse_instance = se;
+    return 0;
 }
 
 void fuse_remove_signal_handlers(struct fuse_session *se)
 {
-	if (fuse_instance != se)
-		fuse_log(FUSE_LOG_ERR,
-			"fuse: fuse_remove_signal_handlers: unknown session\n");
-	else
-		fuse_instance = NULL;
+    if (fuse_instance != se) {
+        fuse_log(FUSE_LOG_ERR,
+                 "fuse: fuse_remove_signal_handlers: unknown session\n");
+    } else {
+        fuse_instance = NULL;
+    }
 
-	set_one_signal_handler(SIGHUP, exit_handler, 1);
-	set_one_signal_handler(SIGINT, exit_handler, 1);
-	set_one_signal_handler(SIGTERM, exit_handler, 1);
-	set_one_signal_handler(SIGPIPE, do_nothing, 1);
+    set_one_signal_handler(SIGHUP, exit_handler, 1);
+    set_one_signal_handler(SIGINT, exit_handler, 1);
+    set_one_signal_handler(SIGTERM, exit_handler, 1);
+    set_one_signal_handler(SIGPIPE, do_nothing, 1);
 }
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 511f981ce6..e077943558 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -1,302 +1,315 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * Helper functions to create (simple) standalone programs. With the
+ * aid of these functions it should be possible to create full FUSE
+ * file system by implementing nothing but the request handlers.
 
-  Helper functions to create (simple) standalone programs. With the
-  aid of these functions it should be possible to create full FUSE
-  file system by implementing nothing but the request handlers.
-
-  This program can be distributed under the terms of the GNU LGPLv2.
-  See the file COPYING.LIB.
-*/
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB.
+ */
 
 #include "config.h"
 #include "fuse_i.h"
+#include "fuse_lowlevel.h"
 #include "fuse_misc.h"
 #include "fuse_opt.h"
-#include "fuse_lowlevel.h"
 #include "mount_util.h"
 
+#include <errno.h>
+#include <limits.h>
+#include <stddef.h>
 #include <stdio.h>
 #include <stdlib.h>
-#include <stddef.h>
-#include <unistd.h>
 #include <string.h>
-#include <limits.h>
-#include <errno.h>
 #include <sys/param.h>
+#include <unistd.h>
 
-#define FUSE_HELPER_OPT(t, p) \
-	{ t, offsetof(struct fuse_cmdline_opts, p), 1 }
+#define FUSE_HELPER_OPT(t, p)                       \
+    {                                               \
+        t, offsetof(struct fuse_cmdline_opts, p), 1 \
+    }
 
 static const struct fuse_opt fuse_helper_opts[] = {
-	FUSE_HELPER_OPT("-h",		show_help),
-	FUSE_HELPER_OPT("--help",	show_help),
-	FUSE_HELPER_OPT("-V",		show_version),
-	FUSE_HELPER_OPT("--version",	show_version),
-	FUSE_HELPER_OPT("-d",		debug),
-	FUSE_HELPER_OPT("debug",	debug),
-	FUSE_HELPER_OPT("-d",		foreground),
-	FUSE_HELPER_OPT("debug",	foreground),
-	FUSE_OPT_KEY("-d",		FUSE_OPT_KEY_KEEP),
-	FUSE_OPT_KEY("debug",		FUSE_OPT_KEY_KEEP),
-	FUSE_HELPER_OPT("-f",		foreground),
-	FUSE_HELPER_OPT("-s",		singlethread),
-	FUSE_HELPER_OPT("fsname=",	nodefault_subtype),
-	FUSE_OPT_KEY("fsname=",		FUSE_OPT_KEY_KEEP),
-	FUSE_HELPER_OPT("subtype=",	nodefault_subtype),
-	FUSE_OPT_KEY("subtype=",	FUSE_OPT_KEY_KEEP),
-	FUSE_HELPER_OPT("clone_fd",	clone_fd),
-	FUSE_HELPER_OPT("max_idle_threads=%u", max_idle_threads),
-	FUSE_OPT_END
+    FUSE_HELPER_OPT("-h", show_help),
+    FUSE_HELPER_OPT("--help", show_help),
+    FUSE_HELPER_OPT("-V", show_version),
+    FUSE_HELPER_OPT("--version", show_version),
+    FUSE_HELPER_OPT("-d", debug),
+    FUSE_HELPER_OPT("debug", debug),
+    FUSE_HELPER_OPT("-d", foreground),
+    FUSE_HELPER_OPT("debug", foreground),
+    FUSE_OPT_KEY("-d", FUSE_OPT_KEY_KEEP),
+    FUSE_OPT_KEY("debug", FUSE_OPT_KEY_KEEP),
+    FUSE_HELPER_OPT("-f", foreground),
+    FUSE_HELPER_OPT("-s", singlethread),
+    FUSE_HELPER_OPT("fsname=", nodefault_subtype),
+    FUSE_OPT_KEY("fsname=", FUSE_OPT_KEY_KEEP),
+    FUSE_HELPER_OPT("subtype=", nodefault_subtype),
+    FUSE_OPT_KEY("subtype=", FUSE_OPT_KEY_KEEP),
+    FUSE_HELPER_OPT("clone_fd", clone_fd),
+    FUSE_HELPER_OPT("max_idle_threads=%u", max_idle_threads),
+    FUSE_OPT_END
 };
 
 struct fuse_conn_info_opts {
-	int atomic_o_trunc;
-	int no_remote_posix_lock;
-	int no_remote_flock;
-	int splice_write;
-	int splice_move;
-	int splice_read;
-	int no_splice_write;
-	int no_splice_move;
-	int no_splice_read;
-	int auto_inval_data;
-	int no_auto_inval_data;
-	int no_readdirplus;
-	int no_readdirplus_auto;
-	int async_dio;
-	int no_async_dio;
-	int writeback_cache;
-	int no_writeback_cache;
-	int async_read;
-	int sync_read;
-	unsigned max_write;
-	unsigned max_readahead;
-	unsigned max_background;
-	unsigned congestion_threshold;
-	unsigned time_gran;
-	int set_max_write;
-	int set_max_readahead;
-	int set_max_background;
-	int set_congestion_threshold;
-	int set_time_gran;
+    int atomic_o_trunc;
+    int no_remote_posix_lock;
+    int no_remote_flock;
+    int splice_write;
+    int splice_move;
+    int splice_read;
+    int no_splice_write;
+    int no_splice_move;
+    int no_splice_read;
+    int auto_inval_data;
+    int no_auto_inval_data;
+    int no_readdirplus;
+    int no_readdirplus_auto;
+    int async_dio;
+    int no_async_dio;
+    int writeback_cache;
+    int no_writeback_cache;
+    int async_read;
+    int sync_read;
+    unsigned max_write;
+    unsigned max_readahead;
+    unsigned max_background;
+    unsigned congestion_threshold;
+    unsigned time_gran;
+    int set_max_write;
+    int set_max_readahead;
+    int set_max_background;
+    int set_congestion_threshold;
+    int set_time_gran;
 };
 
-#define CONN_OPTION(t, p, v)					\
-	{ t, offsetof(struct fuse_conn_info_opts, p), v }
+#define CONN_OPTION(t, p, v)                          \
+    {                                                 \
+        t, offsetof(struct fuse_conn_info_opts, p), v \
+    }
 static const struct fuse_opt conn_info_opt_spec[] = {
-	CONN_OPTION("max_write=%u", max_write, 0),
-	CONN_OPTION("max_write=", set_max_write, 1),
-	CONN_OPTION("max_readahead=%u", max_readahead, 0),
-	CONN_OPTION("max_readahead=", set_max_readahead, 1),
-	CONN_OPTION("max_background=%u", max_background, 0),
-	CONN_OPTION("max_background=", set_max_background, 1),
-	CONN_OPTION("congestion_threshold=%u", congestion_threshold, 0),
-	CONN_OPTION("congestion_threshold=", set_congestion_threshold, 1),
-	CONN_OPTION("sync_read", sync_read, 1),
-	CONN_OPTION("async_read", async_read, 1),
-	CONN_OPTION("atomic_o_trunc", atomic_o_trunc, 1),
-	CONN_OPTION("no_remote_lock", no_remote_posix_lock, 1),
-	CONN_OPTION("no_remote_lock", no_remote_flock, 1),
-	CONN_OPTION("no_remote_flock", no_remote_flock, 1),
-	CONN_OPTION("no_remote_posix_lock", no_remote_posix_lock, 1),
-	CONN_OPTION("splice_write", splice_write, 1),
-	CONN_OPTION("no_splice_write", no_splice_write, 1),
-	CONN_OPTION("splice_move", splice_move, 1),
-	CONN_OPTION("no_splice_move", no_splice_move, 1),
-	CONN_OPTION("splice_read", splice_read, 1),
-	CONN_OPTION("no_splice_read", no_splice_read, 1),
-	CONN_OPTION("auto_inval_data", auto_inval_data, 1),
-	CONN_OPTION("no_auto_inval_data", no_auto_inval_data, 1),
-	CONN_OPTION("readdirplus=no", no_readdirplus, 1),
-	CONN_OPTION("readdirplus=yes", no_readdirplus, 0),
-	CONN_OPTION("readdirplus=yes", no_readdirplus_auto, 1),
-	CONN_OPTION("readdirplus=auto", no_readdirplus, 0),
-	CONN_OPTION("readdirplus=auto", no_readdirplus_auto, 0),
-	CONN_OPTION("async_dio", async_dio, 1),
-	CONN_OPTION("no_async_dio", no_async_dio, 1),
-	CONN_OPTION("writeback_cache", writeback_cache, 1),
-	CONN_OPTION("no_writeback_cache", no_writeback_cache, 1),
-	CONN_OPTION("time_gran=%u", time_gran, 0),
-	CONN_OPTION("time_gran=", set_time_gran, 1),
-	FUSE_OPT_END
+    CONN_OPTION("max_write=%u", max_write, 0),
+    CONN_OPTION("max_write=", set_max_write, 1),
+    CONN_OPTION("max_readahead=%u", max_readahead, 0),
+    CONN_OPTION("max_readahead=", set_max_readahead, 1),
+    CONN_OPTION("max_background=%u", max_background, 0),
+    CONN_OPTION("max_background=", set_max_background, 1),
+    CONN_OPTION("congestion_threshold=%u", congestion_threshold, 0),
+    CONN_OPTION("congestion_threshold=", set_congestion_threshold, 1),
+    CONN_OPTION("sync_read", sync_read, 1),
+    CONN_OPTION("async_read", async_read, 1),
+    CONN_OPTION("atomic_o_trunc", atomic_o_trunc, 1),
+    CONN_OPTION("no_remote_lock", no_remote_posix_lock, 1),
+    CONN_OPTION("no_remote_lock", no_remote_flock, 1),
+    CONN_OPTION("no_remote_flock", no_remote_flock, 1),
+    CONN_OPTION("no_remote_posix_lock", no_remote_posix_lock, 1),
+    CONN_OPTION("splice_write", splice_write, 1),
+    CONN_OPTION("no_splice_write", no_splice_write, 1),
+    CONN_OPTION("splice_move", splice_move, 1),
+    CONN_OPTION("no_splice_move", no_splice_move, 1),
+    CONN_OPTION("splice_read", splice_read, 1),
+    CONN_OPTION("no_splice_read", no_splice_read, 1),
+    CONN_OPTION("auto_inval_data", auto_inval_data, 1),
+    CONN_OPTION("no_auto_inval_data", no_auto_inval_data, 1),
+    CONN_OPTION("readdirplus=no", no_readdirplus, 1),
+    CONN_OPTION("readdirplus=yes", no_readdirplus, 0),
+    CONN_OPTION("readdirplus=yes", no_readdirplus_auto, 1),
+    CONN_OPTION("readdirplus=auto", no_readdirplus, 0),
+    CONN_OPTION("readdirplus=auto", no_readdirplus_auto, 0),
+    CONN_OPTION("async_dio", async_dio, 1),
+    CONN_OPTION("no_async_dio", no_async_dio, 1),
+    CONN_OPTION("writeback_cache", writeback_cache, 1),
+    CONN_OPTION("no_writeback_cache", no_writeback_cache, 1),
+    CONN_OPTION("time_gran=%u", time_gran, 0),
+    CONN_OPTION("time_gran=", set_time_gran, 1),
+    FUSE_OPT_END
 };
 
 
 void fuse_cmdline_help(void)
 {
-	printf("    -h   --help            print help\n"
-	       "    -V   --version         print version\n"
-	       "    -d   -o debug          enable debug output (implies -f)\n"
-	       "    -f                     foreground operation\n"
-	       "    -s                     disable multi-threaded operation\n"
-	       "    -o clone_fd            use separate fuse device fd for each thread\n"
-	       "                           (may improve performance)\n"
-	       "    -o max_idle_threads    the maximum number of idle worker threads\n"
-	       "                           allowed (default: 10)\n");
+    printf(
+        "    -h   --help            print help\n"
+        "    -V   --version         print version\n"
+        "    -d   -o debug          enable debug output (implies -f)\n"
+        "    -f                     foreground operation\n"
+        "    -s                     disable multi-threaded operation\n"
+        "    -o clone_fd            use separate fuse device fd for each "
+        "thread\n"
+        "                           (may improve performance)\n"
+        "    -o max_idle_threads    the maximum number of idle worker threads\n"
+        "                           allowed (default: 10)\n");
 }
 
 static int fuse_helper_opt_proc(void *data, const char *arg, int key,
-				struct fuse_args *outargs)
+                                struct fuse_args *outargs)
 {
-	(void) outargs;
-	struct fuse_cmdline_opts *opts = data;
-
-	switch (key) {
-	case FUSE_OPT_KEY_NONOPT:
-		if (!opts->mountpoint) {
-			if (fuse_mnt_parse_fuse_fd(arg) != -1) {
-				return fuse_opt_add_opt(&opts->mountpoint, arg);
-			}
-
-			char mountpoint[PATH_MAX] = "";
-			if (realpath(arg, mountpoint) == NULL) {
-				fuse_log(FUSE_LOG_ERR,
-					"fuse: bad mount point `%s': %s\n",
-					arg, strerror(errno));
-				return -1;
-			}
-			return fuse_opt_add_opt(&opts->mountpoint, mountpoint);
-		} else {
-			fuse_log(FUSE_LOG_ERR, "fuse: invalid argument `%s'\n", arg);
-			return -1;
-		}
-
-	default:
-		/* Pass through unknown options */
-		return 1;
-	}
+    (void)outargs;
+    struct fuse_cmdline_opts *opts = data;
+
+    switch (key) {
+    case FUSE_OPT_KEY_NONOPT:
+        if (!opts->mountpoint) {
+            if (fuse_mnt_parse_fuse_fd(arg) != -1) {
+                return fuse_opt_add_opt(&opts->mountpoint, arg);
+            }
+
+            char mountpoint[PATH_MAX] = "";
+            if (realpath(arg, mountpoint) == NULL) {
+                fuse_log(FUSE_LOG_ERR, "fuse: bad mount point `%s': %s\n", arg,
+                         strerror(errno));
+                return -1;
+            }
+            return fuse_opt_add_opt(&opts->mountpoint, mountpoint);
+        } else {
+            fuse_log(FUSE_LOG_ERR, "fuse: invalid argument `%s'\n", arg);
+            return -1;
+        }
+
+    default:
+        /* Pass through unknown options */
+        return 1;
+    }
 }
 
-int fuse_parse_cmdline(struct fuse_args *args,
-		       struct fuse_cmdline_opts *opts)
+int fuse_parse_cmdline(struct fuse_args *args, struct fuse_cmdline_opts *opts)
 {
-	memset(opts, 0, sizeof(struct fuse_cmdline_opts));
+    memset(opts, 0, sizeof(struct fuse_cmdline_opts));
 
-	opts->max_idle_threads = 10;
+    opts->max_idle_threads = 10;
 
-	if (fuse_opt_parse(args, opts, fuse_helper_opts,
-			   fuse_helper_opt_proc) == -1)
-		return -1;
+    if (fuse_opt_parse(args, opts, fuse_helper_opts, fuse_helper_opt_proc) ==
+        -1) {
+        return -1;
+    }
 
-	return 0;
+    return 0;
 }
 
 
 int fuse_daemonize(int foreground)
 {
-	if (!foreground) {
-		int nullfd;
-		int waiter[2];
-		char completed;
-
-		if (pipe(waiter)) {
-			perror("fuse_daemonize: pipe");
-			return -1;
-		}
-
-		/*
-		 * demonize current process by forking it and killing the
-		 * parent.  This makes current process as a child of 'init'.
-		 */
-		switch(fork()) {
-		case -1:
-			perror("fuse_daemonize: fork");
-			return -1;
-		case 0:
-			break;
-		default:
-			(void) read(waiter[0], &completed, sizeof(completed));
-			_exit(0);
-		}
-
-		if (setsid() == -1) {
-			perror("fuse_daemonize: setsid");
-			return -1;
-		}
-
-		(void) chdir("/");
-
-		nullfd = open("/dev/null", O_RDWR, 0);
-		if (nullfd != -1) {
-			(void) dup2(nullfd, 0);
-			(void) dup2(nullfd, 1);
-			(void) dup2(nullfd, 2);
-			if (nullfd > 2)
-				close(nullfd);
-		}
-
-		/* Propagate completion of daemon initialization */
-		completed = 1;
-		(void) write(waiter[1], &completed, sizeof(completed));
-		close(waiter[0]);
-		close(waiter[1]);
-	} else {
-		(void) chdir("/");
-	}
-	return 0;
+    if (!foreground) {
+        int nullfd;
+        int waiter[2];
+        char completed;
+
+        if (pipe(waiter)) {
+            perror("fuse_daemonize: pipe");
+            return -1;
+        }
+
+        /*
+         * demonize current process by forking it and killing the
+         * parent.  This makes current process as a child of 'init'.
+         */
+        switch (fork()) {
+        case -1:
+            perror("fuse_daemonize: fork");
+            return -1;
+        case 0:
+            break;
+        default:
+            (void)read(waiter[0], &completed, sizeof(completed));
+            _exit(0);
+        }
+
+        if (setsid() == -1) {
+            perror("fuse_daemonize: setsid");
+            return -1;
+        }
+
+        (void)chdir("/");
+
+        nullfd = open("/dev/null", O_RDWR, 0);
+        if (nullfd != -1) {
+            (void)dup2(nullfd, 0);
+            (void)dup2(nullfd, 1);
+            (void)dup2(nullfd, 2);
+            if (nullfd > 2) {
+                close(nullfd);
+            }
+        }
+
+        /* Propagate completion of daemon initialization */
+        completed = 1;
+        (void)write(waiter[1], &completed, sizeof(completed));
+        close(waiter[0]);
+        close(waiter[1]);
+    } else {
+        (void)chdir("/");
+    }
+    return 0;
 }
 
 void fuse_apply_conn_info_opts(struct fuse_conn_info_opts *opts,
-			       struct fuse_conn_info *conn)
+                               struct fuse_conn_info *conn)
 {
-	if(opts->set_max_write)
-		conn->max_write = opts->max_write;
-	if(opts->set_max_background)
-		conn->max_background = opts->max_background;
-	if(opts->set_congestion_threshold)
-		conn->congestion_threshold = opts->congestion_threshold;
-	if(opts->set_time_gran)
-		conn->time_gran = opts->time_gran;
-	if(opts->set_max_readahead)
-		conn->max_readahead = opts->max_readahead;
-
-#define LL_ENABLE(cond,cap) \
-	if (cond) conn->want |= (cap)
-#define LL_DISABLE(cond,cap) \
-	if (cond) conn->want &= ~(cap)
-
-	LL_ENABLE(opts->splice_read, FUSE_CAP_SPLICE_READ);
-	LL_DISABLE(opts->no_splice_read, FUSE_CAP_SPLICE_READ);
-
-	LL_ENABLE(opts->splice_write, FUSE_CAP_SPLICE_WRITE);
-	LL_DISABLE(opts->no_splice_write, FUSE_CAP_SPLICE_WRITE);
-
-	LL_ENABLE(opts->splice_move, FUSE_CAP_SPLICE_MOVE);
-	LL_DISABLE(opts->no_splice_move, FUSE_CAP_SPLICE_MOVE);
-
-	LL_ENABLE(opts->auto_inval_data, FUSE_CAP_AUTO_INVAL_DATA);
-	LL_DISABLE(opts->no_auto_inval_data, FUSE_CAP_AUTO_INVAL_DATA);
-
-	LL_DISABLE(opts->no_readdirplus, FUSE_CAP_READDIRPLUS);
-	LL_DISABLE(opts->no_readdirplus_auto, FUSE_CAP_READDIRPLUS_AUTO);
-
-	LL_ENABLE(opts->async_dio, FUSE_CAP_ASYNC_DIO);
-	LL_DISABLE(opts->no_async_dio, FUSE_CAP_ASYNC_DIO);
-
-	LL_ENABLE(opts->writeback_cache, FUSE_CAP_WRITEBACK_CACHE);
-	LL_DISABLE(opts->no_writeback_cache, FUSE_CAP_WRITEBACK_CACHE);
-
-	LL_ENABLE(opts->async_read, FUSE_CAP_ASYNC_READ);
-	LL_DISABLE(opts->sync_read, FUSE_CAP_ASYNC_READ);
-
-	LL_DISABLE(opts->no_remote_posix_lock, FUSE_CAP_POSIX_LOCKS);
-	LL_DISABLE(opts->no_remote_flock, FUSE_CAP_FLOCK_LOCKS);
+    if (opts->set_max_write) {
+        conn->max_write = opts->max_write;
+    }
+    if (opts->set_max_background) {
+        conn->max_background = opts->max_background;
+    }
+    if (opts->set_congestion_threshold) {
+        conn->congestion_threshold = opts->congestion_threshold;
+    }
+    if (opts->set_time_gran) {
+        conn->time_gran = opts->time_gran;
+    }
+    if (opts->set_max_readahead) {
+        conn->max_readahead = opts->max_readahead;
+    }
+
+#define LL_ENABLE(cond, cap) \
+    if (cond)                \
+        conn->want |= (cap)
+#define LL_DISABLE(cond, cap) \
+    if (cond)                 \
+        conn->want &= ~(cap)
+
+    LL_ENABLE(opts->splice_read, FUSE_CAP_SPLICE_READ);
+    LL_DISABLE(opts->no_splice_read, FUSE_CAP_SPLICE_READ);
+
+    LL_ENABLE(opts->splice_write, FUSE_CAP_SPLICE_WRITE);
+    LL_DISABLE(opts->no_splice_write, FUSE_CAP_SPLICE_WRITE);
+
+    LL_ENABLE(opts->splice_move, FUSE_CAP_SPLICE_MOVE);
+    LL_DISABLE(opts->no_splice_move, FUSE_CAP_SPLICE_MOVE);
+
+    LL_ENABLE(opts->auto_inval_data, FUSE_CAP_AUTO_INVAL_DATA);
+    LL_DISABLE(opts->no_auto_inval_data, FUSE_CAP_AUTO_INVAL_DATA);
+
+    LL_DISABLE(opts->no_readdirplus, FUSE_CAP_READDIRPLUS);
+    LL_DISABLE(opts->no_readdirplus_auto, FUSE_CAP_READDIRPLUS_AUTO);
+
+    LL_ENABLE(opts->async_dio, FUSE_CAP_ASYNC_DIO);
+    LL_DISABLE(opts->no_async_dio, FUSE_CAP_ASYNC_DIO);
+
+    LL_ENABLE(opts->writeback_cache, FUSE_CAP_WRITEBACK_CACHE);
+    LL_DISABLE(opts->no_writeback_cache, FUSE_CAP_WRITEBACK_CACHE);
+
+    LL_ENABLE(opts->async_read, FUSE_CAP_ASYNC_READ);
+    LL_DISABLE(opts->sync_read, FUSE_CAP_ASYNC_READ);
+
+    LL_DISABLE(opts->no_remote_posix_lock, FUSE_CAP_POSIX_LOCKS);
+    LL_DISABLE(opts->no_remote_flock, FUSE_CAP_FLOCK_LOCKS);
 }
 
-struct fuse_conn_info_opts* fuse_parse_conn_info_opts(struct fuse_args *args)
+struct fuse_conn_info_opts *fuse_parse_conn_info_opts(struct fuse_args *args)
 {
-	struct fuse_conn_info_opts *opts;
-
-	opts = calloc(1, sizeof(struct fuse_conn_info_opts));
-	if(opts == NULL) {
-		fuse_log(FUSE_LOG_ERR, "calloc failed\n");
-		return NULL;
-	}
-	if(fuse_opt_parse(args, opts, conn_info_opt_spec, NULL) == -1) {
-		free(opts);
-		return NULL;
-	}
-	return opts;
+    struct fuse_conn_info_opts *opts;
+
+    opts = calloc(1, sizeof(struct fuse_conn_info_opts));
+    if (opts == NULL) {
+        fuse_log(FUSE_LOG_ERR, "calloc failed\n");
+        return NULL;
+    }
+    if (fuse_opt_parse(args, opts, conn_info_opt_spec, NULL) == -1) {
+        free(opts);
+        return NULL;
+    }
+    return opts;
 }
diff --git a/tools/virtiofsd/passthrough_helpers.h b/tools/virtiofsd/passthrough_helpers.h
index 7c5f561fbc..0b98275ed5 100644
--- a/tools/virtiofsd/passthrough_helpers.h
+++ b/tools/virtiofsd/passthrough_helpers.h
@@ -28,23 +28,24 @@
  * operation
  */
 static int mknod_wrapper(int dirfd, const char *path, const char *link,
-	int mode, dev_t rdev)
+                         int mode, dev_t rdev)
 {
-	int res;
+    int res;
 
-	if (S_ISREG(mode)) {
-		res = openat(dirfd, path, O_CREAT | O_EXCL | O_WRONLY, mode);
-		if (res >= 0)
-			res = close(res);
-	} else if (S_ISDIR(mode)) {
-		res = mkdirat(dirfd, path, mode);
-	} else if (S_ISLNK(mode) && link != NULL) {
-		res = symlinkat(link, dirfd, path);
-	} else if (S_ISFIFO(mode)) {
-		res = mkfifoat(dirfd, path, mode);
-	} else {
-		res = mknodat(dirfd, path, mode, rdev);
-	}
+    if (S_ISREG(mode)) {
+        res = openat(dirfd, path, O_CREAT | O_EXCL | O_WRONLY, mode);
+        if (res >= 0) {
+            res = close(res);
+        }
+    } else if (S_ISDIR(mode)) {
+        res = mkdirat(dirfd, path, mode);
+    } else if (S_ISLNK(mode) && link != NULL) {
+        res = symlinkat(link, dirfd, path);
+    } else if (S_ISFIFO(mode)) {
+        res = mkfifoat(dirfd, path, mode);
+    } else {
+        res = mknodat(dirfd, path, mode, rdev);
+    }
 
-	return res;
+    return res;
 }
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 5372d02934..cd399d5c4b 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1,12 +1,12 @@
 /*
-  FUSE: Filesystem in Userspace
-  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
-
-  This program can be distributed under the terms of the GNU GPL.
-  See the file COPYING.
-*/
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
+ *
+ * This program can be distributed under the terms of the GNU GPL.
+ * See the file COPYING.
+ */
 
-/** @file
+/*
  *
  * This file system mirrors the existing file system hierarchy of the
  * system, starting at the root file system. This is implemented by
@@ -28,7 +28,8 @@
  *
  * Compile with:
  *
- *     gcc -Wall passthrough_ll.c `pkg-config fuse3 --cflags --libs` -o passthrough_ll
+ *     gcc -Wall passthrough_ll.c `pkg-config fuse3 --cflags --libs` -o
+ * passthrough_ll
  *
  * ## Source code ##
  * \include passthrough_ll.c
@@ -39,1300 +40,1366 @@
 
 #include "config.h"
 
-#include <fuse_lowlevel.h>
-#include <unistd.h>
-#include <stdlib.h>
-#include <stdio.h>
-#include <stddef.h>
-#include <stdbool.h>
-#include <string.h>
-#include <limits.h>
-#include <dirent.h>
 #include <assert.h>
+#include <dirent.h>
 #include <errno.h>
+#include <fuse_lowlevel.h>
 #include <inttypes.h>
+#include <limits.h>
 #include <pthread.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
 #include <sys/file.h>
 #include <sys/xattr.h>
+#include <unistd.h>
 
 #include "passthrough_helpers.h"
 
-/* We are re-using pointers to our `struct lo_inode` and `struct
-   lo_dirp` elements as inodes. This means that we must be able to
-   store uintptr_t values in a fuse_ino_t variable. The following
-   incantation checks this condition at compile time. */
-#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && !defined __cplusplus
+/*
+ * We are re-using pointers to our `struct lo_inode` and `struct
+ * lo_dirp` elements as inodes. This means that we must be able to
+ * store uintptr_t values in a fuse_ino_t variable. The following
+ * incantation checks this condition at compile time.
+ */
+#if defined(__GNUC__) &&                                      \
+    (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && \
+    !defined __cplusplus
 _Static_assert(sizeof(fuse_ino_t) >= sizeof(uintptr_t),
-	       "fuse_ino_t too small to hold uintptr_t values!");
+               "fuse_ino_t too small to hold uintptr_t values!");
 #else
-struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct \
-	{ unsigned _uintptr_to_must_hold_fuse_ino_t:
-			((sizeof(fuse_ino_t) >= sizeof(uintptr_t)) ? 1 : -1); };
+struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct {
+    unsigned _uintptr_to_must_hold_fuse_ino_t
+        : ((sizeof(fuse_ino_t) >= sizeof(uintptr_t)) ? 1 : -1);
+};
 #endif
 
 struct lo_inode {
-	struct lo_inode *next; /* protected by lo->mutex */
-	struct lo_inode *prev; /* protected by lo->mutex */
-	int fd;
-	bool is_symlink;
-	ino_t ino;
-	dev_t dev;
-	uint64_t refcount; /* protected by lo->mutex */
+    struct lo_inode *next; /* protected by lo->mutex */
+    struct lo_inode *prev; /* protected by lo->mutex */
+    int fd;
+    bool is_symlink;
+    ino_t ino;
+    dev_t dev;
+    uint64_t refcount; /* protected by lo->mutex */
 };
 
 enum {
-	CACHE_NEVER,
-	CACHE_NORMAL,
-	CACHE_ALWAYS,
+    CACHE_NEVER,
+    CACHE_NORMAL,
+    CACHE_ALWAYS,
 };
 
 struct lo_data {
-	pthread_mutex_t mutex;
-	int debug;
-	int writeback;
-	int flock;
-	int xattr;
-	const char *source;
-	double timeout;
-	int cache;
-	int timeout_set;
-	struct lo_inode root; /* protected by lo->mutex */
+    pthread_mutex_t mutex;
+    int debug;
+    int writeback;
+    int flock;
+    int xattr;
+    const char *source;
+    double timeout;
+    int cache;
+    int timeout_set;
+    struct lo_inode root; /* protected by lo->mutex */
 };
 
 static const struct fuse_opt lo_opts[] = {
-	{ "writeback",
-	  offsetof(struct lo_data, writeback), 1 },
-	{ "no_writeback",
-	  offsetof(struct lo_data, writeback), 0 },
-	{ "source=%s",
-	  offsetof(struct lo_data, source), 0 },
-	{ "flock",
-	  offsetof(struct lo_data, flock), 1 },
-	{ "no_flock",
-	  offsetof(struct lo_data, flock), 0 },
-	{ "xattr",
-	  offsetof(struct lo_data, xattr), 1 },
-	{ "no_xattr",
-	  offsetof(struct lo_data, xattr), 0 },
-	{ "timeout=%lf",
-	  offsetof(struct lo_data, timeout), 0 },
-	{ "timeout=",
-	  offsetof(struct lo_data, timeout_set), 1 },
-	{ "cache=never",
-	  offsetof(struct lo_data, cache), CACHE_NEVER },
-	{ "cache=auto",
-	  offsetof(struct lo_data, cache), CACHE_NORMAL },
-	{ "cache=always",
-	  offsetof(struct lo_data, cache), CACHE_ALWAYS },
-
-	FUSE_OPT_END
+    { "writeback", offsetof(struct lo_data, writeback), 1 },
+    { "no_writeback", offsetof(struct lo_data, writeback), 0 },
+    { "source=%s", offsetof(struct lo_data, source), 0 },
+    { "flock", offsetof(struct lo_data, flock), 1 },
+    { "no_flock", offsetof(struct lo_data, flock), 0 },
+    { "xattr", offsetof(struct lo_data, xattr), 1 },
+    { "no_xattr", offsetof(struct lo_data, xattr), 0 },
+    { "timeout=%lf", offsetof(struct lo_data, timeout), 0 },
+    { "timeout=", offsetof(struct lo_data, timeout_set), 1 },
+    { "cache=never", offsetof(struct lo_data, cache), CACHE_NEVER },
+    { "cache=auto", offsetof(struct lo_data, cache), CACHE_NORMAL },
+    { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },
+
+    FUSE_OPT_END
 };
 
 static struct lo_data *lo_data(fuse_req_t req)
 {
-	return (struct lo_data *) fuse_req_userdata(req);
+    return (struct lo_data *)fuse_req_userdata(req);
 }
 
 static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 {
-	if (ino == FUSE_ROOT_ID)
-		return &lo_data(req)->root;
-	else
-		return (struct lo_inode *) (uintptr_t) ino;
+    if (ino == FUSE_ROOT_ID) {
+        return &lo_data(req)->root;
+    } else {
+        return (struct lo_inode *)(uintptr_t)ino;
+    }
 }
 
 static int lo_fd(fuse_req_t req, fuse_ino_t ino)
 {
-	return lo_inode(req, ino)->fd;
+    return lo_inode(req, ino)->fd;
 }
 
 static bool lo_debug(fuse_req_t req)
 {
-	return lo_data(req)->debug != 0;
+    return lo_data(req)->debug != 0;
 }
 
-static void lo_init(void *userdata,
-		    struct fuse_conn_info *conn)
+static void lo_init(void *userdata, struct fuse_conn_info *conn)
 {
-	struct lo_data *lo = (struct lo_data*) userdata;
-
-	if(conn->capable & FUSE_CAP_EXPORT_SUPPORT)
-		conn->want |= FUSE_CAP_EXPORT_SUPPORT;
-
-	if (lo->writeback &&
-	    conn->capable & FUSE_CAP_WRITEBACK_CACHE) {
-		if (lo->debug)
-			fuse_log(FUSE_LOG_DEBUG, "lo_init: activating writeback\n");
-		conn->want |= FUSE_CAP_WRITEBACK_CACHE;
-	}
-	if (lo->flock && conn->capable & FUSE_CAP_FLOCK_LOCKS) {
-		if (lo->debug)
-			fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
-		conn->want |= FUSE_CAP_FLOCK_LOCKS;
-	}
+    struct lo_data *lo = (struct lo_data *)userdata;
+
+    if (conn->capable & FUSE_CAP_EXPORT_SUPPORT) {
+        conn->want |= FUSE_CAP_EXPORT_SUPPORT;
+    }
+
+    if (lo->writeback && conn->capable & FUSE_CAP_WRITEBACK_CACHE) {
+        if (lo->debug) {
+            fuse_log(FUSE_LOG_DEBUG, "lo_init: activating writeback\n");
+        }
+        conn->want |= FUSE_CAP_WRITEBACK_CACHE;
+    }
+    if (lo->flock && conn->capable & FUSE_CAP_FLOCK_LOCKS) {
+        if (lo->debug) {
+            fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
+        }
+        conn->want |= FUSE_CAP_FLOCK_LOCKS;
+    }
 }
 
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
-			     struct fuse_file_info *fi)
+                       struct fuse_file_info *fi)
 {
-	int res;
-	struct stat buf;
-	struct lo_data *lo = lo_data(req);
+    int res;
+    struct stat buf;
+    struct lo_data *lo = lo_data(req);
 
-	(void) fi;
+    (void)fi;
 
-	res = fstatat(lo_fd(req, ino), "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
-	if (res == -1)
-		return (void) fuse_reply_err(req, errno);
+    res =
+        fstatat(lo_fd(req, ino), "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    if (res == -1) {
+        return (void)fuse_reply_err(req, errno);
+    }
 
-	fuse_reply_attr(req, &buf, lo->timeout);
+    fuse_reply_attr(req, &buf, lo->timeout);
 }
 
 static int utimensat_empty_nofollow(struct lo_inode *inode,
-				    const struct timespec *tv)
+                                    const struct timespec *tv)
 {
-	int res;
-	char procname[64];
-
-	if (inode->is_symlink) {
-		res = utimensat(inode->fd, "", tv,
-				AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
-		if (res == -1 && errno == EINVAL) {
-			/* Sorry, no race free way to set times on symlink. */
-			errno = EPERM;
-		}
-		return res;
-	}
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
-
-	return utimensat(AT_FDCWD, procname, tv, 0);
+    int res;
+    char procname[64];
+
+    if (inode->is_symlink) {
+        res = utimensat(inode->fd, "", tv, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+        if (res == -1 && errno == EINVAL) {
+            /* Sorry, no race free way to set times on symlink. */
+            errno = EPERM;
+        }
+        return res;
+    }
+    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+    return utimensat(AT_FDCWD, procname, tv, 0);
 }
 
 static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
-		       int valid, struct fuse_file_info *fi)
+                       int valid, struct fuse_file_info *fi)
 {
-	int saverr;
-	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
-	int ifd = inode->fd;
-	int res;
-
-	if (valid & FUSE_SET_ATTR_MODE) {
-		if (fi) {
-			res = fchmod(fi->fh, attr->st_mode);
-		} else {
-			sprintf(procname, "/proc/self/fd/%i", ifd);
-			res = chmod(procname, attr->st_mode);
-		}
-		if (res == -1)
-			goto out_err;
-	}
-	if (valid & (FUSE_SET_ATTR_UID | FUSE_SET_ATTR_GID)) {
-		uid_t uid = (valid & FUSE_SET_ATTR_UID) ?
-			attr->st_uid : (uid_t) -1;
-		gid_t gid = (valid & FUSE_SET_ATTR_GID) ?
-			attr->st_gid : (gid_t) -1;
-
-		res = fchownat(ifd, "", uid, gid,
-			       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
-		if (res == -1)
-			goto out_err;
-	}
-	if (valid & FUSE_SET_ATTR_SIZE) {
-		if (fi) {
-			res = ftruncate(fi->fh, attr->st_size);
-		} else {
-			sprintf(procname, "/proc/self/fd/%i", ifd);
-			res = truncate(procname, attr->st_size);
-		}
-		if (res == -1)
-			goto out_err;
-	}
-	if (valid & (FUSE_SET_ATTR_ATIME | FUSE_SET_ATTR_MTIME)) {
-		struct timespec tv[2];
-
-		tv[0].tv_sec = 0;
-		tv[1].tv_sec = 0;
-		tv[0].tv_nsec = UTIME_OMIT;
-		tv[1].tv_nsec = UTIME_OMIT;
-
-		if (valid & FUSE_SET_ATTR_ATIME_NOW)
-			tv[0].tv_nsec = UTIME_NOW;
-		else if (valid & FUSE_SET_ATTR_ATIME)
-			tv[0] = attr->st_atim;
-
-		if (valid & FUSE_SET_ATTR_MTIME_NOW)
-			tv[1].tv_nsec = UTIME_NOW;
-		else if (valid & FUSE_SET_ATTR_MTIME)
-			tv[1] = attr->st_mtim;
-
-		if (fi)
-			res = futimens(fi->fh, tv);
-		else
-			res = utimensat_empty_nofollow(inode, tv);
-		if (res == -1)
-			goto out_err;
-	}
-
-	return lo_getattr(req, ino, fi);
+    int saverr;
+    char procname[64];
+    struct lo_inode *inode = lo_inode(req, ino);
+    int ifd = inode->fd;
+    int res;
+
+    if (valid & FUSE_SET_ATTR_MODE) {
+        if (fi) {
+            res = fchmod(fi->fh, attr->st_mode);
+        } else {
+            sprintf(procname, "/proc/self/fd/%i", ifd);
+            res = chmod(procname, attr->st_mode);
+        }
+        if (res == -1) {
+            goto out_err;
+        }
+    }
+    if (valid & (FUSE_SET_ATTR_UID | FUSE_SET_ATTR_GID)) {
+        uid_t uid = (valid & FUSE_SET_ATTR_UID) ? attr->st_uid : (uid_t)-1;
+        gid_t gid = (valid & FUSE_SET_ATTR_GID) ? attr->st_gid : (gid_t)-1;
+
+        res = fchownat(ifd, "", uid, gid, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+        if (res == -1) {
+            goto out_err;
+        }
+    }
+    if (valid & FUSE_SET_ATTR_SIZE) {
+        if (fi) {
+            res = ftruncate(fi->fh, attr->st_size);
+        } else {
+            sprintf(procname, "/proc/self/fd/%i", ifd);
+            res = truncate(procname, attr->st_size);
+        }
+        if (res == -1) {
+            goto out_err;
+        }
+    }
+    if (valid & (FUSE_SET_ATTR_ATIME | FUSE_SET_ATTR_MTIME)) {
+        struct timespec tv[2];
+
+        tv[0].tv_sec = 0;
+        tv[1].tv_sec = 0;
+        tv[0].tv_nsec = UTIME_OMIT;
+        tv[1].tv_nsec = UTIME_OMIT;
+
+        if (valid & FUSE_SET_ATTR_ATIME_NOW) {
+            tv[0].tv_nsec = UTIME_NOW;
+        } else if (valid & FUSE_SET_ATTR_ATIME) {
+            tv[0] = attr->st_atim;
+        }
+
+        if (valid & FUSE_SET_ATTR_MTIME_NOW) {
+            tv[1].tv_nsec = UTIME_NOW;
+        } else if (valid & FUSE_SET_ATTR_MTIME) {
+            tv[1] = attr->st_mtim;
+        }
+
+        if (fi) {
+            res = futimens(fi->fh, tv);
+        } else {
+            res = utimensat_empty_nofollow(inode, tv);
+        }
+        if (res == -1) {
+            goto out_err;
+        }
+    }
+
+    return lo_getattr(req, ino, fi);
 
 out_err:
-	saverr = errno;
-	fuse_reply_err(req, saverr);
+    saverr = errno;
+    fuse_reply_err(req, saverr);
 }
 
 static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st)
 {
-	struct lo_inode *p;
-	struct lo_inode *ret = NULL;
-
-	pthread_mutex_lock(&lo->mutex);
-	for (p = lo->root.next; p != &lo->root; p = p->next) {
-		if (p->ino == st->st_ino && p->dev == st->st_dev) {
-			assert(p->refcount > 0);
-			ret = p;
-			ret->refcount++;
-			break;
-		}
-	}
-	pthread_mutex_unlock(&lo->mutex);
-	return ret;
+    struct lo_inode *p;
+    struct lo_inode *ret = NULL;
+
+    pthread_mutex_lock(&lo->mutex);
+    for (p = lo->root.next; p != &lo->root; p = p->next) {
+        if (p->ino == st->st_ino && p->dev == st->st_dev) {
+            assert(p->refcount > 0);
+            ret = p;
+            ret->refcount++;
+            break;
+        }
+    }
+    pthread_mutex_unlock(&lo->mutex);
+    return ret;
 }
 
 static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
-			 struct fuse_entry_param *e)
+                        struct fuse_entry_param *e)
 {
-	int newfd;
-	int res;
-	int saverr;
-	struct lo_data *lo = lo_data(req);
-	struct lo_inode *inode;
-
-	memset(e, 0, sizeof(*e));
-	e->attr_timeout = lo->timeout;
-	e->entry_timeout = lo->timeout;
-
-	newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
-	if (newfd == -1)
-		goto out_err;
-
-	res = fstatat(newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
-	if (res == -1)
-		goto out_err;
-
-	inode = lo_find(lo_data(req), &e->attr);
-	if (inode) {
-		close(newfd);
-		newfd = -1;
-	} else {
-		struct lo_inode *prev, *next;
-
-		saverr = ENOMEM;
-		inode = calloc(1, sizeof(struct lo_inode));
-		if (!inode)
-			goto out_err;
-
-		inode->is_symlink = S_ISLNK(e->attr.st_mode);
-		inode->refcount = 1;
-		inode->fd = newfd;
-		inode->ino = e->attr.st_ino;
-		inode->dev = e->attr.st_dev;
-
-		pthread_mutex_lock(&lo->mutex);
-		prev = &lo->root;
-		next = prev->next;
-		next->prev = inode;
-		inode->next = next;
-		inode->prev = prev;
-		prev->next = inode;
-		pthread_mutex_unlock(&lo->mutex);
-	}
-	e->ino = (uintptr_t) inode;
-
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
-			(unsigned long long) parent, name, (unsigned long long) e->ino);
-
-	return 0;
+    int newfd;
+    int res;
+    int saverr;
+    struct lo_data *lo = lo_data(req);
+    struct lo_inode *inode;
+
+    memset(e, 0, sizeof(*e));
+    e->attr_timeout = lo->timeout;
+    e->entry_timeout = lo->timeout;
+
+    newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
+    if (newfd == -1) {
+        goto out_err;
+    }
+
+    res = fstatat(newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    if (res == -1) {
+        goto out_err;
+    }
+
+    inode = lo_find(lo_data(req), &e->attr);
+    if (inode) {
+        close(newfd);
+        newfd = -1;
+    } else {
+        struct lo_inode *prev, *next;
+
+        saverr = ENOMEM;
+        inode = calloc(1, sizeof(struct lo_inode));
+        if (!inode) {
+            goto out_err;
+        }
+
+        inode->is_symlink = S_ISLNK(e->attr.st_mode);
+        inode->refcount = 1;
+        inode->fd = newfd;
+        inode->ino = e->attr.st_ino;
+        inode->dev = e->attr.st_dev;
+
+        pthread_mutex_lock(&lo->mutex);
+        prev = &lo->root;
+        next = prev->next;
+        next->prev = inode;
+        inode->next = next;
+        inode->prev = prev;
+        prev->next = inode;
+        pthread_mutex_unlock(&lo->mutex);
+    }
+    e->ino = (uintptr_t)inode;
+
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
+                 (unsigned long long)parent, name, (unsigned long long)e->ino);
+    }
+
+    return 0;
 
 out_err:
-	saverr = errno;
-	if (newfd != -1)
-		close(newfd);
-	return saverr;
+    saverr = errno;
+    if (newfd != -1) {
+        close(newfd);
+    }
+    return saverr;
 }
 
 static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
-	struct fuse_entry_param e;
-	int err;
-
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_lookup(parent=%" PRIu64 ", name=%s)\n",
-			parent, name);
-
-	err = lo_do_lookup(req, parent, name, &e);
-	if (err)
-		fuse_reply_err(req, err);
-	else
-		fuse_reply_entry(req, &e);
+    struct fuse_entry_param e;
+    int err;
+
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG, "lo_lookup(parent=%" PRIu64 ", name=%s)\n",
+                 parent, name);
+    }
+
+    err = lo_do_lookup(req, parent, name, &e);
+    if (err) {
+        fuse_reply_err(req, err);
+    } else {
+        fuse_reply_entry(req, &e);
+    }
 }
 
 static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
-			     const char *name, mode_t mode, dev_t rdev,
-			     const char *link)
+                             const char *name, mode_t mode, dev_t rdev,
+                             const char *link)
 {
-	int res;
-	int saverr;
-	struct lo_inode *dir = lo_inode(req, parent);
-	struct fuse_entry_param e;
+    int res;
+    int saverr;
+    struct lo_inode *dir = lo_inode(req, parent);
+    struct fuse_entry_param e;
 
-	saverr = ENOMEM;
+    saverr = ENOMEM;
 
-	res = mknod_wrapper(dir->fd, name, link, mode, rdev);
+    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
 
-	saverr = errno;
-	if (res == -1)
-		goto out;
+    saverr = errno;
+    if (res == -1) {
+        goto out;
+    }
 
-	saverr = lo_do_lookup(req, parent, name, &e);
-	if (saverr)
-		goto out;
+    saverr = lo_do_lookup(req, parent, name, &e);
+    if (saverr) {
+        goto out;
+    }
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
-			(unsigned long long) parent, name, (unsigned long long) e.ino);
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
+                 (unsigned long long)parent, name, (unsigned long long)e.ino);
+    }
 
-	fuse_reply_entry(req, &e);
-	return;
+    fuse_reply_entry(req, &e);
+    return;
 
 out:
-	fuse_reply_err(req, saverr);
+    fuse_reply_err(req, saverr);
 }
 
-static void lo_mknod(fuse_req_t req, fuse_ino_t parent,
-		     const char *name, mode_t mode, dev_t rdev)
+static void lo_mknod(fuse_req_t req, fuse_ino_t parent, const char *name,
+                     mode_t mode, dev_t rdev)
 {
-	lo_mknod_symlink(req, parent, name, mode, rdev, NULL);
+    lo_mknod_symlink(req, parent, name, mode, rdev, NULL);
 }
 
 static void lo_mkdir(fuse_req_t req, fuse_ino_t parent, const char *name,
-		     mode_t mode)
+                     mode_t mode)
 {
-	lo_mknod_symlink(req, parent, name, S_IFDIR | mode, 0, NULL);
+    lo_mknod_symlink(req, parent, name, S_IFDIR | mode, 0, NULL);
 }
 
-static void lo_symlink(fuse_req_t req, const char *link,
-		       fuse_ino_t parent, const char *name)
+static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
+                       const char *name)
 {
-	lo_mknod_symlink(req, parent, name, S_IFLNK, 0, link);
+    lo_mknod_symlink(req, parent, name, S_IFLNK, 0, link);
 }
 
 static int linkat_empty_nofollow(struct lo_inode *inode, int dfd,
-				 const char *name)
+                                 const char *name)
 {
-	int res;
-	char procname[64];
+    int res;
+    char procname[64];
 
-	if (inode->is_symlink) {
-		res = linkat(inode->fd, "", dfd, name, AT_EMPTY_PATH);
-		if (res == -1 && (errno == ENOENT || errno == EINVAL)) {
-			/* Sorry, no race free way to hard-link a symlink. */
-			errno = EPERM;
-		}
-		return res;
-	}
+    if (inode->is_symlink) {
+        res = linkat(inode->fd, "", dfd, name, AT_EMPTY_PATH);
+        if (res == -1 && (errno == ENOENT || errno == EINVAL)) {
+            /* Sorry, no race free way to hard-link a symlink. */
+            errno = EPERM;
+        }
+        return res;
+    }
 
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(procname, "/proc/self/fd/%i", inode->fd);
 
-	return linkat(AT_FDCWD, procname, dfd, name, AT_SYMLINK_FOLLOW);
+    return linkat(AT_FDCWD, procname, dfd, name, AT_SYMLINK_FOLLOW);
 }
 
 static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
-		    const char *name)
+                    const char *name)
 {
-	int res;
-	struct lo_data *lo = lo_data(req);
-	struct lo_inode *inode = lo_inode(req, ino);
-	struct fuse_entry_param e;
-	int saverr;
-
-	memset(&e, 0, sizeof(struct fuse_entry_param));
-	e.attr_timeout = lo->timeout;
-	e.entry_timeout = lo->timeout;
-
-	res = linkat_empty_nofollow(inode, lo_fd(req, parent), name);
-	if (res == -1)
-		goto out_err;
-
-	res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
-	if (res == -1)
-		goto out_err;
-
-	pthread_mutex_lock(&lo->mutex);
-	inode->refcount++;
-	pthread_mutex_unlock(&lo->mutex);
-	e.ino = (uintptr_t) inode;
-
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
-			(unsigned long long) parent, name,
-			(unsigned long long) e.ino);
-
-	fuse_reply_entry(req, &e);
-	return;
+    int res;
+    struct lo_data *lo = lo_data(req);
+    struct lo_inode *inode = lo_inode(req, ino);
+    struct fuse_entry_param e;
+    int saverr;
+
+    memset(&e, 0, sizeof(struct fuse_entry_param));
+    e.attr_timeout = lo->timeout;
+    e.entry_timeout = lo->timeout;
+
+    res = linkat_empty_nofollow(inode, lo_fd(req, parent), name);
+    if (res == -1) {
+        goto out_err;
+    }
+
+    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    if (res == -1) {
+        goto out_err;
+    }
+
+    pthread_mutex_lock(&lo->mutex);
+    inode->refcount++;
+    pthread_mutex_unlock(&lo->mutex);
+    e.ino = (uintptr_t)inode;
+
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
+                 (unsigned long long)parent, name, (unsigned long long)e.ino);
+    }
+
+    fuse_reply_entry(req, &e);
+    return;
 
 out_err:
-	saverr = errno;
-	fuse_reply_err(req, saverr);
+    saverr = errno;
+    fuse_reply_err(req, saverr);
 }
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
-	int res;
+    int res;
 
-	res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
+    res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
 
-	fuse_reply_err(req, res == -1 ? errno : 0);
+    fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
 static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
-		      fuse_ino_t newparent, const char *newname,
-		      unsigned int flags)
+                      fuse_ino_t newparent, const char *newname,
+                      unsigned int flags)
 {
-	int res;
+    int res;
 
-	if (flags) {
-		fuse_reply_err(req, EINVAL);
-		return;
-	}
+    if (flags) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
-	res = renameat(lo_fd(req, parent), name,
-			lo_fd(req, newparent), newname);
+    res = renameat(lo_fd(req, parent), name, lo_fd(req, newparent), newname);
 
-	fuse_reply_err(req, res == -1 ? errno : 0);
+    fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
 static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
-	int res;
+    int res;
 
-	res = unlinkat(lo_fd(req, parent), name, 0);
+    res = unlinkat(lo_fd(req, parent), name, 0);
 
-	fuse_reply_err(req, res == -1 ? errno : 0);
+    fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
 static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
 {
-	if (!inode)
-		return;
-
-	pthread_mutex_lock(&lo->mutex);
-	assert(inode->refcount >= n);
-	inode->refcount -= n;
-	if (!inode->refcount) {
-		struct lo_inode *prev, *next;
-
-		prev = inode->prev;
-		next = inode->next;
-		next->prev = prev;
-		prev->next = next;
-
-		pthread_mutex_unlock(&lo->mutex);
-		close(inode->fd);
-		free(inode);
-
-	} else {
-		pthread_mutex_unlock(&lo->mutex);
-	}
+    if (!inode) {
+        return;
+    }
+
+    pthread_mutex_lock(&lo->mutex);
+    assert(inode->refcount >= n);
+    inode->refcount -= n;
+    if (!inode->refcount) {
+        struct lo_inode *prev, *next;
+
+        prev = inode->prev;
+        next = inode->next;
+        next->prev = prev;
+        prev->next = next;
+
+        pthread_mutex_unlock(&lo->mutex);
+        close(inode->fd);
+        free(inode);
+
+    } else {
+        pthread_mutex_unlock(&lo->mutex);
+    }
 }
 
 static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
 {
-	struct lo_data *lo = lo_data(req);
-	struct lo_inode *inode = lo_inode(req, ino);
+    struct lo_data *lo = lo_data(req);
+    struct lo_inode *inode = lo_inode(req, ino);
 
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
-			(unsigned long long) ino,
-			(unsigned long long) inode->refcount,
-			(unsigned long long) nlookup);
-	}
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
+                 (unsigned long long)ino, (unsigned long long)inode->refcount,
+                 (unsigned long long)nlookup);
+    }
 
-	unref_inode(lo, inode, nlookup);
+    unref_inode(lo, inode, nlookup);
 }
 
 static void lo_forget(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
 {
-	lo_forget_one(req, ino, nlookup);
-	fuse_reply_none(req);
+    lo_forget_one(req, ino, nlookup);
+    fuse_reply_none(req);
 }
 
 static void lo_forget_multi(fuse_req_t req, size_t count,
-				struct fuse_forget_data *forgets)
+                            struct fuse_forget_data *forgets)
 {
-	int i;
+    int i;
 
-	for (i = 0; i < count; i++)
-		lo_forget_one(req, forgets[i].ino, forgets[i].nlookup);
-	fuse_reply_none(req);
+    for (i = 0; i < count; i++) {
+        lo_forget_one(req, forgets[i].ino, forgets[i].nlookup);
+    }
+    fuse_reply_none(req);
 }
 
 static void lo_readlink(fuse_req_t req, fuse_ino_t ino)
 {
-	char buf[PATH_MAX + 1];
-	int res;
+    char buf[PATH_MAX + 1];
+    int res;
 
-	res = readlinkat(lo_fd(req, ino), "", buf, sizeof(buf));
-	if (res == -1)
-		return (void) fuse_reply_err(req, errno);
+    res = readlinkat(lo_fd(req, ino), "", buf, sizeof(buf));
+    if (res == -1) {
+        return (void)fuse_reply_err(req, errno);
+    }
 
-	if (res == sizeof(buf))
-		return (void) fuse_reply_err(req, ENAMETOOLONG);
+    if (res == sizeof(buf)) {
+        return (void)fuse_reply_err(req, ENAMETOOLONG);
+    }
 
-	buf[res] = '\0';
+    buf[res] = '\0';
 
-	fuse_reply_readlink(req, buf);
+    fuse_reply_readlink(req, buf);
 }
 
 struct lo_dirp {
-	DIR *dp;
-	struct dirent *entry;
-	off_t offset;
+    DIR *dp;
+    struct dirent *entry;
+    off_t offset;
 };
 
 static struct lo_dirp *lo_dirp(struct fuse_file_info *fi)
 {
-	return (struct lo_dirp *) (uintptr_t) fi->fh;
+    return (struct lo_dirp *)(uintptr_t)fi->fh;
 }
 
-static void lo_opendir(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
+static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
+                       struct fuse_file_info *fi)
 {
-	int error = ENOMEM;
-	struct lo_data *lo = lo_data(req);
-	struct lo_dirp *d;
-	int fd;
-
-	d = calloc(1, sizeof(struct lo_dirp));
-	if (d == NULL)
-		goto out_err;
-
-	fd = openat(lo_fd(req, ino), ".", O_RDONLY);
-	if (fd == -1)
-		goto out_errno;
-
-	d->dp = fdopendir(fd);
-	if (d->dp == NULL)
-		goto out_errno;
-
-	d->offset = 0;
-	d->entry = NULL;
-
-	fi->fh = (uintptr_t) d;
-	if (lo->cache == CACHE_ALWAYS)
-		fi->keep_cache = 1;
-	fuse_reply_open(req, fi);
-	return;
+    int error = ENOMEM;
+    struct lo_data *lo = lo_data(req);
+    struct lo_dirp *d;
+    int fd;
+
+    d = calloc(1, sizeof(struct lo_dirp));
+    if (d == NULL) {
+        goto out_err;
+    }
+
+    fd = openat(lo_fd(req, ino), ".", O_RDONLY);
+    if (fd == -1) {
+        goto out_errno;
+    }
+
+    d->dp = fdopendir(fd);
+    if (d->dp == NULL) {
+        goto out_errno;
+    }
+
+    d->offset = 0;
+    d->entry = NULL;
+
+    fi->fh = (uintptr_t)d;
+    if (lo->cache == CACHE_ALWAYS) {
+        fi->keep_cache = 1;
+    }
+    fuse_reply_open(req, fi);
+    return;
 
 out_errno:
-	error = errno;
+    error = errno;
 out_err:
-	if (d) {
-		if (fd != -1)
-			close(fd);
-		free(d);
-	}
-	fuse_reply_err(req, error);
+    if (d) {
+        if (fd != -1) {
+            close(fd);
+        }
+        free(d);
+    }
+    fuse_reply_err(req, error);
 }
 
 static int is_dot_or_dotdot(const char *name)
 {
-	return name[0] == '.' && (name[1] == '\0' ||
-				  (name[1] == '.' && name[2] == '\0'));
+    return name[0] == '.' &&
+           (name[1] == '\0' || (name[1] == '.' && name[2] == '\0'));
 }
 
 static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
-			  off_t offset, struct fuse_file_info *fi, int plus)
+                          off_t offset, struct fuse_file_info *fi, int plus)
 {
-	struct lo_dirp *d = lo_dirp(fi);
-	char *buf;
-	char *p;
-	size_t rem = size;
-	int err;
-
-	(void) ino;
-
-	buf = calloc(1, size);
-	if (!buf) {
-		err = ENOMEM;
-		goto error;
-	}
-	p = buf;
-
-	if (offset != d->offset) {
-		seekdir(d->dp, offset);
-		d->entry = NULL;
-		d->offset = offset;
-	}
-	while (1) {
-		size_t entsize;
-		off_t nextoff;
-		const char *name;
-
-		if (!d->entry) {
-			errno = 0;
-			d->entry = readdir(d->dp);
-			if (!d->entry) {
-				if (errno) {  // Error
-					err = errno;
-					goto error;
-				} else {  // End of stream
-					break; 
-				}
-			}
-		}
-		nextoff = d->entry->d_off;
-		name = d->entry->d_name;
-		fuse_ino_t entry_ino = 0;
-		if (plus) {
-			struct fuse_entry_param e;
-			if (is_dot_or_dotdot(name)) {
-				e = (struct fuse_entry_param) {
-					.attr.st_ino = d->entry->d_ino,
-					.attr.st_mode = d->entry->d_type << 12,
-				};
-			} else {
-				err = lo_do_lookup(req, ino, name, &e);
-				if (err)
-					goto error;
-				entry_ino = e.ino;
-			}
-
-			entsize = fuse_add_direntry_plus(req, p, rem, name,
-							 &e, nextoff);
-		} else {
-			struct stat st = {
-				.st_ino = d->entry->d_ino,
-				.st_mode = d->entry->d_type << 12,
-			};
-			entsize = fuse_add_direntry(req, p, rem, name,
-						    &st, nextoff);
-		}
-		if (entsize > rem) {
-			if (entry_ino != 0) 
-				lo_forget_one(req, entry_ino, 1);
-			break;
-		}
-		
-		p += entsize;
-		rem -= entsize;
-
-		d->entry = NULL;
-		d->offset = nextoff;
-	}
+    struct lo_dirp *d = lo_dirp(fi);
+    char *buf;
+    char *p;
+    size_t rem = size;
+    int err;
+
+    (void)ino;
+
+    buf = calloc(1, size);
+    if (!buf) {
+        err = ENOMEM;
+        goto error;
+    }
+    p = buf;
+
+    if (offset != d->offset) {
+        seekdir(d->dp, offset);
+        d->entry = NULL;
+        d->offset = offset;
+    }
+    while (1) {
+        size_t entsize;
+        off_t nextoff;
+        const char *name;
+
+        if (!d->entry) {
+            errno = 0;
+            d->entry = readdir(d->dp);
+            if (!d->entry) {
+                if (errno) { /* Error */
+                    err = errno;
+                    goto error;
+                } else { /* End of stream */
+                    break;
+                }
+            }
+        }
+        nextoff = d->entry->d_off;
+        name = d->entry->d_name;
+        fuse_ino_t entry_ino = 0;
+        if (plus) {
+            struct fuse_entry_param e;
+            if (is_dot_or_dotdot(name)) {
+                e = (struct fuse_entry_param){
+                    .attr.st_ino = d->entry->d_ino,
+                    .attr.st_mode = d->entry->d_type << 12,
+                };
+            } else {
+                err = lo_do_lookup(req, ino, name, &e);
+                if (err) {
+                    goto error;
+                }
+                entry_ino = e.ino;
+            }
+
+            entsize = fuse_add_direntry_plus(req, p, rem, name, &e, nextoff);
+        } else {
+            struct stat st = {
+                .st_ino = d->entry->d_ino,
+                .st_mode = d->entry->d_type << 12,
+            };
+            entsize = fuse_add_direntry(req, p, rem, name, &st, nextoff);
+        }
+        if (entsize > rem) {
+            if (entry_ino != 0) {
+                lo_forget_one(req, entry_ino, 1);
+            }
+            break;
+        }
+
+        p += entsize;
+        rem -= entsize;
+
+        d->entry = NULL;
+        d->offset = nextoff;
+    }
 
     err = 0;
 error:
-    // If there's an error, we can only signal it if we haven't stored
-    // any entries yet - otherwise we'd end up with wrong lookup
-    // counts for the entries that are already in the buffer. So we
-    // return what we've collected until that point.
-    if (err && rem == size)
-	    fuse_reply_err(req, err);
-    else
-	    fuse_reply_buf(req, buf, size - rem);
+    /*
+     * If there's an error, we can only signal it if we haven't stored
+     * any entries yet - otherwise we'd end up with wrong lookup
+     * counts for the entries that are already in the buffer. So we
+     * return what we've collected until that point.
+     */
+    if (err && rem == size) {
+        fuse_reply_err(req, err);
+    } else {
+        fuse_reply_buf(req, buf, size - rem);
+    }
     free(buf);
 }
 
 static void lo_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
-		       off_t offset, struct fuse_file_info *fi)
+                       off_t offset, struct fuse_file_info *fi)
 {
-	lo_do_readdir(req, ino, size, offset, fi, 0);
+    lo_do_readdir(req, ino, size, offset, fi, 0);
 }
 
 static void lo_readdirplus(fuse_req_t req, fuse_ino_t ino, size_t size,
-			   off_t offset, struct fuse_file_info *fi)
+                           off_t offset, struct fuse_file_info *fi)
 {
-	lo_do_readdir(req, ino, size, offset, fi, 1);
+    lo_do_readdir(req, ino, size, offset, fi, 1);
 }
 
-static void lo_releasedir(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
+static void lo_releasedir(fuse_req_t req, fuse_ino_t ino,
+                          struct fuse_file_info *fi)
 {
-	struct lo_dirp *d = lo_dirp(fi);
-	(void) ino;
-	closedir(d->dp);
-	free(d);
-	fuse_reply_err(req, 0);
+    struct lo_dirp *d = lo_dirp(fi);
+    (void)ino;
+    closedir(d->dp);
+    free(d);
+    fuse_reply_err(req, 0);
 }
 
 static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
-		      mode_t mode, struct fuse_file_info *fi)
+                      mode_t mode, struct fuse_file_info *fi)
 {
-	int fd;
-	struct lo_data *lo = lo_data(req);
-	struct fuse_entry_param e;
-	int err;
-
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
-			parent, name);
-
-	fd = openat(lo_fd(req, parent), name,
-		    (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
-	if (fd == -1)
-		return (void) fuse_reply_err(req, errno);
-
-	fi->fh = fd;
-	if (lo->cache == CACHE_NEVER)
-		fi->direct_io = 1;
-	else if (lo->cache == CACHE_ALWAYS)
-		fi->keep_cache = 1;
-
-	err = lo_do_lookup(req, parent, name, &e);
-	if (err)
-		fuse_reply_err(req, err);
-	else
-		fuse_reply_create(req, &e, fi);
+    int fd;
+    struct lo_data *lo = lo_data(req);
+    struct fuse_entry_param e;
+    int err;
+
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
+                 parent, name);
+    }
+
+    fd = openat(lo_fd(req, parent), name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
+                mode);
+    if (fd == -1) {
+        return (void)fuse_reply_err(req, errno);
+    }
+
+    fi->fh = fd;
+    if (lo->cache == CACHE_NEVER) {
+        fi->direct_io = 1;
+    } else if (lo->cache == CACHE_ALWAYS) {
+        fi->keep_cache = 1;
+    }
+
+    err = lo_do_lookup(req, parent, name, &e);
+    if (err) {
+        fuse_reply_err(req, err);
+    } else {
+        fuse_reply_create(req, &e, fi);
+    }
 }
 
 static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
-			struct fuse_file_info *fi)
+                        struct fuse_file_info *fi)
 {
-	int res;
-	int fd = dirfd(lo_dirp(fi)->dp);
-	(void) ino;
-	if (datasync)
-		res = fdatasync(fd);
-	else
-		res = fsync(fd);
-	fuse_reply_err(req, res == -1 ? errno : 0);
+    int res;
+    int fd = dirfd(lo_dirp(fi)->dp);
+    (void)ino;
+    if (datasync) {
+        res = fdatasync(fd);
+    } else {
+        res = fsync(fd);
+    }
+    fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
 static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
-	int fd;
-	char buf[64];
-	struct lo_data *lo = lo_data(req);
-
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n",
-			ino, fi->flags);
-
-	/* With writeback cache, kernel may send read requests even
-	   when userspace opened write-only */
-	if (lo->writeback && (fi->flags & O_ACCMODE) == O_WRONLY) {
-		fi->flags &= ~O_ACCMODE;
-		fi->flags |= O_RDWR;
-	}
-
-	/* With writeback cache, O_APPEND is handled by the kernel.
-	   This breaks atomicity (since the file may change in the
-	   underlying filesystem, so that the kernel's idea of the
-	   end of the file isn't accurate anymore). In this example,
-	   we just accept that. A more rigorous filesystem may want
-	   to return an error here */
-	if (lo->writeback && (fi->flags & O_APPEND))
-		fi->flags &= ~O_APPEND;
-
-	sprintf(buf, "/proc/self/fd/%i", lo_fd(req, ino));
-	fd = open(buf, fi->flags & ~O_NOFOLLOW);
-	if (fd == -1)
-		return (void) fuse_reply_err(req, errno);
-
-	fi->fh = fd;
-	if (lo->cache == CACHE_NEVER)
-		fi->direct_io = 1;
-	else if (lo->cache == CACHE_ALWAYS)
-		fi->keep_cache = 1;
-	fuse_reply_open(req, fi);
+    int fd;
+    char buf[64];
+    struct lo_data *lo = lo_data(req);
+
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
+                 fi->flags);
+    }
+
+    /*
+     * With writeback cache, kernel may send read requests even
+     * when userspace opened write-only
+     */
+    if (lo->writeback && (fi->flags & O_ACCMODE) == O_WRONLY) {
+        fi->flags &= ~O_ACCMODE;
+        fi->flags |= O_RDWR;
+    }
+
+    /*
+     * With writeback cache, O_APPEND is handled by the kernel.
+     * This breaks atomicity (since the file may change in the
+     * underlying filesystem, so that the kernel's idea of the
+     * end of the file isn't accurate anymore). In this example,
+     * we just accept that. A more rigorous filesystem may want
+     * to return an error here
+     */
+    if (lo->writeback && (fi->flags & O_APPEND)) {
+        fi->flags &= ~O_APPEND;
+    }
+
+    sprintf(buf, "/proc/self/fd/%i", lo_fd(req, ino));
+    fd = open(buf, fi->flags & ~O_NOFOLLOW);
+    if (fd == -1) {
+        return (void)fuse_reply_err(req, errno);
+    }
+
+    fi->fh = fd;
+    if (lo->cache == CACHE_NEVER) {
+        fi->direct_io = 1;
+    } else if (lo->cache == CACHE_ALWAYS) {
+        fi->keep_cache = 1;
+    }
+    fuse_reply_open(req, fi);
 }
 
-static void lo_release(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
+static void lo_release(fuse_req_t req, fuse_ino_t ino,
+                       struct fuse_file_info *fi)
 {
-	(void) ino;
+    (void)ino;
 
-	close(fi->fh);
-	fuse_reply_err(req, 0);
+    close(fi->fh);
+    fuse_reply_err(req, 0);
 }
 
 static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
-	int res;
-	(void) ino;
-	res = close(dup(fi->fh));
-	fuse_reply_err(req, res == -1 ? errno : 0);
+    int res;
+    (void)ino;
+    res = close(dup(fi->fh));
+    fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
 static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
-		     struct fuse_file_info *fi)
+                     struct fuse_file_info *fi)
 {
-	int res;
-	(void) ino;
-	if (datasync)
-		res = fdatasync(fi->fh);
-	else
-		res = fsync(fi->fh);
-	fuse_reply_err(req, res == -1 ? errno : 0);
+    int res;
+    (void)ino;
+    if (datasync) {
+        res = fdatasync(fi->fh);
+    } else {
+        res = fsync(fi->fh);
+    }
+    fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
-static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size,
-		    off_t offset, struct fuse_file_info *fi)
+static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,
+                    struct fuse_file_info *fi)
 {
-	struct fuse_bufvec buf = FUSE_BUFVEC_INIT(size);
+    struct fuse_bufvec buf = FUSE_BUFVEC_INIT(size);
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_read(ino=%" PRIu64 ", size=%zd, "
-			"off=%lu)\n", ino, size, (unsigned long) offset);
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG,
+                 "lo_read(ino=%" PRIu64 ", size=%zd, "
+                 "off=%lu)\n",
+                 ino, size, (unsigned long)offset);
+    }
 
-	buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
-	buf.buf[0].fd = fi->fh;
-	buf.buf[0].pos = offset;
+    buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
+    buf.buf[0].fd = fi->fh;
+    buf.buf[0].pos = offset;
 
-	fuse_reply_data(req, &buf, FUSE_BUF_SPLICE_MOVE);
+    fuse_reply_data(req, &buf, FUSE_BUF_SPLICE_MOVE);
 }
 
 static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
-			 struct fuse_bufvec *in_buf, off_t off,
-			 struct fuse_file_info *fi)
+                         struct fuse_bufvec *in_buf, off_t off,
+                         struct fuse_file_info *fi)
 {
-	(void) ino;
-	ssize_t res;
-	struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
-
-	out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
-	out_buf.buf[0].fd = fi->fh;
-	out_buf.buf[0].pos = off;
-
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_write(ino=%" PRIu64 ", size=%zd, off=%lu)\n",
-			ino, out_buf.buf[0].size, (unsigned long) off);
-
-	res = fuse_buf_copy(&out_buf, in_buf, 0);
-	if(res < 0)
-		fuse_reply_err(req, -res);
-	else
-		fuse_reply_write(req, (size_t) res);
+    (void)ino;
+    ssize_t res;
+    struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
+
+    out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
+    out_buf.buf[0].fd = fi->fh;
+    out_buf.buf[0].pos = off;
+
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG,
+                 "lo_write(ino=%" PRIu64 ", size=%zd, off=%lu)\n", ino,
+                 out_buf.buf[0].size, (unsigned long)off);
+    }
+
+    res = fuse_buf_copy(&out_buf, in_buf, 0);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+    } else {
+        fuse_reply_write(req, (size_t)res);
+    }
 }
 
 static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
 {
-	int res;
-	struct statvfs stbuf;
-
-	res = fstatvfs(lo_fd(req, ino), &stbuf);
-	if (res == -1)
-		fuse_reply_err(req, errno);
-	else
-		fuse_reply_statfs(req, &stbuf);
+    int res;
+    struct statvfs stbuf;
+
+    res = fstatvfs(lo_fd(req, ino), &stbuf);
+    if (res == -1) {
+        fuse_reply_err(req, errno);
+    } else {
+        fuse_reply_statfs(req, &stbuf);
+    }
 }
 
-static void lo_fallocate(fuse_req_t req, fuse_ino_t ino, int mode,
-			 off_t offset, off_t length, struct fuse_file_info *fi)
+static void lo_fallocate(fuse_req_t req, fuse_ino_t ino, int mode, off_t offset,
+                         off_t length, struct fuse_file_info *fi)
 {
-	int err = EOPNOTSUPP;
-	(void) ino;
+    int err = EOPNOTSUPP;
+    (void)ino;
 
 #ifdef HAVE_FALLOCATE
-	err = fallocate(fi->fh, mode, offset, length);
-	if (err < 0)
-		err = errno;
+    err = fallocate(fi->fh, mode, offset, length);
+    if (err < 0) {
+        err = errno;
+    }
 
 #elif defined(HAVE_POSIX_FALLOCATE)
-	if (mode) {
-		fuse_reply_err(req, EOPNOTSUPP);
-		return;
-	}
+    if (mode) {
+        fuse_reply_err(req, EOPNOTSUPP);
+        return;
+    }
 
-	err = posix_fallocate(fi->fh, offset, length);
+    err = posix_fallocate(fi->fh, offset, length);
 #endif
 
-	fuse_reply_err(req, err);
+    fuse_reply_err(req, err);
 }
 
 static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
-		     int op)
+                     int op)
 {
-	int res;
-	(void) ino;
+    int res;
+    (void)ino;
 
-	res = flock(fi->fh, op);
+    res = flock(fi->fh, op);
 
-	fuse_reply_err(req, res == -1 ? errno : 0);
+    fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
 static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
-			size_t size)
+                        size_t size)
 {
-	char *value = NULL;
-	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
-	ssize_t ret;
-	int saverr;
-
-	saverr = ENOSYS;
-	if (!lo_data(req)->xattr)
-		goto out;
-
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "lo_getxattr(ino=%" PRIu64 ", name=%s size=%zd)\n",
-			ino, name, size);
-	}
-
-	if (inode->is_symlink) {
-		/* Sorry, no race free way to getxattr on symlink. */
-		saverr = EPERM;
-		goto out;
-	}
-
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
-
-	if (size) {
-		value = malloc(size);
-		if (!value)
-			goto out_err;
-
-		ret = getxattr(procname, name, value, size);
-		if (ret == -1)
-			goto out_err;
-		saverr = 0;
-		if (ret == 0)
-			goto out;
-
-		fuse_reply_buf(req, value, ret);
-	} else {
-		ret = getxattr(procname, name, NULL, 0);
-		if (ret == -1)
-			goto out_err;
-
-		fuse_reply_xattr(req, ret);
-	}
+    char *value = NULL;
+    char procname[64];
+    struct lo_inode *inode = lo_inode(req, ino);
+    ssize_t ret;
+    int saverr;
+
+    saverr = ENOSYS;
+    if (!lo_data(req)->xattr) {
+        goto out;
+    }
+
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG,
+                 "lo_getxattr(ino=%" PRIu64 ", name=%s size=%zd)\n", ino, name,
+                 size);
+    }
+
+    if (inode->is_symlink) {
+        /* Sorry, no race free way to getxattr on symlink. */
+        saverr = EPERM;
+        goto out;
+    }
+
+    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+    if (size) {
+        value = malloc(size);
+        if (!value) {
+            goto out_err;
+        }
+
+        ret = getxattr(procname, name, value, size);
+        if (ret == -1) {
+            goto out_err;
+        }
+        saverr = 0;
+        if (ret == 0) {
+            goto out;
+        }
+
+        fuse_reply_buf(req, value, ret);
+    } else {
+        ret = getxattr(procname, name, NULL, 0);
+        if (ret == -1) {
+            goto out_err;
+        }
+
+        fuse_reply_xattr(req, ret);
+    }
 out_free:
-	free(value);
-	return;
+    free(value);
+    return;
 
 out_err:
-	saverr = errno;
+    saverr = errno;
 out:
-	fuse_reply_err(req, saverr);
-	goto out_free;
+    fuse_reply_err(req, saverr);
+    goto out_free;
 }
 
 static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 {
-	char *value = NULL;
-	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
-	ssize_t ret;
-	int saverr;
-
-	saverr = ENOSYS;
-	if (!lo_data(req)->xattr)
-		goto out;
-
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "lo_listxattr(ino=%" PRIu64 ", size=%zd)\n",
-			ino, size);
-	}
-
-	if (inode->is_symlink) {
-		/* Sorry, no race free way to listxattr on symlink. */
-		saverr = EPERM;
-		goto out;
-	}
-
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
-
-	if (size) {
-		value = malloc(size);
-		if (!value)
-			goto out_err;
-
-		ret = listxattr(procname, value, size);
-		if (ret == -1)
-			goto out_err;
-		saverr = 0;
-		if (ret == 0)
-			goto out;
-
-		fuse_reply_buf(req, value, ret);
-	} else {
-		ret = listxattr(procname, NULL, 0);
-		if (ret == -1)
-			goto out_err;
-
-		fuse_reply_xattr(req, ret);
-	}
+    char *value = NULL;
+    char procname[64];
+    struct lo_inode *inode = lo_inode(req, ino);
+    ssize_t ret;
+    int saverr;
+
+    saverr = ENOSYS;
+    if (!lo_data(req)->xattr) {
+        goto out;
+    }
+
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG, "lo_listxattr(ino=%" PRIu64 ", size=%zd)\n",
+                 ino, size);
+    }
+
+    if (inode->is_symlink) {
+        /* Sorry, no race free way to listxattr on symlink. */
+        saverr = EPERM;
+        goto out;
+    }
+
+    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+    if (size) {
+        value = malloc(size);
+        if (!value) {
+            goto out_err;
+        }
+
+        ret = listxattr(procname, value, size);
+        if (ret == -1) {
+            goto out_err;
+        }
+        saverr = 0;
+        if (ret == 0) {
+            goto out;
+        }
+
+        fuse_reply_buf(req, value, ret);
+    } else {
+        ret = listxattr(procname, NULL, 0);
+        if (ret == -1) {
+            goto out_err;
+        }
+
+        fuse_reply_xattr(req, ret);
+    }
 out_free:
-	free(value);
-	return;
+    free(value);
+    return;
 
 out_err:
-	saverr = errno;
+    saverr = errno;
 out:
-	fuse_reply_err(req, saverr);
-	goto out_free;
+    fuse_reply_err(req, saverr);
+    goto out_free;
 }
 
 static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
-			const char *value, size_t size, int flags)
+                        const char *value, size_t size, int flags)
 {
-	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
-	ssize_t ret;
-	int saverr;
+    char procname[64];
+    struct lo_inode *inode = lo_inode(req, ino);
+    ssize_t ret;
+    int saverr;
 
-	saverr = ENOSYS;
-	if (!lo_data(req)->xattr)
-		goto out;
+    saverr = ENOSYS;
+    if (!lo_data(req)->xattr) {
+        goto out;
+    }
 
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64 ", name=%s value=%s size=%zd)\n",
-			ino, name, value, size);
-	}
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG,
+                 "lo_setxattr(ino=%" PRIu64 ", name=%s value=%s size=%zd)\n",
+                 ino, name, value, size);
+    }
 
-	if (inode->is_symlink) {
-		/* Sorry, no race free way to setxattr on symlink. */
-		saverr = EPERM;
-		goto out;
-	}
+    if (inode->is_symlink) {
+        /* Sorry, no race free way to setxattr on symlink. */
+        saverr = EPERM;
+        goto out;
+    }
 
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(procname, "/proc/self/fd/%i", inode->fd);
 
-	ret = setxattr(procname, name, value, size, flags);
-	saverr = ret == -1 ? errno : 0;
+    ret = setxattr(procname, name, value, size, flags);
+    saverr = ret == -1 ? errno : 0;
 
 out:
-	fuse_reply_err(req, saverr);
+    fuse_reply_err(req, saverr);
 }
 
 static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
 {
-	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
-	ssize_t ret;
-	int saverr;
+    char procname[64];
+    struct lo_inode *inode = lo_inode(req, ino);
+    ssize_t ret;
+    int saverr;
 
-	saverr = ENOSYS;
-	if (!lo_data(req)->xattr)
-		goto out;
+    saverr = ENOSYS;
+    if (!lo_data(req)->xattr) {
+        goto out;
+    }
 
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n",
-			ino, name);
-	}
+    if (lo_debug(req)) {
+        fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n",
+                 ino, name);
+    }
 
-	if (inode->is_symlink) {
-		/* Sorry, no race free way to setxattr on symlink. */
-		saverr = EPERM;
-		goto out;
-	}
+    if (inode->is_symlink) {
+        /* Sorry, no race free way to setxattr on symlink. */
+        saverr = EPERM;
+        goto out;
+    }
 
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(procname, "/proc/self/fd/%i", inode->fd);
 
-	ret = removexattr(procname, name);
-	saverr = ret == -1 ? errno : 0;
+    ret = removexattr(procname, name);
+    saverr = ret == -1 ? errno : 0;
 
 out:
-	fuse_reply_err(req, saverr);
+    fuse_reply_err(req, saverr);
 }
 
 #ifdef HAVE_COPY_FILE_RANGE
 static void lo_copy_file_range(fuse_req_t req, fuse_ino_t ino_in, off_t off_in,
-			       struct fuse_file_info *fi_in,
-			       fuse_ino_t ino_out, off_t off_out,
-			       struct fuse_file_info *fi_out, size_t len,
-			       int flags)
+                               struct fuse_file_info *fi_in, fuse_ino_t ino_out,
+                               off_t off_out, struct fuse_file_info *fi_out,
+                               size_t len, int flags)
 {
-	ssize_t res;
-
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_copy_file_range(ino=%" PRIu64 "/fd=%lu, "
-				"off=%lu, ino=%" PRIu64 "/fd=%lu, "
-				"off=%lu, size=%zd, flags=0x%x)\n",
-			ino_in, fi_in->fh, off_in, ino_out, fi_out->fh, off_out,
-			len, flags);
-
-	res = copy_file_range(fi_in->fh, &off_in, fi_out->fh, &off_out, len,
-			      flags);
-	if (res < 0)
-		fuse_reply_err(req, -errno);
-	else
-		fuse_reply_write(req, res);
+    ssize_t res;
+
+    if (lo_debug(req))
+        fuse_log(FUSE_LOG_DEBUG,
+                 "lo_copy_file_range(ino=%" PRIu64 "/fd=%lu, "
+                 "off=%lu, ino=%" PRIu64 "/fd=%lu, "
+                 "off=%lu, size=%zd, flags=0x%x)\n",
+                 ino_in, fi_in->fh, off_in, ino_out, fi_out->fh, off_out, len,
+                 flags);
+
+    res = copy_file_range(fi_in->fh, &off_in, fi_out->fh, &off_out, len, flags);
+    if (res < 0) {
+        fuse_reply_err(req, -errno);
+    } else {
+        fuse_reply_write(req, res);
+    }
 }
 #endif
 
 static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
-		     struct fuse_file_info *fi)
+                     struct fuse_file_info *fi)
 {
-	off_t res;
-
-	(void)ino;
-	res = lseek(fi->fh, off, whence);
-	if (res != -1)
-		fuse_reply_lseek(req, res);
-	else
-		fuse_reply_err(req, errno);
+    off_t res;
+
+    (void)ino;
+    res = lseek(fi->fh, off, whence);
+    if (res != -1) {
+        fuse_reply_lseek(req, res);
+    } else {
+        fuse_reply_err(req, errno);
+    }
 }
 
 static struct fuse_lowlevel_ops lo_oper = {
-	.init		= lo_init,
-	.lookup		= lo_lookup,
-	.mkdir		= lo_mkdir,
-	.mknod		= lo_mknod,
-	.symlink	= lo_symlink,
-	.link		= lo_link,
-	.unlink		= lo_unlink,
-	.rmdir		= lo_rmdir,
-	.rename		= lo_rename,
-	.forget		= lo_forget,
-	.forget_multi	= lo_forget_multi,
-	.getattr	= lo_getattr,
-	.setattr	= lo_setattr,
-	.readlink	= lo_readlink,
-	.opendir	= lo_opendir,
-	.readdir	= lo_readdir,
-	.readdirplus	= lo_readdirplus,
-	.releasedir	= lo_releasedir,
-	.fsyncdir	= lo_fsyncdir,
-	.create		= lo_create,
-	.open		= lo_open,
-	.release	= lo_release,
-	.flush		= lo_flush,
-	.fsync		= lo_fsync,
-	.read		= lo_read,
-	.write_buf      = lo_write_buf,
-	.statfs		= lo_statfs,
-	.fallocate	= lo_fallocate,
-	.flock		= lo_flock,
-	.getxattr	= lo_getxattr,
-	.listxattr	= lo_listxattr,
-	.setxattr	= lo_setxattr,
-	.removexattr	= lo_removexattr,
+    .init = lo_init,
+    .lookup = lo_lookup,
+    .mkdir = lo_mkdir,
+    .mknod = lo_mknod,
+    .symlink = lo_symlink,
+    .link = lo_link,
+    .unlink = lo_unlink,
+    .rmdir = lo_rmdir,
+    .rename = lo_rename,
+    .forget = lo_forget,
+    .forget_multi = lo_forget_multi,
+    .getattr = lo_getattr,
+    .setattr = lo_setattr,
+    .readlink = lo_readlink,
+    .opendir = lo_opendir,
+    .readdir = lo_readdir,
+    .readdirplus = lo_readdirplus,
+    .releasedir = lo_releasedir,
+    .fsyncdir = lo_fsyncdir,
+    .create = lo_create,
+    .open = lo_open,
+    .release = lo_release,
+    .flush = lo_flush,
+    .fsync = lo_fsync,
+    .read = lo_read,
+    .write_buf = lo_write_buf,
+    .statfs = lo_statfs,
+    .fallocate = lo_fallocate,
+    .flock = lo_flock,
+    .getxattr = lo_getxattr,
+    .listxattr = lo_listxattr,
+    .setxattr = lo_setxattr,
+    .removexattr = lo_removexattr,
 #ifdef HAVE_COPY_FILE_RANGE
-	.copy_file_range = lo_copy_file_range,
+    .copy_file_range = lo_copy_file_range,
 #endif
-	.lseek		= lo_lseek,
+    .lseek = lo_lseek,
 };
 
 int main(int argc, char *argv[])
 {
-	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
-	struct fuse_session *se;
-	struct fuse_cmdline_opts opts;
-	struct lo_data lo = { .debug = 0,
-	                      .writeback = 0 };
-	int ret = -1;
-
-	/* Don't mask creation mode, kernel already did that */
-	umask(0);
-
-	pthread_mutex_init(&lo.mutex, NULL);
-	lo.root.next = lo.root.prev = &lo.root;
-	lo.root.fd = -1;
-	lo.cache = CACHE_NORMAL;
-
-	if (fuse_parse_cmdline(&args, &opts) != 0)
-		return 1;
-	if (opts.show_help) {
-		printf("usage: %s [options] <mountpoint>\n\n", argv[0]);
-		fuse_cmdline_help();
-		fuse_lowlevel_help();
-		ret = 0;
-		goto err_out1;
-	} else if (opts.show_version) {
-		printf("FUSE library version %s\n", fuse_pkgversion());
-		fuse_lowlevel_version();
-		ret = 0;
-		goto err_out1;
-	}
-
-	if(opts.mountpoint == NULL) {
-		printf("usage: %s [options] <mountpoint>\n", argv[0]);
-		printf("       %s --help\n", argv[0]);
-		ret = 1;
-		goto err_out1;
-	}
-
-	if (fuse_opt_parse(&args, &lo, lo_opts, NULL)== -1)
-		return 1;
-
-	lo.debug = opts.debug;
-	lo.root.refcount = 2;
-	if (lo.source) {
-		struct stat stat;
-		int res;
-
-		res = lstat(lo.source, &stat);
-		if (res == -1) {
-			fuse_log(FUSE_LOG_ERR, "failed to stat source (\"%s\"): %m\n",
-				 lo.source);
-			exit(1);
-		}
-		if (!S_ISDIR(stat.st_mode)) {
-			fuse_log(FUSE_LOG_ERR, "source is not a directory\n");
-			exit(1);
-		}
-
-	} else {
-		lo.source = "/";
-	}
-	lo.root.is_symlink = false;
-	if (!lo.timeout_set) {
-		switch (lo.cache) {
-		case CACHE_NEVER:
-			lo.timeout = 0.0;
-			break;
-
-		case CACHE_NORMAL:
-			lo.timeout = 1.0;
-			break;
-
-		case CACHE_ALWAYS:
-			lo.timeout = 86400.0;
-			break;
-		}
-	} else if (lo.timeout < 0) {
-		fuse_log(FUSE_LOG_ERR, "timeout is negative (%lf)\n",
-			 lo.timeout);
-		exit(1);
-	}
-
-	lo.root.fd = open(lo.source, O_PATH);
-	if (lo.root.fd == -1) {
-		fuse_log(FUSE_LOG_ERR, "open(\"%s\", O_PATH): %m\n",
-			 lo.source);
-		exit(1);
-	}
-
-	se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
-	if (se == NULL)
-	    goto err_out1;
-
-	if (fuse_set_signal_handlers(se) != 0)
-	    goto err_out2;
-
-	if (fuse_session_mount(se, opts.mountpoint) != 0)
-	    goto err_out3;
-
-	fuse_daemonize(opts.foreground);
-
-	/* Block until ctrl+c or fusermount -u */
-	if (opts.singlethread)
-		ret = fuse_session_loop(se);
-	else
-		ret = fuse_session_loop_mt(se, opts.clone_fd);
-
-	fuse_session_unmount(se);
+    struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
+    struct fuse_session *se;
+    struct fuse_cmdline_opts opts;
+    struct lo_data lo = { .debug = 0, .writeback = 0 };
+    int ret = -1;
+
+    /* Don't mask creation mode, kernel already did that */
+    umask(0);
+
+    pthread_mutex_init(&lo.mutex, NULL);
+    lo.root.next = lo.root.prev = &lo.root;
+    lo.root.fd = -1;
+    lo.cache = CACHE_NORMAL;
+
+    if (fuse_parse_cmdline(&args, &opts) != 0) {
+        return 1;
+    }
+    if (opts.show_help) {
+        printf("usage: %s [options] <mountpoint>\n\n", argv[0]);
+        fuse_cmdline_help();
+        fuse_lowlevel_help();
+        ret = 0;
+        goto err_out1;
+    } else if (opts.show_version) {
+        printf("FUSE library version %s\n", fuse_pkgversion());
+        fuse_lowlevel_version();
+        ret = 0;
+        goto err_out1;
+    }
+
+    if (opts.mountpoint == NULL) {
+        printf("usage: %s [options] <mountpoint>\n", argv[0]);
+        printf("       %s --help\n", argv[0]);
+        ret = 1;
+        goto err_out1;
+    }
+
+    if (fuse_opt_parse(&args, &lo, lo_opts, NULL) == -1) {
+        return 1;
+    }
+
+    lo.debug = opts.debug;
+    lo.root.refcount = 2;
+    if (lo.source) {
+        struct stat stat;
+        int res;
+
+        res = lstat(lo.source, &stat);
+        if (res == -1) {
+            fuse_log(FUSE_LOG_ERR, "failed to stat source (\"%s\"): %m\n",
+                     lo.source);
+            exit(1);
+        }
+        if (!S_ISDIR(stat.st_mode)) {
+            fuse_log(FUSE_LOG_ERR, "source is not a directory\n");
+            exit(1);
+        }
+
+    } else {
+        lo.source = "/";
+    }
+    lo.root.is_symlink = false;
+    if (!lo.timeout_set) {
+        switch (lo.cache) {
+        case CACHE_NEVER:
+            lo.timeout = 0.0;
+            break;
+
+        case CACHE_NORMAL:
+            lo.timeout = 1.0;
+            break;
+
+        case CACHE_ALWAYS:
+            lo.timeout = 86400.0;
+            break;
+        }
+    } else if (lo.timeout < 0) {
+        fuse_log(FUSE_LOG_ERR, "timeout is negative (%lf)\n", lo.timeout);
+        exit(1);
+    }
+
+    lo.root.fd = open(lo.source, O_PATH);
+    if (lo.root.fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(\"%s\", O_PATH): %m\n", lo.source);
+        exit(1);
+    }
+
+    se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
+    if (se == NULL) {
+        goto err_out1;
+    }
+
+    if (fuse_set_signal_handlers(se) != 0) {
+        goto err_out2;
+    }
+
+    if (fuse_session_mount(se, opts.mountpoint) != 0) {
+        goto err_out3;
+    }
+
+    fuse_daemonize(opts.foreground);
+
+    /* Block until ctrl+c or fusermount -u */
+    if (opts.singlethread) {
+        ret = fuse_session_loop(se);
+    } else {
+        ret = fuse_session_loop_mt(se, opts.clone_fd);
+    }
+
+    fuse_session_unmount(se);
 err_out3:
-	fuse_remove_signal_handlers(se);
+    fuse_remove_signal_handlers(se);
 err_out2:
-	fuse_session_destroy(se);
+    fuse_session_destroy(se);
 err_out1:
-	free(opts.mountpoint);
-	fuse_opt_free_args(&args);
+    free(opts.mountpoint);
+    fuse_opt_free_args(&args);
 
-	if (lo.root.fd >= 0)
-		close(lo.root.fd);
+    if (lo.root.fd >= 0) {
+        close(lo.root.fd);
+    }
 
-	return ret ? 1 : 0;
+    return ret ? 1 : 0;
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 008/104] virtiofsd: remove mountpoint dummy argument
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 007/104] virtiofsd: Format imported files to qemu style Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 12:12   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 009/104] virtiofsd: remove unused notify reply support Dr. David Alan Gilbert (git)
                   ` (98 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Classic FUSE file system daemons take a mountpoint argument but
virtiofsd exposes a vhost-user UNIX domain socket instead.  The
mountpoint argument is not used by virtiofsd but the user is still
required to pass a dummy argument on the command-line.

Remove the mountpoint argument to clean up the command-line.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c  |  2 +-
 tools/virtiofsd/fuse_lowlevel.h  |  4 +---
 tools/virtiofsd/helper.c         | 20 +++-----------------
 tools/virtiofsd/passthrough_ll.c | 12 ++----------
 4 files changed, 7 insertions(+), 31 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 42feee5c1c..20037eef67 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2479,7 +2479,7 @@ out1:
     return NULL;
 }
 
-int fuse_session_mount(struct fuse_session *se, const char *mountpoint)
+int fuse_session_mount(struct fuse_session *se)
 {
     int fd;
 
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 85cc027382..17899e012a 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1864,7 +1864,6 @@ struct fuse_cmdline_opts {
     int foreground;
     int debug;
     int nodefault_subtype;
-    char *mountpoint;
     int show_version;
     int show_help;
     int clone_fd;
@@ -1926,12 +1925,11 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
 /**
  * Mount a FUSE file system.
  *
- * @param mountpoint the mount point path
  * @param se session object
  *
  * @return 0 on success, -1 on failure.
  **/
-int fuse_session_mount(struct fuse_session *se, const char *mountpoint);
+int fuse_session_mount(struct fuse_session *se);
 
 /**
  * Enter a single threaded, blocking event loop.
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index e077943558..d8c42401a7 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -146,27 +146,13 @@ void fuse_cmdline_help(void)
 static int fuse_helper_opt_proc(void *data, const char *arg, int key,
                                 struct fuse_args *outargs)
 {
+    (void)data;
     (void)outargs;
-    struct fuse_cmdline_opts *opts = data;
 
     switch (key) {
     case FUSE_OPT_KEY_NONOPT:
-        if (!opts->mountpoint) {
-            if (fuse_mnt_parse_fuse_fd(arg) != -1) {
-                return fuse_opt_add_opt(&opts->mountpoint, arg);
-            }
-
-            char mountpoint[PATH_MAX] = "";
-            if (realpath(arg, mountpoint) == NULL) {
-                fuse_log(FUSE_LOG_ERR, "fuse: bad mount point `%s': %s\n", arg,
-                         strerror(errno));
-                return -1;
-            }
-            return fuse_opt_add_opt(&opts->mountpoint, mountpoint);
-        } else {
-            fuse_log(FUSE_LOG_ERR, "fuse: invalid argument `%s'\n", arg);
-            return -1;
-        }
+        fuse_log(FUSE_LOG_ERR, "fuse: invalid argument `%s'\n", arg);
+        return -1;
 
     default:
         /* Pass through unknown options */
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index cd399d5c4b..a79ec2c70d 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1297,7 +1297,7 @@ int main(int argc, char *argv[])
         return 1;
     }
     if (opts.show_help) {
-        printf("usage: %s [options] <mountpoint>\n\n", argv[0]);
+        printf("usage: %s [options]\n\n", argv[0]);
         fuse_cmdline_help();
         fuse_lowlevel_help();
         ret = 0;
@@ -1309,13 +1309,6 @@ int main(int argc, char *argv[])
         goto err_out1;
     }
 
-    if (opts.mountpoint == NULL) {
-        printf("usage: %s [options] <mountpoint>\n", argv[0]);
-        printf("       %s --help\n", argv[0]);
-        ret = 1;
-        goto err_out1;
-    }
-
     if (fuse_opt_parse(&args, &lo, lo_opts, NULL) == -1) {
         return 1;
     }
@@ -1375,7 +1368,7 @@ int main(int argc, char *argv[])
         goto err_out2;
     }
 
-    if (fuse_session_mount(se, opts.mountpoint) != 0) {
+    if (fuse_session_mount(se) != 0) {
         goto err_out3;
     }
 
@@ -1394,7 +1387,6 @@ err_out3:
 err_out2:
     fuse_session_destroy(se);
 err_out1:
-    free(opts.mountpoint);
     fuse_opt_free_args(&args);
 
     if (lo.root.fd >= 0) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 009/104] virtiofsd: remove unused notify reply support
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 008/104] virtiofsd: remove mountpoint dummy argument Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 12:14   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 010/104] virtiofsd: Fix fuse_daemonize ignored return values Dr. David Alan Gilbert (git)
                   ` (97 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Notify reply support is unused by virtiofsd.  The code would need to be
updated to validate input buffer sizes.  Remove this unused code since
changes to it are untestable.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 147 +-------------------------------
 tools/virtiofsd/fuse_lowlevel.h |  47 ----------
 2 files changed, 1 insertion(+), 193 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 20037eef67..0d7b2c3dc9 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -32,12 +32,6 @@
 #define PARAM(inarg) (((char *)(inarg)) + sizeof(*(inarg)))
 #define OFFSET_MAX 0x7fffffffffffffffLL
 
-#define container_of(ptr, type, member)                    \
-    ({                                                     \
-        const typeof(((type *)0)->member) *__mptr = (ptr); \
-        (type *)((char *)__mptr - offsetof(type, member)); \
-    })
-
 struct fuse_pollhandle {
     uint64_t kh;
     struct fuse_session *se;
@@ -1864,52 +1858,6 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     send_reply_ok(req, NULL, 0);
 }
 
-static void list_del_nreq(struct fuse_notify_req *nreq)
-{
-    struct fuse_notify_req *prev = nreq->prev;
-    struct fuse_notify_req *next = nreq->next;
-    prev->next = next;
-    next->prev = prev;
-}
-
-static void list_add_nreq(struct fuse_notify_req *nreq,
-                          struct fuse_notify_req *next)
-{
-    struct fuse_notify_req *prev = next->prev;
-    nreq->next = next;
-    nreq->prev = prev;
-    prev->next = nreq;
-    next->prev = nreq;
-}
-
-static void list_init_nreq(struct fuse_notify_req *nreq)
-{
-    nreq->next = nreq;
-    nreq->prev = nreq;
-}
-
-static void do_notify_reply(fuse_req_t req, fuse_ino_t nodeid,
-                            const void *inarg, const struct fuse_buf *buf)
-{
-    struct fuse_session *se = req->se;
-    struct fuse_notify_req *nreq;
-    struct fuse_notify_req *head;
-
-    pthread_mutex_lock(&se->lock);
-    head = &se->notify_list;
-    for (nreq = head->next; nreq != head; nreq = nreq->next) {
-        if (nreq->unique == req->unique) {
-            list_del_nreq(nreq);
-            break;
-        }
-    }
-    pthread_mutex_unlock(&se->lock);
-
-    if (nreq != head) {
-        nreq->reply(nreq, req, nodeid, inarg, buf);
-    }
-}
-
 static int send_notify_iov(struct fuse_session *se, int notify_code,
                            struct iovec *iov, int count)
 {
@@ -2061,95 +2009,6 @@ int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
     return res;
 }
 
-struct fuse_retrieve_req {
-    struct fuse_notify_req nreq;
-    void *cookie;
-};
-
-static void fuse_ll_retrieve_reply(struct fuse_notify_req *nreq, fuse_req_t req,
-                                   fuse_ino_t ino, const void *inarg,
-                                   const struct fuse_buf *ibuf)
-{
-    struct fuse_session *se = req->se;
-    struct fuse_retrieve_req *rreq =
-        container_of(nreq, struct fuse_retrieve_req, nreq);
-    const struct fuse_notify_retrieve_in *arg = inarg;
-    struct fuse_bufvec bufv = {
-        .buf[0] = *ibuf,
-        .count = 1,
-    };
-
-    if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD)) {
-        bufv.buf[0].mem = PARAM(arg);
-    }
-
-    bufv.buf[0].size -=
-        sizeof(struct fuse_in_header) + sizeof(struct fuse_notify_retrieve_in);
-
-    if (bufv.buf[0].size < arg->size) {
-        fuse_log(FUSE_LOG_ERR, "fuse: retrieve reply: buffer size too small\n");
-        fuse_reply_none(req);
-        goto out;
-    }
-    bufv.buf[0].size = arg->size;
-
-    if (se->op.retrieve_reply) {
-        se->op.retrieve_reply(req, rreq->cookie, ino, arg->offset, &bufv);
-    } else {
-        fuse_reply_none(req);
-    }
-out:
-    free(rreq);
-}
-
-int fuse_lowlevel_notify_retrieve(struct fuse_session *se, fuse_ino_t ino,
-                                  size_t size, off_t offset, void *cookie)
-{
-    struct fuse_notify_retrieve_out outarg;
-    struct iovec iov[2];
-    struct fuse_retrieve_req *rreq;
-    int err;
-
-    if (!se) {
-        return -EINVAL;
-    }
-
-    if (se->conn.proto_major < 6 || se->conn.proto_minor < 15) {
-        return -ENOSYS;
-    }
-
-    rreq = malloc(sizeof(*rreq));
-    if (rreq == NULL) {
-        return -ENOMEM;
-    }
-
-    pthread_mutex_lock(&se->lock);
-    rreq->cookie = cookie;
-    rreq->nreq.unique = se->notify_ctr++;
-    rreq->nreq.reply = fuse_ll_retrieve_reply;
-    list_add_nreq(&rreq->nreq, &se->notify_list);
-    pthread_mutex_unlock(&se->lock);
-
-    outarg.notify_unique = rreq->nreq.unique;
-    outarg.nodeid = ino;
-    outarg.offset = offset;
-    outarg.size = size;
-    outarg.padding = 0;
-
-    iov[1].iov_base = &outarg;
-    iov[1].iov_len = sizeof(outarg);
-
-    err = send_notify_iov(se, FUSE_NOTIFY_RETRIEVE, iov, 2);
-    if (err) {
-        pthread_mutex_lock(&se->lock);
-        list_del_nreq(&rreq->nreq);
-        pthread_mutex_unlock(&se->lock);
-        free(rreq);
-    }
-
-    return err;
-}
-
 void *fuse_req_userdata(fuse_req_t req)
 {
     return req->se->userdata;
@@ -2228,7 +2087,7 @@ static struct {
     [FUSE_POLL] = { do_poll, "POLL" },
     [FUSE_FALLOCATE] = { do_fallocate, "FALLOCATE" },
     [FUSE_DESTROY] = { do_destroy, "DESTROY" },
-    [FUSE_NOTIFY_REPLY] = { (void *)1, "NOTIFY_REPLY" },
+    [FUSE_NOTIFY_REPLY] = { NULL, "NOTIFY_REPLY" },
     [FUSE_BATCH_FORGET] = { do_batch_forget, "BATCH_FORGET" },
     [FUSE_READDIRPLUS] = { do_readdirplus, "READDIRPLUS" },
     [FUSE_RENAME2] = { do_rename2, "RENAME2" },
@@ -2336,8 +2195,6 @@ void fuse_session_process_buf_int(struct fuse_session *se,
     inarg = (void *)&in[1];
     if (in->opcode == FUSE_WRITE && se->op.write_buf) {
         do_write_buf(req, in->nodeid, inarg, buf);
-    } else if (in->opcode == FUSE_NOTIFY_REPLY) {
-        do_notify_reply(req, in->nodeid, inarg, buf);
     } else {
         fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
     }
@@ -2461,8 +2318,6 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
 
     list_init_req(&se->list);
     list_init_req(&se->interrupts);
-    list_init_nreq(&se->notify_list);
-    se->notify_ctr = 1;
     fuse_mutex_init(&se->lock);
 
     memcpy(&se->op, op, op_size);
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 17899e012a..79929e5541 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1084,21 +1084,6 @@ struct fuse_lowlevel_ops {
     void (*write_buf)(fuse_req_t req, fuse_ino_t ino, struct fuse_bufvec *bufv,
                       off_t off, struct fuse_file_info *fi);
 
-    /**
-     * Callback function for the retrieve request
-     *
-     * Valid replies:
-     *  fuse_reply_none
-     *
-     * @param req request handle
-     * @param cookie user data supplied to fuse_lowlevel_notify_retrieve()
-     * @param ino the inode number supplied to fuse_lowlevel_notify_retrieve()
-     * @param offset the offset supplied to fuse_lowlevel_notify_retrieve()
-     * @param bufv the buffer containing the returned data
-     */
-    void (*retrieve_reply)(fuse_req_t req, void *cookie, fuse_ino_t ino,
-                           off_t offset, struct fuse_bufvec *bufv);
-
     /**
      * Forget about multiple inodes
      *
@@ -1726,38 +1711,6 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
                                off_t offset, struct fuse_bufvec *bufv,
                                enum fuse_buf_copy_flags flags);
-/**
- * Retrieve data from the kernel buffers
- *
- * Retrieve data in the kernel buffers belonging to the given inode.
- * If successful then the retrieve_reply() method will be called with
- * the returned data.
- *
- * Only present pages are returned in the retrieve reply.  Retrieving
- * stops when it finds a non-present page and only data prior to that
- * is returned.
- *
- * If this function returns an error, then the retrieve will not be
- * completed and no reply will be sent.
- *
- * This function doesn't change the dirty state of pages in the kernel
- * buffer.  For dirty pages the write() method will be called
- * regardless of having been retrieved previously.
- *
- * Added in FUSE protocol version 7.15. If the kernel does not support
- * this (or a newer) version, the function will return -ENOSYS and do
- * nothing.
- *
- * @param se the session object
- * @param ino the inode number
- * @param size the number of bytes to retrieve
- * @param offset the starting offset into the file to retrieve from
- * @param cookie user data to supply to the reply callback
- * @return zero for success, -errno for failure
- */
-int fuse_lowlevel_notify_retrieve(struct fuse_session *se, fuse_ino_t ino,
-                                  size_t size, off_t offset, void *cookie);
-
 
 /*
  * Utility functions
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 010/104] virtiofsd: Fix fuse_daemonize ignored return values
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 009/104] virtiofsd: remove unused notify reply support Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 12:18   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 011/104] virtiofsd: Fix common header and define for QEMU builds Dr. David Alan Gilbert (git)
                   ` (96 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

QEMU's compiler enables warnings/errors for ignored values
and the (void) trick used in the fuse code isn't enough.
Turn all the return values into a return value on the function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/helper.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index d8c42401a7..8afccfc15e 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -10,12 +10,10 @@
  * See the file COPYING.LIB.
  */
 
-#include "config.h"
 #include "fuse_i.h"
 #include "fuse_lowlevel.h"
 #include "fuse_misc.h"
 #include "fuse_opt.h"
-#include "mount_util.h"
 
 #include <errno.h>
 #include <limits.h>
@@ -177,6 +175,7 @@ int fuse_parse_cmdline(struct fuse_args *args, struct fuse_cmdline_opts *opts)
 
 int fuse_daemonize(int foreground)
 {
+    int ret = 0, rett;
     if (!foreground) {
         int nullfd;
         int waiter[2];
@@ -198,8 +197,8 @@ int fuse_daemonize(int foreground)
         case 0:
             break;
         default:
-            (void)read(waiter[0], &completed, sizeof(completed));
-            _exit(0);
+            _exit(read(waiter[0], &completed,
+                       sizeof(completed) != sizeof(completed)));
         }
 
         if (setsid() == -1) {
@@ -207,13 +206,22 @@ int fuse_daemonize(int foreground)
             return -1;
         }
 
-        (void)chdir("/");
+        ret = chdir("/");
 
         nullfd = open("/dev/null", O_RDWR, 0);
         if (nullfd != -1) {
-            (void)dup2(nullfd, 0);
-            (void)dup2(nullfd, 1);
-            (void)dup2(nullfd, 2);
+            rett = dup2(nullfd, 0);
+            if (!ret) {
+                ret = rett;
+            }
+            rett = dup2(nullfd, 1);
+            if (!ret) {
+                ret = rett;
+            }
+            rett = dup2(nullfd, 2);
+            if (!ret) {
+                ret = rett;
+            }
             if (nullfd > 2) {
                 close(nullfd);
             }
@@ -221,13 +229,16 @@ int fuse_daemonize(int foreground)
 
         /* Propagate completion of daemon initialization */
         completed = 1;
-        (void)write(waiter[1], &completed, sizeof(completed));
+        rett = write(waiter[1], &completed, sizeof(completed));
+        if (!ret) {
+            ret = rett;
+        }
         close(waiter[0]);
         close(waiter[1]);
     } else {
-        (void)chdir("/");
+        ret = chdir("/");
     }
-    return 0;
+    return ret;
 }
 
 void fuse_apply_conn_info_opts(struct fuse_conn_info_opts *opts,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 011/104] virtiofsd: Fix common header and define for QEMU builds
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 010/104] virtiofsd: Fix fuse_daemonize ignored return values Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 12:22   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 012/104] virtiofsd: Trim out compatibility code Dr. David Alan Gilbert (git)
                   ` (95 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

All of the fuse files include config.h and define GNU_SOURCE
where we don't have either under our build - remove them.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/buffer.c         |  3 ---
 tools/virtiofsd/fuse_i.h         |  3 +++
 tools/virtiofsd/fuse_loop_mt.c   |  3 +--
 tools/virtiofsd/fuse_lowlevel.c  | 12 +-----------
 tools/virtiofsd/fuse_opt.c       |  1 -
 tools/virtiofsd/fuse_signals.c   |  1 -
 tools/virtiofsd/passthrough_ll.c |  9 ++-------
 7 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 38521f5889..1d7e6d2439 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -9,9 +9,6 @@
  * See the file COPYING.LIB
  */
 
-#define _GNU_SOURCE
-
-#include "config.h"
 #include "fuse_i.h"
 #include "fuse_lowlevel.h"
 #include <assert.h>
diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index 1119e85e57..0b5acc8765 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -6,6 +6,9 @@
  * See the file COPYING.LIB
  */
 
+#define FUSE_USE_VERSION 31
+
+
 #include "fuse.h"
 #include "fuse_lowlevel.h"
 
diff --git a/tools/virtiofsd/fuse_loop_mt.c b/tools/virtiofsd/fuse_loop_mt.c
index 39e080d9ff..00138b2ab3 100644
--- a/tools/virtiofsd/fuse_loop_mt.c
+++ b/tools/virtiofsd/fuse_loop_mt.c
@@ -8,11 +8,10 @@
  * See the file COPYING.LIB.
  */
 
-#include "config.h"
 #include "fuse_i.h"
-#include "fuse_kernel.h"
 #include "fuse_lowlevel.h"
 #include "fuse_misc.h"
+#include "standard-headers/linux/fuse.h"
 
 #include <assert.h>
 #include <errno.h>
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 0d7b2c3dc9..497eb25487 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -9,14 +9,10 @@
  * See the file COPYING.LIB
  */
 
-#define _GNU_SOURCE
-
-#include "config.h"
 #include "fuse_i.h"
-#include "fuse_kernel.h"
+#include "standard-headers/linux/fuse.h"
 #include "fuse_misc.h"
 #include "fuse_opt.h"
-#include "mount_util.h"
 
 #include <assert.h>
 #include <errno.h>
@@ -2093,7 +2089,6 @@ static struct {
     [FUSE_RENAME2] = { do_rename2, "RENAME2" },
     [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
     [FUSE_LSEEK] = { do_lseek, "LSEEK" },
-    [CUSE_INIT] = { cuse_lowlevel_init, "CUSE_INIT" },
 };
 
 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
@@ -2220,7 +2215,6 @@ void fuse_lowlevel_version(void)
 {
     printf("using FUSE kernel interface version %i.%i\n", FUSE_KERNEL_VERSION,
            FUSE_KERNEL_MINOR_VERSION);
-    fuse_mount_version();
 }
 
 void fuse_lowlevel_help(void)
@@ -2310,10 +2304,6 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
         goto out4;
     }
 
-    if (se->debug) {
-        fuse_log(FUSE_LOG_DEBUG, "FUSE library version: %s\n", PACKAGE_VERSION);
-    }
-
     se->bufsize = FUSE_MAX_MAX_PAGES * getpagesize() + FUSE_BUFFER_HEADER_SIZE;
 
     list_init_req(&se->list);
diff --git a/tools/virtiofsd/fuse_opt.c b/tools/virtiofsd/fuse_opt.c
index edd36f4a3b..1fee55e266 100644
--- a/tools/virtiofsd/fuse_opt.c
+++ b/tools/virtiofsd/fuse_opt.c
@@ -10,7 +10,6 @@
  */
 
 #include "fuse_opt.h"
-#include "config.h"
 #include "fuse_i.h"
 #include "fuse_misc.h"
 
diff --git a/tools/virtiofsd/fuse_signals.c b/tools/virtiofsd/fuse_signals.c
index 19d6791cb9..10a6f88088 100644
--- a/tools/virtiofsd/fuse_signals.c
+++ b/tools/virtiofsd/fuse_signals.c
@@ -8,7 +8,6 @@
  * See the file COPYING.LIB
  */
 
-#include "config.h"
 #include "fuse_i.h"
 #include "fuse_lowlevel.h"
 
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index a79ec2c70d..0e543353a4 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -35,15 +35,10 @@
  * \include passthrough_ll.c
  */
 
-#define _GNU_SOURCE
-#define FUSE_USE_VERSION 31
-
-#include "config.h"
-
+#include "fuse_lowlevel.h"
 #include <assert.h>
 #include <dirent.h>
 #include <errno.h>
-#include <fuse_lowlevel.h>
 #include <inttypes.h>
 #include <limits.h>
 #include <pthread.h>
@@ -58,6 +53,7 @@
 
 #include "passthrough_helpers.h"
 
+#define HAVE_POSIX_FALLOCATE 1
 /*
  * We are re-using pointers to our `struct lo_inode` and `struct
  * lo_dirp` elements as inodes. This means that we must be able to
@@ -1303,7 +1299,6 @@ int main(int argc, char *argv[])
         ret = 0;
         goto err_out1;
     } else if (opts.show_version) {
-        printf("FUSE library version %s\n", fuse_pkgversion());
         fuse_lowlevel_version();
         ret = 0;
         goto err_out1;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 012/104] virtiofsd: Trim out compatibility code
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 011/104] virtiofsd: Fix common header and define for QEMU builds Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 12:26   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 013/104] virtiofsd: Make fsync work even if only inode is passed in Dr. David Alan Gilbert (git)
                   ` (94 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

virtiofsd only supports major=7, minor>=31; trim out a lot of
old compatibility code.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 330 ++++++++++++--------------------
 1 file changed, 119 insertions(+), 211 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 497eb25487..f4bd303a7a 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -387,16 +387,7 @@ static void fill_open(struct fuse_open_out *arg, const struct fuse_file_info *f)
 int fuse_reply_entry(fuse_req_t req, const struct fuse_entry_param *e)
 {
     struct fuse_entry_out arg;
-    size_t size = req->se->conn.proto_minor < 9 ? FUSE_COMPAT_ENTRY_OUT_SIZE :
-                                                  sizeof(arg);
-
-    /*
-     * before ABI 7.4 e->ino == 0 was invalid, only ENOENT meant
-     * negative entry
-     */
-    if (!e->ino && req->se->conn.proto_minor < 4) {
-        return fuse_reply_err(req, ENOENT);
-    }
+    size_t size = sizeof(arg);
 
     memset(&arg, 0, sizeof(arg));
     fill_entry(&arg, e);
@@ -407,9 +398,7 @@ int fuse_reply_create(fuse_req_t req, const struct fuse_entry_param *e,
                       const struct fuse_file_info *f)
 {
     char buf[sizeof(struct fuse_entry_out) + sizeof(struct fuse_open_out)];
-    size_t entrysize = req->se->conn.proto_minor < 9 ?
-                           FUSE_COMPAT_ENTRY_OUT_SIZE :
-                           sizeof(struct fuse_entry_out);
+    size_t entrysize = sizeof(struct fuse_entry_out);
     struct fuse_entry_out *earg = (struct fuse_entry_out *)buf;
     struct fuse_open_out *oarg = (struct fuse_open_out *)(buf + entrysize);
 
@@ -423,8 +412,7 @@ int fuse_reply_attr(fuse_req_t req, const struct stat *attr,
                     double attr_timeout)
 {
     struct fuse_attr_out arg;
-    size_t size =
-        req->se->conn.proto_minor < 9 ? FUSE_COMPAT_ATTR_OUT_SIZE : sizeof(arg);
+    size_t size = sizeof(arg);
 
     memset(&arg, 0, sizeof(arg));
     arg.attr_valid = calc_timeout_sec(attr_timeout);
@@ -521,8 +509,7 @@ int fuse_reply_data(fuse_req_t req, struct fuse_bufvec *bufv,
 int fuse_reply_statfs(fuse_req_t req, const struct statvfs *stbuf)
 {
     struct fuse_statfs_out arg;
-    size_t size =
-        req->se->conn.proto_minor < 4 ? FUSE_COMPAT_STATFS_SIZE : sizeof(arg);
+    size_t size = sizeof(arg);
 
     memset(&arg, 0, sizeof(arg));
     convert_statfs(stbuf, &arg.st);
@@ -606,45 +593,31 @@ int fuse_reply_ioctl_retry(fuse_req_t req, const struct iovec *in_iov,
     iov[count].iov_len = sizeof(arg);
     count++;
 
-    if (req->se->conn.proto_minor < 16) {
-        if (in_count) {
-            iov[count].iov_base = (void *)in_iov;
-            iov[count].iov_len = sizeof(in_iov[0]) * in_count;
-            count++;
-        }
+    /* Can't handle non-compat 64bit ioctls on 32bit */
+    if (sizeof(void *) == 4 && req->ioctl_64bit) {
+        res = fuse_reply_err(req, EINVAL);
+        goto out;
+    }
 
-        if (out_count) {
-            iov[count].iov_base = (void *)out_iov;
-            iov[count].iov_len = sizeof(out_iov[0]) * out_count;
-            count++;
+    if (in_count) {
+        in_fiov = fuse_ioctl_iovec_copy(in_iov, in_count);
+        if (!in_fiov) {
+            goto enomem;
         }
-    } else {
-        /* Can't handle non-compat 64bit ioctls on 32bit */
-        if (sizeof(void *) == 4 && req->ioctl_64bit) {
-            res = fuse_reply_err(req, EINVAL);
-            goto out;
-        }
-
-        if (in_count) {
-            in_fiov = fuse_ioctl_iovec_copy(in_iov, in_count);
-            if (!in_fiov) {
-                goto enomem;
-            }
 
-            iov[count].iov_base = (void *)in_fiov;
-            iov[count].iov_len = sizeof(in_fiov[0]) * in_count;
-            count++;
+        iov[count].iov_base = (void *)in_fiov;
+        iov[count].iov_len = sizeof(in_fiov[0]) * in_count;
+        count++;
+    }
+    if (out_count) {
+        out_fiov = fuse_ioctl_iovec_copy(out_iov, out_count);
+        if (!out_fiov) {
+            goto enomem;
         }
-        if (out_count) {
-            out_fiov = fuse_ioctl_iovec_copy(out_iov, out_count);
-            if (!out_fiov) {
-                goto enomem;
-            }
 
-            iov[count].iov_base = (void *)out_fiov;
-            iov[count].iov_len = sizeof(out_fiov[0]) * out_count;
-            count++;
-        }
+        iov[count].iov_base = (void *)out_fiov;
+        iov[count].iov_len = sizeof(out_fiov[0]) * out_count;
+        count++;
     }
 
     res = send_reply_iov(req, 0, iov, count);
@@ -786,14 +759,12 @@ static void do_getattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     struct fuse_file_info *fip = NULL;
     struct fuse_file_info fi;
 
-    if (req->se->conn.proto_minor >= 9) {
-        struct fuse_getattr_in *arg = (struct fuse_getattr_in *)inarg;
+    struct fuse_getattr_in *arg = (struct fuse_getattr_in *)inarg;
 
-        if (arg->getattr_flags & FUSE_GETATTR_FH) {
-            memset(&fi, 0, sizeof(fi));
-            fi.fh = arg->fh;
-            fip = &fi;
-        }
+    if (arg->getattr_flags & FUSE_GETATTR_FH) {
+        memset(&fi, 0, sizeof(fi));
+        fi.fh = arg->fh;
+        fip = &fi;
     }
 
     if (req->se->op.getattr) {
@@ -858,11 +829,7 @@ static void do_mknod(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     struct fuse_mknod_in *arg = (struct fuse_mknod_in *)inarg;
     char *name = PARAM(arg);
 
-    if (req->se->conn.proto_minor >= 12) {
-        req->ctx.umask = arg->umask;
-    } else {
-        name = (char *)inarg + FUSE_COMPAT_MKNOD_IN_SIZE;
-    }
+    req->ctx.umask = arg->umask;
 
     if (req->se->op.mknod) {
         req->se->op.mknod(req, nodeid, name, arg->mode, arg->rdev);
@@ -875,9 +842,7 @@ static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 {
     struct fuse_mkdir_in *arg = (struct fuse_mkdir_in *)inarg;
 
-    if (req->se->conn.proto_minor >= 12) {
-        req->ctx.umask = arg->umask;
-    }
+    req->ctx.umask = arg->umask;
 
     if (req->se->op.mkdir) {
         req->se->op.mkdir(req, nodeid, PARAM(arg), arg->mode);
@@ -969,11 +934,7 @@ static void do_create(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
         memset(&fi, 0, sizeof(fi));
         fi.flags = arg->flags;
 
-        if (req->se->conn.proto_minor >= 12) {
-            req->ctx.umask = arg->umask;
-        } else {
-            name = (char *)inarg + sizeof(struct fuse_open_in);
-        }
+        req->ctx.umask = arg->umask;
 
         req->se->op.create(req, nodeid, name, arg->mode, &fi);
     } else {
@@ -1005,10 +966,8 @@ static void do_read(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 
         memset(&fi, 0, sizeof(fi));
         fi.fh = arg->fh;
-        if (req->se->conn.proto_minor >= 9) {
-            fi.lock_owner = arg->lock_owner;
-            fi.flags = arg->flags;
-        }
+        fi.lock_owner = arg->lock_owner;
+        fi.flags = arg->flags;
         req->se->op.read(req, nodeid, arg->size, arg->offset, &fi);
     } else {
         fuse_reply_err(req, ENOSYS);
@@ -1025,13 +984,9 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     fi.fh = arg->fh;
     fi.writepage = (arg->write_flags & FUSE_WRITE_CACHE) != 0;
 
-    if (req->se->conn.proto_minor < 9) {
-        param = ((char *)arg) + FUSE_COMPAT_WRITE_IN_SIZE;
-    } else {
-        fi.lock_owner = arg->lock_owner;
-        fi.flags = arg->flags;
-        param = PARAM(arg);
-    }
+    fi.lock_owner = arg->lock_owner;
+    fi.flags = arg->flags;
+    param = PARAM(arg);
 
     if (req->se->op.write) {
         req->se->op.write(req, nodeid, param, arg->size, arg->offset, &fi);
@@ -1055,21 +1010,14 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
     fi.fh = arg->fh;
     fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
 
-    if (se->conn.proto_minor < 9) {
-        bufv.buf[0].mem = ((char *)arg) + FUSE_COMPAT_WRITE_IN_SIZE;
-        bufv.buf[0].size -=
-            sizeof(struct fuse_in_header) + FUSE_COMPAT_WRITE_IN_SIZE;
-        assert(!(bufv.buf[0].flags & FUSE_BUF_IS_FD));
-    } else {
-        fi.lock_owner = arg->lock_owner;
-        fi.flags = arg->flags;
-        if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD)) {
-            bufv.buf[0].mem = PARAM(arg);
-        }
-
-        bufv.buf[0].size -=
-            sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in);
+    fi.lock_owner = arg->lock_owner;
+    fi.flags = arg->flags;
+    if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD)) {
+        bufv.buf[0].mem = PARAM(arg);
     }
+
+    bufv.buf[0].size -=
+        sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in);
     if (bufv.buf[0].size < arg->size) {
         fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
         fuse_reply_err(req, EIO);
@@ -1088,9 +1036,7 @@ static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
     fi.flush = 1;
-    if (req->se->conn.proto_minor >= 7) {
-        fi.lock_owner = arg->lock_owner;
-    }
+    fi.lock_owner = arg->lock_owner;
 
     if (req->se->op.flush) {
         req->se->op.flush(req, nodeid, &fi);
@@ -1107,10 +1053,8 @@ static void do_release(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     memset(&fi, 0, sizeof(fi));
     fi.flags = arg->flags;
     fi.fh = arg->fh;
-    if (req->se->conn.proto_minor >= 8) {
-        fi.flush = (arg->release_flags & FUSE_RELEASE_FLUSH) ? 1 : 0;
-        fi.lock_owner = arg->lock_owner;
-    }
+    fi.flush = (arg->release_flags & FUSE_RELEASE_FLUSH) ? 1 : 0;
+    fi.lock_owner = arg->lock_owner;
     if (arg->release_flags & FUSE_RELEASE_FLOCK_UNLOCK) {
         fi.flock_release = 1;
         fi.lock_owner = arg->lock_owner;
@@ -1479,8 +1423,7 @@ static void do_ioctl(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
 
-    if (sizeof(void *) == 4 && req->se->conn.proto_minor >= 16 &&
-        !(flags & FUSE_IOCTL_32BIT)) {
+    if (sizeof(void *) == 4 && !(flags & FUSE_IOCTL_32BIT)) {
         req->ioctl_64bit = 1;
     }
 
@@ -1605,7 +1548,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     outarg.major = FUSE_KERNEL_VERSION;
     outarg.minor = FUSE_KERNEL_MINOR_VERSION;
 
-    if (arg->major < 7) {
+    if (arg->major < 7 || (arg->major == 7 && arg->minor < 31)) {
         fuse_log(FUSE_LOG_ERR, "fuse: unsupported protocol version: %u.%u\n",
                  arg->major, arg->minor);
         fuse_reply_err(req, EPROTO);
@@ -1618,81 +1561,71 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
         return;
     }
 
-    if (arg->minor >= 6) {
-        if (arg->max_readahead < se->conn.max_readahead) {
-            se->conn.max_readahead = arg->max_readahead;
-        }
-        if (arg->flags & FUSE_ASYNC_READ) {
-            se->conn.capable |= FUSE_CAP_ASYNC_READ;
-        }
-        if (arg->flags & FUSE_POSIX_LOCKS) {
-            se->conn.capable |= FUSE_CAP_POSIX_LOCKS;
-        }
-        if (arg->flags & FUSE_ATOMIC_O_TRUNC) {
-            se->conn.capable |= FUSE_CAP_ATOMIC_O_TRUNC;
-        }
-        if (arg->flags & FUSE_EXPORT_SUPPORT) {
-            se->conn.capable |= FUSE_CAP_EXPORT_SUPPORT;
-        }
-        if (arg->flags & FUSE_DONT_MASK) {
-            se->conn.capable |= FUSE_CAP_DONT_MASK;
-        }
-        if (arg->flags & FUSE_FLOCK_LOCKS) {
-            se->conn.capable |= FUSE_CAP_FLOCK_LOCKS;
-        }
-        if (arg->flags & FUSE_AUTO_INVAL_DATA) {
-            se->conn.capable |= FUSE_CAP_AUTO_INVAL_DATA;
-        }
-        if (arg->flags & FUSE_DO_READDIRPLUS) {
-            se->conn.capable |= FUSE_CAP_READDIRPLUS;
-        }
-        if (arg->flags & FUSE_READDIRPLUS_AUTO) {
-            se->conn.capable |= FUSE_CAP_READDIRPLUS_AUTO;
-        }
-        if (arg->flags & FUSE_ASYNC_DIO) {
-            se->conn.capable |= FUSE_CAP_ASYNC_DIO;
-        }
-        if (arg->flags & FUSE_WRITEBACK_CACHE) {
-            se->conn.capable |= FUSE_CAP_WRITEBACK_CACHE;
-        }
-        if (arg->flags & FUSE_NO_OPEN_SUPPORT) {
-            se->conn.capable |= FUSE_CAP_NO_OPEN_SUPPORT;
-        }
-        if (arg->flags & FUSE_PARALLEL_DIROPS) {
-            se->conn.capable |= FUSE_CAP_PARALLEL_DIROPS;
-        }
-        if (arg->flags & FUSE_POSIX_ACL) {
-            se->conn.capable |= FUSE_CAP_POSIX_ACL;
-        }
-        if (arg->flags & FUSE_HANDLE_KILLPRIV) {
-            se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV;
-        }
-        if (arg->flags & FUSE_NO_OPENDIR_SUPPORT) {
-            se->conn.capable |= FUSE_CAP_NO_OPENDIR_SUPPORT;
-        }
-        if (!(arg->flags & FUSE_MAX_PAGES)) {
-            size_t max_bufsize =
-                FUSE_DEFAULT_MAX_PAGES_PER_REQ * getpagesize() +
-                FUSE_BUFFER_HEADER_SIZE;
-            if (bufsize > max_bufsize) {
-                bufsize = max_bufsize;
-            }
+    if (arg->max_readahead < se->conn.max_readahead) {
+        se->conn.max_readahead = arg->max_readahead;
+    }
+    if (arg->flags & FUSE_ASYNC_READ) {
+        se->conn.capable |= FUSE_CAP_ASYNC_READ;
+    }
+    if (arg->flags & FUSE_POSIX_LOCKS) {
+        se->conn.capable |= FUSE_CAP_POSIX_LOCKS;
+    }
+    if (arg->flags & FUSE_ATOMIC_O_TRUNC) {
+        se->conn.capable |= FUSE_CAP_ATOMIC_O_TRUNC;
+    }
+    if (arg->flags & FUSE_EXPORT_SUPPORT) {
+        se->conn.capable |= FUSE_CAP_EXPORT_SUPPORT;
+    }
+    if (arg->flags & FUSE_DONT_MASK) {
+        se->conn.capable |= FUSE_CAP_DONT_MASK;
+    }
+    if (arg->flags & FUSE_FLOCK_LOCKS) {
+        se->conn.capable |= FUSE_CAP_FLOCK_LOCKS;
+    }
+    if (arg->flags & FUSE_AUTO_INVAL_DATA) {
+        se->conn.capable |= FUSE_CAP_AUTO_INVAL_DATA;
+    }
+    if (arg->flags & FUSE_DO_READDIRPLUS) {
+        se->conn.capable |= FUSE_CAP_READDIRPLUS;
+    }
+    if (arg->flags & FUSE_READDIRPLUS_AUTO) {
+        se->conn.capable |= FUSE_CAP_READDIRPLUS_AUTO;
+    }
+    if (arg->flags & FUSE_ASYNC_DIO) {
+        se->conn.capable |= FUSE_CAP_ASYNC_DIO;
+    }
+    if (arg->flags & FUSE_WRITEBACK_CACHE) {
+        se->conn.capable |= FUSE_CAP_WRITEBACK_CACHE;
+    }
+    if (arg->flags & FUSE_NO_OPEN_SUPPORT) {
+        se->conn.capable |= FUSE_CAP_NO_OPEN_SUPPORT;
+    }
+    if (arg->flags & FUSE_PARALLEL_DIROPS) {
+        se->conn.capable |= FUSE_CAP_PARALLEL_DIROPS;
+    }
+    if (arg->flags & FUSE_POSIX_ACL) {
+        se->conn.capable |= FUSE_CAP_POSIX_ACL;
+    }
+    if (arg->flags & FUSE_HANDLE_KILLPRIV) {
+        se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV;
+    }
+    if (arg->flags & FUSE_NO_OPENDIR_SUPPORT) {
+        se->conn.capable |= FUSE_CAP_NO_OPENDIR_SUPPORT;
+    }
+    if (!(arg->flags & FUSE_MAX_PAGES)) {
+        size_t max_bufsize = FUSE_DEFAULT_MAX_PAGES_PER_REQ * getpagesize() +
+                             FUSE_BUFFER_HEADER_SIZE;
+        if (bufsize > max_bufsize) {
+            bufsize = max_bufsize;
         }
-    } else {
-        se->conn.max_readahead = 0;
     }
-
-    if (se->conn.proto_minor >= 14) {
 #ifdef HAVE_SPLICE
 #ifdef HAVE_VMSPLICE
-        se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
+    se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
 #endif
-        se->conn.capable |= FUSE_CAP_SPLICE_READ;
+    se->conn.capable |= FUSE_CAP_SPLICE_READ;
 #endif
-    }
-    if (se->conn.proto_minor >= 18) {
-        se->conn.capable |= FUSE_CAP_IOCTL_DIR;
-    }
+    se->conn.capable |= FUSE_CAP_IOCTL_DIR;
 
     /*
      * Default settings for modern filesystems.
@@ -1799,24 +1732,20 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
     outarg.max_readahead = se->conn.max_readahead;
     outarg.max_write = se->conn.max_write;
-    if (se->conn.proto_minor >= 13) {
-        if (se->conn.max_background >= (1 << 16)) {
-            se->conn.max_background = (1 << 16) - 1;
-        }
-        if (se->conn.congestion_threshold > se->conn.max_background) {
-            se->conn.congestion_threshold = se->conn.max_background;
-        }
-        if (!se->conn.congestion_threshold) {
-            se->conn.congestion_threshold = se->conn.max_background * 3 / 4;
-        }
-
-        outarg.max_background = se->conn.max_background;
-        outarg.congestion_threshold = se->conn.congestion_threshold;
+    if (se->conn.max_background >= (1 << 16)) {
+        se->conn.max_background = (1 << 16) - 1;
+    }
+    if (se->conn.congestion_threshold > se->conn.max_background) {
+        se->conn.congestion_threshold = se->conn.max_background;
     }
-    if (se->conn.proto_minor >= 23) {
-        outarg.time_gran = se->conn.time_gran;
+    if (!se->conn.congestion_threshold) {
+        se->conn.congestion_threshold = se->conn.max_background * 3 / 4;
     }
 
+    outarg.max_background = se->conn.max_background;
+    outarg.congestion_threshold = se->conn.congestion_threshold;
+    outarg.time_gran = se->conn.time_gran;
+
     if (se->debug) {
         fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major,
                  outarg.minor);
@@ -1830,11 +1759,6 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
                  outarg.congestion_threshold);
         fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n", outarg.time_gran);
     }
-    if (arg->minor < 5) {
-        outargsize = FUSE_COMPAT_INIT_OUT_SIZE;
-    } else if (arg->minor < 23) {
-        outargsize = FUSE_COMPAT_22_INIT_OUT_SIZE;
-    }
 
     send_reply_ok(req, &outarg, outargsize);
 }
@@ -1898,10 +1822,6 @@ int fuse_lowlevel_notify_inval_inode(struct fuse_session *se, fuse_ino_t ino,
         return -EINVAL;
     }
 
-    if (se->conn.proto_major < 6 || se->conn.proto_minor < 12) {
-        return -ENOSYS;
-    }
-
     outarg.ino = ino;
     outarg.off = off;
     outarg.len = len;
@@ -1922,10 +1842,6 @@ int fuse_lowlevel_notify_inval_entry(struct fuse_session *se, fuse_ino_t parent,
         return -EINVAL;
     }
 
-    if (se->conn.proto_major < 6 || se->conn.proto_minor < 12) {
-        return -ENOSYS;
-    }
-
     outarg.parent = parent;
     outarg.namelen = namelen;
     outarg.padding = 0;
@@ -1949,10 +1865,6 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
         return -EINVAL;
     }
 
-    if (se->conn.proto_major < 6 || se->conn.proto_minor < 18) {
-        return -ENOSYS;
-    }
-
     outarg.parent = parent;
     outarg.child = child;
     outarg.namelen = namelen;
@@ -1980,10 +1892,6 @@ int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
         return -EINVAL;
     }
 
-    if (se->conn.proto_major < 6 || se->conn.proto_minor < 15) {
-        return -ENOSYS;
-    }
-
     out.unique = 0;
     out.error = FUSE_NOTIFY_STORE;
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 013/104] virtiofsd: Make fsync work even if only inode is passed in
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 012/104] virtiofsd: Trim out compatibility code Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:13   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 014/104] virtiofsd: Add options for virtio Dr. David Alan Gilbert (git)
                   ` (93 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Vivek Goyal <vgoyal@redhat.com>

If caller has not sent file handle in request, then using inode, retrieve
the fd opened using O_PATH and use that to open file again and issue
fsync. This will be needed when dax_flush() calls fsync. At that time
we only have inode information (and not file).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c  |  6 +++++-
 tools/virtiofsd/passthrough_ll.c | 28 ++++++++++++++++++++++++++--
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index f4bd303a7a..167701b453 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1077,7 +1077,11 @@ static void do_fsync(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     fi.fh = arg->fh;
 
     if (req->se->op.fsync) {
-        req->se->op.fsync(req, nodeid, datasync, &fi);
+        if (fi.fh == (uint64_t)-1) {
+            req->se->op.fsync(req, nodeid, datasync, NULL);
+        } else {
+            req->se->op.fsync(req, nodeid, datasync, &fi);
+        }
     } else {
         fuse_reply_err(req, ENOSYS);
     }
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0e543353a4..cd6a0f4409 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -903,10 +903,34 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
 {
     int res;
     (void)ino;
+    int fd;
+    char *buf;
+
+    fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino,
+             (void *)fi);
+
+    if (!fi) {
+        res = asprintf(&buf, "/proc/self/fd/%i", lo_fd(req, ino));
+        if (res == -1) {
+            return (void)fuse_reply_err(req, errno);
+        }
+
+        fd = open(buf, O_RDWR);
+        free(buf);
+        if (fd == -1) {
+            return (void)fuse_reply_err(req, errno);
+        }
+    } else {
+        fd = fi->fh;
+    }
+
     if (datasync) {
-        res = fdatasync(fi->fh);
+        res = fdatasync(fd);
     } else {
-        res = fsync(fi->fh);
+        res = fsync(fd);
+    }
+    if (!fi) {
+        close(fd);
     }
     fuse_reply_err(req, res == -1 ? errno : 0);
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 014/104] virtiofsd: Add options for virtio
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 013/104] virtiofsd: Make fsync work even if only inode is passed in Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:18   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 015/104] virtiofsd: add -o source=PATH to help output Dr. David Alan Gilbert (git)
                   ` (92 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add options to specify parameters for virtio-fs paths, i.e.

   ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_i.h        |  1 +
 tools/virtiofsd/fuse_lowlevel.c | 17 ++++++++++++-----
 tools/virtiofsd/helper.c        | 22 +++++++++++-----------
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index 0b5acc8765..f58be71e4b 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -63,6 +63,7 @@ struct fuse_session {
     struct fuse_notify_req notify_list;
     size_t bufsize;
     int error;
+    char *vu_socket_path;
 };
 
 struct fuse_chan {
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 167701b453..da708161e1 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2118,8 +2118,12 @@ reply_err:
     }
 
 static const struct fuse_opt fuse_ll_opts[] = {
-    LL_OPTION("debug", debug, 1), LL_OPTION("-d", debug, 1),
-    LL_OPTION("--debug", debug, 1), LL_OPTION("allow_root", deny_others, 1),
+    LL_OPTION("debug", debug, 1),
+    LL_OPTION("-d", debug, 1),
+    LL_OPTION("--debug", debug, 1),
+    LL_OPTION("allow_root", deny_others, 1),
+    LL_OPTION("--socket-path=%s", vu_socket_path, 0),
+    LL_OPTION("vhost_user_socket=%s", vu_socket_path, 0),
     FUSE_OPT_END
 };
 
@@ -2135,9 +2139,12 @@ void fuse_lowlevel_help(void)
      * These are not all options, but the ones that are
      * potentially of interest to an end-user
      */
-    printf("    -o allow_other         allow access by all users\n"
-           "    -o allow_root          allow access by root\n"
-           "    -o auto_unmount        auto unmount on process termination\n");
+    printf(
+        "    -o allow_other             allow access by all users\n"
+        "    -o allow_root              allow access by root\n"
+        "    --socket-path=PATH         path for the vhost-user socket\n"
+        "    -o vhost_user_socket=PATH  path for the vhost-user socket\n"
+        "    -o auto_unmount            auto unmount on process termination\n");
 }
 
 void fuse_session_destroy(struct fuse_session *se)
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 8afccfc15e..48e38a7963 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -128,17 +128,17 @@ static const struct fuse_opt conn_info_opt_spec[] = {
 
 void fuse_cmdline_help(void)
 {
-    printf(
-        "    -h   --help            print help\n"
-        "    -V   --version         print version\n"
-        "    -d   -o debug          enable debug output (implies -f)\n"
-        "    -f                     foreground operation\n"
-        "    -s                     disable multi-threaded operation\n"
-        "    -o clone_fd            use separate fuse device fd for each "
-        "thread\n"
-        "                           (may improve performance)\n"
-        "    -o max_idle_threads    the maximum number of idle worker threads\n"
-        "                           allowed (default: 10)\n");
+    printf("    -h   --help                print help\n"
+           "    -V   --version             print version\n"
+           "    -d   -o debug              enable debug output (implies -f)\n"
+           "    -f                         foreground operation\n"
+           "    -s                         disable multi-threaded operation\n"
+           "    -o clone_fd                use separate fuse device fd for "
+           "each thread\n"
+           "                               (may improve performance)\n"
+           "    -o max_idle_threads        the maximum number of idle worker "
+           "threads\n"
+           "                               allowed (default: 10)\n");
 }
 
 static int fuse_helper_opt_proc(void *data, const char *arg, int key,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 015/104] virtiofsd: add -o source=PATH to help output
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 014/104] virtiofsd: Add options for virtio Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:18   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting Dr. David Alan Gilbert (git)
                   ` (91 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

The -o source=PATH option will be used by most command-line invocations.
Let's document it!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index cd6a0f4409..be808e7bb9 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1319,6 +1319,7 @@ int main(int argc, char *argv[])
     if (opts.show_help) {
         printf("usage: %s [options]\n\n", argv[0]);
         fuse_cmdline_help();
+        printf("    -o source=PATH             shared directory tree\n");
         fuse_lowlevel_help();
         ret = 0;
         goto err_out1;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 015/104] virtiofsd: add -o source=PATH to help output Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:21   ` Daniel P. Berrangé
  2020-01-21  6:57   ` Misono Tomohiro
  2019-12-12 16:37 ` [PATCH 017/104] virtiofsd: Start wiring up vhost-user Dr. David Alan Gilbert (git)
                   ` (90 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When run with vhost-user options we conect to the QEMU instead
via a socket.  Start this off by creating the socket.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_i.h        |  7 ++-
 tools/virtiofsd/fuse_lowlevel.c | 55 +++-------------------
 tools/virtiofsd/fuse_virtio.c   | 83 +++++++++++++++++++++++++++++++++
 tools/virtiofsd/fuse_virtio.h   | 23 +++++++++
 4 files changed, 118 insertions(+), 50 deletions(-)
 create mode 100644 tools/virtiofsd/fuse_virtio.c
 create mode 100644 tools/virtiofsd/fuse_virtio.h

diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index f58be71e4b..df078f2360 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -6,9 +6,10 @@
  * See the file COPYING.LIB
  */
 
-#define FUSE_USE_VERSION 31
-
+#ifndef FUSE_I_H
+#define FUSE_I_H
 
+#define FUSE_USE_VERSION 31
 #include "fuse.h"
 #include "fuse_lowlevel.h"
 
@@ -120,3 +121,5 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 
 /* room needed in buffer to accommodate header */
 #define FUSE_BUFFER_HEADER_SIZE 0x1000
+
+#endif
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index da708161e1..f553102eba 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -13,6 +13,7 @@
 #include "standard-headers/linux/fuse.h"
 #include "fuse_misc.h"
 #include "fuse_opt.h"
+#include "fuse_virtio.h"
 
 #include <assert.h>
 #include <errno.h>
@@ -2223,6 +2224,11 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
         goto out4;
     }
 
+    if (!se->vu_socket_path) {
+        fprintf(stderr, "fuse: missing -o vhost_user_socket option\n");
+        goto out4;
+    }
+
     se->bufsize = FUSE_MAX_MAX_PAGES * getpagesize() + FUSE_BUFFER_HEADER_SIZE;
 
     list_init_req(&se->list);
@@ -2245,54 +2251,7 @@ out1:
 
 int fuse_session_mount(struct fuse_session *se)
 {
-    int fd;
-
-    /*
-     * Make sure file descriptors 0, 1 and 2 are open, otherwise chaos
-     * would ensue.
-     */
-    do {
-        fd = open("/dev/null", O_RDWR);
-        if (fd > 2) {
-            close(fd);
-        }
-    } while (fd >= 0 && fd <= 2);
-
-    /*
-     * To allow FUSE daemons to run without privileges, the caller may open
-     * /dev/fuse before launching the file system and pass on the file
-     * descriptor by specifying /dev/fd/N as the mount point. Note that the
-     * parent process takes care of performing the mount in this case.
-     */
-    fd = fuse_mnt_parse_fuse_fd(mountpoint);
-    if (fd != -1) {
-        if (fcntl(fd, F_GETFD) == -1) {
-            fuse_log(FUSE_LOG_ERR, "fuse: Invalid file descriptor /dev/fd/%u\n",
-                     fd);
-            return -1;
-        }
-        se->fd = fd;
-        return 0;
-    }
-
-    /* Open channel */
-    fd = fuse_kern_mount(mountpoint, se->mo);
-    if (fd == -1) {
-        return -1;
-    }
-    se->fd = fd;
-
-    /* Save mountpoint */
-    se->mountpoint = strdup(mountpoint);
-    if (se->mountpoint == NULL) {
-        goto error_out;
-    }
-
-    return 0;
-
-error_out:
-    fuse_kern_unmount(mountpoint, fd);
-    return -1;
+    return virtio_session_mount(se);
 }
 
 int fuse_session_fd(struct fuse_session *se)
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
new file mode 100644
index 0000000000..3a77bb8657
--- /dev/null
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -0,0 +1,83 @@
+/*
+ * virtio-fs glue for FUSE
+ * Copyright (C) 2018 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * Implements the glue between libfuse and libvhost-user
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ * See the file COPYING.LIB
+ */
+
+#include "fuse_i.h"
+#include "standard-headers/linux/fuse.h"
+#include "fuse_misc.h"
+#include "fuse_opt.h"
+#include "fuse_virtio.h"
+
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <sys/un.h>
+#include <unistd.h>
+
+/* From spec */
+struct virtio_fs_config {
+    char tag[36];
+    uint32_t num_queues;
+};
+
+int virtio_session_mount(struct fuse_session *se)
+{
+    struct sockaddr_un un;
+    mode_t old_umask;
+
+    if (strlen(se->vu_socket_path) >= sizeof(un.sun_path)) {
+        fuse_log(FUSE_LOG_ERR, "Socket path too long\n");
+        return -1;
+    }
+
+    /*
+     * Poison the fuse FD so we spot if we accidentally use it;
+     * DO NOT check for this value, check for fuse_lowlevel_is_virtio()
+     */
+    se->fd = 0xdaff0d11;
+
+    /*
+     * Create the Unix socket to communicate with qemu
+     * based on QEMU's vhost-user-bridge
+     */
+    unlink(se->vu_socket_path);
+    strcpy(un.sun_path, se->vu_socket_path);
+    size_t addr_len = sizeof(un);
+
+    int listen_sock = socket(AF_UNIX, SOCK_STREAM, 0);
+    if (listen_sock == -1) {
+        fuse_log(FUSE_LOG_ERR, "vhost socket creation: %m\n");
+        return -1;
+    }
+    un.sun_family = AF_UNIX;
+
+    /*
+     * Unfortunately bind doesn't let you set the mask on the socket,
+     * so set umask to 077 and restore it later.
+     */
+    old_umask = umask(0077);
+    if (bind(listen_sock, (struct sockaddr *)&un, addr_len) == -1) {
+        fuse_log(FUSE_LOG_ERR, "vhost socket bind: %m\n");
+        umask(old_umask);
+        return -1;
+    }
+    umask(old_umask);
+
+    if (listen(listen_sock, 1) == -1) {
+        fuse_log(FUSE_LOG_ERR, "vhost socket listen: %m\n");
+        return -1;
+    }
+
+    return -1;
+}
diff --git a/tools/virtiofsd/fuse_virtio.h b/tools/virtiofsd/fuse_virtio.h
new file mode 100644
index 0000000000..8f2edb69ca
--- /dev/null
+++ b/tools/virtiofsd/fuse_virtio.h
@@ -0,0 +1,23 @@
+/*
+ * virtio-fs glue for FUSE
+ * Copyright (C) 2018 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * Implements the glue between libfuse and libvhost-user
+ *
+ * This program can be distributed under the terms of the GNU LGPLv2.
+ *  See the file COPYING.LIB
+ */
+
+#ifndef FUSE_VIRTIO_H
+#define FUSE_VIRTIO_H
+
+#include "fuse_i.h"
+
+struct fuse_session;
+
+int virtio_session_mount(struct fuse_session *se);
+
+#endif
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 017/104] virtiofsd: Start wiring up vhost-user
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:25   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 018/104] virtiofsd: Add main virtio loop Dr. David Alan Gilbert (git)
                   ` (89 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Listen on our unix socket for the connection from QEMU, when we get it
initialise vhost-user and dive into our own loop variant (currently
dummy).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_i.h         |  4 ++
 tools/virtiofsd/fuse_loop_mt.c   |  1 +
 tools/virtiofsd/fuse_lowlevel.c  |  5 ++
 tools/virtiofsd/fuse_lowlevel.h  |  7 +++
 tools/virtiofsd/fuse_virtio.c    | 87 +++++++++++++++++++++++++++++++-
 tools/virtiofsd/fuse_virtio.h    |  2 +
 tools/virtiofsd/passthrough_ll.c |  7 +--
 7 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index df078f2360..76cc968a6e 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -13,6 +13,8 @@
 #include "fuse.h"
 #include "fuse_lowlevel.h"
 
+struct fv_VuDev;
+
 struct fuse_req {
     struct fuse_session *se;
     uint64_t unique;
@@ -65,6 +67,8 @@ struct fuse_session {
     size_t bufsize;
     int error;
     char *vu_socket_path;
+    int   vu_socketfd;
+    struct fv_VuDev *virtio_dev;
 };
 
 struct fuse_chan {
diff --git a/tools/virtiofsd/fuse_loop_mt.c b/tools/virtiofsd/fuse_loop_mt.c
index 00138b2ab3..5dfaff35fd 100644
--- a/tools/virtiofsd/fuse_loop_mt.c
+++ b/tools/virtiofsd/fuse_loop_mt.c
@@ -12,6 +12,7 @@
 #include "fuse_lowlevel.h"
 #include "fuse_misc.h"
 #include "standard-headers/linux/fuse.h"
+#include "fuse_virtio.h"
 
 #include <assert.h>
 #include <errno.h>
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index f553102eba..2bdd1d9d80 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2263,6 +2263,11 @@ void fuse_session_unmount(struct fuse_session *se)
 {
 }
 
+int fuse_lowlevel_is_virtio(struct fuse_session *se)
+{
+    return se->vu_socket_path != NULL;
+}
+
 #ifdef linux
 int fuse_req_getgroups(fuse_req_t req, int size, gid_t list[])
 {
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 79929e5541..081bb1abb3 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1786,6 +1786,13 @@ void fuse_req_interrupt_func(fuse_req_t req, fuse_interrupt_func_t func,
  */
 int fuse_req_interrupted(fuse_req_t req);
 
+/**
+ * Check if the session is connected via virtio
+ *
+ * @param se session object
+ * @return 1 if the session is a virtio session
+ */
+int fuse_lowlevel_is_virtio(struct fuse_session *se);
 
 /*
  * Inquiry functions
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 3a77bb8657..69ad522323 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -19,18 +19,78 @@
 
 #include <stdint.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <string.h>
 #include <sys/socket.h>
 #include <sys/types.h>
 #include <sys/un.h>
 #include <unistd.h>
 
+#include "contrib/libvhost-user/libvhost-user.h"
+
+/*
+ * We pass the dev element into libvhost-user
+ * and then use it to get back to the outer
+ * container for other data.
+ */
+struct fv_VuDev {
+    VuDev dev;
+    struct fuse_session *se;
+};
+
 /* From spec */
 struct virtio_fs_config {
     char tag[36];
     uint32_t num_queues;
 };
 
+/*
+ * Callback from libvhost-user if there's a new fd we're supposed to listen
+ * to, typically a queue kick?
+ */
+static void fv_set_watch(VuDev *dev, int fd, int condition, vu_watch_cb cb,
+                         void *data)
+{
+    fuse_log(FUSE_LOG_WARNING, "%s: TODO! fd=%d\n", __func__, fd);
+}
+
+/*
+ * Callback from libvhost-user if we're no longer supposed to listen on an fd
+ */
+static void fv_remove_watch(VuDev *dev, int fd)
+{
+    fuse_log(FUSE_LOG_WARNING, "%s: TODO! fd=%d\n", __func__, fd);
+}
+
+/* Callback from libvhost-user to panic */
+static void fv_panic(VuDev *dev, const char *err)
+{
+    fuse_log(FUSE_LOG_ERR, "%s: libvhost-user: %s\n", __func__, err);
+    /* TODO: Allow reconnects?? */
+    exit(EXIT_FAILURE);
+}
+
+static bool fv_queue_order(VuDev *dev, int qidx)
+{
+    return false;
+}
+
+static const VuDevIface fv_iface = {
+    /* TODO: Add other callbacks */
+    .queue_is_processed_in_order = fv_queue_order,
+};
+
+int virtio_loop(struct fuse_session *se)
+{
+    fuse_log(FUSE_LOG_INFO, "%s: Entry\n", __func__);
+
+    while (1) {
+        /* TODO: Add stuffing */
+    }
+
+    fuse_log(FUSE_LOG_INFO, "%s: Exit\n", __func__);
+}
+
 int virtio_session_mount(struct fuse_session *se)
 {
     struct sockaddr_un un;
@@ -79,5 +139,30 @@ int virtio_session_mount(struct fuse_session *se)
         return -1;
     }
 
-    return -1;
+    fuse_log(FUSE_LOG_INFO, "%s: Waiting for vhost-user socket connection...\n",
+             __func__);
+    int data_sock = accept(listen_sock, NULL, NULL);
+    if (data_sock == -1) {
+        fuse_log(FUSE_LOG_ERR, "vhost socket accept: %m\n");
+        close(listen_sock);
+        return -1;
+    }
+    close(listen_sock);
+    fuse_log(FUSE_LOG_INFO, "%s: Received vhost-user socket connection\n",
+             __func__);
+
+    /* TODO: Some cleanup/deallocation! */
+    se->virtio_dev = calloc(sizeof(struct fv_VuDev), 1);
+    if (!se->virtio_dev) {
+        fuse_log(FUSE_LOG_ERR, "%s: virtio_dev calloc failed\n", __func__);
+        close(data_sock);
+        return -1;
+    }
+
+    se->vu_socketfd = data_sock;
+    se->virtio_dev->se = se;
+    vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
+            fv_remove_watch, &fv_iface);
+
+    return 0;
 }
diff --git a/tools/virtiofsd/fuse_virtio.h b/tools/virtiofsd/fuse_virtio.h
index 8f2edb69ca..23026d6e4c 100644
--- a/tools/virtiofsd/fuse_virtio.h
+++ b/tools/virtiofsd/fuse_virtio.h
@@ -20,4 +20,6 @@ struct fuse_session;
 
 int virtio_session_mount(struct fuse_session *se);
 
+int virtio_loop(struct fuse_session *se);
+
 #endif
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index be808e7bb9..23531d791d 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -35,6 +35,7 @@
  * \include passthrough_ll.c
  */
 
+#include "fuse_virtio.h"
 #include "fuse_lowlevel.h"
 #include <assert.h>
 #include <dirent.h>
@@ -1395,11 +1396,7 @@ int main(int argc, char *argv[])
     fuse_daemonize(opts.foreground);
 
     /* Block until ctrl+c or fusermount -u */
-    if (opts.singlethread) {
-        ret = fuse_session_loop(se);
-    } else {
-        ret = fuse_session_loop_mt(se, opts.clone_fd);
-    }
+    ret = virtio_loop(se);
 
     fuse_session_unmount(se);
 err_out3:
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 018/104] virtiofsd: Add main virtio loop
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 017/104] virtiofsd: Start wiring up vhost-user Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:26   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 019/104] virtiofsd: get/set features callbacks Dr. David Alan Gilbert (git)
                   ` (88 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Processes incoming requests on the vhost-user fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 42 ++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 69ad522323..2b64d551e8 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -11,12 +11,14 @@
  * See the file COPYING.LIB
  */
 
+#include "fuse_virtio.h"
 #include "fuse_i.h"
 #include "standard-headers/linux/fuse.h"
 #include "fuse_misc.h"
 #include "fuse_opt.h"
-#include "fuse_virtio.h"
 
+#include <assert.h>
+#include <errno.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -80,15 +82,49 @@ static const VuDevIface fv_iface = {
     .queue_is_processed_in_order = fv_queue_order,
 };
 
+/*
+ * Main loop; this mostly deals with events on the vhost-user
+ * socket itself, and not actual fuse data.
+ */
 int virtio_loop(struct fuse_session *se)
 {
     fuse_log(FUSE_LOG_INFO, "%s: Entry\n", __func__);
 
-    while (1) {
-        /* TODO: Add stuffing */
+    while (!fuse_session_exited(se)) {
+        struct pollfd pf[1];
+        pf[0].fd = se->vu_socketfd;
+        pf[0].events = POLLIN;
+        pf[0].revents = 0;
+
+        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for VU event\n", __func__);
+        int poll_res = ppoll(pf, 1, NULL, NULL);
+
+        if (poll_res == -1) {
+            if (errno == EINTR) {
+                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
+                         __func__);
+                continue;
+            }
+            fuse_log(FUSE_LOG_ERR, "virtio_loop ppoll: %m\n");
+            break;
+        }
+        assert(poll_res == 1);
+        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
+            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x\n", __func__,
+                     pf[0].revents);
+            break;
+        }
+        assert(pf[0].revents & POLLIN);
+        fuse_log(FUSE_LOG_DEBUG, "%s: Got VU event\n", __func__);
+        if (!vu_dispatch(&se->virtio_dev->dev)) {
+            fuse_log(FUSE_LOG_ERR, "%s: vu_dispatch failed\n", __func__);
+            break;
+        }
     }
 
     fuse_log(FUSE_LOG_INFO, "%s: Exit\n", __func__);
+
+    return 0;
 }
 
 int virtio_session_mount(struct fuse_session *se)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 019/104] virtiofsd: get/set features callbacks
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 018/104] virtiofsd: Add main virtio loop Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:26   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 020/104] virtiofsd: Start queue threads Dr. David Alan Gilbert (git)
                   ` (87 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add the get/set features callbacks.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 2b64d551e8..1bbbf570ac 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -46,6 +46,17 @@ struct virtio_fs_config {
     uint32_t num_queues;
 };
 
+/* Callback from libvhost-user */
+static uint64_t fv_get_features(VuDev *dev)
+{
+    return 1ULL << VIRTIO_F_VERSION_1;
+}
+
+/* Callback from libvhost-user */
+static void fv_set_features(VuDev *dev, uint64_t features)
+{
+}
+
 /*
  * Callback from libvhost-user if there's a new fd we're supposed to listen
  * to, typically a queue kick?
@@ -78,7 +89,9 @@ static bool fv_queue_order(VuDev *dev, int qidx)
 }
 
 static const VuDevIface fv_iface = {
-    /* TODO: Add other callbacks */
+    .get_features = fv_get_features,
+    .set_features = fv_set_features,
+
     .queue_is_processed_in_order = fv_queue_order,
 };
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 020/104] virtiofsd: Start queue threads
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 019/104] virtiofsd: get/set features callbacks Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:27   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 021/104] virtiofsd: Poll kick_fd for queue Dr. David Alan Gilbert (git)
                   ` (86 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Start a thread for each queue when we get notified it's been started.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
fix by:
Signed-off-by: Jun Piao <piaojun@huawei.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 89 +++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 1bbbf570ac..94f9db76df 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -11,6 +11,7 @@
  * See the file COPYING.LIB
  */
 
+#include "qemu/osdep.h"
 #include "fuse_virtio.h"
 #include "fuse_i.h"
 #include "standard-headers/linux/fuse.h"
@@ -30,6 +31,15 @@
 
 #include "contrib/libvhost-user/libvhost-user.h"
 
+struct fv_QueueInfo {
+    pthread_t thread;
+    struct fv_VuDev *virtio_dev;
+
+    /* Our queue index, corresponds to array position */
+    int qidx;
+    int kick_fd;
+};
+
 /*
  * We pass the dev element into libvhost-user
  * and then use it to get back to the outer
@@ -38,6 +48,13 @@
 struct fv_VuDev {
     VuDev dev;
     struct fuse_session *se;
+
+    /*
+     * The following pair of fields are only accessed in the main
+     * virtio_loop
+     */
+    size_t nqueues;
+    struct fv_QueueInfo **qi;
 };
 
 /* From spec */
@@ -83,6 +100,75 @@ static void fv_panic(VuDev *dev, const char *err)
     exit(EXIT_FAILURE);
 }
 
+static void *fv_queue_thread(void *opaque)
+{
+    struct fv_QueueInfo *qi = opaque;
+    fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
+             qi->qidx, qi->kick_fd);
+    while (1) {
+        /* TODO */
+    }
+
+    return NULL;
+}
+
+/* Callback from libvhost-user on start or stop of a queue */
+static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
+{
+    struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
+    struct fv_QueueInfo *ourqi;
+
+    fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
+             started);
+    assert(qidx >= 0);
+
+    /*
+     * Ignore additional request queues for now.  passthrough_ll.c must be
+     * audited for thread-safety issues first.  It was written with a
+     * well-behaved client in mind and may not protect against all types of
+     * races yet.
+     */
+    if (qidx > 1) {
+        fuse_log(FUSE_LOG_ERR,
+                 "%s: multiple request queues not yet implemented, please only "
+                 "configure 1 request queue\n",
+                 __func__);
+        exit(EXIT_FAILURE);
+    }
+
+    if (started) {
+        /* Fire up a thread to watch this queue */
+        if (qidx >= vud->nqueues) {
+            vud->qi = realloc(vud->qi, (qidx + 1) * sizeof(vud->qi[0]));
+            assert(vud->qi);
+            memset(vud->qi + vud->nqueues, 0,
+                   sizeof(vud->qi[0]) * (1 + (qidx - vud->nqueues)));
+            vud->nqueues = qidx + 1;
+        }
+        if (!vud->qi[qidx]) {
+            vud->qi[qidx] = calloc(sizeof(struct fv_QueueInfo), 1);
+            assert(vud->qi[qidx]);
+            vud->qi[qidx]->virtio_dev = vud;
+            vud->qi[qidx]->qidx = qidx;
+        } else {
+            /* Shouldn't have been started */
+            assert(vud->qi[qidx]->kick_fd == -1);
+        }
+        ourqi = vud->qi[qidx];
+        ourqi->kick_fd = dev->vq[qidx].kick_fd;
+        if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
+            fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
+                     __func__, qidx);
+            assert(0);
+        }
+    } else {
+        /* TODO: Kill the thread */
+        assert(qidx < vud->nqueues);
+        ourqi = vud->qi[qidx];
+        ourqi->kick_fd = -1;
+    }
+}
+
 static bool fv_queue_order(VuDev *dev, int qidx)
 {
     return false;
@@ -92,6 +178,9 @@ static const VuDevIface fv_iface = {
     .get_features = fv_get_features,
     .set_features = fv_set_features,
 
+    /* Don't need process message, we've not got any at vhost-user level */
+    .queue_set_started = fv_queue_set_started,
+
     .queue_is_processed_in_order = fv_queue_order,
 };
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 021/104] virtiofsd: Poll kick_fd for queue
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 020/104] virtiofsd: Start queue threads Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:33   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 022/104] virtiofsd: Start reading commands from queue Dr. David Alan Gilbert (git)
                   ` (85 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In the queue thread poll the kick_fd we're passed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 40 ++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 94f9db76df..7118def1eb 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -24,6 +24,7 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+#include <sys/eventfd.h>
 #include <sys/socket.h>
 #include <sys/types.h>
 #include <sys/un.h>
@@ -100,13 +101,50 @@ static void fv_panic(VuDev *dev, const char *err)
     exit(EXIT_FAILURE);
 }
 
+/* Thread function for individual queues, created when a queue is 'started' */
 static void *fv_queue_thread(void *opaque)
 {
     struct fv_QueueInfo *qi = opaque;
     fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
              qi->qidx, qi->kick_fd);
     while (1) {
-        /* TODO */
+        struct pollfd pf[1];
+        pf[0].fd = qi->kick_fd;
+        pf[0].events = POLLIN;
+        pf[0].revents = 0;
+
+        fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
+                 qi->qidx);
+        int poll_res = ppoll(pf, 1, NULL, NULL);
+
+        if (poll_res == -1) {
+            if (errno == EINTR) {
+                fuse_log(FUSE_LOG_INFO, "%s: ppoll interrupted, going around\n",
+                         __func__);
+                continue;
+            }
+            fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
+            break;
+        }
+        assert(poll_res == 1);
+        if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
+            fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
+                     __func__, pf[0].revents, qi->qidx);
+            break;
+        }
+        assert(pf[0].revents & POLLIN);
+        fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
+                 qi->qidx);
+
+        eventfd_t evalue;
+        if (eventfd_read(qi->kick_fd, &evalue)) {
+            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
+            break;
+        }
+        if (qi->virtio_dev->se->debug) {
+            fprintf(stderr, "%s: Queue %d gave evalue: %zx\n", __func__,
+                    qi->qidx, (size_t)evalue);
+        }
     }
 
     return NULL;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 022/104] virtiofsd: Start reading commands from queue
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 021/104] virtiofsd: Poll kick_fd for queue Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:34   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 023/104] virtiofsd: Send replies to messages Dr. David Alan Gilbert (git)
                   ` (84 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Pop queue elements off queues, copy the data from them and
pass that to fuse.

  Note: 'out' in a VuVirtqElement is from QEMU
        'in' in libfuse is into the daemon

  So we read from the out iov's to get a fuse_in_header

When we get a kick we've got to read all the elements until the queue
is empty.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_i.h      |   2 +
 tools/virtiofsd/fuse_virtio.c | 100 +++++++++++++++++++++++++++++++++-
 2 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index 76cc968a6e..3d9a39cc1a 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -14,6 +14,7 @@
 #include "fuse_lowlevel.h"
 
 struct fv_VuDev;
+struct fv_QueueInfo;
 
 struct fuse_req {
     struct fuse_session *se;
@@ -75,6 +76,7 @@ struct fuse_chan {
     pthread_mutex_t lock;
     int ctr;
     int fd;
+    struct fv_QueueInfo *qi;
 };
 
 /**
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 7118def1eb..99cced6888 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/iov.h"
 #include "fuse_virtio.h"
 #include "fuse_i.h"
 #include "standard-headers/linux/fuse.h"
@@ -32,6 +33,7 @@
 
 #include "contrib/libvhost-user/libvhost-user.h"
 
+struct fv_VuDev;
 struct fv_QueueInfo {
     pthread_t thread;
     struct fv_VuDev *virtio_dev;
@@ -101,10 +103,42 @@ static void fv_panic(VuDev *dev, const char *err)
     exit(EXIT_FAILURE);
 }
 
+/*
+ * Copy from an iovec into a fuse_buf (memory only)
+ * Caller must ensure there is space
+ */
+static void copy_from_iov(struct fuse_buf *buf, size_t out_num,
+                          const struct iovec *out_sg)
+{
+    void *dest = buf->mem;
+
+    while (out_num) {
+        size_t onelen = out_sg->iov_len;
+        memcpy(dest, out_sg->iov_base, onelen);
+        dest += onelen;
+        out_sg++;
+        out_num--;
+    }
+}
+
 /* Thread function for individual queues, created when a queue is 'started' */
 static void *fv_queue_thread(void *opaque)
 {
     struct fv_QueueInfo *qi = opaque;
+    struct VuDev *dev = &qi->virtio_dev->dev;
+    struct VuVirtq *q = vu_get_queue(dev, qi->qidx);
+    struct fuse_session *se = qi->virtio_dev->se;
+    struct fuse_chan ch;
+    struct fuse_buf fbuf;
+
+    fbuf.mem = NULL;
+    fbuf.flags = 0;
+
+    fuse_mutex_init(&ch.lock);
+    ch.fd = (int)0xdaff0d111;
+    ch.ctr = 1;
+    ch.qi = qi;
+
     fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
              qi->qidx, qi->kick_fd);
     while (1) {
@@ -141,11 +175,71 @@ static void *fv_queue_thread(void *opaque)
             fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
             break;
         }
-        if (qi->virtio_dev->se->debug) {
-            fprintf(stderr, "%s: Queue %d gave evalue: %zx\n", __func__,
-                    qi->qidx, (size_t)evalue);
+        /* out is from guest, in is too guest */
+        unsigned int in_bytes, out_bytes;
+        vu_queue_get_avail_bytes(dev, q, &in_bytes, &out_bytes, ~0, ~0);
+
+        fuse_log(FUSE_LOG_DEBUG,
+                 "%s: Queue %d gave evalue: %zx available: in: %u out: %u\n",
+                 __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
+
+        while (1) {
+            /*
+             * An element contains one request and the space to send our
+             * response They're spread over multiple descriptors in a
+             * scatter/gather set and we can't trust the guest to keep them
+             * still; so copy in/out.
+             */
+            VuVirtqElement *elem = vu_queue_pop(dev, q, sizeof(VuVirtqElement));
+            if (!elem) {
+                break;
+            }
+
+            if (!fbuf.mem) {
+                fbuf.mem = malloc(se->bufsize);
+                assert(fbuf.mem);
+                assert(se->bufsize > sizeof(struct fuse_in_header));
+            }
+            /* The 'out' part of the elem is from qemu */
+            unsigned int out_num = elem->out_num;
+            struct iovec *out_sg = elem->out_sg;
+            size_t out_len = iov_size(out_sg, out_num);
+            fuse_log(FUSE_LOG_DEBUG,
+                     "%s: elem %d: with %d out desc of length %zd\n", __func__,
+                     elem->index, out_num, out_len);
+
+            /*
+             * The elem should contain a 'fuse_in_header' (in to fuse)
+             * plus the data based on the len in the header.
+             */
+            if (out_len < sizeof(struct fuse_in_header)) {
+                fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for in_header\n",
+                         __func__, elem->index);
+                assert(0); /* TODO */
+            }
+            if (out_len > se->bufsize) {
+                fuse_log(FUSE_LOG_ERR, "%s: elem %d too large for buffer\n",
+                         __func__, elem->index);
+                assert(0); /* TODO */
+            }
+            copy_from_iov(&fbuf, out_num, out_sg);
+            fbuf.size = out_len;
+
+            /* TODO! Endianness of header */
+
+            /* TODO: Fixup fuse_send_msg */
+            /* TODO: Add checks for fuse_session_exited */
+            fuse_session_process_buf_int(se, &fbuf, &ch);
+
+            /* TODO: vu_queue_push(dev, q, elem, qi->write_count); */
+            vu_queue_notify(dev, q);
+
+            free(elem);
+            elem = NULL;
         }
     }
+    pthread_mutex_destroy(&ch.lock);
+    free(fbuf.mem);
 
     return NULL;
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 023/104] virtiofsd: Send replies to messages
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 022/104] virtiofsd: Start reading commands from queue Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:36   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 024/104] virtiofsd: Keep track of replies Dr. David Alan Gilbert (git)
                   ` (83 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Route fuse out messages back through the same queue elements
that had the command that triggered the request.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c |   4 ++
 tools/virtiofsd/fuse_virtio.c   | 107 ++++++++++++++++++++++++++++++--
 tools/virtiofsd/fuse_virtio.h   |   4 ++
 3 files changed, 111 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 2bdd1d9d80..c2b114cf5b 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -171,6 +171,10 @@ static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
         }
     }
 
+    if (fuse_lowlevel_is_virtio(se)) {
+        return virtio_send_msg(se, ch, iov, count);
+    }
+
     abort(); /* virtio should have taken it before here */
     return 0;
 }
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 99cced6888..c38268a1d5 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -41,6 +41,9 @@ struct fv_QueueInfo {
     /* Our queue index, corresponds to array position */
     int qidx;
     int kick_fd;
+
+    /* The element for the command currently being processed */
+    VuVirtqElement *qe;
 };
 
 /*
@@ -121,6 +124,105 @@ static void copy_from_iov(struct fuse_buf *buf, size_t out_num,
     }
 }
 
+/*
+ * Copy from one iov to another, the given number of bytes
+ * The caller must have checked sizes.
+ */
+static void copy_iov(struct iovec *src_iov, int src_count,
+                     struct iovec *dst_iov, int dst_count, size_t to_copy)
+{
+    size_t dst_offset = 0;
+    /* Outer loop copies 'src' elements */
+    while (to_copy) {
+        assert(src_count);
+        size_t src_len = src_iov[0].iov_len;
+        size_t src_offset = 0;
+
+        if (src_len > to_copy) {
+            src_len = to_copy;
+        }
+        /* Inner loop copies contents of one 'src' to maybe multiple dst. */
+        while (src_len) {
+            assert(dst_count);
+            size_t dst_len = dst_iov[0].iov_len - dst_offset;
+            if (dst_len > src_len) {
+                dst_len = src_len;
+            }
+
+            memcpy(dst_iov[0].iov_base + dst_offset,
+                   src_iov[0].iov_base + src_offset, dst_len);
+            src_len -= dst_len;
+            to_copy -= dst_len;
+            src_offset += dst_len;
+            dst_offset += dst_len;
+
+            assert(dst_offset <= dst_iov[0].iov_len);
+            if (dst_offset == dst_iov[0].iov_len) {
+                dst_offset = 0;
+                dst_iov++;
+                dst_count--;
+            }
+        }
+        src_iov++;
+        src_count--;
+    }
+}
+
+/*
+ * Called back by ll whenever it wants to send a reply/message back
+ * The 1st element of the iov starts with the fuse_out_header
+ * 'unique'==0 means it's a notify message.
+ */
+int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
+                    struct iovec *iov, int count)
+{
+    VuVirtqElement *elem;
+    VuVirtq *q;
+
+    assert(count >= 1);
+    assert(iov[0].iov_len >= sizeof(struct fuse_out_header));
+
+    struct fuse_out_header *out = iov[0].iov_base;
+    /* TODO: Endianness! */
+
+    size_t tosend_len = iov_size(iov, count);
+
+    /* unique == 0 is notification, which we don't support */
+    assert(out->unique);
+    /* For virtio we always have ch */
+    assert(ch);
+    elem = ch->qi->qe;
+    q = &ch->qi->virtio_dev->dev.vq[ch->qi->qidx];
+
+    /* The 'in' part of the elem is to qemu */
+    unsigned int in_num = elem->in_num;
+    struct iovec *in_sg = elem->in_sg;
+    size_t in_len = iov_size(in_sg, in_num);
+    fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
+             __func__, elem->index, in_num, in_len);
+
+    /*
+     * The elem should have room for a 'fuse_out_header' (out from fuse)
+     * plus the data based on the len in the header.
+     */
+    if (in_len < sizeof(struct fuse_out_header)) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
+                 __func__, elem->index);
+        return -E2BIG;
+    }
+    if (in_len < tosend_len) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len %zd\n",
+                 __func__, elem->index, tosend_len);
+        return -E2BIG;
+    }
+
+    copy_iov(iov, count, in_sg, in_num, tosend_len);
+    vu_queue_push(&se->virtio_dev->dev, q, elem, tosend_len);
+    vu_queue_notify(&se->virtio_dev->dev, q);
+
+    return 0;
+}
+
 /* Thread function for individual queues, created when a queue is 'started' */
 static void *fv_queue_thread(void *opaque)
 {
@@ -227,13 +329,10 @@ static void *fv_queue_thread(void *opaque)
 
             /* TODO! Endianness of header */
 
-            /* TODO: Fixup fuse_send_msg */
             /* TODO: Add checks for fuse_session_exited */
             fuse_session_process_buf_int(se, &fbuf, &ch);
 
-            /* TODO: vu_queue_push(dev, q, elem, qi->write_count); */
-            vu_queue_notify(dev, q);
-
+            qi->qe = NULL;
             free(elem);
             elem = NULL;
         }
diff --git a/tools/virtiofsd/fuse_virtio.h b/tools/virtiofsd/fuse_virtio.h
index 23026d6e4c..135a14875a 100644
--- a/tools/virtiofsd/fuse_virtio.h
+++ b/tools/virtiofsd/fuse_virtio.h
@@ -22,4 +22,8 @@ int virtio_session_mount(struct fuse_session *se);
 
 int virtio_loop(struct fuse_session *se);
 
+
+int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
+                    struct iovec *iov, int count);
+
 #endif
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 024/104] virtiofsd: Keep track of replies
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 023/104] virtiofsd: Send replies to messages Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-03 15:41   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 025/104] virtiofsd: Add Makefile wiring for virtiofsd contrib Dr. David Alan Gilbert (git)
                   ` (82 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Keep track of whether we sent a reply to a request; this is a bit
paranoid but it means:
  a) We should always recycle an element even if there was an error
     in the request
  b) Never try and send two replies on one queue element

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index c38268a1d5..c33e0f7e8c 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -44,6 +44,7 @@ struct fv_QueueInfo {
 
     /* The element for the command currently being processed */
     VuVirtqElement *qe;
+    bool reply_sent;
 };
 
 /*
@@ -178,6 +179,7 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 {
     VuVirtqElement *elem;
     VuVirtq *q;
+    int ret = 0;
 
     assert(count >= 1);
     assert(iov[0].iov_len >= sizeof(struct fuse_out_header));
@@ -191,6 +193,7 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
     assert(out->unique);
     /* For virtio we always have ch */
     assert(ch);
+    assert(!ch->qi->reply_sent);
     elem = ch->qi->qe;
     q = &ch->qi->virtio_dev->dev.vq[ch->qi->qidx];
 
@@ -208,19 +211,23 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
     if (in_len < sizeof(struct fuse_out_header)) {
         fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
                  __func__, elem->index);
-        return -E2BIG;
+        ret = -E2BIG;
+        goto err;
     }
     if (in_len < tosend_len) {
         fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len %zd\n",
                  __func__, elem->index, tosend_len);
-        return -E2BIG;
+        ret = -E2BIG;
+        goto err;
     }
 
     copy_iov(iov, count, in_sg, in_num, tosend_len);
     vu_queue_push(&se->virtio_dev->dev, q, elem, tosend_len);
     vu_queue_notify(&se->virtio_dev->dev, q);
+    ch->qi->reply_sent = true;
 
-    return 0;
+err:
+    return ret;
 }
 
 /* Thread function for individual queues, created when a queue is 'started' */
@@ -297,6 +304,9 @@ static void *fv_queue_thread(void *opaque)
                 break;
             }
 
+            qi->qe = elem;
+            qi->reply_sent = false;
+
             if (!fbuf.mem) {
                 fbuf.mem = malloc(se->bufsize);
                 assert(fbuf.mem);
@@ -332,6 +342,13 @@ static void *fv_queue_thread(void *opaque)
             /* TODO: Add checks for fuse_session_exited */
             fuse_session_process_buf_int(se, &fbuf, &ch);
 
+            if (!qi->reply_sent) {
+                fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n",
+                         __func__, elem->index);
+                /* I think we've still got to recycle the element */
+                vu_queue_push(dev, q, elem, 0);
+                vu_queue_notify(dev, q);
+            }
             qi->qe = NULL;
             free(elem);
             elem = NULL;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 025/104] virtiofsd: Add Makefile wiring for virtiofsd contrib
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 024/104] virtiofsd: Keep track of replies Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2019-12-13 16:02   ` Liam Merwick
  2020-01-03 15:41   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 026/104] virtiofsd: Fast path for virtio read Dr. David Alan Gilbert (git)
                   ` (81 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up the building of the virtiofsd in contrib.

virtiofsd relies on Linux-specific system calls and seccomp.  Anyone
wishing to port it to other host operating systems should do so
carefully and without reducing security.

Only allow building on Linux hosts.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 Makefile                      |  8 ++++++++
 Makefile.objs                 |  1 +
 tools/virtiofsd/Makefile.objs | 10 ++++++++++
 3 files changed, 19 insertions(+)
 create mode 100644 tools/virtiofsd/Makefile.objs

diff --git a/Makefile b/Makefile
index b437a346d7..b7f7019a50 100644
--- a/Makefile
+++ b/Makefile
@@ -322,6 +322,8 @@ HELPERS-y =
 HELPERS-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_LINUX)) = qemu-bridge-helper$(EXESUF)
 
 ifdef CONFIG_LINUX
+HELPERS-y += virtiofsd$(EXESUF)
+
 ifdef CONFIG_VIRGL
 ifdef CONFIG_GBM
 HELPERS-y += vhost-user-gpu$(EXESUF)
@@ -430,6 +432,7 @@ dummy := $(call unnest-vars,, \
                 elf2dmp-obj-y \
                 ivshmem-client-obj-y \
                 ivshmem-server-obj-y \
+                virtiofsd-obj-y \
                 rdmacm-mux-obj-y \
                 libvhost-user-obj-y \
                 vhost-user-scsi-obj-y \
@@ -674,6 +677,11 @@ rdmacm-mux$(EXESUF): LIBS += "-libumad"
 rdmacm-mux$(EXESUF): $(rdmacm-mux-obj-y) $(COMMON_LDADDS)
 	$(call LINK, $^)
 
+ifdef CONFIG_LINUX # relies on Linux-specific syscalls
+virtiofsd$(EXESUF): $(virtiofsd-obj-y) libvhost-user.a $(COMMON_LDADDS)
+	$(call LINK, $^)
+endif
+
 vhost-user-gpu$(EXESUF): $(vhost-user-gpu-obj-y) $(libvhost-user-obj-y) libqemuutil.a libqemustub.a
 	$(call LINK, $^)
 
diff --git a/Makefile.objs b/Makefile.objs
index 11ba1a36bd..b5f667a4ba 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -125,6 +125,7 @@ vhost-user-blk-obj-y = contrib/vhost-user-blk/
 rdmacm-mux-obj-y = contrib/rdmacm-mux/
 vhost-user-input-obj-y = contrib/vhost-user-input/
 vhost-user-gpu-obj-y = contrib/vhost-user-gpu/
+virtiofsd-obj-y = tools/virtiofsd/
 
 ######################################################################
 trace-events-subdirs =
diff --git a/tools/virtiofsd/Makefile.objs b/tools/virtiofsd/Makefile.objs
new file mode 100644
index 0000000000..67be16332c
--- /dev/null
+++ b/tools/virtiofsd/Makefile.objs
@@ -0,0 +1,10 @@
+virtiofsd-obj-y = buffer.o \
+                  fuse_opt.o \
+                  fuse_log.o \
+                  fuse_loop_mt.o \
+                  fuse_lowlevel.o \
+                  fuse_signals.o \
+                  fuse_virtio.o \
+                  helper.o \
+                  passthrough_ll.o
+
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 026/104] virtiofsd: Fast path for virtio read
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (24 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 025/104] virtiofsd: Add Makefile wiring for virtiofsd contrib Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-17 18:54   ` Masayoshi Mizuma
  2019-12-12 16:37 ` [PATCH 027/104] virtiofsd: add --fd=FDNUM fd passing option Dr. David Alan Gilbert (git)
                   ` (80 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Readv the data straight into the guests buffer.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
With fix by:
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
---
 tools/virtiofsd/fuse_lowlevel.c |   5 +
 tools/virtiofsd/fuse_virtio.c   | 159 ++++++++++++++++++++++++++++++++
 tools/virtiofsd/fuse_virtio.h   |   4 +
 3 files changed, 168 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index c2b114cf5b..5f80625652 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -475,6 +475,11 @@ static int fuse_send_data_iov_fallback(struct fuse_session *se,
         return fuse_send_msg(se, ch, iov, iov_count);
     }
 
+    if (fuse_lowlevel_is_virtio(se) && buf->count == 1 &&
+        buf->buf[0].flags == (FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK)) {
+        return virtio_send_data_iov(se, ch, iov, iov_count, buf, len);
+    }
+
     abort(); /* Will have taken vhost path */
     return 0;
 }
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index c33e0f7e8c..146cd3f702 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -230,6 +230,165 @@ err:
     return ret;
 }
 
+/*
+ * Callback from fuse_send_data_iov_* when it's virtio and the buffer
+ * is a single FD with FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK
+ * We need send the iov and then the buffer.
+ * Return 0 on success
+ */
+int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
+                         struct iovec *iov, int count, struct fuse_bufvec *buf,
+                         size_t len)
+{
+    int ret = 0;
+    VuVirtqElement *elem;
+    VuVirtq *q;
+
+    assert(count >= 1);
+    assert(iov[0].iov_len >= sizeof(struct fuse_out_header));
+
+    struct fuse_out_header *out = iov[0].iov_base;
+    /* TODO: Endianness! */
+
+    size_t iov_len = iov_size(iov, count);
+    size_t tosend_len = iov_len + len;
+
+    out->len = tosend_len;
+
+    fuse_log(FUSE_LOG_DEBUG, "%s: count=%d len=%zd iov_len=%zd\n", __func__,
+             count, len, iov_len);
+
+    /* unique == 0 is notification which we don't support */
+    assert(out->unique);
+
+    /* For virtio we always have ch */
+    assert(ch);
+    assert(!ch->qi->reply_sent);
+    elem = ch->qi->qe;
+    q = &ch->qi->virtio_dev->dev.vq[ch->qi->qidx];
+
+    /* The 'in' part of the elem is to qemu */
+    unsigned int in_num = elem->in_num;
+    struct iovec *in_sg = elem->in_sg;
+    size_t in_len = iov_size(in_sg, in_num);
+    fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
+             __func__, elem->index, in_num, in_len);
+
+    /*
+     * The elem should have room for a 'fuse_out_header' (out from fuse)
+     * plus the data based on the len in the header.
+     */
+    if (in_len < sizeof(struct fuse_out_header)) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
+                 __func__, elem->index);
+        ret = -E2BIG;
+        goto err;
+    }
+    if (in_len < tosend_len) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len %zd\n",
+                 __func__, elem->index, tosend_len);
+        ret = -E2BIG;
+        goto err;
+    }
+
+    /* TODO: Limit to 'len' */
+
+    /* First copy the header data from iov->in_sg */
+    copy_iov(iov, count, in_sg, in_num, iov_len);
+
+    /*
+     * Build a copy of the the in_sg iov so we can skip bits in it,
+     * including changing the offsets
+     */
+    struct iovec *in_sg_cpy = calloc(sizeof(struct iovec), in_num);
+    memcpy(in_sg_cpy, in_sg, sizeof(struct iovec) * in_num);
+    /* These get updated as we skip */
+    struct iovec *in_sg_ptr = in_sg_cpy;
+    int in_sg_cpy_count = in_num;
+
+    /* skip over parts of in_sg that contained the header iov */
+    size_t skip_size = iov_len;
+
+    size_t in_sg_left = 0;
+    do {
+        while (skip_size != 0 && in_sg_cpy_count) {
+            if (skip_size >= in_sg_ptr[0].iov_len) {
+                skip_size -= in_sg_ptr[0].iov_len;
+                in_sg_ptr++;
+                in_sg_cpy_count--;
+            } else {
+                in_sg_ptr[0].iov_len -= skip_size;
+                in_sg_ptr[0].iov_base += skip_size;
+                break;
+            }
+        }
+
+        int i;
+        for (i = 0, in_sg_left = 0; i < in_sg_cpy_count; i++) {
+            in_sg_left += in_sg_ptr[i].iov_len;
+        }
+        fuse_log(FUSE_LOG_DEBUG,
+                 "%s: after skip skip_size=%zd in_sg_cpy_count=%d "
+                 "in_sg_left=%zd\n",
+                 __func__, skip_size, in_sg_cpy_count, in_sg_left);
+        ret = preadv(buf->buf[0].fd, in_sg_ptr, in_sg_cpy_count,
+                     buf->buf[0].pos);
+
+        fuse_log(FUSE_LOG_DEBUG, "%s: preadv_res=%d(%m) len=%zd\n",
+                 __func__, ret, len);
+        if (ret == -1) {
+            ret = errno;
+            free(in_sg_cpy);
+            goto err;
+        }
+        if (ret < len && ret) {
+            fuse_log(FUSE_LOG_DEBUG, "%s: ret < len\n", __func__);
+            /* Skip over this much next time around */
+            skip_size = ret;
+            buf->buf[0].pos += ret;
+            len -= ret;
+
+            /* Lets do another read */
+            continue;
+        }
+        if (!ret) {
+            /* EOF case? */
+            fuse_log(FUSE_LOG_DEBUG, "%s: !ret in_sg_left=%zd\n", __func__,
+                     in_sg_left);
+            break;
+        }
+        if (ret != len) {
+            fuse_log(FUSE_LOG_DEBUG, "%s: ret!=len\n", __func__);
+            ret = EIO;
+            free(in_sg_cpy);
+            goto err;
+        }
+        in_sg_left -= ret;
+        len -= ret;
+    } while (in_sg_left);
+    free(in_sg_cpy);
+
+    /* Need to fix out->len on EOF */
+    if (len) {
+        struct fuse_out_header *out_sg = in_sg[0].iov_base;
+
+        tosend_len -= len;
+        out_sg->len = tosend_len;
+    }
+
+    ret = 0;
+
+    vu_queue_push(&se->virtio_dev->dev, q, elem, tosend_len);
+    vu_queue_notify(&se->virtio_dev->dev, q);
+
+err:
+    if (ret == 0) {
+        ch->qi->reply_sent = true;
+    }
+
+    return ret;
+}
+
 /* Thread function for individual queues, created when a queue is 'started' */
 static void *fv_queue_thread(void *opaque)
 {
diff --git a/tools/virtiofsd/fuse_virtio.h b/tools/virtiofsd/fuse_virtio.h
index 135a14875a..cc676b9193 100644
--- a/tools/virtiofsd/fuse_virtio.h
+++ b/tools/virtiofsd/fuse_virtio.h
@@ -26,4 +26,8 @@ int virtio_loop(struct fuse_session *se);
 int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
                     struct iovec *iov, int count);
 
+int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
+                         struct iovec *iov, int count,
+                         struct fuse_bufvec *buf, size_t len);
+
 #endif
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 027/104] virtiofsd: add --fd=FDNUM fd passing option
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (25 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 026/104] virtiofsd: Fast path for virtio read Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:12   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 028/104] virtiofsd: make -f (foreground) the default Dr. David Alan Gilbert (git)
                   ` (79 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Although --socket-path=PATH is useful for manual invocations, management
tools typically create the UNIX domain socket themselves and pass it to
the vhost-user device backend.  This way QEMU can be launched
immediately with a valid socket.  No waiting for the vhost-user device
backend is required when fd passing is used.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_i.h        |  1 +
 tools/virtiofsd/fuse_lowlevel.c | 14 +++++++++---
 tools/virtiofsd/fuse_virtio.c   | 39 ++++++++++++++++++++++++---------
 3 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index 3d9a39cc1a..cb1ca70ffa 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -68,6 +68,7 @@ struct fuse_session {
     size_t bufsize;
     int error;
     char *vu_socket_path;
+    int   vu_listen_fd;
     int   vu_socketfd;
     struct fv_VuDev *virtio_dev;
 };
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 5f80625652..bea092b454 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2134,6 +2134,7 @@ static const struct fuse_opt fuse_ll_opts[] = {
     LL_OPTION("allow_root", deny_others, 1),
     LL_OPTION("--socket-path=%s", vu_socket_path, 0),
     LL_OPTION("vhost_user_socket=%s", vu_socket_path, 0),
+    LL_OPTION("--fd=%d", vu_listen_fd, 0),
     FUSE_OPT_END
 };
 
@@ -2154,6 +2155,7 @@ void fuse_lowlevel_help(void)
         "    -o allow_root              allow access by root\n"
         "    --socket-path=PATH         path for the vhost-user socket\n"
         "    -o vhost_user_socket=PATH  path for the vhost-user socket\n"
+        "    --fd=FDNUM                 fd number of vhost-user socket\n"
         "    -o auto_unmount            auto unmount on process termination\n");
 }
 
@@ -2198,6 +2200,7 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
         goto out1;
     }
     se->fd = -1;
+    se->vu_listen_fd = -1;
     se->conn.max_write = UINT_MAX;
     se->conn.max_readahead = UINT_MAX;
 
@@ -2233,8 +2236,13 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
         goto out4;
     }
 
-    if (!se->vu_socket_path) {
-        fprintf(stderr, "fuse: missing -o vhost_user_socket option\n");
+    if (!se->vu_socket_path && se->vu_listen_fd < 0) {
+        fuse_log(FUSE_LOG_ERR, "fuse: missing --socket-path or --fd option\n");
+        goto out4;
+    }
+    if (se->vu_socket_path && se->vu_listen_fd >= 0) {
+        fuse_log(FUSE_LOG_ERR,
+                 "fuse: --socket-path and --fd cannot be given together\n");
         goto out4;
     }
 
@@ -2274,7 +2282,7 @@ void fuse_session_unmount(struct fuse_session *se)
 
 int fuse_lowlevel_is_virtio(struct fuse_session *se)
 {
-    return se->vu_socket_path != NULL;
+    return !!se->virtio_dev;
 }
 
 #ifdef linux
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 146cd3f702..fa6e53e7d0 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -636,22 +636,21 @@ int virtio_loop(struct fuse_session *se)
     return 0;
 }
 
-int virtio_session_mount(struct fuse_session *se)
+static int fv_create_listen_socket(struct fuse_session *se)
 {
     struct sockaddr_un un;
     mode_t old_umask;
 
+    /* Nothing to do if fd is already initialized */
+    if (se->vu_listen_fd >= 0) {
+        return 0;
+    }
+
     if (strlen(se->vu_socket_path) >= sizeof(un.sun_path)) {
         fuse_log(FUSE_LOG_ERR, "Socket path too long\n");
         return -1;
     }
 
-    /*
-     * Poison the fuse FD so we spot if we accidentally use it;
-     * DO NOT check for this value, check for fuse_lowlevel_is_virtio()
-     */
-    se->fd = 0xdaff0d11;
-
     /*
      * Create the Unix socket to communicate with qemu
      * based on QEMU's vhost-user-bridge
@@ -684,15 +683,35 @@ int virtio_session_mount(struct fuse_session *se)
         return -1;
     }
 
+    se->vu_listen_fd = listen_sock;
+    return 0;
+}
+
+int virtio_session_mount(struct fuse_session *se)
+{
+    int ret;
+
+    ret = fv_create_listen_socket(se);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /*
+     * Poison the fuse FD so we spot if we accidentally use it;
+     * DO NOT check for this value, check fuse_lowlevel_is_virtio()
+     */
+    se->fd = 0xdaff0d11;
+
     fuse_log(FUSE_LOG_INFO, "%s: Waiting for vhost-user socket connection...\n",
              __func__);
-    int data_sock = accept(listen_sock, NULL, NULL);
+    int data_sock = accept(se->vu_listen_fd, NULL, NULL);
     if (data_sock == -1) {
         fuse_log(FUSE_LOG_ERR, "vhost socket accept: %m\n");
-        close(listen_sock);
+        close(se->vu_listen_fd);
         return -1;
     }
-    close(listen_sock);
+    close(se->vu_listen_fd);
+    se->vu_listen_fd = -1;
     fuse_log(FUSE_LOG_INFO, "%s: Received vhost-user socket connection\n",
              __func__);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 028/104] virtiofsd: make -f (foreground) the default
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (26 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 027/104] virtiofsd: add --fd=FDNUM fd passing option Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:19   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 029/104] virtiofsd: add vhost-user.json file Dr. David Alan Gilbert (git)
                   ` (78 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

According to vhost-user.rst "Backend program conventions", backend
programs should run in the foregound by default.  Follow the
conventions so libvirt and other management tools can control virtiofsd
in a standard way.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/helper.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 48e38a7963..d4fff4fa53 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -28,6 +28,11 @@
     {                                               \
         t, offsetof(struct fuse_cmdline_opts, p), 1 \
     }
+#define FUSE_HELPER_OPT_VALUE(t, p, v)              \
+    {                                               \
+        t, offsetof(struct fuse_cmdline_opts, p), v \
+    }
+
 
 static const struct fuse_opt fuse_helper_opts[] = {
     FUSE_HELPER_OPT("-h", show_help),
@@ -41,6 +46,7 @@ static const struct fuse_opt fuse_helper_opts[] = {
     FUSE_OPT_KEY("-d", FUSE_OPT_KEY_KEEP),
     FUSE_OPT_KEY("debug", FUSE_OPT_KEY_KEEP),
     FUSE_HELPER_OPT("-f", foreground),
+    FUSE_HELPER_OPT_VALUE("--daemonize", foreground, 0),
     FUSE_HELPER_OPT("-s", singlethread),
     FUSE_HELPER_OPT("fsname=", nodefault_subtype),
     FUSE_OPT_KEY("fsname=", FUSE_OPT_KEY_KEEP),
@@ -132,6 +138,7 @@ void fuse_cmdline_help(void)
            "    -V   --version             print version\n"
            "    -d   -o debug              enable debug output (implies -f)\n"
            "    -f                         foreground operation\n"
+           "    --daemonize                run in background\n"
            "    -s                         disable multi-threaded operation\n"
            "    -o clone_fd                use separate fuse device fd for "
            "each thread\n"
@@ -163,6 +170,7 @@ int fuse_parse_cmdline(struct fuse_args *args, struct fuse_cmdline_opts *opts)
     memset(opts, 0, sizeof(struct fuse_cmdline_opts));
 
     opts->max_idle_threads = 10;
+    opts->foreground = 1;
 
     if (fuse_opt_parse(args, opts, fuse_helper_opts, fuse_helper_opt_proc) ==
         -1) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 029/104] virtiofsd: add vhost-user.json file
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (27 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 028/104] virtiofsd: make -f (foreground) the default Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:19   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 030/104] virtiofsd: add --print-capabilities option Dr. David Alan Gilbert (git)
                   ` (77 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Install a vhost-user.json file describing virtiofsd.  This allows
libvirt and other management tools to enumerate vhost-user backend
programs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 .gitignore                                | 1 +
 Makefile                                  | 1 +
 tools/virtiofsd/50-qemu-virtiofsd.json.in | 5 +++++
 3 files changed, 7 insertions(+)
 create mode 100644 tools/virtiofsd/50-qemu-virtiofsd.json.in

diff --git a/.gitignore b/.gitignore
index 7de868d1ea..c56ec1d122 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,7 @@
 /config-target.*
 /config.status
 /config-temp
+/tools/virtiofsd/50-qemu-virtiofsd.json
 /elf2dmp
 /trace-events-all
 /trace/generated-events.h
diff --git a/Makefile b/Makefile
index b7f7019a50..8a5746d8a0 100644
--- a/Makefile
+++ b/Makefile
@@ -323,6 +323,7 @@ HELPERS-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_LINUX)) = qemu-bridge-helper$(EXE
 
 ifdef CONFIG_LINUX
 HELPERS-y += virtiofsd$(EXESUF)
+vhost-user-json-y += tools/virtiofsd/50-qemu-virtiofsd.json
 
 ifdef CONFIG_VIRGL
 ifdef CONFIG_GBM
diff --git a/tools/virtiofsd/50-qemu-virtiofsd.json.in b/tools/virtiofsd/50-qemu-virtiofsd.json.in
new file mode 100644
index 0000000000..9bcd86f8dc
--- /dev/null
+++ b/tools/virtiofsd/50-qemu-virtiofsd.json.in
@@ -0,0 +1,5 @@
+{
+  "description": "QEMU virtiofsd vhost-user-fs",
+  "type": "fs",
+  "binary": "@libexecdir@/virtiofsd"
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 030/104] virtiofsd: add --print-capabilities option
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (28 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 029/104] virtiofsd: add vhost-user.json file Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:20   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 031/104] virtiofs: Add maintainers entry Dr. David Alan Gilbert (git)
                   ` (76 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Add the --print-capabilities option as per vhost-user.rst "Backend
programs conventions".  Currently there are no advertised features.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 docs/interop/vhost-user.json     |  4 +++-
 tools/virtiofsd/fuse_lowlevel.h  |  1 +
 tools/virtiofsd/helper.c         |  2 ++
 tools/virtiofsd/passthrough_ll.c | 12 ++++++++++++
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/docs/interop/vhost-user.json b/docs/interop/vhost-user.json
index da6aaf51c8..d4ea1f7ac5 100644
--- a/docs/interop/vhost-user.json
+++ b/docs/interop/vhost-user.json
@@ -31,6 +31,7 @@
 # @rproc-serial: virtio remoteproc serial link
 # @scsi: virtio scsi
 # @vsock: virtio vsock transport
+# @fs: virtio fs (since 4.2)
 #
 # Since: 4.0
 ##
@@ -50,7 +51,8 @@
       'rpmsg',
       'rproc-serial',
       'scsi',
-      'vsock'
+      'vsock',
+      'fs'
   ]
 }
 
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 081bb1abb3..6c63cb740c 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1826,6 +1826,7 @@ struct fuse_cmdline_opts {
     int nodefault_subtype;
     int show_version;
     int show_help;
+    int print_capabilities;
     int clone_fd;
     unsigned int max_idle_threads;
 };
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index d4fff4fa53..4c9a3b2fc9 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -39,6 +39,7 @@ static const struct fuse_opt fuse_helper_opts[] = {
     FUSE_HELPER_OPT("--help", show_help),
     FUSE_HELPER_OPT("-V", show_version),
     FUSE_HELPER_OPT("--version", show_version),
+    FUSE_HELPER_OPT("--print-capabilities", print_capabilities),
     FUSE_HELPER_OPT("-d", debug),
     FUSE_HELPER_OPT("debug", debug),
     FUSE_HELPER_OPT("-d", foreground),
@@ -136,6 +137,7 @@ void fuse_cmdline_help(void)
 {
     printf("    -h   --help                print help\n"
            "    -V   --version             print version\n"
+           "    --print-capabilities       print vhost-user.json\n"
            "    -d   -o debug              enable debug output (implies -f)\n"
            "    -f                         foreground operation\n"
            "    --daemonize                run in background\n"
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 23531d791d..68bacb6fc5 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1298,6 +1298,14 @@ static struct fuse_lowlevel_ops lo_oper = {
     .lseek = lo_lseek,
 };
 
+/* Print vhost-user.json backend program capabilities */
+static void print_capabilities(void)
+{
+    printf("{\n");
+    printf("  \"type\": \"fs\"\n");
+    printf("}\n");
+}
+
 int main(int argc, char *argv[])
 {
     struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -1328,6 +1336,10 @@ int main(int argc, char *argv[])
         fuse_lowlevel_version();
         ret = 0;
         goto err_out1;
+    } else if (opts.print_capabilities) {
+        print_capabilities();
+        ret = 0;
+        goto err_out1;
     }
 
     if (fuse_opt_parse(&args, &lo, lo_opts, NULL) == -1) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 031/104] virtiofs: Add maintainers entry
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (29 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 030/104] virtiofsd: add --print-capabilities option Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:21   ` Daniel P. Berrangé
  2020-01-15 17:19   ` Philippe Mathieu-Daudé
  2019-12-12 16:37 ` [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context Dr. David Alan Gilbert (git)
                   ` (75 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 MAINTAINERS | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5e5e3e52d6..d1b3e262d2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1575,6 +1575,14 @@ T: git https://github.com/cohuck/qemu.git s390-next
 T: git https://github.com/borntraeger/qemu.git s390-next
 L: qemu-s390x@nongnu.org
 
+virtiofs
+M: Dr. David Alan Gilbert <dgilbert@redhat.com>
+M: Stefan Hajnoczi <stefanha@redhat.com>
+S: Supported
+F: tools/virtiofsd/*
+F: hw/virtio/vhost-user-fs*
+F: include/hw/virtio/vhost-user-fs.h
+
 virtio-input
 M: Gerd Hoffmann <kraxel@redhat.com>
 S: Maintained
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (30 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 031/104] virtiofs: Add maintainers entry Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:30   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 033/104] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection Dr. David Alan Gilbert (git)
                   ` (74 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Vivek Goyal <vgoyal@redhat.com>

We need to create files in the caller's context. Otherwise after
creating a file, the caller might not be able to do file operations on
that file.

Changed effective uid/gid to caller's uid/gid, create file and then
switch back to uid/gid 0.

Use syscall(setresuid, ...) otherwise glibc does some magic to change EUID
in all threads, which is not what we want.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 79 ++++++++++++++++++++++++++++++--
 1 file changed, 74 insertions(+), 5 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 68bacb6fc5..0188cd9ad6 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -49,6 +49,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <sys/file.h>
+#include <sys/syscall.h>
 #include <sys/xattr.h>
 #include <unistd.h>
 
@@ -83,6 +84,11 @@ struct lo_inode {
     uint64_t refcount; /* protected by lo->mutex */
 };
 
+struct lo_cred {
+    uid_t euid;
+    gid_t egid;
+};
+
 enum {
     CACHE_NEVER,
     CACHE_NORMAL,
@@ -383,6 +389,52 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
     }
 }
 
+/*
+ * Change to uid/gid of caller so that file is created with
+ * ownership of caller.
+ * TODO: What about selinux context?
+ */
+static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
+{
+    int res;
+
+    old->euid = geteuid();
+    old->egid = getegid();
+
+    res = syscall(SYS_setresgid, -1, fuse_req_ctx(req)->gid, -1);
+    if (res == -1) {
+        return errno;
+    }
+
+    res = syscall(SYS_setresuid, -1, fuse_req_ctx(req)->uid, -1);
+    if (res == -1) {
+        int errno_save = errno;
+
+        syscall(SYS_setresgid, -1, old->egid, -1);
+        return errno_save;
+    }
+
+    return 0;
+}
+
+/* Regain Privileges */
+static void lo_restore_cred(struct lo_cred *old)
+{
+    int res;
+
+    res = syscall(SYS_setresuid, -1, old->euid, -1);
+    if (res == -1) {
+        fuse_log(FUSE_LOG_ERR, "seteuid(%u): %m\n", old->euid);
+        exit(1);
+    }
+
+    res = syscall(SYS_setresgid, -1, old->egid, -1);
+    if (res == -1) {
+        fuse_log(FUSE_LOG_ERR, "setegid(%u): %m\n", old->egid);
+        exit(1);
+    }
+}
+
 static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
                              const char *name, mode_t mode, dev_t rdev,
                              const char *link)
@@ -391,12 +443,21 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
     int saverr;
     struct lo_inode *dir = lo_inode(req, parent);
     struct fuse_entry_param e;
+    struct lo_cred old = {};
 
     saverr = ENOMEM;
 
+    saverr = lo_change_cred(req, &old);
+    if (saverr) {
+        goto out;
+    }
+
     res = mknod_wrapper(dir->fd, name, link, mode, rdev);
 
     saverr = errno;
+
+    lo_restore_cred(&old);
+
     if (res == -1) {
         goto out;
     }
@@ -794,26 +855,34 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
     struct lo_data *lo = lo_data(req);
     struct fuse_entry_param e;
     int err;
+    struct lo_cred old = {};
 
     if (lo_debug(req)) {
         fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
                  parent, name);
     }
 
+    err = lo_change_cred(req, &old);
+    if (err) {
+        goto out;
+    }
+
     fd = openat(lo_fd(req, parent), name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
                 mode);
-    if (fd == -1) {
-        return (void)fuse_reply_err(req, errno);
-    }
+    err = fd == -1 ? errno : 0;
+    lo_restore_cred(&old);
 
-    fi->fh = fd;
+    if (!err) {
+        fi->fh = fd;
+        err = lo_do_lookup(req, parent, name, &e);
+    }
     if (lo->cache == CACHE_NEVER) {
         fi->direct_io = 1;
     } else if (lo->cache == CACHE_ALWAYS) {
         fi->keep_cache = 1;
     }
 
-    err = lo_do_lookup(req, parent, name, &e);
+out:
     if (err) {
         fuse_reply_err(req, err);
     } else {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 033/104] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (31 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-17 21:44   ` Masayoshi Mizuma
  2019-12-12 16:37 ` [PATCH 034/104] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers Dr. David Alan Gilbert (git)
                   ` (73 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

A layer of indirection is needed because passthrough_ll cannot expose
pointers or file descriptor numbers to untrusted clients.  Malicious
clients could send invalid pointers or file descriptors in order to
crash or exploit the file system daemon.

lo_map provides an integer key->value mapping.  This will be used for
ino and fh fields in the patches that follow.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 124 +++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0188cd9ad6..0a94c3e1f2 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -74,6 +74,21 @@ struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct {
 };
 #endif
 
+struct lo_map_elem {
+    union {
+        /* Element values will go here... */
+        ssize_t freelist;
+    };
+    bool in_use;
+};
+
+/* Maps FUSE fh or ino values to internal objects */
+struct lo_map {
+    struct lo_map_elem *elems;
+    size_t nelems;
+    ssize_t freelist;
+};
+
 struct lo_inode {
     struct lo_inode *next; /* protected by lo->mutex */
     struct lo_inode *prev; /* protected by lo->mutex */
@@ -130,6 +145,115 @@ static struct lo_data *lo_data(fuse_req_t req)
     return (struct lo_data *)fuse_req_userdata(req);
 }
 
+__attribute__((unused)) static void lo_map_init(struct lo_map *map)
+{
+    map->elems = NULL;
+    map->nelems = 0;
+    map->freelist = -1;
+}
+
+__attribute__((unused)) static void lo_map_destroy(struct lo_map *map)
+{
+    free(map->elems);
+}
+
+static int lo_map_grow(struct lo_map *map, size_t new_nelems)
+{
+    struct lo_map_elem *new_elems;
+    size_t i;
+
+    if (new_nelems <= map->nelems) {
+        return 1;
+    }
+
+    new_elems = realloc(map->elems, sizeof(map->elems[0]) * new_nelems);
+    if (!new_elems) {
+        return 0;
+    }
+
+    for (i = map->nelems; i < new_nelems; i++) {
+        new_elems[i].freelist = i + 1;
+        new_elems[i].in_use = false;
+    }
+    new_elems[new_nelems - 1].freelist = -1;
+
+    map->elems = new_elems;
+    map->freelist = map->nelems;
+    map->nelems = new_nelems;
+    return 1;
+}
+
+__attribute__((unused)) static struct lo_map_elem *
+lo_map_alloc_elem(struct lo_map *map)
+{
+    struct lo_map_elem *elem;
+
+    if (map->freelist == -1 && !lo_map_grow(map, map->nelems + 256)) {
+        return NULL;
+    }
+
+    elem = &map->elems[map->freelist];
+    map->freelist = elem->freelist;
+
+    elem->in_use = true;
+
+    return elem;
+}
+
+__attribute__((unused)) static struct lo_map_elem *
+lo_map_reserve(struct lo_map *map, size_t key)
+{
+    ssize_t *prev;
+
+    if (!lo_map_grow(map, key + 1)) {
+        return NULL;
+    }
+
+    for (prev = &map->freelist; *prev != -1;
+         prev = &map->elems[*prev].freelist) {
+        if (*prev == key) {
+            struct lo_map_elem *elem = &map->elems[key];
+
+            *prev = elem->freelist;
+            elem->in_use = true;
+            return elem;
+        }
+    }
+    return NULL;
+}
+
+__attribute__((unused)) static struct lo_map_elem *
+lo_map_get(struct lo_map *map, size_t key)
+{
+    if (key >= map->nelems) {
+        return NULL;
+    }
+    if (!map->elems[key].in_use) {
+        return NULL;
+    }
+    return &map->elems[key];
+}
+
+__attribute__((unused)) static void lo_map_remove(struct lo_map *map,
+                                                  size_t key)
+{
+    struct lo_map_elem *elem;
+
+    if (key >= map->nelems) {
+        return;
+    }
+
+    elem = &map->elems[key];
+    if (!elem->in_use) {
+        return;
+    }
+
+    elem->in_use = false;
+
+    elem->freelist = map->freelist;
+    map->freelist = key;
+}
+
 static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 {
     if (ino == FUSE_ROOT_ID) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 034/104] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (32 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 033/104] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-17 21:45   ` Masayoshi Mizuma
  2019-12-12 16:37 ` [PATCH 035/104] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers Dr. David Alan Gilbert (git)
                   ` (72 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Do not expose lo_inode pointers to clients.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 144 ++++++++++++++++++++++++-------
 1 file changed, 114 insertions(+), 30 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0a94c3e1f2..fd1d88bddf 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -57,8 +57,8 @@
 
 #define HAVE_POSIX_FALLOCATE 1
 /*
- * We are re-using pointers to our `struct lo_inode` and `struct
- * lo_dirp` elements as inodes. This means that we must be able to
+ * We are re-using pointers to our `struct lo_inode`
+ * elements as inodes. This means that we must be able to
  * store uintptr_t values in a fuse_ino_t variable. The following
  * incantation checks this condition at compile time.
  */
@@ -76,7 +76,7 @@ struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct {
 
 struct lo_map_elem {
     union {
-        /* Element values will go here... */
+        struct lo_inode *inode;
         ssize_t freelist;
     };
     bool in_use;
@@ -97,6 +97,7 @@ struct lo_inode {
     ino_t ino;
     dev_t dev;
     uint64_t refcount; /* protected by lo->mutex */
+    fuse_ino_t fuse_ino;
 };
 
 struct lo_cred {
@@ -121,6 +122,7 @@ struct lo_data {
     int cache;
     int timeout_set;
     struct lo_inode root; /* protected by lo->mutex */
+    struct lo_map ino_map; /* protected by lo->mutex */
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -145,14 +147,14 @@ static struct lo_data *lo_data(fuse_req_t req)
     return (struct lo_data *)fuse_req_userdata(req);
 }
 
-__attribute__((unused)) static void lo_map_init(struct lo_map *map)
+static void lo_map_init(struct lo_map *map)
 {
     map->elems = NULL;
     map->nelems = 0;
     map->freelist = -1;
 }
 
-__attribute__((unused)) static void lo_map_destroy(struct lo_map *map)
+static void lo_map_destroy(struct lo_map *map)
 {
     free(map->elems);
 }
@@ -183,8 +185,7 @@ static int lo_map_grow(struct lo_map *map, size_t new_nelems)
     return 1;
 }
 
-__attribute__((unused)) static struct lo_map_elem *
-lo_map_alloc_elem(struct lo_map *map)
+static struct lo_map_elem *lo_map_alloc_elem(struct lo_map *map)
 {
     struct lo_map_elem *elem;
 
@@ -200,8 +201,7 @@ lo_map_alloc_elem(struct lo_map *map)
     return elem;
 }
 
-__attribute__((unused)) static struct lo_map_elem *
-lo_map_reserve(struct lo_map *map, size_t key)
+static struct lo_map_elem *lo_map_reserve(struct lo_map *map, size_t key)
 {
     ssize_t *prev;
 
@@ -222,8 +222,7 @@ lo_map_reserve(struct lo_map *map, size_t key)
     return NULL;
 }
 
-__attribute__((unused)) static struct lo_map_elem *
-lo_map_get(struct lo_map *map, size_t key)
+static struct lo_map_elem *lo_map_get(struct lo_map *map, size_t key)
 {
     if (key >= map->nelems) {
         return NULL;
@@ -234,8 +233,7 @@ lo_map_get(struct lo_map *map, size_t key)
     return &map->elems[key];
 }
 
-__attribute__((unused)) static void lo_map_remove(struct lo_map *map,
-                                                  size_t key)
+static void lo_map_remove(struct lo_map *map, size_t key)
 {
     struct lo_map_elem *elem;
 
@@ -254,18 +252,40 @@ __attribute__((unused)) static void lo_map_remove(struct lo_map *map,
     map->freelist = key;
 }
 
+/* Assumes lo->mutex is held */
+static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
+{
+    struct lo_map_elem *elem;
+
+    elem = lo_map_alloc_elem(&lo_data(req)->ino_map);
+    if (!elem) {
+        return -1;
+    }
+
+    elem->inode = inode;
+    return elem - lo_data(req)->ino_map.elems;
+}
+
 static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 {
-    if (ino == FUSE_ROOT_ID) {
-        return &lo_data(req)->root;
-    } else {
-        return (struct lo_inode *)(uintptr_t)ino;
+    struct lo_data *lo = lo_data(req);
+    struct lo_map_elem *elem;
+
+    pthread_mutex_lock(&lo->mutex);
+    elem = lo_map_get(&lo->ino_map, ino);
+    pthread_mutex_unlock(&lo->mutex);
+
+    if (!elem) {
+        return NULL;
     }
+
+    return elem->inode;
 }
 
 static int lo_fd(fuse_req_t req, fuse_ino_t ino)
 {
-    return lo_inode(req, ino)->fd;
+    struct lo_inode *inode = lo_inode(req, ino);
+    return inode ? inode->fd : -1;
 }
 
 static bool lo_debug(fuse_req_t req)
@@ -337,10 +357,18 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 {
     int saverr;
     char procname[64];
-    struct lo_inode *inode = lo_inode(req, ino);
-    int ifd = inode->fd;
+    struct lo_inode *inode;
+    int ifd;
     int res;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
+    ifd = inode->fd;
+
     if (valid & FUSE_SET_ATTR_MODE) {
         if (fi) {
             res = fchmod(fi->fh, attr->st_mode);
@@ -470,6 +498,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         inode->dev = e->attr.st_dev;
 
         pthread_mutex_lock(&lo->mutex);
+        inode->fuse_ino = lo_add_inode_mapping(req, inode);
         prev = &lo->root;
         next = prev->next;
         next->prev = inode;
@@ -478,7 +507,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         prev->next = inode;
         pthread_mutex_unlock(&lo->mutex);
     }
-    e->ino = (uintptr_t)inode;
+    e->ino = inode->fuse_ino;
 
     if (lo_debug(req)) {
         fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
@@ -565,10 +594,16 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
 {
     int res;
     int saverr;
-    struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_inode *dir;
     struct fuse_entry_param e;
     struct lo_cred old = {};
 
+    dir = lo_inode(req, parent);
+    if (!dir) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
     saverr = ENOMEM;
 
     saverr = lo_change_cred(req, &old);
@@ -646,10 +681,16 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
 {
     int res;
     struct lo_data *lo = lo_data(req);
-    struct lo_inode *inode = lo_inode(req, ino);
+    struct lo_inode *inode;
     struct fuse_entry_param e;
     int saverr;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
     memset(&e, 0, sizeof(struct fuse_entry_param));
     e.attr_timeout = lo->timeout;
     e.entry_timeout = lo->timeout;
@@ -667,7 +708,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
     pthread_mutex_lock(&lo->mutex);
     inode->refcount++;
     pthread_mutex_unlock(&lo->mutex);
-    e.ino = (uintptr_t)inode;
+    e.ino = inode->fuse_ino;
 
     if (lo_debug(req)) {
         fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
@@ -733,10 +774,10 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
         next->prev = prev;
         prev->next = next;
 
+        lo_map_remove(&lo->ino_map, inode->fuse_ino);
         pthread_mutex_unlock(&lo->mutex);
         close(inode->fd);
         free(inode);
-
     } else {
         pthread_mutex_unlock(&lo->mutex);
     }
@@ -745,7 +786,12 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
 static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
 {
     struct lo_data *lo = lo_data(req);
-    struct lo_inode *inode = lo_inode(req, ino);
+    struct lo_inode *inode;
+
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        return;
+    }
 
     if (lo_debug(req)) {
         fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
@@ -1227,10 +1273,16 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 {
     char *value = NULL;
     char procname[64];
-    struct lo_inode *inode = lo_inode(req, ino);
+    struct lo_inode *inode;
     ssize_t ret;
     int saverr;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
     saverr = ENOSYS;
     if (!lo_data(req)->xattr) {
         goto out;
@@ -1289,10 +1341,16 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 {
     char *value = NULL;
     char procname[64];
-    struct lo_inode *inode = lo_inode(req, ino);
+    struct lo_inode *inode;
     ssize_t ret;
     int saverr;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
     saverr = ENOSYS;
     if (!lo_data(req)->xattr) {
         goto out;
@@ -1350,10 +1408,16 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
                         const char *value, size_t size, int flags)
 {
     char procname[64];
-    struct lo_inode *inode = lo_inode(req, ino);
+    struct lo_inode *inode;
     ssize_t ret;
     int saverr;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
     saverr = ENOSYS;
     if (!lo_data(req)->xattr) {
         goto out;
@@ -1383,10 +1447,16 @@ out:
 static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
 {
     char procname[64];
-    struct lo_inode *inode = lo_inode(req, ino);
+    struct lo_inode *inode;
     ssize_t ret;
     int saverr;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
     saverr = ENOSYS;
     if (!lo_data(req)->xattr) {
         goto out;
@@ -1505,6 +1575,7 @@ int main(int argc, char *argv[])
     struct fuse_session *se;
     struct fuse_cmdline_opts opts;
     struct lo_data lo = { .debug = 0, .writeback = 0 };
+    struct lo_map_elem *root_elem;
     int ret = -1;
 
     /* Don't mask creation mode, kernel already did that */
@@ -1513,8 +1584,19 @@ int main(int argc, char *argv[])
     pthread_mutex_init(&lo.mutex, NULL);
     lo.root.next = lo.root.prev = &lo.root;
     lo.root.fd = -1;
+    lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_NORMAL;
 
+    /*
+     * Set up the ino map like this:
+     * [0] Reserved (will not be used)
+     * [1] Root inode
+     */
+    lo_map_init(&lo.ino_map);
+    lo_map_reserve(&lo.ino_map, 0)->in_use = false;
+    root_elem = lo_map_reserve(&lo.ino_map, lo.root.fuse_ino);
+    root_elem->inode = &lo.root;
+
     if (fuse_parse_cmdline(&args, &opts) != 0) {
         return 1;
     }
@@ -1611,6 +1693,8 @@ err_out2:
 err_out1:
     fuse_opt_free_args(&args);
 
+    lo_map_destroy(&lo.ino_map);
+
     if (lo.root.fd >= 0) {
         close(lo.root.fd);
     }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 035/104] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (33 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 034/104] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-17 13:58   ` Philippe Mathieu-Daudé
  2019-12-12 16:37 ` [PATCH 036/104] virtiofsd: passthrough_ll: add fd_map to hide file descriptors Dr. David Alan Gilbert (git)
                   ` (71 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Do not expose lo_dirp pointers to clients.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 103 +++++++++++++++++++++++--------
 1 file changed, 76 insertions(+), 27 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index fd1d88bddf..face8910b0 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -56,27 +56,10 @@
 #include "passthrough_helpers.h"
 
 #define HAVE_POSIX_FALLOCATE 1
-/*
- * We are re-using pointers to our `struct lo_inode`
- * elements as inodes. This means that we must be able to
- * store uintptr_t values in a fuse_ino_t variable. The following
- * incantation checks this condition at compile time.
- */
-#if defined(__GNUC__) &&                                      \
-    (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && \
-    !defined __cplusplus
-_Static_assert(sizeof(fuse_ino_t) >= sizeof(uintptr_t),
-               "fuse_ino_t too small to hold uintptr_t values!");
-#else
-struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct {
-    unsigned _uintptr_to_must_hold_fuse_ino_t
-        : ((sizeof(fuse_ino_t) >= sizeof(uintptr_t)) ? 1 : -1);
-};
-#endif
-
 struct lo_map_elem {
     union {
         struct lo_inode *inode;
+        struct lo_dirp *dirp;
         ssize_t freelist;
     };
     bool in_use;
@@ -123,6 +106,7 @@ struct lo_data {
     int timeout_set;
     struct lo_inode root; /* protected by lo->mutex */
     struct lo_map ino_map; /* protected by lo->mutex */
+    struct lo_map dirp_map; /* protected by lo->mutex */
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -252,6 +236,20 @@ static void lo_map_remove(struct lo_map *map, size_t key)
     map->freelist = key;
 }
 
+/* Assumes lo->mutex is held */
+static ssize_t lo_add_dirp_mapping(fuse_req_t req, struct lo_dirp *dirp)
+{
+    struct lo_map_elem *elem;
+
+    elem = lo_map_alloc_elem(&lo_data(req)->dirp_map);
+    if (!elem) {
+        return -1;
+    }
+
+    elem->dirp = dirp;
+    return elem - lo_data(req)->dirp_map.elems;
+}
+
 /* Assumes lo->mutex is held */
 static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
 {
@@ -844,9 +842,19 @@ struct lo_dirp {
     off_t offset;
 };
 
-static struct lo_dirp *lo_dirp(struct fuse_file_info *fi)
+static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
 {
-    return (struct lo_dirp *)(uintptr_t)fi->fh;
+    struct lo_data *lo = lo_data(req);
+    struct lo_map_elem *elem;
+
+    pthread_mutex_lock(&lo->mutex);
+    elem = lo_map_get(&lo->dirp_map, fi->fh);
+    pthread_mutex_unlock(&lo->mutex);
+    if (!elem) {
+        return NULL;
+    }
+
+    return elem->dirp;
 }
 
 static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
@@ -856,6 +864,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
     struct lo_data *lo = lo_data(req);
     struct lo_dirp *d;
     int fd;
+    ssize_t fh;
 
     d = calloc(1, sizeof(struct lo_dirp));
     if (d == NULL) {
@@ -875,7 +884,14 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
     d->offset = 0;
     d->entry = NULL;
 
-    fi->fh = (uintptr_t)d;
+    pthread_mutex_lock(&lo->mutex);
+    fh = lo_add_dirp_mapping(req, d);
+    pthread_mutex_unlock(&lo->mutex);
+    if (fh == -1) {
+        goto out_err;
+    }
+
+    fi->fh = fh;
     if (lo->cache == CACHE_ALWAYS) {
         fi->keep_cache = 1;
     }
@@ -886,6 +902,9 @@ out_errno:
     error = errno;
 out_err:
     if (d) {
+        if (d->dp) {
+            closedir(d->dp);
+        }
         if (fd != -1) {
             close(fd);
         }
@@ -903,17 +922,21 @@ static int is_dot_or_dotdot(const char *name)
 static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
                           off_t offset, struct fuse_file_info *fi, int plus)
 {
-    struct lo_dirp *d = lo_dirp(fi);
-    char *buf;
+    struct lo_dirp *d;
+    char *buf = NULL;
     char *p;
     size_t rem = size;
-    int err;
+    int err = ENOMEM;
 
     (void)ino;
 
+    d = lo_dirp(req, fi);
+    if (!d) {
+        goto error;
+    }
+
     buf = calloc(1, size);
     if (!buf) {
-        err = ENOMEM;
         goto error;
     }
     p = buf;
@@ -1011,8 +1034,21 @@ static void lo_readdirplus(fuse_req_t req, fuse_ino_t ino, size_t size,
 static void lo_releasedir(fuse_req_t req, fuse_ino_t ino,
                           struct fuse_file_info *fi)
 {
-    struct lo_dirp *d = lo_dirp(fi);
+    struct lo_data *lo = lo_data(req);
+    struct lo_dirp *d;
+
     (void)ino;
+
+    d = lo_dirp(req, fi);
+    if (!d) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
+    pthread_mutex_lock(&lo->mutex);
+    lo_map_remove(&lo->dirp_map, fi->fh);
+    pthread_mutex_unlock(&lo->mutex);
+
     closedir(d->dp);
     free(d);
     fuse_reply_err(req, 0);
@@ -1064,8 +1100,18 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
                         struct fuse_file_info *fi)
 {
     int res;
-    int fd = dirfd(lo_dirp(fi)->dp);
+    struct lo_dirp *d;
+    int fd;
+
     (void)ino;
+
+    d = lo_dirp(req, fi);
+    if (!d) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
+    fd = dirfd(d->dp);
     if (datasync) {
         res = fdatasync(fd);
     } else {
@@ -1597,6 +1643,8 @@ int main(int argc, char *argv[])
     root_elem = lo_map_reserve(&lo.ino_map, lo.root.fuse_ino);
     root_elem->inode = &lo.root;
 
+    lo_map_init(&lo.dirp_map);
+
     if (fuse_parse_cmdline(&args, &opts) != 0) {
         return 1;
     }
@@ -1693,6 +1741,7 @@ err_out2:
 err_out1:
     fuse_opt_free_args(&args);
 
+    lo_map_destroy(&lo.dirp_map);
     lo_map_destroy(&lo.ino_map);
 
     if (lo.root.fd >= 0) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 036/104] virtiofsd: passthrough_ll: add fd_map to hide file descriptors
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (34 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 035/104] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-17 22:32   ` Masayoshi Mizuma
  2019-12-12 16:37 ` [PATCH 037/104] virtiofsd: passthrough_ll: add fallback for racy ops Dr. David Alan Gilbert (git)
                   ` (70 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Do not expose file descriptor numbers to clients.  This prevents the
abuse of internal file descriptors (like stdin/stdout).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
dgilbert:
  Added lseek
---
 tools/virtiofsd/passthrough_ll.c | 114 +++++++++++++++++++++++++------
 1 file changed, 93 insertions(+), 21 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index face8910b0..93e74cce21 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -60,6 +60,7 @@ struct lo_map_elem {
     union {
         struct lo_inode *inode;
         struct lo_dirp *dirp;
+        int fd;
         ssize_t freelist;
     };
     bool in_use;
@@ -107,6 +108,7 @@ struct lo_data {
     struct lo_inode root; /* protected by lo->mutex */
     struct lo_map ino_map; /* protected by lo->mutex */
     struct lo_map dirp_map; /* protected by lo->mutex */
+    struct lo_map fd_map; /* protected by lo->mutex */
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -236,6 +238,20 @@ static void lo_map_remove(struct lo_map *map, size_t key)
     map->freelist = key;
 }
 
+/* Assumes lo->mutex is held */
+static ssize_t lo_add_fd_mapping(fuse_req_t req, int fd)
+{
+    struct lo_map_elem *elem;
+
+    elem = lo_map_alloc_elem(&lo_data(req)->fd_map);
+    if (!elem) {
+        return -1;
+    }
+
+    elem->fd = fd;
+    return elem - lo_data(req)->fd_map.elems;
+}
+
 /* Assumes lo->mutex is held */
 static ssize_t lo_add_dirp_mapping(fuse_req_t req, struct lo_dirp *dirp)
 {
@@ -350,6 +366,22 @@ static int utimensat_empty_nofollow(struct lo_inode *inode,
     return utimensat(AT_FDCWD, procname, tv, 0);
 }
 
+static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
+{
+    struct lo_data *lo = lo_data(req);
+    struct lo_map_elem *elem;
+
+    pthread_mutex_lock(&lo->mutex);
+    elem = lo_map_get(&lo->fd_map, fi->fh);
+    pthread_mutex_unlock(&lo->mutex);
+
+    if (!elem) {
+        return -1;
+    }
+
+    return elem->fd;
+}
+
 static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                        int valid, struct fuse_file_info *fi)
 {
@@ -358,6 +390,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
     struct lo_inode *inode;
     int ifd;
     int res;
+    int fd;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -367,9 +400,14 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 
     ifd = inode->fd;
 
+    /* If fi->fh is invalid we'll report EBADF later */
+    if (fi) {
+        fd = lo_fi_fd(req, fi);
+    }
+
     if (valid & FUSE_SET_ATTR_MODE) {
         if (fi) {
-            res = fchmod(fi->fh, attr->st_mode);
+            res = fchmod(fd, attr->st_mode);
         } else {
             sprintf(procname, "/proc/self/fd/%i", ifd);
             res = chmod(procname, attr->st_mode);
@@ -389,7 +427,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
     }
     if (valid & FUSE_SET_ATTR_SIZE) {
         if (fi) {
-            res = ftruncate(fi->fh, attr->st_size);
+            res = ftruncate(fd, attr->st_size);
         } else {
             sprintf(procname, "/proc/self/fd/%i", ifd);
             res = truncate(procname, attr->st_size);
@@ -419,7 +457,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         }
 
         if (fi) {
-            res = futimens(fi->fh, tv);
+            res = futimens(fd, tv);
         } else {
             res = utimensat_empty_nofollow(inode, tv);
         }
@@ -1079,7 +1117,18 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
     lo_restore_cred(&old);
 
     if (!err) {
-        fi->fh = fd;
+        ssize_t fh;
+
+        pthread_mutex_lock(&lo->mutex);
+        fh = lo_add_fd_mapping(req, fd);
+        pthread_mutex_unlock(&lo->mutex);
+        if (fh == -1) {
+            close(fd);
+            fuse_reply_err(req, ENOMEM);
+            return;
+        }
+
+        fi->fh = fh;
         err = lo_do_lookup(req, parent, name, &e);
     }
     if (lo->cache == CACHE_NEVER) {
@@ -1123,6 +1172,7 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
 static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
     int fd;
+    ssize_t fh;
     char buf[64];
     struct lo_data *lo = lo_data(req);
 
@@ -1158,7 +1208,16 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
         return (void)fuse_reply_err(req, errno);
     }
 
-    fi->fh = fd;
+    pthread_mutex_lock(&lo->mutex);
+    fh = lo_add_fd_mapping(req, fd);
+    pthread_mutex_unlock(&lo->mutex);
+    if (fh == -1) {
+        close(fd);
+        fuse_reply_err(req, ENOMEM);
+        return;
+    }
+
+    fi->fh = fh;
     if (lo->cache == CACHE_NEVER) {
         fi->direct_io = 1;
     } else if (lo->cache == CACHE_ALWAYS) {
@@ -1170,9 +1229,18 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 static void lo_release(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    struct lo_data *lo = lo_data(req);
+    int fd;
+
     (void)ino;
 
-    close(fi->fh);
+    fd = lo_fi_fd(req, fi);
+
+    pthread_mutex_lock(&lo->mutex);
+    lo_map_remove(&lo->fd_map, fi->fh);
+    pthread_mutex_unlock(&lo->mutex);
+
+    close(fd);
     fuse_reply_err(req, 0);
 }
 
@@ -1180,7 +1248,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
     int res;
     (void)ino;
-    res = close(dup(fi->fh));
+    res = close(dup(lo_fi_fd(req, fi)));
     fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
@@ -1207,7 +1275,7 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
             return (void)fuse_reply_err(req, errno);
         }
     } else {
-        fd = fi->fh;
+        fd = lo_fi_fd(req, fi);
     }
 
     if (datasync) {
@@ -1234,7 +1302,7 @@ static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,
     }
 
     buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
-    buf.buf[0].fd = fi->fh;
+    buf.buf[0].fd = lo_fi_fd(req, fi);
     buf.buf[0].pos = offset;
 
     fuse_reply_data(req, &buf, FUSE_BUF_SPLICE_MOVE);
@@ -1249,7 +1317,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
     struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
 
     out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
-    out_buf.buf[0].fd = fi->fh;
+    out_buf.buf[0].fd = lo_fi_fd(req, fi);
     out_buf.buf[0].pos = off;
 
     if (lo_debug(req)) {
@@ -1297,7 +1365,7 @@ static void lo_fallocate(fuse_req_t req, fuse_ino_t ino, int mode, off_t offset,
         return;
     }
 
-    err = posix_fallocate(fi->fh, offset, length);
+    err = posix_fallocate(lo_fi_fd(req, fi), offset, length);
 #endif
 
     fuse_reply_err(req, err);
@@ -1309,7 +1377,7 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
     int res;
     (void)ino;
 
-    res = flock(fi->fh, op);
+    res = flock(lo_fi_fd(req, fi), op);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
 }
@@ -1534,17 +1602,19 @@ static void lo_copy_file_range(fuse_req_t req, fuse_ino_t ino_in, off_t off_in,
                                off_t off_out, struct fuse_file_info *fi_out,
                                size_t len, int flags)
 {
+    int in_fd, out_fd;
     ssize_t res;
 
-    if (lo_debug(req))
-        fuse_log(FUSE_LOG_DEBUG,
-                 "lo_copy_file_range(ino=%" PRIu64 "/fd=%lu, "
-                 "off=%lu, ino=%" PRIu64 "/fd=%lu, "
-                 "off=%lu, size=%zd, flags=0x%x)\n",
-                 ino_in, fi_in->fh, off_in, ino_out, fi_out->fh, off_out, len,
-                 flags);
+    in_fd = lo_fi_fd(req, fi_in);
+    out_fd = lo_fi_fd(req, fi_out);
+
+    fuse_log(FUSE_LOG_DEBUG,
+             "lo_copy_file_range(ino=%" PRIu64 "/fd=%d, "
+             "off=%lu, ino=%" PRIu64 "/fd=%d, "
+             "off=%lu, size=%zd, flags=0x%x)\n",
+             ino_in, in_fd, off_in, ino_out, out_fd, off_out, len, flags);
 
-    res = copy_file_range(fi_in->fh, &off_in, fi_out->fh, &off_out, len, flags);
+    res = copy_file_range(in_fd, &off_in, out_fd, &off_out, len, flags);
     if (res < 0) {
         fuse_reply_err(req, -errno);
     } else {
@@ -1559,7 +1629,7 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
     off_t res;
 
     (void)ino;
-    res = lseek(fi->fh, off, whence);
+    res = lseek(lo_fi_fd(req, fi), off, whence);
     if (res != -1) {
         fuse_reply_lseek(req, res);
     } else {
@@ -1644,6 +1714,7 @@ int main(int argc, char *argv[])
     root_elem->inode = &lo.root;
 
     lo_map_init(&lo.dirp_map);
+    lo_map_init(&lo.fd_map);
 
     if (fuse_parse_cmdline(&args, &opts) != 0) {
         return 1;
@@ -1741,6 +1812,7 @@ err_out2:
 err_out1:
     fuse_opt_free_args(&args);
 
+    lo_map_destroy(&lo.fd_map);
     lo_map_destroy(&lo.dirp_map);
     lo_map_destroy(&lo.ino_map);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 037/104] virtiofsd: passthrough_ll: add fallback for racy ops
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (35 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 036/104] virtiofsd: passthrough_ll: add fd_map to hide file descriptors Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-18 16:22   ` Masayoshi Mizuma
  2019-12-12 16:37 ` [PATCH 038/104] virtiofsd: validate path components Dr. David Alan Gilbert (git)
                   ` (69 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

We have two operations that cannot be done race-free on a symlink in
certain cases: utimes and link.

Add racy fallback for these if the race-free method doesn't work.  We do
our best to avoid races even in this case:

  - get absolute path by reading /proc/self/fd/NN symlink

  - lookup parent directory: after this we are safe against renames in
    ancestors

  - lookup name in parent directory, and verify that we got to the original
    inode,  if not retry the whole thing

Both utimes(2) and link(2) hold i_lock on the inode across the operation,
so a racing rename/delete by this fuse instance is not possible, only from
other entities changing the filesystem.

If the "norace" option is given, then disable the racy fallbacks.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 159 +++++++++++++++++++++++++++----
 1 file changed, 142 insertions(+), 17 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 93e74cce21..1faae2753f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -98,6 +98,7 @@ enum {
 struct lo_data {
     pthread_mutex_t mutex;
     int debug;
+    int norace;
     int writeback;
     int flock;
     int xattr;
@@ -124,10 +125,15 @@ static const struct fuse_opt lo_opts[] = {
     { "cache=never", offsetof(struct lo_data, cache), CACHE_NEVER },
     { "cache=auto", offsetof(struct lo_data, cache), CACHE_NORMAL },
     { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },
-
+    { "norace", offsetof(struct lo_data, norace), 1 },
     FUSE_OPT_END
 };
 
+static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
+
+static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st);
+
+
 static struct lo_data *lo_data(fuse_req_t req)
 {
     return (struct lo_data *)fuse_req_userdata(req);
@@ -347,23 +353,127 @@ static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
     fuse_reply_attr(req, &buf, lo->timeout);
 }
 
-static int utimensat_empty_nofollow(struct lo_inode *inode,
-                                    const struct timespec *tv)
+static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
+                              char path[PATH_MAX], struct lo_inode **parent)
 {
-    int res;
     char procname[64];
+    char *last;
+    struct stat stat;
+    struct lo_inode *p;
+    int retries = 2;
+    int res;
+
+retry:
+    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+    res = readlink(procname, path, PATH_MAX);
+    if (res < 0) {
+        fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink failed: %m\n");
+        goto fail_noretry;
+    }
+
+    if (res >= PATH_MAX) {
+        fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink overflowed\n");
+        goto fail_noretry;
+    }
+    path[res] = '\0';
+
+    last = strrchr(path, '/');
+    if (last == NULL) {
+        /* Shouldn't happen */
+        fuse_log(
+            FUSE_LOG_WARNING,
+            "lo_parent_and_name: INTERNAL ERROR: bad path read from proc\n");
+        goto fail_noretry;
+    }
+    if (last == path) {
+        p = &lo->root;
+        pthread_mutex_lock(&lo->mutex);
+        p->refcount++;
+        pthread_mutex_unlock(&lo->mutex);
+    } else {
+        *last = '\0';
+        res = fstatat(AT_FDCWD, last == path ? "/" : path, &stat, 0);
+        if (res == -1) {
+            if (!retries) {
+                fuse_log(FUSE_LOG_WARNING,
+                         "lo_parent_and_name: failed to stat parent: %m\n");
+            }
+            goto fail;
+        }
+        p = lo_find(lo, &stat);
+        if (p == NULL) {
+            if (!retries) {
+                fuse_log(FUSE_LOG_WARNING,
+                         "lo_parent_and_name: failed to find parent\n");
+            }
+            goto fail;
+        }
+    }
+    last++;
+    res = fstatat(p->fd, last, &stat, AT_SYMLINK_NOFOLLOW);
+    if (res == -1) {
+        if (!retries) {
+            fuse_log(FUSE_LOG_WARNING,
+                     "lo_parent_and_name: failed to stat last\n");
+        }
+        goto fail_unref;
+    }
+    if (stat.st_dev != inode->dev || stat.st_ino != inode->ino) {
+        if (!retries) {
+            fuse_log(FUSE_LOG_WARNING,
+                     "lo_parent_and_name: failed to match last\n");
+        }
+        goto fail_unref;
+    }
+    *parent = p;
+    memmove(path, last, strlen(last) + 1);
+
+    return 0;
+
+fail_unref:
+    unref_inode(lo, p, 1);
+fail:
+    if (retries) {
+        retries--;
+        goto retry;
+    }
+fail_noretry:
+    errno = EIO;
+    return -1;
+}
+
+static int utimensat_empty(struct lo_data *lo, struct lo_inode *inode,
+                           const struct timespec *tv)
+{
+    int res;
+    struct lo_inode *parent;
+    char path[PATH_MAX];
 
     if (inode->is_symlink) {
-        res = utimensat(inode->fd, "", tv, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+        res = utimensat(inode->fd, "", tv, AT_EMPTY_PATH);
         if (res == -1 && errno == EINVAL) {
             /* Sorry, no race free way to set times on symlink. */
-            errno = EPERM;
+            if (lo->norace) {
+                errno = EPERM;
+            } else {
+                goto fallback;
+            }
         }
         return res;
     }
-    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(path, "/proc/self/fd/%i", inode->fd);
+
+    return utimensat(AT_FDCWD, path, tv, 0);
 
-    return utimensat(AT_FDCWD, procname, tv, 0);
+fallback:
+    res = lo_parent_and_name(lo, inode, path, &parent);
+    if (res != -1) {
+        res = utimensat(parent->fd, path, tv, AT_SYMLINK_NOFOLLOW);
+        unref_inode(lo, parent, 1);
+    }
+
+    return res;
 }
 
 static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
@@ -387,6 +497,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 {
     int saverr;
     char procname[64];
+    struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     int ifd;
     int res;
@@ -459,7 +570,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = futimens(fd, tv);
         } else {
-            res = utimensat_empty_nofollow(inode, tv);
+            res = utimensat_empty(lo, inode, tv);
         }
         if (res == -1) {
             goto out_err;
@@ -692,24 +803,38 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
     lo_mknod_symlink(req, parent, name, S_IFLNK, 0, link);
 }
 
-static int linkat_empty_nofollow(struct lo_inode *inode, int dfd,
-                                 const char *name)
+static int linkat_empty_nofollow(struct lo_data *lo, struct lo_inode *inode,
+                                 int dfd, const char *name)
 {
     int res;
-    char procname[64];
+    struct lo_inode *parent;
+    char path[PATH_MAX];
 
     if (inode->is_symlink) {
         res = linkat(inode->fd, "", dfd, name, AT_EMPTY_PATH);
         if (res == -1 && (errno == ENOENT || errno == EINVAL)) {
             /* Sorry, no race free way to hard-link a symlink. */
-            errno = EPERM;
+            if (lo->norace) {
+                errno = EPERM;
+            } else {
+                goto fallback;
+            }
         }
         return res;
     }
 
-    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(path, "/proc/self/fd/%i", inode->fd);
+
+    return linkat(AT_FDCWD, path, dfd, name, AT_SYMLINK_FOLLOW);
+
+fallback:
+    res = lo_parent_and_name(lo, inode, path, &parent);
+    if (res != -1) {
+        res = linkat(parent->fd, path, dfd, name, 0);
+        unref_inode(lo, parent, 1);
+    }
 
-    return linkat(AT_FDCWD, procname, dfd, name, AT_SYMLINK_FOLLOW);
+    return res;
 }
 
 static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
@@ -731,7 +856,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
     e.attr_timeout = lo->timeout;
     e.entry_timeout = lo->timeout;
 
-    res = linkat_empty_nofollow(inode, lo_fd(req, parent), name);
+    res = linkat_empty_nofollow(lo, inode, lo_fd(req, parent), name);
     if (res == -1) {
         goto out_err;
     }
@@ -1544,7 +1669,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
     }
 
     if (inode->is_symlink) {
-        /* Sorry, no race free way to setxattr on symlink. */
+        /* Sorry, no race free way to removexattr on symlink. */
         saverr = EPERM;
         goto out;
     }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 038/104] virtiofsd: validate path components
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (36 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 037/104] virtiofsd: passthrough_ll: add fallback for racy ops Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:32   ` Daniel P. Berrangé
  2019-12-12 16:37 ` [PATCH 039/104] virtiofsd: Plumb fuse_bufvec through to do_write_buf Dr. David Alan Gilbert (git)
                   ` (68 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Several FUSE requests contain single path components.  A correct FUSE
client sends well-formed path components but there is currently no input
validation in case something went wrong or the client is malicious.

Refuse ".", "..", and paths containing '/' when we expect a path
component.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 59 ++++++++++++++++++++++++++++----
 1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 1faae2753f..84e9d8916f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -133,6 +133,21 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
 
 static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st);
 
+static int is_dot_or_dotdot(const char *name)
+{
+    return name[0] == '.' &&
+           (name[1] == '\0' || (name[1] == '.' && name[2] == '\0'));
+}
+
+/* Is `path` a single path component that is not "." or ".."? */
+static int is_safe_path_component(const char *path)
+{
+    if (strchr(path, '/')) {
+        return 0;
+    }
+
+    return !is_dot_or_dotdot(path);
+}
 
 static struct lo_data *lo_data(fuse_req_t req)
 {
@@ -681,6 +696,15 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
                  parent, name);
     }
 
+    /*
+     * Don't use is_safe_path_component(), allow "." and ".." for NFS export
+     * support.
+     */
+    if (strchr(name, '/')) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     err = lo_do_lookup(req, parent, name, &e);
     if (err) {
         fuse_reply_err(req, err);
@@ -745,6 +769,11 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
     struct fuse_entry_param e;
     struct lo_cred old = {};
 
+    if (!is_safe_path_component(name)) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     dir = lo_inode(req, parent);
     if (!dir) {
         fuse_reply_err(req, EBADF);
@@ -846,6 +875,11 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
     struct fuse_entry_param e;
     int saverr;
 
+    if (!is_safe_path_component(name)) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     inode = lo_inode(req, ino);
     if (!inode) {
         fuse_reply_err(req, EBADF);
@@ -887,6 +921,10 @@ out_err:
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
     int res;
+    if (!is_safe_path_component(name)) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
 
@@ -899,6 +937,11 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
 {
     int res;
 
+    if (!is_safe_path_component(name) || !is_safe_path_component(newname)) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     if (flags) {
         fuse_reply_err(req, EINVAL);
         return;
@@ -913,6 +956,11 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
     int res;
 
+    if (!is_safe_path_component(name)) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     res = unlinkat(lo_fd(req, parent), name, 0);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
@@ -1076,12 +1124,6 @@ out_err:
     fuse_reply_err(req, error);
 }
 
-static int is_dot_or_dotdot(const char *name)
-{
-    return name[0] == '.' &&
-           (name[1] == '\0' || (name[1] == '.' && name[2] == '\0'));
-}
-
 static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
                           off_t offset, struct fuse_file_info *fi, int plus)
 {
@@ -1231,6 +1273,11 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
                  parent, name);
     }
 
+    if (!is_safe_path_component(name)) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     err = lo_change_cred(req, &old);
     if (err) {
         goto out;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 039/104] virtiofsd: Plumb fuse_bufvec through to do_write_buf
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (37 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 038/104] virtiofsd: validate path components Dr. David Alan Gilbert (git)
@ 2019-12-12 16:37 ` Dr. David Alan Gilbert (git)
  2020-01-17 21:01   ` Masayoshi Mizuma
  2019-12-12 16:38 ` [PATCH 040/104] virtiofsd: Pass write iov's all the way through Dr. David Alan Gilbert (git)
                   ` (67 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:37 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Let fuse_session_process_buf_int take a fuse_bufvec * instead of a
fuse_buf;  and then through to do_write_buf - where in the best
case it can pass that straight through to op.write_buf without copying
(other than skipping a header).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_i.h        |  2 +-
 tools/virtiofsd/fuse_lowlevel.c | 61 ++++++++++++++++++++++-----------
 tools/virtiofsd/fuse_virtio.c   |  3 +-
 3 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index cb1ca70ffa..d0679508cd 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -119,7 +119,7 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
 void fuse_free_req(fuse_req_t req);
 
 void fuse_session_process_buf_int(struct fuse_session *se,
-                                  const struct fuse_buf *buf,
+                                  struct fuse_bufvec *bufv,
                                   struct fuse_chan *ch);
 
 
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index bea092b454..7d33bbf539 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1006,11 +1006,12 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 }
 
 static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
-                         const struct fuse_buf *ibuf)
+                         struct fuse_bufvec *ibufv)
 {
     struct fuse_session *se = req->se;
-    struct fuse_bufvec bufv = {
-        .buf[0] = *ibuf,
+    struct fuse_bufvec *pbufv = ibufv;
+    struct fuse_bufvec tmpbufv = {
+        .buf[0] = ibufv->buf[0],
         .count = 1,
     };
     struct fuse_write_in *arg = (struct fuse_write_in *)inarg;
@@ -1020,22 +1021,31 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
     fi.fh = arg->fh;
     fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
 
-    fi.lock_owner = arg->lock_owner;
-    fi.flags = arg->flags;
-    if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD)) {
-        bufv.buf[0].mem = PARAM(arg);
-    }
-
-    bufv.buf[0].size -=
-        sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in);
-    if (bufv.buf[0].size < arg->size) {
-        fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
-        fuse_reply_err(req, EIO);
-        return;
+    if (ibufv->count == 1) {
+        fi.lock_owner = arg->lock_owner;
+        fi.flags = arg->flags;
+        if (!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD)) {
+            tmpbufv.buf[0].mem = PARAM(arg);
+        }
+        tmpbufv.buf[0].size -=
+            sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in);
+        if (tmpbufv.buf[0].size < arg->size) {
+            fuse_log(FUSE_LOG_ERR,
+                     "fuse: do_write_buf: buffer size too small\n");
+            fuse_reply_err(req, EIO);
+            return;
+        }
+        tmpbufv.buf[0].size = arg->size;
+        pbufv = &tmpbufv;
+    } else {
+        /*
+         *  Input bufv contains the headers in the first element
+         * and the data in the rest, we need to skip that first element
+         */
+        ibufv->buf[0].size = 0;
     }
-    bufv.buf[0].size = arg->size;
 
-    se->op.write_buf(req, nodeid, &bufv, arg->offset, &fi);
+    se->op.write_buf(req, nodeid, pbufv, arg->offset, &fi);
 }
 
 static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
@@ -2027,13 +2037,24 @@ static const char *opname(enum fuse_opcode opcode)
 void fuse_session_process_buf(struct fuse_session *se,
                               const struct fuse_buf *buf)
 {
-    fuse_session_process_buf_int(se, buf, NULL);
+    struct fuse_bufvec bufv = { .buf[0] = *buf, .count = 1 };
+    fuse_session_process_buf_int(se, &bufv, NULL);
 }
 
+/*
+ * Restriction:
+ *   bufv is normally a single entry buffer, except for a write
+ *   where (if it's in memory) then the bufv may be multiple entries,
+ *   where the first entry contains all headers and subsequent entries
+ *   contain data
+ *   bufv shall not use any offsets etc to make the data anything
+ *   other than contiguous starting from 0.
+ */
 void fuse_session_process_buf_int(struct fuse_session *se,
-                                  const struct fuse_buf *buf,
+                                  struct fuse_bufvec *bufv,
                                   struct fuse_chan *ch)
 {
+    const struct fuse_buf *buf = bufv->buf;
     struct fuse_in_header *in;
     const void *inarg;
     struct fuse_req *req;
@@ -2111,7 +2132,7 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 
     inarg = (void *)&in[1];
     if (in->opcode == FUSE_WRITE && se->op.write_buf) {
-        do_write_buf(req, in->nodeid, inarg, buf);
+        do_write_buf(req, in->nodeid, inarg, bufv);
     } else {
         fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
     }
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index fa6e53e7d0..99c877ea2e 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -499,7 +499,8 @@ static void *fv_queue_thread(void *opaque)
             /* TODO! Endianness of header */
 
             /* TODO: Add checks for fuse_session_exited */
-            fuse_session_process_buf_int(se, &fbuf, &ch);
+            struct fuse_bufvec bufv = { .buf[0] = fbuf, .count = 1 };
+            fuse_session_process_buf_int(se, &bufv, &ch);
 
             if (!qi->reply_sent) {
                 fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n",
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 040/104] virtiofsd: Pass write iov's all the way through
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (38 preceding siblings ...)
  2019-12-12 16:37 ` [PATCH 039/104] virtiofsd: Plumb fuse_bufvec through to do_write_buf Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-19  8:08   ` Xiao Yang
  2019-12-12 16:38 ` [PATCH 041/104] virtiofsd: add fuse_mbuf_iter API Dr. David Alan Gilbert (git)
                   ` (66 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Pass the write iov pointing to guest RAM all the way through rather
than copying the data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 79 ++++++++++++++++++++++++++++++++---
 1 file changed, 73 insertions(+), 6 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 99c877ea2e..3c778b6296 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -452,6 +452,10 @@ static void *fv_queue_thread(void *opaque)
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
         while (1) {
+            bool allocated_bufv = false;
+            struct fuse_bufvec bufv;
+            struct fuse_bufvec *pbufv;
+
             /*
              * An element contains one request and the space to send our
              * response They're spread over multiple descriptors in a
@@ -493,14 +497,76 @@ static void *fv_queue_thread(void *opaque)
                          __func__, elem->index);
                 assert(0); /* TODO */
             }
-            copy_from_iov(&fbuf, out_num, out_sg);
-            fbuf.size = out_len;
+            /* Copy just the first element and look at it */
+            copy_from_iov(&fbuf, 1, out_sg);
+
+            if (out_num > 2 &&
+                out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
+                ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
+                out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
+                /*
+                 * For a write we don't actually need to copy the
+                 * data, we can just do it straight out of guest memory
+                 * but we must still copy the headers in case the guest
+                 * was nasty and changed them while we were using them.
+                 */
+                fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__);
+
+                /* copy the fuse_write_in header after the fuse_in_header */
+                fbuf.mem += out_sg->iov_len;
+                copy_from_iov(&fbuf, 1, out_sg + 1);
+                fbuf.mem -= out_sg->iov_len;
+                fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
+
+                /* Allocate the bufv, with space for the rest of the iov */
+                allocated_bufv = true;
+                pbufv = malloc(sizeof(struct fuse_bufvec) +
+                               sizeof(struct fuse_buf) * (out_num - 2));
+                if (!pbufv) {
+                    vu_queue_unpop(dev, q, elem, 0);
+                    free(elem);
+                    fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
+                             __func__);
+                    goto out;
+                }
+
+                pbufv->count = 1;
+                pbufv->buf[0] = fbuf;
+
+                size_t iovindex, pbufvindex;
+                iovindex = 2; /* 2 headers, separate iovs */
+                pbufvindex = 1; /* 2 headers, 1 fusebuf */
+
+                for (; iovindex < out_num; iovindex++, pbufvindex++) {
+                    pbufv->count++;
+                    pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
+                    pbufv->buf[pbufvindex].flags = 0;
+                    pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
+                    pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+                }
+            } else {
+                /* Normal (non fast write) path */
+
+                /* Copy the rest of the buffer */
+                fbuf.mem += out_sg->iov_len;
+                copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
+                fbuf.mem -= out_sg->iov_len;
+                fbuf.size = out_len;
 
-            /* TODO! Endianness of header */
+                /* TODO! Endianness of header */
 
-            /* TODO: Add checks for fuse_session_exited */
-            struct fuse_bufvec bufv = { .buf[0] = fbuf, .count = 1 };
-            fuse_session_process_buf_int(se, &bufv, &ch);
+                /* TODO: Add checks for fuse_session_exited */
+                bufv.buf[0] = fbuf;
+                bufv.count = 1;
+                pbufv = &bufv;
+            }
+            pbufv->idx = 0;
+            pbufv->off = 0;
+            fuse_session_process_buf_int(se, pbufv, &ch);
+
+            if (allocated_bufv) {
+                free(pbufv);
+            }
 
             if (!qi->reply_sent) {
                 fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n",
@@ -514,6 +580,7 @@ static void *fv_queue_thread(void *opaque)
             elem = NULL;
         }
     }
+out:
     pthread_mutex_destroy(&ch.lock);
     free(fbuf.mem);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 041/104] virtiofsd: add fuse_mbuf_iter API
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (39 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 040/104] virtiofsd: Pass write iov's all the way through Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16 14:17   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 042/104] virtiofsd: validate input buffer sizes in do_write_buf() Dr. David Alan Gilbert (git)
                   ` (65 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Introduce an API for consuming bytes from a buffer with size checks.
All FUSE operations will be converted to use this safe API instead of
void *inarg.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/buffer.c      | 28 ++++++++++++++++++++
 tools/virtiofsd/fuse_common.h | 49 ++++++++++++++++++++++++++++++++++-
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index 1d7e6d2439..f59d8d72eb 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -338,3 +338,31 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
 
     return copied;
 }
+
+void *fuse_mbuf_iter_advance(struct fuse_mbuf_iter *iter, size_t len)
+{
+    void *ptr;
+
+    if (len > iter->size - iter->pos) {
+        return NULL;
+    }
+
+    ptr = iter->mem + iter->pos;
+    iter->pos += len;
+    return ptr;
+}
+
+const char *fuse_mbuf_iter_advance_str(struct fuse_mbuf_iter *iter)
+{
+    const char *str = iter->mem + iter->pos;
+    size_t remaining = iter->size - iter->pos;
+    size_t i;
+
+    for (i = 0; i < remaining; i++) {
+        if (str[i] == '\0') {
+            iter->pos += i + 1;
+            return str;
+        }
+    }
+    return NULL;
+}
diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 7bed38b436..147c043bd9 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -771,10 +771,57 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv);
 ssize_t fuse_buf_copy(struct fuse_bufvec *dst, struct fuse_bufvec *src,
                       enum fuse_buf_copy_flags flags);
 
+/**
+ * Memory buffer iterator
+ *
+ */
+struct fuse_mbuf_iter {
+    /**
+     * Data pointer
+     */
+    void *mem;
+
+    /**
+     * Total length, in bytes
+     */
+    size_t size;
+
+    /**
+     * Offset from start of buffer
+     */
+    size_t pos;
+};
+
+/* Initialize memory buffer iterator from a fuse_buf */
+#define FUSE_MBUF_ITER_INIT(fbuf) \
+    ((struct fuse_mbuf_iter){     \
+        .mem = fbuf->mem,         \
+        .size = fbuf->size,       \
+        .pos = 0,                 \
+    })
+
+/**
+ * Consume bytes from a memory buffer iterator
+ *
+ * @param iter memory buffer iterator
+ * @param len number of bytes to consume
+ * @return pointer to start of consumed bytes or
+ *         NULL if advancing beyond end of buffer
+ */
+void *fuse_mbuf_iter_advance(struct fuse_mbuf_iter *iter, size_t len);
+
+/**
+ * Consume a NUL-terminated string from a memory buffer iterator
+ *
+ * @param iter memory buffer iterator
+ * @return pointer to the string or
+ *         NULL if advancing beyond end of buffer or there is no NUL-terminator
+ */
+const char *fuse_mbuf_iter_advance_str(struct fuse_mbuf_iter *iter);
+
 /*
  * Signal handling
  */
-
 /**
  * Exit session on HUP, TERM and INT signals and ignore PIPE signal
  *
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 042/104] virtiofsd: validate input buffer sizes in do_write_buf()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (40 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 041/104] virtiofsd: add fuse_mbuf_iter API Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16 14:19   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 043/104] virtiofsd: check input buffer size in fuse_lowlevel.c ops Dr. David Alan Gilbert (git)
                   ` (64 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

There is a small change in behavior: if fuse_write_in->size doesn't
match the input buffer size then the request is failed.  Previously
write requests with 1 fuse_buf element would truncate to
fuse_write_in->size.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 49 ++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 7d33bbf539..4b1777b991 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1005,8 +1005,8 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
-                         struct fuse_bufvec *ibufv)
+static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
+                         struct fuse_mbuf_iter *iter, struct fuse_bufvec *ibufv)
 {
     struct fuse_session *se = req->se;
     struct fuse_bufvec *pbufv = ibufv;
@@ -1014,28 +1014,27 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
         .buf[0] = ibufv->buf[0],
         .count = 1,
     };
-    struct fuse_write_in *arg = (struct fuse_write_in *)inarg;
+    struct fuse_write_in *arg;
+    size_t arg_size = sizeof(*arg);
     struct fuse_file_info fi;
 
     memset(&fi, 0, sizeof(fi));
+
+    arg = fuse_mbuf_iter_advance(iter, arg_size);
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    fi.lock_owner = arg->lock_owner;
+    fi.flags = arg->flags;
     fi.fh = arg->fh;
     fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
 
     if (ibufv->count == 1) {
-        fi.lock_owner = arg->lock_owner;
-        fi.flags = arg->flags;
-        if (!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD)) {
-            tmpbufv.buf[0].mem = PARAM(arg);
-        }
-        tmpbufv.buf[0].size -=
-            sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in);
-        if (tmpbufv.buf[0].size < arg->size) {
-            fuse_log(FUSE_LOG_ERR,
-                     "fuse: do_write_buf: buffer size too small\n");
-            fuse_reply_err(req, EIO);
-            return;
-        }
-        tmpbufv.buf[0].size = arg->size;
+        assert(!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD));
+        tmpbufv.buf[0].mem = ((char *)arg) + arg_size;
+        tmpbufv.buf[0].size -= sizeof(struct fuse_in_header) + arg_size;
         pbufv = &tmpbufv;
     } else {
         /*
@@ -1045,6 +1044,13 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
         ibufv->buf[0].size = 0;
     }
 
+    if (fuse_buf_size(pbufv) != arg->size) {
+        fuse_log(FUSE_LOG_ERR,
+                 "fuse: do_write_buf: buffer size doesn't match arg->size\n");
+        fuse_reply_err(req, EIO);
+        return;
+    }
+
     se->op.write_buf(req, nodeid, pbufv, arg->offset, &fi);
 }
 
@@ -2055,12 +2061,17 @@ void fuse_session_process_buf_int(struct fuse_session *se,
                                   struct fuse_chan *ch)
 {
     const struct fuse_buf *buf = bufv->buf;
+    struct fuse_mbuf_iter iter = FUSE_MBUF_ITER_INIT(buf);
     struct fuse_in_header *in;
     const void *inarg;
     struct fuse_req *req;
     int err;
 
-    in = buf->mem;
+    /* The first buffer must be a memory buffer */
+    assert(!(buf->flags & FUSE_BUF_IS_FD));
+
+    in = fuse_mbuf_iter_advance(&iter, sizeof(*in));
+    assert(in); /* caller guarantees the input buffer is large enough */
 
     if (se->debug) {
         fuse_log(FUSE_LOG_DEBUG,
@@ -2132,7 +2143,7 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 
     inarg = (void *)&in[1];
     if (in->opcode == FUSE_WRITE && se->op.write_buf) {
-        do_write_buf(req, in->nodeid, inarg, bufv);
+        do_write_buf(req, in->nodeid, &iter, bufv);
     } else {
         fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
     }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 043/104] virtiofsd: check input buffer size in fuse_lowlevel.c ops
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (41 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 042/104] virtiofsd: validate input buffer sizes in do_write_buf() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16 14:25   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 044/104] virtiofsd: prevent ".." escape in lo_do_lookup() Dr. David Alan Gilbert (git)
                   ` (63 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Each FUSE operation involves parsing the input buffer.  Currently the
code assumes the input buffer is large enough for the expected
arguments.  This patch uses fuse_mbuf_iter to check the size.

Most operations are simple to convert.  Some are more complicated due to
variable-length inputs or different sizes depending on the protocol
version.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 581 +++++++++++++++++++++++++-------
 1 file changed, 456 insertions(+), 125 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 4b1777b991..bd5ca2f157 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -18,6 +18,7 @@
 #include <assert.h>
 #include <errno.h>
 #include <limits.h>
+#include <stdbool.h>
 #include <stddef.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -26,7 +27,6 @@
 #include <unistd.h>
 
 
-#define PARAM(inarg) (((char *)(inarg)) + sizeof(*(inarg)))
 #define OFFSET_MAX 0x7fffffffffffffffLL
 
 struct fuse_pollhandle {
@@ -708,9 +708,14 @@ int fuse_reply_lseek(fuse_req_t req, off_t off)
     return send_reply_ok(req, &arg, sizeof(arg));
 }
 
-static void do_lookup(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_lookup(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
 {
-    char *name = (char *)inarg;
+    const char *name = fuse_mbuf_iter_advance_str(iter);
+    if (!name) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.lookup) {
         req->se->op.lookup(req, nodeid, name);
@@ -719,9 +724,16 @@ static void do_lookup(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_forget(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_forget(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
 {
-    struct fuse_forget_in *arg = (struct fuse_forget_in *)inarg;
+    struct fuse_forget_in *arg;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.forget) {
         req->se->op.forget(req, nodeid, arg->nlookup);
@@ -731,20 +743,48 @@ static void do_forget(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 }
 
 static void do_batch_forget(fuse_req_t req, fuse_ino_t nodeid,
-                            const void *inarg)
+                            struct fuse_mbuf_iter *iter)
 {
-    struct fuse_batch_forget_in *arg = (void *)inarg;
-    struct fuse_forget_one *param = (void *)PARAM(arg);
-    unsigned int i;
+    struct fuse_batch_forget_in *arg;
+    struct fuse_forget_data *forgets;
+    size_t scount;
 
     (void)nodeid;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_none(req);
+        return;
+    }
+
+    /*
+     * Prevent integer overflow.  The compiler emits the following warning
+     * unless we use the scount local variable:
+     *
+     * error: comparison is always false due to limited range of data type
+     * [-Werror=type-limits]
+     *
+     * This may be true on 64-bit hosts but we need this check for 32-bit
+     * hosts.
+     */
+    scount = arg->count;
+    if (scount > SIZE_MAX / sizeof(forgets[0])) {
+        fuse_reply_none(req);
+        return;
+    }
+
+    forgets = fuse_mbuf_iter_advance(iter, arg->count * sizeof(forgets[0]));
+    if (!forgets) {
+        fuse_reply_none(req);
+        return;
+    }
+
     if (req->se->op.forget_multi) {
-        req->se->op.forget_multi(req, arg->count,
-                                 (struct fuse_forget_data *)param);
+        req->se->op.forget_multi(req, arg->count, forgets);
     } else if (req->se->op.forget) {
+        unsigned int i;
+
         for (i = 0; i < arg->count; i++) {
-            struct fuse_forget_one *forget = &param[i];
             struct fuse_req *dummy_req;
 
             dummy_req = fuse_ll_alloc_req(req->se);
@@ -756,7 +796,7 @@ static void do_batch_forget(fuse_req_t req, fuse_ino_t nodeid,
             dummy_req->ctx = req->ctx;
             dummy_req->ch = NULL;
 
-            req->se->op.forget(dummy_req, forget->nodeid, forget->nlookup);
+            req->se->op.forget(dummy_req, forgets[i].ino, forgets[i].nlookup);
         }
         fuse_reply_none(req);
     } else {
@@ -764,12 +804,19 @@ static void do_batch_forget(fuse_req_t req, fuse_ino_t nodeid,
     }
 }
 
-static void do_getattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_getattr(fuse_req_t req, fuse_ino_t nodeid,
+                       struct fuse_mbuf_iter *iter)
 {
     struct fuse_file_info *fip = NULL;
     struct fuse_file_info fi;
 
-    struct fuse_getattr_in *arg = (struct fuse_getattr_in *)inarg;
+    struct fuse_getattr_in *arg;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (arg->getattr_flags & FUSE_GETATTR_FH) {
         memset(&fi, 0, sizeof(fi));
@@ -784,14 +831,21 @@ static void do_getattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_setattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_setattr(fuse_req_t req, fuse_ino_t nodeid,
+                       struct fuse_mbuf_iter *iter)
 {
-    struct fuse_setattr_in *arg = (struct fuse_setattr_in *)inarg;
-
     if (req->se->op.setattr) {
+        struct fuse_setattr_in *arg;
         struct fuse_file_info *fi = NULL;
         struct fuse_file_info fi_store;
         struct stat stbuf;
+
+        arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+        if (!arg) {
+            fuse_reply_err(req, EINVAL);
+            return;
+        }
+
         memset(&stbuf, 0, sizeof(stbuf));
         convert_attr(arg, &stbuf);
         if (arg->valid & FATTR_FH) {
@@ -812,9 +866,16 @@ static void do_setattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_access(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_access(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
 {
-    struct fuse_access_in *arg = (struct fuse_access_in *)inarg;
+    struct fuse_access_in *arg;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.access) {
         req->se->op.access(req, nodeid, arg->mask);
@@ -823,9 +884,10 @@ static void do_access(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_readlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_readlink(fuse_req_t req, fuse_ino_t nodeid,
+                        struct fuse_mbuf_iter *iter)
 {
-    (void)inarg;
+    (void)iter;
 
     if (req->se->op.readlink) {
         req->se->op.readlink(req, nodeid);
@@ -834,10 +896,18 @@ static void do_readlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_mknod(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_mknod(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    struct fuse_mknod_in *arg = (struct fuse_mknod_in *)inarg;
-    char *name = PARAM(arg);
+    struct fuse_mknod_in *arg;
+    const char *name;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    name = fuse_mbuf_iter_advance_str(iter);
+    if (!arg || !name) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     req->ctx.umask = arg->umask;
 
@@ -848,22 +918,37 @@ static void do_mknod(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    struct fuse_mkdir_in *arg = (struct fuse_mkdir_in *)inarg;
+    struct fuse_mkdir_in *arg;
+    const char *name;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    name = fuse_mbuf_iter_advance_str(iter);
+    if (!arg || !name) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     req->ctx.umask = arg->umask;
 
     if (req->se->op.mkdir) {
-        req->se->op.mkdir(req, nodeid, PARAM(arg), arg->mode);
+        req->se->op.mkdir(req, nodeid, name, arg->mode);
     } else {
         fuse_reply_err(req, ENOSYS);
     }
 }
 
-static void do_unlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_unlink(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
 {
-    char *name = (char *)inarg;
+    const char *name = fuse_mbuf_iter_advance_str(iter);
+
+    if (!name) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.unlink) {
         req->se->op.unlink(req, nodeid, name);
@@ -872,9 +957,15 @@ static void do_unlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_rmdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_rmdir(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    char *name = (char *)inarg;
+    const char *name = fuse_mbuf_iter_advance_str(iter);
+
+    if (!name) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.rmdir) {
         req->se->op.rmdir(req, nodeid, name);
@@ -883,10 +974,16 @@ static void do_rmdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_symlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_symlink(fuse_req_t req, fuse_ino_t nodeid,
+                       struct fuse_mbuf_iter *iter)
 {
-    char *name = (char *)inarg;
-    char *linkname = ((char *)inarg) + strlen((char *)inarg) + 1;
+    const char *name = fuse_mbuf_iter_advance_str(iter);
+    const char *linkname = fuse_mbuf_iter_advance_str(iter);
+
+    if (!name || !linkname) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.symlink) {
         req->se->op.symlink(req, linkname, nodeid, name);
@@ -895,11 +992,20 @@ static void do_symlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_rename(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_rename(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
 {
-    struct fuse_rename_in *arg = (struct fuse_rename_in *)inarg;
-    char *oldname = PARAM(arg);
-    char *newname = oldname + strlen(oldname) + 1;
+    struct fuse_rename_in *arg;
+    const char *oldname;
+    const char *newname;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    oldname = fuse_mbuf_iter_advance_str(iter);
+    newname = fuse_mbuf_iter_advance_str(iter);
+    if (!arg || !oldname || !newname) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.rename) {
         req->se->op.rename(req, nodeid, oldname, arg->newdir, newname, 0);
@@ -908,11 +1014,20 @@ static void do_rename(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_rename2(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_rename2(fuse_req_t req, fuse_ino_t nodeid,
+                       struct fuse_mbuf_iter *iter)
 {
-    struct fuse_rename2_in *arg = (struct fuse_rename2_in *)inarg;
-    char *oldname = PARAM(arg);
-    char *newname = oldname + strlen(oldname) + 1;
+    struct fuse_rename2_in *arg;
+    const char *oldname;
+    const char *newname;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    oldname = fuse_mbuf_iter_advance_str(iter);
+    newname = fuse_mbuf_iter_advance_str(iter);
+    if (!arg || !oldname || !newname) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.rename) {
         req->se->op.rename(req, nodeid, oldname, arg->newdir, newname,
@@ -922,24 +1037,38 @@ static void do_rename2(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_link(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_link(fuse_req_t req, fuse_ino_t nodeid,
+                    struct fuse_mbuf_iter *iter)
 {
-    struct fuse_link_in *arg = (struct fuse_link_in *)inarg;
+    struct fuse_link_in *arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    const char *name = fuse_mbuf_iter_advance_str(iter);
+
+    if (!arg || !name) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.link) {
-        req->se->op.link(req, arg->oldnodeid, nodeid, PARAM(arg));
+        req->se->op.link(req, arg->oldnodeid, nodeid, name);
     } else {
         fuse_reply_err(req, ENOSYS);
     }
 }
 
-static void do_create(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_create(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
 {
-    struct fuse_create_in *arg = (struct fuse_create_in *)inarg;
-
     if (req->se->op.create) {
+        struct fuse_create_in *arg;
         struct fuse_file_info fi;
-        char *name = PARAM(arg);
+        const char *name;
+
+        arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+        name = fuse_mbuf_iter_advance_str(iter);
+        if (!arg || !name) {
+            fuse_reply_err(req, EINVAL);
+            return;
+        }
 
         memset(&fi, 0, sizeof(fi));
         fi.flags = arg->flags;
@@ -952,11 +1081,18 @@ static void do_create(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_open(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_open(fuse_req_t req, fuse_ino_t nodeid,
+                    struct fuse_mbuf_iter *iter)
 {
-    struct fuse_open_in *arg = (struct fuse_open_in *)inarg;
+    struct fuse_open_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.flags = arg->flags;
 
@@ -967,13 +1103,15 @@ static void do_open(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_read(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_read(fuse_req_t req, fuse_ino_t nodeid,
+                    struct fuse_mbuf_iter *iter)
 {
-    struct fuse_read_in *arg = (struct fuse_read_in *)inarg;
-
     if (req->se->op.read) {
+        struct fuse_read_in *arg;
         struct fuse_file_info fi;
 
+        arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+
         memset(&fi, 0, sizeof(fi));
         fi.fh = arg->fh;
         fi.lock_owner = arg->lock_owner;
@@ -984,11 +1122,24 @@ static void do_read(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_write(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    struct fuse_write_in *arg = (struct fuse_write_in *)inarg;
+    struct fuse_write_in *arg;
     struct fuse_file_info fi;
-    char *param;
+    const char *param;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    param = fuse_mbuf_iter_advance(iter, arg->size);
+    if (!param) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
@@ -996,7 +1147,6 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 
     fi.lock_owner = arg->lock_owner;
     fi.flags = arg->flags;
-    param = PARAM(arg);
 
     if (req->se->op.write) {
         req->se->op.write(req, nodeid, param, arg->size, arg->offset, &fi);
@@ -1054,11 +1204,18 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
     se->op.write_buf(req, nodeid, pbufv, arg->offset, &fi);
 }
 
-static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_flush(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    struct fuse_flush_in *arg = (struct fuse_flush_in *)inarg;
+    struct fuse_flush_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
     fi.flush = 1;
@@ -1071,19 +1228,26 @@ static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_release(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_release(fuse_req_t req, fuse_ino_t nodeid,
+                       struct fuse_mbuf_iter *iter)
 {
-    struct fuse_release_in *arg = (struct fuse_release_in *)inarg;
+    struct fuse_release_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.flags = arg->flags;
     fi.fh = arg->fh;
     fi.flush = (arg->release_flags & FUSE_RELEASE_FLUSH) ? 1 : 0;
     fi.lock_owner = arg->lock_owner;
+
     if (arg->release_flags & FUSE_RELEASE_FLOCK_UNLOCK) {
         fi.flock_release = 1;
-        fi.lock_owner = arg->lock_owner;
     }
 
     if (req->se->op.release) {
@@ -1093,11 +1257,19 @@ static void do_release(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_fsync(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_fsync(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    struct fuse_fsync_in *arg = (struct fuse_fsync_in *)inarg;
+    struct fuse_fsync_in *arg;
     struct fuse_file_info fi;
-    int datasync = arg->fsync_flags & 1;
+    int datasync;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+    datasync = arg->fsync_flags & 1;
 
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
@@ -1113,11 +1285,18 @@ static void do_fsync(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_opendir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_opendir(fuse_req_t req, fuse_ino_t nodeid,
+                       struct fuse_mbuf_iter *iter)
 {
-    struct fuse_open_in *arg = (struct fuse_open_in *)inarg;
+    struct fuse_open_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.flags = arg->flags;
 
@@ -1128,11 +1307,18 @@ static void do_opendir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_readdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_readdir(fuse_req_t req, fuse_ino_t nodeid,
+                       struct fuse_mbuf_iter *iter)
 {
-    struct fuse_read_in *arg = (struct fuse_read_in *)inarg;
+    struct fuse_read_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
 
@@ -1143,11 +1329,18 @@ static void do_readdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_readdirplus(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_readdirplus(fuse_req_t req, fuse_ino_t nodeid,
+                           struct fuse_mbuf_iter *iter)
 {
-    struct fuse_read_in *arg = (struct fuse_read_in *)inarg;
+    struct fuse_read_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
 
@@ -1158,11 +1351,18 @@ static void do_readdirplus(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_releasedir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_releasedir(fuse_req_t req, fuse_ino_t nodeid,
+                          struct fuse_mbuf_iter *iter)
 {
-    struct fuse_release_in *arg = (struct fuse_release_in *)inarg;
+    struct fuse_release_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.flags = arg->flags;
     fi.fh = arg->fh;
@@ -1174,11 +1374,19 @@ static void do_releasedir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_fsyncdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_fsyncdir(fuse_req_t req, fuse_ino_t nodeid,
+                        struct fuse_mbuf_iter *iter)
 {
-    struct fuse_fsync_in *arg = (struct fuse_fsync_in *)inarg;
+    struct fuse_fsync_in *arg;
     struct fuse_file_info fi;
-    int datasync = arg->fsync_flags & 1;
+    int datasync;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+    datasync = arg->fsync_flags & 1;
 
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
@@ -1190,10 +1398,11 @@ static void do_fsyncdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_statfs(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_statfs(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
 {
     (void)nodeid;
-    (void)inarg;
+    (void)iter;
 
     if (req->se->op.statfs) {
         req->se->op.statfs(req, nodeid);
@@ -1206,11 +1415,25 @@ static void do_statfs(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_setxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_setxattr(fuse_req_t req, fuse_ino_t nodeid,
+                        struct fuse_mbuf_iter *iter)
 {
-    struct fuse_setxattr_in *arg = (struct fuse_setxattr_in *)inarg;
-    char *name = PARAM(arg);
-    char *value = name + strlen(name) + 1;
+    struct fuse_setxattr_in *arg;
+    const char *name;
+    const char *value;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    name = fuse_mbuf_iter_advance_str(iter);
+    if (!arg || !name) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    value = fuse_mbuf_iter_advance(iter, arg->size);
+    if (!value) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.setxattr) {
         req->se->op.setxattr(req, nodeid, name, value, arg->size, arg->flags);
@@ -1219,20 +1442,36 @@ static void do_setxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_getxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_getxattr(fuse_req_t req, fuse_ino_t nodeid,
+                        struct fuse_mbuf_iter *iter)
 {
-    struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *)inarg;
+    struct fuse_getxattr_in *arg;
+    const char *name;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    name = fuse_mbuf_iter_advance_str(iter);
+    if (!arg || !name) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.getxattr) {
-        req->se->op.getxattr(req, nodeid, PARAM(arg), arg->size);
+        req->se->op.getxattr(req, nodeid, name, arg->size);
     } else {
         fuse_reply_err(req, ENOSYS);
     }
 }
 
-static void do_listxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_listxattr(fuse_req_t req, fuse_ino_t nodeid,
+                         struct fuse_mbuf_iter *iter)
 {
-    struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *)inarg;
+    struct fuse_getxattr_in *arg;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.listxattr) {
         req->se->op.listxattr(req, nodeid, arg->size);
@@ -1241,9 +1480,15 @@ static void do_listxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_removexattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_removexattr(fuse_req_t req, fuse_ino_t nodeid,
+                           struct fuse_mbuf_iter *iter)
 {
-    char *name = (char *)inarg;
+    const char *name = fuse_mbuf_iter_advance_str(iter);
+
+    if (!name) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.removexattr) {
         req->se->op.removexattr(req, nodeid, name);
@@ -1267,12 +1512,19 @@ static void convert_fuse_file_lock(struct fuse_file_lock *fl,
     flock->l_pid = fl->pid;
 }
 
-static void do_getlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_getlk(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    struct fuse_lk_in *arg = (struct fuse_lk_in *)inarg;
+    struct fuse_lk_in *arg;
     struct fuse_file_info fi;
     struct flock flock;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
     fi.lock_owner = arg->owner;
@@ -1286,12 +1538,18 @@ static void do_getlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 }
 
 static void do_setlk_common(fuse_req_t req, fuse_ino_t nodeid,
-                            const void *inarg, int sleep)
+                            struct fuse_mbuf_iter *iter, int sleep)
 {
-    struct fuse_lk_in *arg = (struct fuse_lk_in *)inarg;
+    struct fuse_lk_in *arg;
     struct fuse_file_info fi;
     struct flock flock;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
     fi.lock_owner = arg->owner;
@@ -1329,14 +1587,16 @@ static void do_setlk_common(fuse_req_t req, fuse_ino_t nodeid,
     }
 }
 
-static void do_setlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_setlk(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    do_setlk_common(req, nodeid, inarg, 0);
+    do_setlk_common(req, nodeid, iter, 0);
 }
 
-static void do_setlkw(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_setlkw(fuse_req_t req, fuse_ino_t nodeid,
+                      struct fuse_mbuf_iter *iter)
 {
-    do_setlk_common(req, nodeid, inarg, 1);
+    do_setlk_common(req, nodeid, iter, 1);
 }
 
 static int find_interrupted(struct fuse_session *se, struct fuse_req *req)
@@ -1381,12 +1641,20 @@ static int find_interrupted(struct fuse_session *se, struct fuse_req *req)
     return 0;
 }
 
-static void do_interrupt(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_interrupt(fuse_req_t req, fuse_ino_t nodeid,
+                         struct fuse_mbuf_iter *iter)
 {
-    struct fuse_interrupt_in *arg = (struct fuse_interrupt_in *)inarg;
+    struct fuse_interrupt_in *arg;
     struct fuse_session *se = req->se;
 
     (void)nodeid;
+
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     if (se->debug) {
         fuse_log(FUSE_LOG_DEBUG, "INTERRUPT: %llu\n",
                  (unsigned long long)arg->unique);
@@ -1427,9 +1695,15 @@ static struct fuse_req *check_interrupt(struct fuse_session *se,
     }
 }
 
-static void do_bmap(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_bmap(fuse_req_t req, fuse_ino_t nodeid,
+                    struct fuse_mbuf_iter *iter)
 {
-    struct fuse_bmap_in *arg = (struct fuse_bmap_in *)inarg;
+    struct fuse_bmap_in *arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
 
     if (req->se->op.bmap) {
         req->se->op.bmap(req, nodeid, arg->blocksize, arg->block);
@@ -1438,18 +1712,34 @@ static void do_bmap(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_ioctl(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_ioctl(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    struct fuse_ioctl_in *arg = (struct fuse_ioctl_in *)inarg;
-    unsigned int flags = arg->flags;
-    void *in_buf = arg->in_size ? PARAM(arg) : NULL;
+    struct fuse_ioctl_in *arg;
+    unsigned int flags;
+    void *in_buf = NULL;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    flags = arg->flags;
     if (flags & FUSE_IOCTL_DIR && !(req->se->conn.want & FUSE_CAP_IOCTL_DIR)) {
         fuse_reply_err(req, ENOTTY);
         return;
     }
 
+    if (arg->in_size) {
+        in_buf = fuse_mbuf_iter_advance(iter, arg->in_size);
+        if (!in_buf) {
+            fuse_reply_err(req, EINVAL);
+            return;
+        }
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
 
@@ -1470,11 +1760,18 @@ void fuse_pollhandle_destroy(struct fuse_pollhandle *ph)
     free(ph);
 }
 
-static void do_poll(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_poll(fuse_req_t req, fuse_ino_t nodeid,
+                    struct fuse_mbuf_iter *iter)
 {
-    struct fuse_poll_in *arg = (struct fuse_poll_in *)inarg;
+    struct fuse_poll_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
     fi.poll_events = arg->events;
@@ -1498,11 +1795,18 @@ static void do_poll(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_fallocate(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_fallocate(fuse_req_t req, fuse_ino_t nodeid,
+                         struct fuse_mbuf_iter *iter)
 {
-    struct fuse_fallocate_in *arg = (struct fuse_fallocate_in *)inarg;
+    struct fuse_fallocate_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
 
@@ -1515,12 +1819,17 @@ static void do_fallocate(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 }
 
 static void do_copy_file_range(fuse_req_t req, fuse_ino_t nodeid_in,
-                               const void *inarg)
+                               struct fuse_mbuf_iter *iter)
 {
-    struct fuse_copy_file_range_in *arg =
-        (struct fuse_copy_file_range_in *)inarg;
+    struct fuse_copy_file_range_in *arg;
     struct fuse_file_info fi_in, fi_out;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
     memset(&fi_in, 0, sizeof(fi_in));
     fi_in.fh = arg->fh_in;
 
@@ -1537,11 +1846,17 @@ static void do_copy_file_range(fuse_req_t req, fuse_ino_t nodeid_in,
     }
 }
 
-static void do_lseek(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_lseek(fuse_req_t req, fuse_ino_t nodeid,
+                     struct fuse_mbuf_iter *iter)
 {
-    struct fuse_lseek_in *arg = (struct fuse_lseek_in *)inarg;
+    struct fuse_lseek_in *arg;
     struct fuse_file_info fi;
 
+    arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
 
@@ -1552,15 +1867,33 @@ static void do_lseek(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     }
 }
 
-static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_init(fuse_req_t req, fuse_ino_t nodeid,
+                    struct fuse_mbuf_iter *iter)
 {
-    struct fuse_init_in *arg = (struct fuse_init_in *)inarg;
+    size_t compat_size = offsetof(struct fuse_init_in, max_readahead);
+    struct fuse_init_in *arg;
     struct fuse_init_out outarg;
     struct fuse_session *se = req->se;
     size_t bufsize = se->bufsize;
     size_t outargsize = sizeof(outarg);
 
     (void)nodeid;
+
+    /* First consume the old fields... */
+    arg = fuse_mbuf_iter_advance(iter, compat_size);
+    if (!arg) {
+        fuse_reply_err(req, EINVAL);
+        return;
+    }
+
+    /* ...and now consume the new fields. */
+    if (arg->major == 7 && arg->minor >= 6) {
+        if (!fuse_mbuf_iter_advance(iter, sizeof(*arg) - compat_size)) {
+            fuse_reply_err(req, EINVAL);
+            return;
+        }
+    }
+
     if (se->debug) {
         fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
         if (arg->major == 7 && arg->minor >= 6) {
@@ -1793,12 +2126,13 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
     send_reply_ok(req, &outarg, outargsize);
 }
 
-static void do_destroy(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
+                       struct fuse_mbuf_iter *iter)
 {
     struct fuse_session *se = req->se;
 
     (void)nodeid;
-    (void)inarg;
+    (void)iter;
 
     se->got_destroy = 1;
     if (se->op.destroy) {
@@ -1979,7 +2313,7 @@ int fuse_req_interrupted(fuse_req_t req)
 }
 
 static struct {
-    void (*func)(fuse_req_t, fuse_ino_t, const void *);
+    void (*func)(fuse_req_t, fuse_ino_t, struct fuse_mbuf_iter *);
     const char *name;
 } fuse_ll_ops[] = {
     [FUSE_LOOKUP] = { do_lookup, "LOOKUP" },
@@ -2063,7 +2397,6 @@ void fuse_session_process_buf_int(struct fuse_session *se,
     const struct fuse_buf *buf = bufv->buf;
     struct fuse_mbuf_iter iter = FUSE_MBUF_ITER_INIT(buf);
     struct fuse_in_header *in;
-    const void *inarg;
     struct fuse_req *req;
     int err;
 
@@ -2141,13 +2474,11 @@ void fuse_session_process_buf_int(struct fuse_session *se,
         }
     }
 
-    inarg = (void *)&in[1];
     if (in->opcode == FUSE_WRITE && se->op.write_buf) {
         do_write_buf(req, in->nodeid, &iter, bufv);
     } else {
-        fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
+        fuse_ll_ops[in->opcode].func(req, in->nodeid, &iter);
     }
-
     return;
 
 reply_err:
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 044/104] virtiofsd: prevent ".." escape in lo_do_lookup()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (42 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 043/104] virtiofsd: check input buffer size in fuse_lowlevel.c ops Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16 14:33   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 045/104] virtiofsd: prevent ".." escape in lo_do_readdir() Dr. David Alan Gilbert (git)
                   ` (62 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 84e9d8916f..cf5b43531a 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -624,12 +624,17 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     int res;
     int saverr;
     struct lo_data *lo = lo_data(req);
-    struct lo_inode *inode;
+    struct lo_inode *inode, *dir = lo_inode(req, parent);
 
     memset(e, 0, sizeof(*e));
     e->attr_timeout = lo->timeout;
     e->entry_timeout = lo->timeout;
 
+    /* Do not allow escaping root directory */
+    if (dir == &lo->root && strcmp(name, "..") == 0) {
+        name = ".";
+    }
+
     newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
     if (newfd == -1) {
         goto out_err;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 045/104] virtiofsd: prevent ".." escape in lo_do_readdir()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (43 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 044/104] virtiofsd: prevent ".." escape in lo_do_lookup() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16 14:35   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 046/104] virtiofsd: use /proc/self/fd/ O_PATH file descriptor Dr. David Alan Gilbert (git)
                   ` (61 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Construct a fake dirent for the root directory's ".." entry.  This hides
the parent directory from the FUSE client.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 36 +++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index cf5b43531a..123f095990 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1132,19 +1132,25 @@ out_err:
 static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
                           off_t offset, struct fuse_file_info *fi, int plus)
 {
+    struct lo_data *lo = lo_data(req);
     struct lo_dirp *d;
+    struct lo_inode *dinode;
     char *buf = NULL;
     char *p;
     size_t rem = size;
-    int err = ENOMEM;
+    int err = EBADF;
 
-    (void)ino;
+    dinode = lo_inode(req, ino);
+    if (!dinode) {
+        goto error;
+    }
 
     d = lo_dirp(req, fi);
     if (!d) {
         goto error;
     }
 
+    err = ENOMEM;
     buf = calloc(1, size);
     if (!buf) {
         goto error;
@@ -1175,15 +1181,21 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
         }
         nextoff = d->entry->d_off;
         name = d->entry->d_name;
+
         fuse_ino_t entry_ino = 0;
+        struct fuse_entry_param e = (struct fuse_entry_param){
+            .attr.st_ino = d->entry->d_ino,
+            .attr.st_mode = d->entry->d_type << 12,
+        };
+
+        /* Hide root's parent directory */
+        if (dinode == &lo->root && strcmp(name, "..") == 0) {
+            e.attr.st_ino = lo->root.ino;
+            e.attr.st_mode = DT_DIR << 12;
+        }
+
         if (plus) {
-            struct fuse_entry_param e;
-            if (is_dot_or_dotdot(name)) {
-                e = (struct fuse_entry_param){
-                    .attr.st_ino = d->entry->d_ino,
-                    .attr.st_mode = d->entry->d_type << 12,
-                };
-            } else {
+            if (!is_dot_or_dotdot(name)) {
                 err = lo_do_lookup(req, ino, name, &e);
                 if (err) {
                     goto error;
@@ -1193,11 +1205,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
 
             entsize = fuse_add_direntry_plus(req, p, rem, name, &e, nextoff);
         } else {
-            struct stat st = {
-                .st_ino = d->entry->d_ino,
-                .st_mode = d->entry->d_type << 12,
-            };
-            entsize = fuse_add_direntry(req, p, rem, name, &st, nextoff);
+            entsize = fuse_add_direntry(req, p, rem, name, &e.attr, nextoff);
         }
         if (entsize > rem) {
             if (entry_ino != 0) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 046/104] virtiofsd: use /proc/self/fd/ O_PATH file descriptor
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (44 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 045/104] virtiofsd: prevent ".." escape in lo_do_readdir() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-15 18:09   ` Philippe Mathieu-Daudé
  2019-12-12 16:38 ` [PATCH 047/104] virtiofsd: sandbox mount namespace Dr. David Alan Gilbert (git)
                   ` (60 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Sandboxing will remove /proc from the mount namespace so we can no
longer build string paths into "/proc/self/fd/...".

Keep an O_PATH file descriptor so we can still re-open fds via
/proc/self/fd.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 129 ++++++++++++++++++++++++-------
 1 file changed, 102 insertions(+), 27 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 123f095990..006908f25a 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -110,6 +110,9 @@ struct lo_data {
     struct lo_map ino_map; /* protected by lo->mutex */
     struct lo_map dirp_map; /* protected by lo->mutex */
     struct lo_map fd_map; /* protected by lo->mutex */
+
+    /* An O_PATH file descriptor to /proc/self/fd/ */
+    int proc_self_fd;
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -379,9 +382,9 @@ static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
     int res;
 
 retry:
-    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(procname, "%i", inode->fd);
 
-    res = readlink(procname, path, PATH_MAX);
+    res = readlinkat(lo->proc_self_fd, procname, path, PATH_MAX);
     if (res < 0) {
         fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink failed: %m\n");
         goto fail_noretry;
@@ -477,9 +480,9 @@ static int utimensat_empty(struct lo_data *lo, struct lo_inode *inode,
         }
         return res;
     }
-    sprintf(path, "/proc/self/fd/%i", inode->fd);
+    sprintf(path, "%i", inode->fd);
 
-    return utimensat(AT_FDCWD, path, tv, 0);
+    return utimensat(lo->proc_self_fd, path, tv, 0);
 
 fallback:
     res = lo_parent_and_name(lo, inode, path, &parent);
@@ -535,8 +538,8 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = fchmod(fd, attr->st_mode);
         } else {
-            sprintf(procname, "/proc/self/fd/%i", ifd);
-            res = chmod(procname, attr->st_mode);
+            sprintf(procname, "%i", ifd);
+            res = fchmodat(lo->proc_self_fd, procname, attr->st_mode, 0);
         }
         if (res == -1) {
             goto out_err;
@@ -552,11 +555,23 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         }
     }
     if (valid & FUSE_SET_ATTR_SIZE) {
+        int truncfd;
+
         if (fi) {
-            res = ftruncate(fd, attr->st_size);
+            truncfd = fd;
         } else {
-            sprintf(procname, "/proc/self/fd/%i", ifd);
-            res = truncate(procname, attr->st_size);
+            sprintf(procname, "%i", ifd);
+            truncfd = openat(lo->proc_self_fd, procname, O_RDWR);
+            if (truncfd < 0) {
+                goto out_err;
+            }
+        }
+
+        res = ftruncate(truncfd, attr->st_size);
+        if (!fi) {
+            saverr = errno;
+            close(truncfd);
+            errno = saverr;
         }
         if (res == -1) {
             goto out_err;
@@ -857,9 +872,9 @@ static int linkat_empty_nofollow(struct lo_data *lo, struct lo_inode *inode,
         return res;
     }
 
-    sprintf(path, "/proc/self/fd/%i", inode->fd);
+    sprintf(path, "%i", inode->fd);
 
-    return linkat(AT_FDCWD, path, dfd, name, AT_SYMLINK_FOLLOW);
+    return linkat(lo->proc_self_fd, path, dfd, name, AT_SYMLINK_FOLLOW);
 
 fallback:
     res = lo_parent_and_name(lo, inode, path, &parent);
@@ -1387,8 +1402,8 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
         fi->flags &= ~O_APPEND;
     }
 
-    sprintf(buf, "/proc/self/fd/%i", lo_fd(req, ino));
-    fd = open(buf, fi->flags & ~O_NOFOLLOW);
+    sprintf(buf, "%i", lo_fd(req, ino));
+    fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
     if (fd == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -1440,8 +1455,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
                      struct fuse_file_info *fi)
 {
+    struct lo_data *lo = lo_data(req);
     int res;
-    (void)ino;
     int fd;
     char *buf;
 
@@ -1449,12 +1464,12 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
              (void *)fi);
 
     if (!fi) {
-        res = asprintf(&buf, "/proc/self/fd/%i", lo_fd(req, ino));
+        res = asprintf(&buf, "%i", lo_fd(req, ino));
         if (res == -1) {
             return (void)fuse_reply_err(req, errno);
         }
 
-        fd = open(buf, O_RDWR);
+        fd = openat(lo->proc_self_fd, buf, O_RDWR);
         free(buf);
         if (fd == -1) {
             return (void)fuse_reply_err(req, errno);
@@ -1570,11 +1585,13 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
 static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
                         size_t size)
 {
+    struct lo_data *lo = lo_data(req);
     char *value = NULL;
     char procname[64];
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
+    int fd = -1;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -1599,7 +1616,11 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
         goto out;
     }
 
-    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(procname, "%i", inode->fd);
+    fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+    if (fd < 0) {
+        goto out_err;
+    }
 
     if (size) {
         value = malloc(size);
@@ -1607,7 +1628,7 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
             goto out_err;
         }
 
-        ret = getxattr(procname, name, value, size);
+        ret = fgetxattr(fd, name, value, size);
         if (ret == -1) {
             goto out_err;
         }
@@ -1618,7 +1639,7 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 
         fuse_reply_buf(req, value, ret);
     } else {
-        ret = getxattr(procname, name, NULL, 0);
+        ret = fgetxattr(fd, name, NULL, 0);
         if (ret == -1) {
             goto out_err;
         }
@@ -1627,6 +1648,10 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
     }
 out_free:
     free(value);
+
+    if (fd >= 0) {
+        close(fd);
+    }
     return;
 
 out_err:
@@ -1638,11 +1663,13 @@ out:
 
 static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 {
+    struct lo_data *lo = lo_data(req);
     char *value = NULL;
     char procname[64];
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
+    int fd = -1;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -1666,7 +1693,11 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         goto out;
     }
 
-    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(procname, "%i", inode->fd);
+    fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+    if (fd < 0) {
+        goto out_err;
+    }
 
     if (size) {
         value = malloc(size);
@@ -1674,7 +1705,7 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
             goto out_err;
         }
 
-        ret = listxattr(procname, value, size);
+        ret = flistxattr(fd, value, size);
         if (ret == -1) {
             goto out_err;
         }
@@ -1685,7 +1716,7 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 
         fuse_reply_buf(req, value, ret);
     } else {
-        ret = listxattr(procname, NULL, 0);
+        ret = flistxattr(fd, NULL, 0);
         if (ret == -1) {
             goto out_err;
         }
@@ -1694,6 +1725,10 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     }
 out_free:
     free(value);
+
+    if (fd >= 0) {
+        close(fd);
+    }
     return;
 
 out_err:
@@ -1707,9 +1742,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
                         const char *value, size_t size, int flags)
 {
     char procname[64];
+    struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
+    int fd = -1;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -1734,21 +1771,31 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
         goto out;
     }
 
-    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(procname, "%i", inode->fd);
+    fd = openat(lo->proc_self_fd, procname, O_RDWR);
+    if (fd < 0) {
+        saverr = errno;
+        goto out;
+    }
 
-    ret = setxattr(procname, name, value, size, flags);
+    ret = fsetxattr(fd, name, value, size, flags);
     saverr = ret == -1 ? errno : 0;
 
 out:
+    if (fd >= 0) {
+        close(fd);
+    }
     fuse_reply_err(req, saverr);
 }
 
 static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
 {
     char procname[64];
+    struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
+    int fd = -1;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -1772,12 +1819,20 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
         goto out;
     }
 
-    sprintf(procname, "/proc/self/fd/%i", inode->fd);
+    sprintf(procname, "%i", inode->fd);
+    fd = openat(lo->proc_self_fd, procname, O_RDWR);
+    if (fd < 0) {
+        saverr = errno;
+        goto out;
+    }
 
-    ret = removexattr(procname, name);
+    ret = fremovexattr(fd, name);
     saverr = ret == -1 ? errno : 0;
 
 out:
+    if (fd >= 0) {
+        close(fd);
+    }
     fuse_reply_err(req, saverr);
 }
 
@@ -1870,12 +1925,25 @@ static void print_capabilities(void)
     printf("}\n");
 }
 
+static void setup_proc_self_fd(struct lo_data *lo)
+{
+    lo->proc_self_fd = open("/proc/self/fd", O_PATH);
+    if (lo->proc_self_fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/fd, O_PATH): %m\n");
+        exit(1);
+    }
+}
+
 int main(int argc, char *argv[])
 {
     struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
     struct fuse_session *se;
     struct fuse_cmdline_opts opts;
-    struct lo_data lo = { .debug = 0, .writeback = 0 };
+    struct lo_data lo = {
+        .debug = 0,
+        .writeback = 0,
+        .proc_self_fd = -1,
+    };
     struct lo_map_elem *root_elem;
     int ret = -1;
 
@@ -1986,6 +2054,9 @@ int main(int argc, char *argv[])
 
     fuse_daemonize(opts.foreground);
 
+    /* Must be after daemonize to get the right /proc/self/fd */
+    setup_proc_self_fd(&lo);
+
     /* Block until ctrl+c or fusermount -u */
     ret = virtio_loop(se);
 
@@ -2001,6 +2072,10 @@ err_out1:
     lo_map_destroy(&lo.dirp_map);
     lo_map_destroy(&lo.ino_map);
 
+    if (lo.proc_self_fd >= 0) {
+        close(lo.proc_self_fd);
+    }
+
     if (lo.root.fd >= 0) {
         close(lo.root.fd);
     }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 047/104] virtiofsd: sandbox mount namespace
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (45 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 046/104] virtiofsd: use /proc/self/fd/ O_PATH file descriptor Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:36   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 048/104] virtiofsd: move to an empty network namespace Dr. David Alan Gilbert (git)
                   ` (59 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Use a mount namespace with the shared directory tree mounted at "/" and
no other mounts.

This prevents symlink escape attacks because symlink targets are
resolved only against the shared directory and cannot go outside it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 89 ++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 006908f25a..606824f002 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -49,6 +49,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <sys/file.h>
+#include <sys/mount.h>
 #include <sys/syscall.h>
 #include <sys/xattr.h>
 #include <unistd.h>
@@ -1925,6 +1926,58 @@ static void print_capabilities(void)
     printf("}\n");
 }
 
+/* This magic is based on lxc's lxc_pivot_root() */
+static void setup_pivot_root(const char *source)
+{
+    int oldroot;
+    int newroot;
+
+    oldroot = open("/", O_DIRECTORY | O_RDONLY | O_CLOEXEC);
+    if (oldroot < 0) {
+        fuse_log(FUSE_LOG_ERR, "open(/): %m\n");
+        exit(1);
+    }
+
+    newroot = open(source, O_DIRECTORY | O_RDONLY | O_CLOEXEC);
+    if (newroot < 0) {
+        fuse_log(FUSE_LOG_ERR, "open(%s): %m\n", source);
+        exit(1);
+    }
+
+    if (fchdir(newroot) < 0) {
+        fuse_log(FUSE_LOG_ERR, "fchdir(newroot): %m\n");
+        exit(1);
+    }
+
+    if (syscall(__NR_pivot_root, ".", ".") < 0) {
+        fuse_log(FUSE_LOG_ERR, "pivot_root(., .): %m\n");
+        exit(1);
+    }
+
+    if (fchdir(oldroot) < 0) {
+        fuse_log(FUSE_LOG_ERR, "fchdir(oldroot): %m\n");
+        exit(1);
+    }
+
+    if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
+        fuse_log(FUSE_LOG_ERR, "mount(., MS_SLAVE | MS_REC): %m\n");
+        exit(1);
+    }
+
+    if (umount2(".", MNT_DETACH) < 0) {
+        fuse_log(FUSE_LOG_ERR, "umount2(., MNT_DETACH): %m\n");
+        exit(1);
+    }
+
+    if (fchdir(newroot) < 0) {
+        fuse_log(FUSE_LOG_ERR, "fchdir(newroot): %m\n");
+        exit(1);
+    }
+
+    close(newroot);
+    close(oldroot);
+}
+
 static void setup_proc_self_fd(struct lo_data *lo)
 {
     lo->proc_self_fd = open("/proc/self/fd", O_PATH);
@@ -1934,6 +1987,39 @@ static void setup_proc_self_fd(struct lo_data *lo)
     }
 }
 
+/*
+ * Make the source directory our root so symlinks cannot escape and no other
+ * files are accessible.
+ */
+static void setup_mount_namespace(const char *source)
+{
+    if (unshare(CLONE_NEWNS) != 0) {
+        fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWNS): %m\n");
+        exit(1);
+    }
+
+    if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
+        fuse_log(FUSE_LOG_ERR, "mount(/, MS_REC|MS_PRIVATE): %m\n");
+        exit(1);
+    }
+
+    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
+        fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
+        exit(1);
+    }
+
+    setup_pivot_root(source);
+}
+
+/*
+ * Lock down this process to prevent access to other processes or files outside
+ * source directory.  This reduces the impact of arbitrary code execution bugs.
+ */
+static void setup_sandbox(struct lo_data *lo)
+{
+    setup_mount_namespace(lo->source);
+}
+
 int main(int argc, char *argv[])
 {
     struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -2034,6 +2120,7 @@ int main(int argc, char *argv[])
     }
 
     lo.root.fd = open(lo.source, O_PATH);
+
     if (lo.root.fd == -1) {
         fuse_log(FUSE_LOG_ERR, "open(\"%s\", O_PATH): %m\n", lo.source);
         exit(1);
@@ -2057,6 +2144,8 @@ int main(int argc, char *argv[])
     /* Must be after daemonize to get the right /proc/self/fd */
     setup_proc_self_fd(&lo);
 
+    setup_sandbox(&lo);
+
     /* Block until ctrl+c or fusermount -u */
     ret = virtio_loop(se);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 048/104] virtiofsd: move to an empty network namespace
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (46 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 047/104] virtiofsd: sandbox mount namespace Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:37   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 049/104] virtiofsd: move to a new pid namespace Dr. David Alan Gilbert (git)
                   ` (58 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

If the process is compromised there should be no network access.  Use an
empty network namespace to sandbox networking.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 606824f002..135a99f2fd 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1926,6 +1926,19 @@ static void print_capabilities(void)
     printf("}\n");
 }
 
+/*
+ * Called after our UNIX domain sockets have been created, now we can move to
+ * an empty network namespace to prevent TCP/IP and other network activity in
+ * case this process is compromised.
+ */
+static void setup_net_namespace(void)
+{
+    if (unshare(CLONE_NEWNET) != 0) {
+        fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWNET): %m\n");
+        exit(1);
+    }
+}
+
 /* This magic is based on lxc's lxc_pivot_root() */
 static void setup_pivot_root(const char *source)
 {
@@ -2017,6 +2030,7 @@ static void setup_mount_namespace(const char *source)
  */
 static void setup_sandbox(struct lo_data *lo)
 {
+    setup_net_namespace();
     setup_mount_namespace(lo->source);
 }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 049/104] virtiofsd: move to a new pid namespace
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (47 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 048/104] virtiofsd: move to an empty network namespace Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:40   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 050/104] virtiofsd: add seccomp whitelist Dr. David Alan Gilbert (git)
                   ` (57 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

virtiofsd needs access to /proc/self/fd.  Let's move to a new pid
namespace so that a compromised process cannot see another other
processes running on the system.

One wrinkle in this approach: unshare(CLONE_NEWPID) affects *child*
processes and not the current process.  Therefore we need to fork the
pid 1 process that will actually run virtiofsd and leave a parent in
waitpid(2).  This is not the same thing as daemonization and parent
processes should not notice a difference.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 134 ++++++++++++++++++++-----------
 1 file changed, 86 insertions(+), 48 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 135a99f2fd..754ef2618b 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -50,7 +50,10 @@
 #include <string.h>
 #include <sys/file.h>
 #include <sys/mount.h>
+#include <sys/prctl.h>
 #include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
 #include <sys/xattr.h>
 #include <unistd.h>
 
@@ -1927,24 +1930,95 @@ static void print_capabilities(void)
 }
 
 /*
- * Called after our UNIX domain sockets have been created, now we can move to
- * an empty network namespace to prevent TCP/IP and other network activity in
- * case this process is compromised.
+ * Move to a new mount, net, and pid namespaces to isolate this process.
  */
-static void setup_net_namespace(void)
+static void setup_namespaces(struct lo_data *lo, struct fuse_session *se)
 {
-    if (unshare(CLONE_NEWNET) != 0) {
-        fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWNET): %m\n");
+    pid_t child;
+
+    /*
+     * Create a new pid namespace for *child* processes.  We'll have to
+     * fork in order to enter the new pid namespace.  A new mount namespace
+     * is also needed so that we can remount /proc for the new pid
+     * namespace.
+     *
+     * Our UNIX domain sockets have been created.  Now we can move to
+     * an empty network namespace to prevent TCP/IP and other network
+     * activity in case this process is compromised.
+     */
+    if (unshare(CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET) != 0) {
+        fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWPID | CLONE_NEWNS): %m\n");
+        exit(1);
+    }
+
+    child = fork();
+    if (child < 0) {
+        fuse_log(FUSE_LOG_ERR, "fork() failed: %m\n");
+        exit(1);
+    }
+    if (child > 0) {
+        pid_t waited;
+        int wstatus;
+
+        /* The parent waits for the child */
+        do {
+            waited = waitpid(child, &wstatus, 0);
+        } while (waited < 0 && errno == EINTR && !se->exited);
+
+        /* We were terminated by a signal, see fuse_signals.c */
+        if (se->exited) {
+            exit(0);
+        }
+
+        if (WIFEXITED(wstatus)) {
+            exit(WEXITSTATUS(wstatus));
+        }
+
+        exit(1);
+    }
+
+    /* Send us SIGTERM when the parent thread terminates, see prctl(2) */
+    prctl(PR_SET_PDEATHSIG, SIGTERM);
+
+    /*
+     * If the mounts have shared propagation then we want to opt out so our
+     * mount changes don't affect the parent mount namespace.
+     */
+    if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
+        fuse_log(FUSE_LOG_ERR, "mount(/, MS_REC|MS_SLAVE): %m\n");
+        exit(1);
+    }
+
+    /* The child must remount /proc to use the new pid namespace */
+    if (mount("proc", "/proc", "proc",
+              MS_NODEV | MS_NOEXEC | MS_NOSUID | MS_RELATIME, NULL) < 0) {
+        fuse_log(FUSE_LOG_ERR, "mount(/proc): %m\n");
+        exit(1);
+    }
+
+    /* Now we can get our /proc/self/fd directory file descriptor */
+    lo->proc_self_fd = open("/proc/self/fd", O_PATH);
+    if (lo->proc_self_fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/fd, O_PATH): %m\n");
         exit(1);
     }
 }
 
-/* This magic is based on lxc's lxc_pivot_root() */
-static void setup_pivot_root(const char *source)
+/*
+ * Make the source directory our root so symlinks cannot escape and no other
+ * files are accessible.  Assumes unshare(CLONE_NEWNS) was already called.
+ */
+static void setup_mounts(const char *source)
 {
     int oldroot;
     int newroot;
 
+    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
+        fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
+        exit(1);
+    }
+
+    /* This magic is based on lxc's lxc_pivot_root() */
     oldroot = open("/", O_DIRECTORY | O_RDONLY | O_CLOEXEC);
     if (oldroot < 0) {
         fuse_log(FUSE_LOG_ERR, "open(/): %m\n");
@@ -1991,47 +2065,14 @@ static void setup_pivot_root(const char *source)
     close(oldroot);
 }
 
-static void setup_proc_self_fd(struct lo_data *lo)
-{
-    lo->proc_self_fd = open("/proc/self/fd", O_PATH);
-    if (lo->proc_self_fd == -1) {
-        fuse_log(FUSE_LOG_ERR, "open(/proc/self/fd, O_PATH): %m\n");
-        exit(1);
-    }
-}
-
-/*
- * Make the source directory our root so symlinks cannot escape and no other
- * files are accessible.
- */
-static void setup_mount_namespace(const char *source)
-{
-    if (unshare(CLONE_NEWNS) != 0) {
-        fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWNS): %m\n");
-        exit(1);
-    }
-
-    if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
-        fuse_log(FUSE_LOG_ERR, "mount(/, MS_REC|MS_PRIVATE): %m\n");
-        exit(1);
-    }
-
-    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
-        fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
-        exit(1);
-    }
-
-    setup_pivot_root(source);
-}
-
 /*
  * Lock down this process to prevent access to other processes or files outside
  * source directory.  This reduces the impact of arbitrary code execution bugs.
  */
-static void setup_sandbox(struct lo_data *lo)
+static void setup_sandbox(struct lo_data *lo, struct fuse_session *se)
 {
-    setup_net_namespace();
-    setup_mount_namespace(lo->source);
+    setup_namespaces(lo, se);
+    setup_mounts(lo->source);
 }
 
 int main(int argc, char *argv[])
@@ -2155,10 +2196,7 @@ int main(int argc, char *argv[])
 
     fuse_daemonize(opts.foreground);
 
-    /* Must be after daemonize to get the right /proc/self/fd */
-    setup_proc_self_fd(&lo);
-
-    setup_sandbox(&lo);
+    setup_sandbox(&lo, se);
 
     /* Block until ctrl+c or fusermount -u */
     ret = virtio_loop(se);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 050/104] virtiofsd: add seccomp whitelist
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (48 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 049/104] virtiofsd: move to a new pid namespace Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:56   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 051/104] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV Dr. David Alan Gilbert (git)
                   ` (56 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Only allow system calls that are needed by virtiofsd.  All other system
calls cause SIGSYS to be directed at the thread and the process will
coredump.

Restricting system calls reduces the kernel attack surface and limits
what the process can do when compromised.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
with additional entries by:
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: piaojun <piaojun@huawei.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
---
 Makefile                         |   2 +
 tools/virtiofsd/Makefile.objs    |   5 +-
 tools/virtiofsd/passthrough_ll.c |   2 +
 tools/virtiofsd/seccomp.c        | 141 +++++++++++++++++++++++++++++++
 tools/virtiofsd/seccomp.h        |  14 +++
 5 files changed, 163 insertions(+), 1 deletion(-)
 create mode 100644 tools/virtiofsd/seccomp.c
 create mode 100644 tools/virtiofsd/seccomp.h

diff --git a/Makefile b/Makefile
index 8a5746d8a0..3f5d04e1f7 100644
--- a/Makefile
+++ b/Makefile
@@ -322,8 +322,10 @@ HELPERS-y =
 HELPERS-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_LINUX)) = qemu-bridge-helper$(EXESUF)
 
 ifdef CONFIG_LINUX
+ifdef CONFIG_SECCOMP
 HELPERS-y += virtiofsd$(EXESUF)
 vhost-user-json-y += tools/virtiofsd/50-qemu-virtiofsd.json
+endif
 
 ifdef CONFIG_VIRGL
 ifdef CONFIG_GBM
diff --git a/tools/virtiofsd/Makefile.objs b/tools/virtiofsd/Makefile.objs
index 67be16332c..941b19f18e 100644
--- a/tools/virtiofsd/Makefile.objs
+++ b/tools/virtiofsd/Makefile.objs
@@ -6,5 +6,8 @@ virtiofsd-obj-y = buffer.o \
                   fuse_signals.o \
                   fuse_virtio.o \
                   helper.o \
-                  passthrough_ll.o
+                  passthrough_ll.o \
+                  seccomp.o
 
+seccomp.o-cflags := $(SECCOMP_CFLAGS)
+seccomp.o-libs := $(SECCOMP_LIBS)
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 754ef2618b..701608c6df 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -58,6 +58,7 @@
 #include <unistd.h>
 
 #include "passthrough_helpers.h"
+#include "seccomp.h"
 
 #define HAVE_POSIX_FALLOCATE 1
 struct lo_map_elem {
@@ -2073,6 +2074,7 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se)
 {
     setup_namespaces(lo, se);
     setup_mounts(lo->source);
+    setup_seccomp();
 }
 
 int main(int argc, char *argv[])
diff --git a/tools/virtiofsd/seccomp.c b/tools/virtiofsd/seccomp.c
new file mode 100644
index 0000000000..6359bb55bb
--- /dev/null
+++ b/tools/virtiofsd/seccomp.c
@@ -0,0 +1,141 @@
+/*
+ * Seccomp sandboxing for virtiofsd
+ *
+ * Copyright (C) 2019 Red Hat, Inc.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "seccomp.h"
+#include "fuse_i.h"
+#include "fuse_log.h"
+#include <errno.h>
+#include <glib.h>
+#include <seccomp.h>
+#include <stdlib.h>
+
+/* Bodge for libseccomp 2.4.2 which broke ppoll */
+#if !defined(__SNR_ppoll) && defined(__SNR_brk)
+#ifdef __NR_ppoll
+#define __SNR_ppoll __NR_ppoll
+#else
+#define __SNR_ppoll __PNR_ppoll
+#endif
+#endif
+
+static const int syscall_whitelist[] = {
+    /* TODO ireg sem*() syscalls */
+    SCMP_SYS(brk),
+    SCMP_SYS(capget), /* For CAP_FSETID */
+    SCMP_SYS(capset),
+    SCMP_SYS(clock_gettime),
+    SCMP_SYS(clone),
+    SCMP_SYS(close),
+    SCMP_SYS(copy_file_range),
+    SCMP_SYS(dup),
+    SCMP_SYS(eventfd2),
+    SCMP_SYS(exit),
+    SCMP_SYS(exit_group),
+    SCMP_SYS(fallocate),
+    SCMP_SYS(fchmodat),
+    SCMP_SYS(fchownat),
+    SCMP_SYS(fcntl),
+    SCMP_SYS(fdatasync),
+    SCMP_SYS(fgetxattr),
+    SCMP_SYS(flistxattr),
+    SCMP_SYS(flock),
+    SCMP_SYS(fremovexattr),
+    SCMP_SYS(fsetxattr),
+    SCMP_SYS(fstat),
+    SCMP_SYS(fstatfs),
+    SCMP_SYS(fsync),
+    SCMP_SYS(ftruncate),
+    SCMP_SYS(futex),
+    SCMP_SYS(getdents),
+    SCMP_SYS(getdents64),
+    SCMP_SYS(getegid),
+    SCMP_SYS(geteuid),
+    SCMP_SYS(getpid),
+    SCMP_SYS(gettid),
+    SCMP_SYS(gettimeofday),
+    SCMP_SYS(linkat),
+    SCMP_SYS(lseek),
+    SCMP_SYS(madvise),
+    SCMP_SYS(mkdirat),
+    SCMP_SYS(mknodat),
+    SCMP_SYS(mmap),
+    SCMP_SYS(mprotect),
+    SCMP_SYS(mremap),
+    SCMP_SYS(munmap),
+    SCMP_SYS(newfstatat),
+    SCMP_SYS(open),
+    SCMP_SYS(openat),
+    SCMP_SYS(ppoll),
+    SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
+    SCMP_SYS(preadv),
+    SCMP_SYS(pread64),
+    SCMP_SYS(pwritev),
+    SCMP_SYS(pwrite64),
+    SCMP_SYS(read),
+    SCMP_SYS(readlinkat),
+    SCMP_SYS(recvmsg),
+    SCMP_SYS(renameat),
+    SCMP_SYS(renameat2),
+    SCMP_SYS(rt_sigaction),
+    SCMP_SYS(rt_sigprocmask),
+    SCMP_SYS(rt_sigreturn),
+    SCMP_SYS(sendmsg),
+    SCMP_SYS(setresgid),
+    SCMP_SYS(setresuid),
+    SCMP_SYS(set_robust_list),
+    SCMP_SYS(symlinkat),
+    SCMP_SYS(time), /* Rarely needed, except on static builds */
+    SCMP_SYS(tgkill),
+    SCMP_SYS(unlinkat),
+    SCMP_SYS(utimensat),
+    SCMP_SYS(write),
+    SCMP_SYS(writev),
+};
+
+void setup_seccomp(void)
+{
+    scmp_filter_ctx ctx;
+    size_t i;
+
+#ifdef SCMP_ACT_KILL_PROCESS
+    ctx = seccomp_init(SCMP_ACT_KILL_PROCESS);
+    /* Handle a newer libseccomp but an older kernel */
+    if (!ctx && errno == EOPNOTSUPP) {
+        ctx = seccomp_init(SCMP_ACT_TRAP);
+    }
+#else
+    ctx = seccomp_init(SCMP_ACT_TRAP);
+#endif
+    if (!ctx) {
+        fuse_log(FUSE_LOG_ERR, "seccomp_init() failed\n");
+        exit(1);
+    }
+
+    for (i = 0; i < G_N_ELEMENTS(syscall_whitelist); i++) {
+        if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW,
+                             syscall_whitelist[i], 0) != 0) {
+            fuse_log(FUSE_LOG_ERR, "seccomp_rule_add syscall %d",
+                     syscall_whitelist[i]);
+            exit(1);
+        }
+    }
+
+    /* libvhost-user calls this for post-copy migration, we don't need it */
+    if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(ENOSYS),
+                         SCMP_SYS(userfaultfd), 0) != 0) {
+        fuse_log(FUSE_LOG_ERR, "seccomp_rule_add userfaultfd failed\n");
+        exit(1);
+    }
+
+    if (seccomp_load(ctx) < 0) {
+        fuse_log(FUSE_LOG_ERR, "seccomp_load() failed\n");
+        exit(1);
+    }
+
+    seccomp_release(ctx);
+}
diff --git a/tools/virtiofsd/seccomp.h b/tools/virtiofsd/seccomp.h
new file mode 100644
index 0000000000..86bce72652
--- /dev/null
+++ b/tools/virtiofsd/seccomp.h
@@ -0,0 +1,14 @@
+/*
+ * Seccomp sandboxing for virtiofsd
+ *
+ * Copyright (C) 2019 Red Hat, Inc.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef VIRTIOFSD_SECCOMP_H
+#define VIRTIOFSD_SECCOMP_H
+
+void setup_seccomp(void);
+
+#endif /* VIRTIOFSD_SECCOMP_H */
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 051/104] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (49 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 050/104] virtiofsd: add seccomp whitelist Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-15 12:06   ` Misono Tomohiro
  2020-01-16 14:37   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 052/104] virtiofsd: cap-ng helpers Dr. David Alan Gilbert (git)
                   ` (55 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Vivek Goyal <vgoyal@redhat.com>

Caller can set FUSE_WRITE_KILL_PRIV in write_flags. Parse it and pass it
to the filesystem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_common.h   | 6 +++++-
 tools/virtiofsd/fuse_lowlevel.c | 4 +++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 147c043bd9..1e8191b7a6 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -93,8 +93,12 @@ struct fuse_file_info {
      */
     unsigned int cache_readdir:1;
 
+    /* Indicates that suid/sgid bits should be removed upon write */
+    unsigned int kill_priv:1;
+
+
     /** Padding.  Reserved for future use*/
-    unsigned int padding:25;
+    unsigned int padding:24;
     unsigned int padding2:32;
 
     /*
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index bd5ca2f157..c8a3b1597a 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1144,6 +1144,7 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid,
     memset(&fi, 0, sizeof(fi));
     fi.fh = arg->fh;
     fi.writepage = (arg->write_flags & FUSE_WRITE_CACHE) != 0;
+    fi.kill_priv = !!(arg->write_flags & FUSE_WRITE_KILL_PRIV);
 
     fi.lock_owner = arg->lock_owner;
     fi.flags = arg->flags;
@@ -1179,7 +1180,8 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
     fi.lock_owner = arg->lock_owner;
     fi.flags = arg->flags;
     fi.fh = arg->fh;
-    fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
+    fi.writepage = !!(arg->write_flags & FUSE_WRITE_CACHE);
+    fi.kill_priv = !!(arg->write_flags & FUSE_WRITE_KILL_PRIV);
 
     if (ibufv->count == 1) {
         assert(!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD));
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 052/104] virtiofsd: cap-ng helpers
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (50 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 051/104] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 14:58   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 053/104] virtiofsd: Drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
                   ` (54 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

libcap-ng reads /proc during capng_get_caps_process, and virtiofsd's
sandboxing doesn't have /proc mounted; thus we have to do the
caps read before we sandbox it and save/restore the state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 Makefile                         |  2 +
 tools/virtiofsd/passthrough_ll.c | 72 ++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)

diff --git a/Makefile b/Makefile
index 3f5d04e1f7..fa15174ba0 100644
--- a/Makefile
+++ b/Makefile
@@ -323,9 +323,11 @@ HELPERS-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_LINUX)) = qemu-bridge-helper$(EXE
 
 ifdef CONFIG_LINUX
 ifdef CONFIG_SECCOMP
+ifdef CONFIG_LIBCAP_NG
 HELPERS-y += virtiofsd$(EXESUF)
 vhost-user-json-y += tools/virtiofsd/50-qemu-virtiofsd.json
 endif
+endif
 
 ifdef CONFIG_VIRGL
 ifdef CONFIG_GBM
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 701608c6df..6a09b28608 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -38,6 +38,7 @@
 #include "fuse_virtio.h"
 #include "fuse_lowlevel.h"
 #include <assert.h>
+#include <cap-ng.h>
 #include <dirent.h>
 #include <errno.h>
 #include <inttypes.h>
@@ -139,6 +140,13 @@ static const struct fuse_opt lo_opts[] = {
 
 static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
 
+static struct {
+    pthread_mutex_t mutex;
+    void *saved;
+} cap;
+/* That we loaded cap-ng in the current thread from the saved */
+static __thread bool cap_loaded = 0;
+
 static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st);
 
 static int is_dot_or_dotdot(const char *name)
@@ -162,6 +170,37 @@ static struct lo_data *lo_data(fuse_req_t req)
     return (struct lo_data *)fuse_req_userdata(req);
 }
 
+/*
+ * Load capng's state from our saved state if the current thread
+ * hadn't previously been loaded.
+ * returns 0 on success
+ */
+static int load_capng(void)
+{
+    if (!cap_loaded) {
+        pthread_mutex_lock(&cap.mutex);
+        capng_restore_state(&cap.saved);
+        /*
+         * restore_state free's the saved copy
+         * so make another.
+         */
+        cap.saved = capng_save_state();
+        if (!cap.saved) {
+            fuse_log(FUSE_LOG_ERR, "capng_save_state (thread)\n");
+            return -EINVAL;
+        }
+        pthread_mutex_unlock(&cap.mutex);
+
+        /*
+         * We want to use the loaded state for our pid,
+         * not the original
+         */
+        capng_setpid(syscall(SYS_gettid));
+        cap_loaded = true;
+    }
+    return 0;
+}
+
 static void lo_map_init(struct lo_map *map)
 {
     map->elems = NULL;
@@ -2005,6 +2044,35 @@ static void setup_namespaces(struct lo_data *lo, struct fuse_session *se)
     }
 }
 
+/*
+ * Capture the capability state, we'll need to restore this for individual
+ * threads later; see load_capng.
+ */
+static void setup_capng(void)
+{
+    /* Note this accesses /proc so has to happen before the sandbox */
+    if (capng_get_caps_process()) {
+        fuse_log(FUSE_LOG_ERR, "capng_get_caps_process\n");
+        exit(1);
+    }
+    pthread_mutex_init(&cap.mutex, NULL);
+    pthread_mutex_lock(&cap.mutex);
+    cap.saved = capng_save_state();
+    if (!cap.saved) {
+        fuse_log(FUSE_LOG_ERR, "capng_save_state\n");
+        exit(1);
+    }
+    pthread_mutex_unlock(&cap.mutex);
+}
+
+static void cleanup_capng(void)
+{
+    free(cap.saved);
+    cap.saved = NULL;
+    pthread_mutex_destroy(&cap.mutex);
+}
+
+
 /*
  * Make the source directory our root so symlinks cannot escape and no other
  * files are accessible.  Assumes unshare(CLONE_NEWNS) was already called.
@@ -2198,12 +2266,16 @@ int main(int argc, char *argv[])
 
     fuse_daemonize(opts.foreground);
 
+    /* Must be before sandbox since it wants /proc */
+    setup_capng();
+
     setup_sandbox(&lo, se);
 
     /* Block until ctrl+c or fusermount -u */
     ret = virtio_loop(se);
 
     fuse_session_unmount(se);
+    cleanup_capng();
 err_out3:
     fuse_remove_signal_handlers(se);
 err_out2:
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 053/104] virtiofsd: Drop CAP_FSETID if client asked for it
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (51 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 052/104] virtiofsd: cap-ng helpers Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16  4:41   ` Misono Tomohiro
  2020-01-16 15:21   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 054/104] virtiofsd: set maximum RLIMIT_NOFILE limit Dr. David Alan Gilbert (git)
                   ` (53 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Vivek Goyal <vgoyal@redhat.com>

If client requested killing setuid/setgid bits on file being written, drop
CAP_FSETID capability so that setuid/setgid bits are cleared upon write
automatically.

pjdfstest chown/12.t needs this.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
  dgilbert: reworked for libcap-ng
---
 tools/virtiofsd/passthrough_ll.c | 105 +++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 6a09b28608..ab318a6f36 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -201,6 +201,91 @@ static int load_capng(void)
     return 0;
 }
 
+/*
+ * Helpers for dropping and regaining effective capabilities. Returns 0
+ * on success, error otherwise
+ */
+static int drop_effective_cap(const char *cap_name, bool *cap_dropped)
+{
+    int cap, ret;
+
+    cap = capng_name_to_capability(cap_name);
+    if (cap < 0) {
+        ret = errno;
+        fuse_log(FUSE_LOG_ERR, "capng_name_to_capability(%s) failed:%s\n",
+                 cap_name, strerror(errno));
+        goto out;
+    }
+
+    if (load_capng()) {
+        ret = errno;
+        fuse_log(FUSE_LOG_ERR, "load_capng() failed\n");
+        goto out;
+    }
+
+    /* We dont have this capability in effective set already. */
+    if (!capng_have_capability(CAPNG_EFFECTIVE, cap)) {
+        ret = 0;
+        goto out;
+    }
+
+    if (capng_update(CAPNG_DROP, CAPNG_EFFECTIVE, cap)) {
+        ret = errno;
+        fuse_log(FUSE_LOG_ERR, "capng_update(DROP,) failed\n");
+        goto out;
+    }
+
+    if (capng_apply(CAPNG_SELECT_CAPS)) {
+        ret = errno;
+        fuse_log(FUSE_LOG_ERR, "drop:capng_apply() failed\n");
+        goto out;
+    }
+
+    ret = 0;
+    if (cap_dropped) {
+        *cap_dropped = true;
+    }
+
+out:
+    return ret;
+}
+
+static int gain_effective_cap(const char *cap_name)
+{
+    int cap;
+    int ret = 0;
+
+    cap = capng_name_to_capability(cap_name);
+    if (cap < 0) {
+        ret = errno;
+        fuse_log(FUSE_LOG_ERR, "capng_name_to_capability(%s) failed:%s\n",
+                 cap_name, strerror(errno));
+        goto out;
+    }
+
+    if (load_capng()) {
+        ret = errno;
+        fuse_log(FUSE_LOG_ERR, "load_capng() failed\n");
+        goto out;
+    }
+
+    if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE, cap)) {
+        ret = errno;
+        fuse_log(FUSE_LOG_ERR, "capng_update(ADD,) failed\n");
+        goto out;
+    }
+
+    if (capng_apply(CAPNG_SELECT_CAPS)) {
+        ret = errno;
+        fuse_log(FUSE_LOG_ERR, "gain:capng_apply() failed\n");
+        goto out;
+    }
+    ret = 0;
+
+out:
+    return ret;
+}
+
 static void lo_map_init(struct lo_map *map)
 {
     map->elems = NULL;
@@ -1559,6 +1644,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
     (void)ino;
     ssize_t res;
     struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
+    bool cap_fsetid_dropped = false;
 
     out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
     out_buf.buf[0].fd = lo_fi_fd(req, fi);
@@ -1570,12 +1656,31 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
                  out_buf.buf[0].size, (unsigned long)off);
     }
 
+    /*
+     * If kill_priv is set, drop CAP_FSETID which should lead to kernel
+     * clearing setuid/setgid on file.
+     */
+    if (fi->kill_priv) {
+        res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
+        if (res != 0) {
+            fuse_reply_err(req, res);
+            return;
+        }
+    }
+
     res = fuse_buf_copy(&out_buf, in_buf, 0);
     if (res < 0) {
         fuse_reply_err(req, -res);
     } else {
         fuse_reply_write(req, (size_t)res);
     }
+
+    if (cap_fsetid_dropped) {
+        res = gain_effective_cap("FSETID");
+        if (res) {
+            fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
+        }
+    }
 }
 
 static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 054/104] virtiofsd: set maximum RLIMIT_NOFILE limit
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (52 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 053/104] virtiofsd: Drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 15:00   ` Daniel P. Berrangé
  2020-01-15 17:09   ` Philippe Mathieu-Daudé
  2019-12-12 16:38 ` [PATCH 055/104] virtiofsd: fix libfuse information leaks Dr. David Alan Gilbert (git)
                   ` (52 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

virtiofsd can exceed the default open file descriptor limit easily on
most systems.  Take advantage of the fact that it runs as root to raise
the limit.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index ab318a6f36..139bf08f4c 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -52,6 +52,7 @@
 #include <sys/file.h>
 #include <sys/mount.h>
 #include <sys/prctl.h>
+#include <sys/resource.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <sys/wait.h>
@@ -2250,6 +2251,35 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se)
     setup_seccomp();
 }
 
+/* Raise the maximum number of open file descriptors */
+static void setup_nofile_rlimit(void)
+{
+    const rlim_t max_fds = 1000000;
+    struct rlimit rlim;
+
+    if (getrlimit(RLIMIT_NOFILE, &rlim) < 0) {
+        fuse_log(FUSE_LOG_ERR, "getrlimit(RLIMIT_NOFILE): %m\n");
+        exit(1);
+    }
+
+    if (rlim.rlim_cur >= max_fds) {
+        return; /* nothing to do */
+    }
+
+    rlim.rlim_cur = max_fds;
+    rlim.rlim_max = max_fds;
+
+    if (setrlimit(RLIMIT_NOFILE, &rlim) < 0) {
+        /* Ignore SELinux denials */
+        if (errno == EPERM) {
+            return;
+        }
+
+        fuse_log(FUSE_LOG_ERR, "setrlimit(RLIMIT_NOFILE): %m\n");
+        exit(1);
+    }
+}
+
 int main(int argc, char *argv[])
 {
     struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -2371,6 +2401,8 @@ int main(int argc, char *argv[])
 
     fuse_daemonize(opts.foreground);
 
+    setup_nofile_rlimit();
+
     /* Must be before sandbox since it wants /proc */
     setup_capng();
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 055/104] virtiofsd: fix libfuse information leaks
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (53 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 054/104] virtiofsd: set maximum RLIMIT_NOFILE limit Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 15:01   ` Daniel P. Berrangé
  2020-01-15 17:07   ` Philippe Mathieu-Daudé
  2019-12-12 16:38 ` [PATCH 056/104] virtiofsd: add security guide document Dr. David Alan Gilbert (git)
                   ` (51 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Some FUSE message replies contain padding fields that are not
initialized by libfuse.  This is fine in traditional FUSE applications
because the kernel is trusted.  virtiofsd does not trust the guest and
must not expose uninitialized memory.

Use C struct initializers to automatically zero out memory.  Not all of
these code changes are strictly necessary but they will prevent future
information leaks if the structs are extended.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 150 ++++++++++++++++----------------
 1 file changed, 76 insertions(+), 74 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index c8a3b1597a..f3c8bdf7cb 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -43,21 +43,23 @@ static __attribute__((constructor)) void fuse_ll_init_pagesize(void)
 
 static void convert_stat(const struct stat *stbuf, struct fuse_attr *attr)
 {
-    attr->ino = stbuf->st_ino;
-    attr->mode = stbuf->st_mode;
-    attr->nlink = stbuf->st_nlink;
-    attr->uid = stbuf->st_uid;
-    attr->gid = stbuf->st_gid;
-    attr->rdev = stbuf->st_rdev;
-    attr->size = stbuf->st_size;
-    attr->blksize = stbuf->st_blksize;
-    attr->blocks = stbuf->st_blocks;
-    attr->atime = stbuf->st_atime;
-    attr->mtime = stbuf->st_mtime;
-    attr->ctime = stbuf->st_ctime;
-    attr->atimensec = ST_ATIM_NSEC(stbuf);
-    attr->mtimensec = ST_MTIM_NSEC(stbuf);
-    attr->ctimensec = ST_CTIM_NSEC(stbuf);
+    *attr = (struct fuse_attr){
+        .ino = stbuf->st_ino,
+        .mode = stbuf->st_mode,
+        .nlink = stbuf->st_nlink,
+        .uid = stbuf->st_uid,
+        .gid = stbuf->st_gid,
+        .rdev = stbuf->st_rdev,
+        .size = stbuf->st_size,
+        .blksize = stbuf->st_blksize,
+        .blocks = stbuf->st_blocks,
+        .atime = stbuf->st_atime,
+        .mtime = stbuf->st_mtime,
+        .ctime = stbuf->st_ctime,
+        .atimensec = ST_ATIM_NSEC(stbuf),
+        .mtimensec = ST_MTIM_NSEC(stbuf),
+        .ctimensec = ST_CTIM_NSEC(stbuf),
+    };
 }
 
 static void convert_attr(const struct fuse_setattr_in *attr, struct stat *stbuf)
@@ -183,16 +185,16 @@ static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
                                int count)
 {
-    struct fuse_out_header out;
+    struct fuse_out_header out = {
+        .unique = req->unique,
+        .error = error,
+    };
 
     if (error <= -1000 || error > 0) {
         fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n", error);
         error = -ERANGE;
     }
 
-    out.unique = req->unique;
-    out.error = error;
-
     iov[0].iov_base = &out;
     iov[0].iov_len = sizeof(struct fuse_out_header);
 
@@ -277,14 +279,16 @@ size_t fuse_add_direntry(fuse_req_t req, char *buf, size_t bufsize,
 static void convert_statfs(const struct statvfs *stbuf,
                            struct fuse_kstatfs *kstatfs)
 {
-    kstatfs->bsize = stbuf->f_bsize;
-    kstatfs->frsize = stbuf->f_frsize;
-    kstatfs->blocks = stbuf->f_blocks;
-    kstatfs->bfree = stbuf->f_bfree;
-    kstatfs->bavail = stbuf->f_bavail;
-    kstatfs->files = stbuf->f_files;
-    kstatfs->ffree = stbuf->f_ffree;
-    kstatfs->namelen = stbuf->f_namemax;
+    *kstatfs = (struct fuse_kstatfs){
+        .bsize = stbuf->f_bsize,
+        .frsize = stbuf->f_frsize,
+        .blocks = stbuf->f_blocks,
+        .bfree = stbuf->f_bfree,
+        .bavail = stbuf->f_bavail,
+        .files = stbuf->f_files,
+        .ffree = stbuf->f_ffree,
+        .namelen = stbuf->f_namemax,
+    };
 }
 
 static int send_reply_ok(fuse_req_t req, const void *arg, size_t argsize)
@@ -328,12 +332,14 @@ static unsigned int calc_timeout_nsec(double t)
 static void fill_entry(struct fuse_entry_out *arg,
                        const struct fuse_entry_param *e)
 {
-    arg->nodeid = e->ino;
-    arg->generation = e->generation;
-    arg->entry_valid = calc_timeout_sec(e->entry_timeout);
-    arg->entry_valid_nsec = calc_timeout_nsec(e->entry_timeout);
-    arg->attr_valid = calc_timeout_sec(e->attr_timeout);
-    arg->attr_valid_nsec = calc_timeout_nsec(e->attr_timeout);
+    *arg = (struct fuse_entry_out){
+        .nodeid = e->ino,
+        .generation = e->generation,
+        .entry_valid = calc_timeout_sec(e->entry_timeout),
+        .entry_valid_nsec = calc_timeout_nsec(e->entry_timeout),
+        .attr_valid = calc_timeout_sec(e->attr_timeout),
+        .attr_valid_nsec = calc_timeout_nsec(e->attr_timeout),
+    };
     convert_stat(&e->attr, &arg->attr);
 }
 
@@ -362,10 +368,12 @@ size_t fuse_add_direntry_plus(fuse_req_t req, char *buf, size_t bufsize,
     fill_entry(&dp->entry_out, e);
 
     struct fuse_dirent *dirent = &dp->dirent;
-    dirent->ino = e->attr.st_ino;
-    dirent->off = off;
-    dirent->namelen = namelen;
-    dirent->type = (e->attr.st_mode & S_IFMT) >> 12;
+    *dirent = (struct fuse_dirent){
+        .ino = e->attr.st_ino,
+        .off = off,
+        .namelen = namelen,
+        .type = (e->attr.st_mode & S_IFMT) >> 12,
+    };
     memcpy(dirent->name, name, namelen);
     memset(dirent->name + namelen, 0, entlen_padded - entlen);
 
@@ -498,15 +506,14 @@ int fuse_reply_data(fuse_req_t req, struct fuse_bufvec *bufv,
                     enum fuse_buf_copy_flags flags)
 {
     struct iovec iov[2];
-    struct fuse_out_header out;
+    struct fuse_out_header out = {
+        .unique = req->unique,
+    };
     int res;
 
     iov[0].iov_base = &out;
     iov[0].iov_len = sizeof(struct fuse_out_header);
 
-    out.unique = req->unique;
-    out.error = 0;
-
     res = fuse_send_data_iov(req->se, req->ch, iov, 1, bufv, flags);
     if (res <= 0) {
         fuse_free_req(req);
@@ -2147,14 +2154,14 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
 static int send_notify_iov(struct fuse_session *se, int notify_code,
                            struct iovec *iov, int count)
 {
-    struct fuse_out_header out;
+    struct fuse_out_header out = {
+        .error = notify_code,
+    };
 
     if (!se->got_init) {
         return -ENOTCONN;
     }
 
-    out.unique = 0;
-    out.error = notify_code;
     iov[0].iov_base = &out;
     iov[0].iov_len = sizeof(struct fuse_out_header);
 
@@ -2164,11 +2171,11 @@ static int send_notify_iov(struct fuse_session *se, int notify_code,
 int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph)
 {
     if (ph != NULL) {
-        struct fuse_notify_poll_wakeup_out outarg;
+        struct fuse_notify_poll_wakeup_out outarg = {
+            .kh = ph->kh,
+        };
         struct iovec iov[2];
 
-        outarg.kh = ph->kh;
-
         iov[1].iov_base = &outarg;
         iov[1].iov_len = sizeof(outarg);
 
@@ -2181,17 +2188,17 @@ int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph)
 int fuse_lowlevel_notify_inval_inode(struct fuse_session *se, fuse_ino_t ino,
                                      off_t off, off_t len)
 {
-    struct fuse_notify_inval_inode_out outarg;
+    struct fuse_notify_inval_inode_out outarg = {
+        .ino = ino,
+        .off = off,
+        .len = len,
+    };
     struct iovec iov[2];
 
     if (!se) {
         return -EINVAL;
     }
 
-    outarg.ino = ino;
-    outarg.off = off;
-    outarg.len = len;
-
     iov[1].iov_base = &outarg;
     iov[1].iov_len = sizeof(outarg);
 
@@ -2201,17 +2208,16 @@ int fuse_lowlevel_notify_inval_inode(struct fuse_session *se, fuse_ino_t ino,
 int fuse_lowlevel_notify_inval_entry(struct fuse_session *se, fuse_ino_t parent,
                                      const char *name, size_t namelen)
 {
-    struct fuse_notify_inval_entry_out outarg;
+    struct fuse_notify_inval_entry_out outarg = {
+        .parent = parent,
+        .namelen = namelen,
+    };
     struct iovec iov[3];
 
     if (!se) {
         return -EINVAL;
     }
 
-    outarg.parent = parent;
-    outarg.namelen = namelen;
-    outarg.padding = 0;
-
     iov[1].iov_base = &outarg;
     iov[1].iov_len = sizeof(outarg);
     iov[2].iov_base = (void *)name;
@@ -2224,18 +2230,17 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se, fuse_ino_t parent,
                                 fuse_ino_t child, const char *name,
                                 size_t namelen)
 {
-    struct fuse_notify_delete_out outarg;
+    struct fuse_notify_delete_out outarg = {
+        .parent = parent,
+        .child = child,
+        .namelen = namelen,
+    };
     struct iovec iov[3];
 
     if (!se) {
         return -EINVAL;
     }
 
-    outarg.parent = parent;
-    outarg.child = child;
-    outarg.namelen = namelen;
-    outarg.padding = 0;
-
     iov[1].iov_base = &outarg;
     iov[1].iov_len = sizeof(outarg);
     iov[2].iov_base = (void *)name;
@@ -2248,24 +2253,21 @@ int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
                                off_t offset, struct fuse_bufvec *bufv,
                                enum fuse_buf_copy_flags flags)
 {
-    struct fuse_out_header out;
-    struct fuse_notify_store_out outarg;
+    struct fuse_out_header out = {
+        .error = FUSE_NOTIFY_STORE,
+    };
+    struct fuse_notify_store_out outarg = {
+        .nodeid = ino,
+        .offset = offset,
+        .size = fuse_buf_size(bufv),
+    };
     struct iovec iov[3];
-    size_t size = fuse_buf_size(bufv);
     int res;
 
     if (!se) {
         return -EINVAL;
     }
 
-    out.unique = 0;
-    out.error = FUSE_NOTIFY_STORE;
-
-    outarg.nodeid = ino;
-    outarg.offset = offset;
-    outarg.size = size;
-    outarg.padding = 0;
-
     iov[0].iov_base = &out;
     iov[0].iov_len = sizeof(out);
     iov[1].iov_base = &outarg;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 056/104] virtiofsd: add security guide document
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (54 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 055/104] virtiofsd: fix libfuse information leaks Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 15:03   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 057/104] virtiofsd: add --syslog command-line option Dr. David Alan Gilbert (git)
                   ` (50 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Many people want to know: what's up with virtiofsd and security?  This
document provides the answers!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/security.rst | 118 +++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)
 create mode 100644 tools/virtiofsd/security.rst

diff --git a/tools/virtiofsd/security.rst b/tools/virtiofsd/security.rst
new file mode 100644
index 0000000000..61ce551344
--- /dev/null
+++ b/tools/virtiofsd/security.rst
@@ -0,0 +1,118 @@
+========================
+Virtiofsd Security Guide
+========================
+
+Introduction
+============
+This document covers security topics for users of virtiofsd, the daemon that
+implements host<->guest file system sharing.  Sharing files between one or more
+guests and the host raises questions about the trust relationships between
+these entities.  By understanding these topics users can safely deploy
+virtiofsd and control access to their data.
+
+Architecture
+============
+The virtiofsd daemon process acts as a vhost-user device backend, implementing
+the virtio-fs device that the corresponding device driver inside the guest
+interacts with.
+
+There is one virtiofsd process per virtio-fs device instance.  For example,
+when two guests have access to the same shared directory there are still two
+virtiofsd processes since there are two virtio-fs device instances.  Similarly,
+if one guest has access to two shared directories, there are two virtiofsd
+processes since there are two virtio-fs device instances.
+
+Files are created on the host with uid/gid values provided by the guest.
+Furthermore, virtiofsd is unable to enforce file permissions since guests have
+the ability to access any file within the shared directory.  File permissions
+are implemented in the guest, just like with traditional local file systems.
+
+Security Requirements
+=====================
+Guests have root access to the shared directory.  This is necessary for root
+file systems on virtio-fs and similar use cases.
+
+When multiple guests have access to the same shared directory, the guests have
+a trust relationship.  A broken or malicious guest could delete or corrupt
+files.  It could exploit symlink or time-of-check-to-time-of-use (TOCTOU) race
+conditions against applications in other guests.  It could plant device nodes
+or setuid executables to gain privileges in other guests.  It could perform
+denial-of-service (DoS) attacks by consuming available space or making the file
+system unavailable to other guests.
+
+Guests are restricted to the shared directory and cannot access other files on
+the host.
+
+Guests should not be able to gain arbitrary code execution inside the virtiofsd
+process.  If they do, the process is sandboxed to prevent escaping into other
+parts of the host.
+
+Daemon Sandboxing
+=================
+The virtiofsd process handles virtio-fs FUSE requests from the untrusted guest.
+This attack surface could give the guest access to host resources and must
+therefore be protected.  Sandboxing mechanisms are integrated into virtiofsd to
+reduce the impact in the event that an attacker gains control of the process.
+
+As a general rule, virtiofsd does not trust inputs from the guest, aside from
+uid/gid values.  Input validation is performed so that the guest cannot corrupt
+memory or otherwise gain arbitrary code execution in the virtiofsd process.
+
+Sandboxing adds restrictions on the virtiofsd so that even if an attacker is
+able to exploit a bug, they will be constrained to the virtiofsd process and
+unable to cause damage on the host.
+
+Seccomp Whitelist
+-----------------
+Many system calls are not required by virtiofsd to perform its function.  For
+example, ptrace(2) and execve(2) are not necessary and attackers are likely to
+use them to further compromise the system.  This is prevented using a seccomp
+whitelist in virtiofsd.
+
+During startup virtiofsd installs a whitelist of allowed system calls.  All
+other system calls are forbidden for the remaining lifetime of the process.
+This list has been built through experience of running virtiofsd on several
+flavors of Linux and observing which system calls were encountered.
+
+It is possible that previously unexplored code paths or newer library versions
+will invoke system calls that have not been whitelisted yet.  In this case the
+process terminates and a seccomp error is captured in the audit log.  The log
+can typically be viewed using ``journalctl -xe`` and searching for ``SECCOMP``.
+
+Should it be necessary to extend the whitelist, system call numbers from the
+audit log can be translated to names through a CPU architecture-specific
+``.tbl`` file in the Linux source tree.  They can then be added to the
+whitelist in ``seccomp.c`` in the virtiofsd source tree.
+
+Mount Namespace
+---------------
+During startup virtiofsd enters a new mount namespace and releases all mounts
+except for the shared directory.  This makes the file system root `/` the
+shared directory.  It is impossible to access files outside the shared
+directory since they cannot be looked up by path resolution.
+
+Several attacks, including `..` traversal and symlink escapes, are prevented by
+the mount namespace.
+
+The current virtiofsd implementation keeps a directory file descriptor to
+/proc/self/fd open in order to implement several FUSE requests.  This file
+descriptor could be used by attackers to access files outside the shared
+directory.  This limitation will be addressed in a future release of virtiofsd.
+
+Other Namespaces
+----------------
+Virtiofsd enters new pid and network namespaces during startup.  The pid
+namespace prevents the process from seeing other processes running on the host.
+The network namespace removes network connectivity from the process.
+
+Deployment Best Practices
+=========================
+The shared directory should be a separate file system so that untrusted guests
+cannot cause a denial-of-service by using up all available inodes or exhausting
+free space.
+
+If the shared directory is also accessible from a host mount namespace, it is
+recommended to keep a parent directory with rwx------ permissions so that other
+users on the host are unable to access any setuid executables or device nodes
+in the shared directory.  The `nosuid` and `nodev` mount options can also be
+used to prevent this issue.
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 057/104] virtiofsd: add --syslog command-line option
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (55 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 056/104] virtiofsd: add security guide document Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 15:05   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 058/104] virtiofsd: print log only when priority is high enough Dr. David Alan Gilbert (git)
                   ` (49 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Sometimes collecting output from stderr is inconvenient or does not fit
within the overall logging architecture.  Add syslog(3) support for
cases where stderr cannot be used.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
dgilbert: Reworked as a logging function
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.h  |  1 +
 tools/virtiofsd/helper.c         |  2 ++
 tools/virtiofsd/passthrough_ll.c | 50 ++++++++++++++++++++++++++++++--
 tools/virtiofsd/seccomp.c        | 32 ++++++++++++++------
 tools/virtiofsd/seccomp.h        |  4 ++-
 5 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index 6c63cb740c..c67677732b 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1828,6 +1828,7 @@ struct fuse_cmdline_opts {
     int show_help;
     int print_capabilities;
     int clone_fd;
+    int syslog;
     unsigned int max_idle_threads;
 };
 
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 4c9a3b2fc9..2c64b10ebf 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -55,6 +55,7 @@ static const struct fuse_opt fuse_helper_opts[] = {
     FUSE_OPT_KEY("subtype=", FUSE_OPT_KEY_KEEP),
     FUSE_HELPER_OPT("clone_fd", clone_fd),
     FUSE_HELPER_OPT("max_idle_threads=%u", max_idle_threads),
+    FUSE_HELPER_OPT("--syslog", syslog),
     FUSE_OPT_END
 };
 
@@ -139,6 +140,7 @@ void fuse_cmdline_help(void)
            "    -V   --version             print version\n"
            "    --print-capabilities       print vhost-user.json\n"
            "    -d   -o debug              enable debug output (implies -f)\n"
+           "    --syslog                   log to syslog (default stderr)\n"
            "    -f                         foreground operation\n"
            "    --daemonize                run in background\n"
            "    -s                         disable multi-threaded operation\n"
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 139bf08f4c..9ede80af94 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -57,6 +57,7 @@
 #include <sys/types.h>
 #include <sys/wait.h>
 #include <sys/xattr.h>
+#include <syslog.h>
 #include <unistd.h>
 
 #include "passthrough_helpers.h"
@@ -138,6 +139,7 @@ static const struct fuse_opt lo_opts[] = {
     { "norace", offsetof(struct lo_data, norace), 1 },
     FUSE_OPT_END
 };
+static bool use_syslog = false;
 
 static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
 
@@ -2244,11 +2246,12 @@ static void setup_mounts(const char *source)
  * Lock down this process to prevent access to other processes or files outside
  * source directory.  This reduces the impact of arbitrary code execution bugs.
  */
-static void setup_sandbox(struct lo_data *lo, struct fuse_session *se)
+static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
+                          bool enable_syslog)
 {
     setup_namespaces(lo, se);
     setup_mounts(lo->source);
-    setup_seccomp();
+    setup_seccomp(enable_syslog);
 }
 
 /* Raise the maximum number of open file descriptors */
@@ -2280,6 +2283,42 @@ static void setup_nofile_rlimit(void)
     }
 }
 
+static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
+{
+    if (use_syslog) {
+        int priority = LOG_ERR;
+        switch (level) {
+        case FUSE_LOG_EMERG:
+            priority = LOG_EMERG;
+            break;
+        case FUSE_LOG_ALERT:
+            priority = LOG_ALERT;
+            break;
+        case FUSE_LOG_CRIT:
+            priority = LOG_CRIT;
+            break;
+        case FUSE_LOG_ERR:
+            priority = LOG_ERR;
+            break;
+        case FUSE_LOG_WARNING:
+            priority = LOG_WARNING;
+            break;
+        case FUSE_LOG_NOTICE:
+            priority = LOG_NOTICE;
+            break;
+        case FUSE_LOG_INFO:
+            priority = LOG_INFO;
+            break;
+        case FUSE_LOG_DEBUG:
+            priority = LOG_DEBUG;
+            break;
+        }
+        vsyslog(priority, fmt, ap);
+    } else {
+        vfprintf(stderr, fmt, ap);
+    }
+}
+
 int main(int argc, char *argv[])
 {
     struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -2318,6 +2357,11 @@ int main(int argc, char *argv[])
     if (fuse_parse_cmdline(&args, &opts) != 0) {
         return 1;
     }
+    fuse_set_log_func(log_func);
+    use_syslog = opts.syslog;
+    if (use_syslog) {
+        openlog("virtiofsd", LOG_PID, LOG_DAEMON);
+    }
     if (opts.show_help) {
         printf("usage: %s [options]\n\n", argv[0]);
         fuse_cmdline_help();
@@ -2406,7 +2450,7 @@ int main(int argc, char *argv[])
     /* Must be before sandbox since it wants /proc */
     setup_capng();
 
-    setup_sandbox(&lo, se);
+    setup_sandbox(&lo, se, opts.syslog);
 
     /* Block until ctrl+c or fusermount -u */
     ret = virtio_loop(se);
diff --git a/tools/virtiofsd/seccomp.c b/tools/virtiofsd/seccomp.c
index 6359bb55bb..33ec309fc8 100644
--- a/tools/virtiofsd/seccomp.c
+++ b/tools/virtiofsd/seccomp.c
@@ -97,11 +97,28 @@ static const int syscall_whitelist[] = {
     SCMP_SYS(writev),
 };
 
-void setup_seccomp(void)
+/* Syscalls used when --syslog is enabled */
+static const int syscall_whitelist_syslog[] = {
+    SCMP_SYS(sendto),
+};
+
+static void add_whitelist(scmp_filter_ctx ctx, const int syscalls[], size_t len)
 {
-    scmp_filter_ctx ctx;
     size_t i;
 
+    for (i = 0; i < len; i++) {
+        if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW, syscalls[i], 0) != 0) {
+            fuse_log(FUSE_LOG_ERR, "seccomp_rule_add syscall %d failed\n",
+                     syscalls[i]);
+            exit(1);
+        }
+    }
+}
+
+void setup_seccomp(bool enable_syslog)
+{
+    scmp_filter_ctx ctx;
+
 #ifdef SCMP_ACT_KILL_PROCESS
     ctx = seccomp_init(SCMP_ACT_KILL_PROCESS);
     /* Handle a newer libseccomp but an older kernel */
@@ -116,13 +133,10 @@ void setup_seccomp(void)
         exit(1);
     }
 
-    for (i = 0; i < G_N_ELEMENTS(syscall_whitelist); i++) {
-        if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW,
-                             syscall_whitelist[i], 0) != 0) {
-            fuse_log(FUSE_LOG_ERR, "seccomp_rule_add syscall %d",
-                     syscall_whitelist[i]);
-            exit(1);
-        }
+    add_whitelist(ctx, syscall_whitelist, G_N_ELEMENTS(syscall_whitelist));
+    if (enable_syslog) {
+        add_whitelist(ctx, syscall_whitelist_syslog,
+                      G_N_ELEMENTS(syscall_whitelist_syslog));
     }
 
     /* libvhost-user calls this for post-copy migration, we don't need it */
diff --git a/tools/virtiofsd/seccomp.h b/tools/virtiofsd/seccomp.h
index 86bce72652..d47c8eade6 100644
--- a/tools/virtiofsd/seccomp.h
+++ b/tools/virtiofsd/seccomp.h
@@ -9,6 +9,8 @@
 #ifndef VIRTIOFSD_SECCOMP_H
 #define VIRTIOFSD_SECCOMP_H
 
-void setup_seccomp(void);
+#include <stdbool.h>
+
+void setup_seccomp(bool enable_syslog);
 
 #endif /* VIRTIOFSD_SECCOMP_H */
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 058/104] virtiofsd: print log only when priority is high enough
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (56 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 057/104] virtiofsd: add --syslog command-line option Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 15:10   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 059/104] virtiofsd: Add ID to the log with FUSE_LOG_DEBUG level Dr. David Alan Gilbert (git)
                   ` (48 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Eryu Guan <eguan@linux.alibaba.com>

Introduce "-o log_level=" command line option to specify current log
level (priority), valid values are "debug info warn err", e.g.

    ./virtiofsd -o log_level=debug ...

So only log priority higher than "debug" will be printed to
stderr/syslog. And the default level is info.

The "-o debug"/"-d" options are kept, and imply debug log level.

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
dgilbert: Reworked for libfuse's log_func
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_log.c       |   4 ++
 tools/virtiofsd/fuse_lowlevel.c  |  75 ++++++++------------
 tools/virtiofsd/fuse_lowlevel.h  |   1 +
 tools/virtiofsd/helper.c         |  10 ++-
 tools/virtiofsd/passthrough_ll.c | 118 +++++++++++++------------------
 5 files changed, 92 insertions(+), 116 deletions(-)

diff --git a/tools/virtiofsd/fuse_log.c b/tools/virtiofsd/fuse_log.c
index 11345f9ec8..79a18a7aaa 100644
--- a/tools/virtiofsd/fuse_log.c
+++ b/tools/virtiofsd/fuse_log.c
@@ -8,6 +8,10 @@
  * See the file COPYING.LIB
  */
 
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <syslog.h>
 #include "fuse_log.h"
 
 #include <stdarg.h>
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index f3c8bdf7cb..0abb369b3d 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -158,19 +158,17 @@ static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
     struct fuse_out_header *out = iov[0].iov_base;
 
     out->len = iov_length(iov, count);
-    if (se->debug) {
-        if (out->unique == 0) {
-            fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
-                     out->len);
-        } else if (out->error) {
-            fuse_log(FUSE_LOG_DEBUG,
-                     "   unique: %llu, error: %i (%s), outsize: %i\n",
-                     (unsigned long long)out->unique, out->error,
-                     strerror(-out->error), out->len);
-        } else {
-            fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
-                     (unsigned long long)out->unique, out->len);
-        }
+    if (out->unique == 0) {
+        fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
+                 out->len);
+    } else if (out->error) {
+        fuse_log(FUSE_LOG_DEBUG,
+                 "   unique: %llu, error: %i (%s), outsize: %i\n",
+                 (unsigned long long)out->unique, out->error,
+                 strerror(-out->error), out->len);
+    } else {
+        fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
+                 (unsigned long long)out->unique, out->len);
     }
 
     if (fuse_lowlevel_is_virtio(se)) {
@@ -1664,10 +1662,8 @@ static void do_interrupt(fuse_req_t req, fuse_ino_t nodeid,
         return;
     }
 
-    if (se->debug) {
-        fuse_log(FUSE_LOG_DEBUG, "INTERRUPT: %llu\n",
-                 (unsigned long long)arg->unique);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "INTERRUPT: %llu\n",
+             (unsigned long long)arg->unique);
 
     req->u.i.unique = arg->unique;
 
@@ -1903,13 +1899,10 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
         }
     }
 
-    if (se->debug) {
-        fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
-        if (arg->major == 7 && arg->minor >= 6) {
-            fuse_log(FUSE_LOG_DEBUG, "flags=0x%08x\n", arg->flags);
-            fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n",
-                     arg->max_readahead);
-        }
+    fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
+    if (arg->major == 7 && arg->minor >= 6) {
+        fuse_log(FUSE_LOG_DEBUG, "flags=0x%08x\n", arg->flags);
+        fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n", arg->max_readahead);
     }
     se->conn.proto_major = arg->major;
     se->conn.proto_minor = arg->minor;
@@ -2118,19 +2111,14 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     outarg.congestion_threshold = se->conn.congestion_threshold;
     outarg.time_gran = se->conn.time_gran;
 
-    if (se->debug) {
-        fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major,
-                 outarg.minor);
-        fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
-        fuse_log(FUSE_LOG_DEBUG, "   max_readahead=0x%08x\n",
-                 outarg.max_readahead);
-        fuse_log(FUSE_LOG_DEBUG, "   max_write=0x%08x\n", outarg.max_write);
-        fuse_log(FUSE_LOG_DEBUG, "   max_background=%i\n",
-                 outarg.max_background);
-        fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
-                 outarg.congestion_threshold);
-        fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n", outarg.time_gran);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major, outarg.minor);
+    fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
+    fuse_log(FUSE_LOG_DEBUG, "   max_readahead=0x%08x\n", outarg.max_readahead);
+    fuse_log(FUSE_LOG_DEBUG, "   max_write=0x%08x\n", outarg.max_write);
+    fuse_log(FUSE_LOG_DEBUG, "   max_background=%i\n", outarg.max_background);
+    fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
+             outarg.congestion_threshold);
+    fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n", outarg.time_gran);
 
     send_reply_ok(req, &outarg, outargsize);
 }
@@ -2410,14 +2398,11 @@ void fuse_session_process_buf_int(struct fuse_session *se,
     in = fuse_mbuf_iter_advance(&iter, sizeof(*in));
     assert(in); /* caller guarantees the input buffer is large enough */
 
-    if (se->debug) {
-        fuse_log(FUSE_LOG_DEBUG,
-                 "unique: %llu, opcode: %s (%i), nodeid: %llu, insize: %zu, "
-                 "pid: %u\n",
-                 (unsigned long long)in->unique,
-                 opname((enum fuse_opcode)in->opcode), in->opcode,
-                 (unsigned long long)in->nodeid, buf->size, in->pid);
-    }
+    fuse_log(
+        FUSE_LOG_DEBUG,
+        "unique: %llu, opcode: %s (%i), nodeid: %llu, insize: %zu, pid: %u\n",
+        (unsigned long long)in->unique, opname((enum fuse_opcode)in->opcode),
+        in->opcode, (unsigned long long)in->nodeid, buf->size, in->pid);
 
     req = fuse_ll_alloc_req(se);
     if (req == NULL) {
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index c67677732b..27f631d9fc 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1829,6 +1829,7 @@ struct fuse_cmdline_opts {
     int print_capabilities;
     int clone_fd;
     int syslog;
+    int log_level;
     unsigned int max_idle_threads;
 };
 
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 2c64b10ebf..7b28507a38 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -33,7 +33,6 @@
         t, offsetof(struct fuse_cmdline_opts, p), v \
     }
 
-
 static const struct fuse_opt fuse_helper_opts[] = {
     FUSE_HELPER_OPT("-h", show_help),
     FUSE_HELPER_OPT("--help", show_help),
@@ -56,6 +55,10 @@ static const struct fuse_opt fuse_helper_opts[] = {
     FUSE_HELPER_OPT("clone_fd", clone_fd),
     FUSE_HELPER_OPT("max_idle_threads=%u", max_idle_threads),
     FUSE_HELPER_OPT("--syslog", syslog),
+    FUSE_HELPER_OPT_VALUE("log_level=debug", log_level, FUSE_LOG_DEBUG),
+    FUSE_HELPER_OPT_VALUE("log_level=info", log_level, FUSE_LOG_INFO),
+    FUSE_HELPER_OPT_VALUE("log_level=warn", log_level, FUSE_LOG_WARNING),
+    FUSE_HELPER_OPT_VALUE("log_level=err", log_level, FUSE_LOG_ERR),
     FUSE_OPT_END
 };
 
@@ -149,7 +152,10 @@ void fuse_cmdline_help(void)
            "                               (may improve performance)\n"
            "    -o max_idle_threads        the maximum number of idle worker "
            "threads\n"
-           "                               allowed (default: 10)\n");
+           "                               allowed (default: 10)\n"
+           "    -o log_level=<level>       log level, default to \"info\"\n"
+           "                               level could be one of \"debug, "
+           "info, warn, err\"\n");
 }
 
 static int fuse_helper_opt_proc(void *data, const char *arg, int key,
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 9ede80af94..6f398a7ff2 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -36,6 +36,7 @@
  */
 
 #include "fuse_virtio.h"
+#include "fuse_log.h"
 #include "fuse_lowlevel.h"
 #include <assert.h>
 #include <cap-ng.h>
@@ -140,6 +141,7 @@ static const struct fuse_opt lo_opts[] = {
     FUSE_OPT_END
 };
 static bool use_syslog = false;
+static int current_log_level;
 
 static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
 
@@ -458,11 +460,6 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino)
     return inode ? inode->fd : -1;
 }
 
-static bool lo_debug(fuse_req_t req)
-{
-    return lo_data(req)->debug != 0;
-}
-
 static void lo_init(void *userdata, struct fuse_conn_info *conn)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
@@ -472,15 +469,11 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
     }
 
     if (lo->writeback && conn->capable & FUSE_CAP_WRITEBACK_CACHE) {
-        if (lo->debug) {
-            fuse_log(FUSE_LOG_DEBUG, "lo_init: activating writeback\n");
-        }
+        fuse_log(FUSE_LOG_DEBUG, "lo_init: activating writeback\n");
         conn->want |= FUSE_CAP_WRITEBACK_CACHE;
     }
     if (lo->flock && conn->capable & FUSE_CAP_FLOCK_LOCKS) {
-        if (lo->debug) {
-            fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
-        }
+        fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
         conn->want |= FUSE_CAP_FLOCK_LOCKS;
     }
 }
@@ -823,10 +816,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     }
     e->ino = inode->fuse_ino;
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
-                 (unsigned long long)parent, name, (unsigned long long)e->ino);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n", (unsigned long long)parent,
+             name, (unsigned long long)e->ino);
 
     return 0;
 
@@ -843,10 +834,8 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
     struct fuse_entry_param e;
     int err;
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG, "lo_lookup(parent=%" PRIu64 ", name=%s)\n",
-                 parent, name);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "lo_lookup(parent=%" PRIu64 ", name=%s)\n", parent,
+             name);
 
     /*
      * Don't use is_safe_path_component(), allow "." and ".." for NFS export
@@ -954,10 +943,8 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
         goto out;
     }
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
-                 (unsigned long long)parent, name, (unsigned long long)e.ino);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n", (unsigned long long)parent,
+             name, (unsigned long long)e.ino);
 
     fuse_reply_entry(req, &e);
     return;
@@ -1057,10 +1044,8 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
     pthread_mutex_unlock(&lo->mutex);
     e.ino = inode->fuse_ino;
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
-                 (unsigned long long)parent, name, (unsigned long long)e.ino);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n", (unsigned long long)parent,
+             name, (unsigned long long)e.ino);
 
     fuse_reply_entry(req, &e);
     return;
@@ -1154,11 +1139,9 @@ static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
         return;
     }
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
-                 (unsigned long long)ino, (unsigned long long)inode->refcount,
-                 (unsigned long long)nlookup);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
+             (unsigned long long)ino, (unsigned long long)inode->refcount,
+             (unsigned long long)nlookup);
 
     unref_inode(lo, inode, nlookup);
 }
@@ -1428,10 +1411,8 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
     int err;
     struct lo_cred old = {};
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
-                 parent, name);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n", parent,
+             name);
 
     if (!is_safe_path_component(name)) {
         fuse_reply_err(req, EINVAL);
@@ -1508,10 +1489,8 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     char buf[64];
     struct lo_data *lo = lo_data(req);
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
-                 fi->flags);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
+             fi->flags);
 
     /*
      * With writeback cache, kernel may send read requests even
@@ -1626,12 +1605,10 @@ static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,
 {
     struct fuse_bufvec buf = FUSE_BUFVEC_INIT(size);
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG,
-                 "lo_read(ino=%" PRIu64 ", size=%zd, "
-                 "off=%lu)\n",
-                 ino, size, (unsigned long)offset);
-    }
+    fuse_log(FUSE_LOG_DEBUG,
+             "lo_read(ino=%" PRIu64 ", size=%zd, "
+             "off=%lu)\n",
+             ino, size, (unsigned long)offset);
 
     buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
     buf.buf[0].fd = lo_fi_fd(req, fi);
@@ -1653,11 +1630,9 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
     out_buf.buf[0].fd = lo_fi_fd(req, fi);
     out_buf.buf[0].pos = off;
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG,
-                 "lo_write(ino=%" PRIu64 ", size=%zd, off=%lu)\n", ino,
-                 out_buf.buf[0].size, (unsigned long)off);
-    }
+    fuse_log(FUSE_LOG_DEBUG,
+             "lo_write_buf(ino=%" PRIu64 ", size=%zd, off=%lu)\n", ino,
+             out_buf.buf[0].size, (unsigned long)off);
 
     /*
      * If kill_priv is set, drop CAP_FSETID which should lead to kernel
@@ -1756,11 +1731,8 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
         goto out;
     }
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG,
-                 "lo_getxattr(ino=%" PRIu64 ", name=%s size=%zd)\n", ino, name,
-                 size);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "lo_getxattr(ino=%" PRIu64 ", name=%s size=%zd)\n",
+             ino, name, size);
 
     if (inode->is_symlink) {
         /* Sorry, no race free way to getxattr on symlink. */
@@ -1834,10 +1806,8 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         goto out;
     }
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG, "lo_listxattr(ino=%" PRIu64 ", size=%zd)\n",
-                 ino, size);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "lo_listxattr(ino=%" PRIu64 ", size=%zd)\n", ino,
+             size);
 
     if (inode->is_symlink) {
         /* Sorry, no race free way to listxattr on symlink. */
@@ -1911,11 +1881,8 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
         goto out;
     }
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG,
-                 "lo_setxattr(ino=%" PRIu64 ", name=%s value=%s size=%zd)\n",
-                 ino, name, value, size);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64,
+             ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
     if (inode->is_symlink) {
         /* Sorry, no race free way to removexattr on symlink. */
@@ -1960,10 +1927,8 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
         goto out;
     }
 
-    if (lo_debug(req)) {
-        fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n",
-                 ino, name);
-    }
+    fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n", ino,
+             name);
 
     if (inode->is_symlink) {
         /* Sorry, no race free way to setxattr on symlink. */
@@ -2285,6 +2250,10 @@ static void setup_nofile_rlimit(void)
 
 static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
 {
+    if (current_log_level < level) {
+        return;
+    }
+
     if (use_syslog) {
         int priority = LOG_ERR;
         switch (level) {
@@ -2383,8 +2352,19 @@ int main(int argc, char *argv[])
         return 1;
     }
 
+    /*
+     * log_level is 0 if not configured via cmd options (0 is LOG_EMERG,
+     * and we don't use this log level).
+     */
+    if (opts.log_level != 0) {
+        current_log_level = opts.log_level;
+    }
     lo.debug = opts.debug;
+    if (lo.debug) {
+        current_log_level = FUSE_LOG_DEBUG;
+    }
     lo.root.refcount = 2;
+
     if (lo.source) {
         struct stat stat;
         int res;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 059/104] virtiofsd: Add ID to the log with FUSE_LOG_DEBUG level
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (57 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 058/104] virtiofsd: print log only when priority is high enough Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-06 15:18   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 060/104] virtiofsd: Add timestamp " Dr. David Alan Gilbert (git)
                   ` (47 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

virtiofsd has some threads, so we see a lot of logs with debug option.
It would be useful for debugging if we can identify the specific thread
from the log.

Add ID, which is got by gettid(), to the log with FUSE_LOG_DEBUG level
so that we can grep the specific thread.

The log is like as:

  ]# ./virtiofsd -d -o vhost_user_socket=/tmp/vhostqemu0 -o source=/tmp/share0 -o cache=auto
  ...
  [ID: 00000097]    unique: 12696, success, outsize: 120
  [ID: 00000097] virtio_send_msg: elem 18: with 2 in desc of length 120
  [ID: 00000003] fv_queue_thread: Got queue event on Queue 1
  [ID: 00000003] fv_queue_thread: Queue 1 gave evalue: 1 available: in: 65552 out: 80
  [ID: 00000003] fv_queue_thread: Waiting for Queue 1 event
  [ID: 00000071] fv_queue_worker: elem 33: with 2 out desc of length 80 bad_in_num=0 bad_out_num=0
  [ID: 00000071] unique: 12694, opcode: READ (15), nodeid: 2, insize: 80, pid: 2014
  [ID: 00000071] lo_read(ino=2, size=65536, off=131072)

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
---
 tools/virtiofsd/passthrough_ll.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 6f398a7ff2..8e00a90e6f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -42,6 +42,7 @@
 #include <cap-ng.h>
 #include <dirent.h>
 #include <errno.h>
+#include <glib.h>
 #include <inttypes.h>
 #include <limits.h>
 #include <pthread.h>
@@ -2248,12 +2249,18 @@ static void setup_nofile_rlimit(void)
     }
 }
 
-static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
+static void log_func(enum fuse_log_level level, const char *_fmt, va_list ap)
 {
+    char *fmt = (char *)_fmt;
+
     if (current_log_level < level) {
         return;
     }
 
+    if (current_log_level == FUSE_LOG_DEBUG) {
+        fmt = g_strdup_printf("[ID: %08ld] %s", syscall(__NR_gettid), _fmt);
+    }
+
     if (use_syslog) {
         int priority = LOG_ERR;
         switch (level) {
@@ -2286,6 +2293,10 @@ static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
     } else {
         vfprintf(stderr, fmt, ap);
     }
+
+    if (current_log_level == FUSE_LOG_DEBUG) {
+        g_free(fmt);
+    }
 }
 
 int main(int argc, char *argv[])
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 060/104] virtiofsd: Add timestamp to the log with FUSE_LOG_DEBUG level
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (58 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 059/104] virtiofsd: Add ID to the log with FUSE_LOG_DEBUG level Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:11   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 061/104] virtiofsd: Handle reinit Dr. David Alan Gilbert (git)
                   ` (46 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

virtiofsd has some threads, so we see a lot of logs with debug option.
It would be useful for debugging if we can see the timestamp.

Add nano second timestamp, which got by get_clock(), to the log with
FUSE_LOG_DEBUG level if the syslog option isn't set.

The log is like as:

  ]# ./virtiofsd -d -o vhost_user_socket=/tmp/vhostqemu0 -o source=/tmp/share0 -o cache=auto
  ...
  [5365943125463727] [ID: 00000002] fv_queue_thread: Start for queue 0 kick_fd 9
  [5365943125568644] [ID: 00000002] fv_queue_thread: Waiting for Queue 0 event
  [5365943125573561] [ID: 00000002] fv_queue_thread: Got queue event on Queue 0

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
---
 tools/virtiofsd/passthrough_ll.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 8e00a90e6f..91d3120033 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -35,6 +35,8 @@
  * \include passthrough_ll.c
  */
 
+#include "qemu/osdep.h"
+#include "qemu/timer.h"
 #include "fuse_virtio.h"
 #include "fuse_log.h"
 #include "fuse_lowlevel.h"
@@ -2258,7 +2260,12 @@ static void log_func(enum fuse_log_level level, const char *_fmt, va_list ap)
     }
 
     if (current_log_level == FUSE_LOG_DEBUG) {
-        fmt = g_strdup_printf("[ID: %08ld] %s", syscall(__NR_gettid), _fmt);
+        if (!use_syslog) {
+            fmt = g_strdup_printf("[%ld] [ID: %08ld] %s", get_clock(),
+                                  syscall(__NR_gettid), _fmt);
+        } else {
+            fmt = g_strdup_printf("[ID: %08ld] %s", syscall(__NR_gettid), _fmt);
+        }
     }
 
     if (use_syslog) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 061/104] virtiofsd: Handle reinit
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (59 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 060/104] virtiofsd: Add timestamp " Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:12   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 062/104] virtiofsd: Handle hard reboot Dr. David Alan Gilbert (git)
                   ` (45 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Allow init->destroy->init  for mount->umount->mount

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 0abb369b3d..2d1d1a2e59 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2030,6 +2030,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     }
 
     se->got_init = 1;
+    se->got_destroy = 0;
     if (se->op.init) {
         se->op.init(se->userdata, &se->conn);
     }
@@ -2132,6 +2133,7 @@ static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
     (void)iter;
 
     se->got_destroy = 1;
+    se->got_init = 0;
     if (se->op.destroy) {
         se->op.destroy(se->userdata);
     }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 062/104] virtiofsd: Handle hard reboot
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (60 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 061/104] virtiofsd: Handle reinit Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:14   ` Daniel P. Berrangé
  2020-01-20  6:46   ` Misono Tomohiro
  2019-12-12 16:38 ` [PATCH 063/104] virtiofsd: Kill threads when queues are stopped Dr. David Alan Gilbert (git)
                   ` (44 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Handle a
  mount
  hard reboot (without unmount)
  mount

we get another 'init' which FUSE doesn't normally expect.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 2d1d1a2e59..45125ef66a 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2436,7 +2436,21 @@ void fuse_session_process_buf_int(struct fuse_session *se,
             goto reply_err;
         }
     } else if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT) {
-        goto reply_err;
+        if (fuse_lowlevel_is_virtio(se)) {
+            /*
+             * TODO: This is after a hard reboot typically, we need to do
+             * a destroy, but we can't reply to this request yet so
+             * we can't use do_destroy
+             */
+            fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
+            se->got_destroy = 1;
+            se->got_init = 0;
+            if (se->op.destroy) {
+                se->op.destroy(se->userdata);
+            }
+        } else {
+            goto reply_err;
+        }
     }
 
     err = EACCES;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 063/104] virtiofsd: Kill threads when queues are stopped
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (61 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 062/104] virtiofsd: Handle hard reboot Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:16   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 064/104] vhost-user: Print unexpected slave message types Dr. David Alan Gilbert (git)
                   ` (43 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Kill the threads we've started when the queues get stopped.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 37 +++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 3c778b6296..2f11fee46d 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -41,6 +41,7 @@ struct fv_QueueInfo {
     /* Our queue index, corresponds to array position */
     int qidx;
     int kick_fd;
+    int kill_fd; /* For killing the thread */
 
     /* The element for the command currently being processed */
     VuVirtqElement *qe;
@@ -410,14 +411,17 @@ static void *fv_queue_thread(void *opaque)
     fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
              qi->qidx, qi->kick_fd);
     while (1) {
-        struct pollfd pf[1];
+        struct pollfd pf[2];
         pf[0].fd = qi->kick_fd;
         pf[0].events = POLLIN;
         pf[0].revents = 0;
+        pf[1].fd = qi->kill_fd;
+        pf[1].events = POLLIN;
+        pf[1].revents = 0;
 
         fuse_log(FUSE_LOG_DEBUG, "%s: Waiting for Queue %d event\n", __func__,
                  qi->qidx);
-        int poll_res = ppoll(pf, 1, NULL, NULL);
+        int poll_res = ppoll(pf, 2, NULL, NULL);
 
         if (poll_res == -1) {
             if (errno == EINTR) {
@@ -428,12 +432,23 @@ static void *fv_queue_thread(void *opaque)
             fuse_log(FUSE_LOG_ERR, "fv_queue_thread ppoll: %m\n");
             break;
         }
-        assert(poll_res == 1);
+        assert(poll_res >= 1);
         if (pf[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
             fuse_log(FUSE_LOG_ERR, "%s: Unexpected poll revents %x Queue %d\n",
                      __func__, pf[0].revents, qi->qidx);
             break;
         }
+        if (pf[1].revents & (POLLERR | POLLHUP | POLLNVAL)) {
+            fuse_log(FUSE_LOG_ERR,
+                     "%s: Unexpected poll revents %x Queue %d killfd\n",
+                     __func__, pf[1].revents, qi->qidx);
+            break;
+        }
+        if (pf[1].revents) {
+            fuse_log(FUSE_LOG_INFO, "%s: kill event on queue %d - quitting\n",
+                     __func__, qi->qidx);
+            break;
+        }
         assert(pf[0].revents & POLLIN);
         fuse_log(FUSE_LOG_DEBUG, "%s: Got queue event on Queue %d\n", __func__,
                  qi->qidx);
@@ -631,15 +646,29 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
         }
         ourqi = vud->qi[qidx];
         ourqi->kick_fd = dev->vq[qidx].kick_fd;
+
+        ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
+        assert(ourqi->kill_fd != -1);
         if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
             fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
                      __func__, qidx);
             assert(0);
         }
     } else {
-        /* TODO: Kill the thread */
+        int ret;
         assert(qidx < vud->nqueues);
         ourqi = vud->qi[qidx];
+
+        /* Kill the thread */
+        if (eventfd_write(ourqi->kill_fd, 1)) {
+            fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
+        }
+        ret = pthread_join(ourqi->thread, NULL);
+        if (ret) {
+            fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err %d\n",
+                     __func__, qidx, ret);
+        }
+        close(ourqi->kill_fd);
         ourqi->kick_fd = -1;
     }
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 064/104] vhost-user: Print unexpected slave message types
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (62 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 063/104] virtiofsd: Kill threads when queues are stopped Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:18   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 065/104] contrib/libvhost-user: Protect slave fd with mutex Dr. David Alan Gilbert (git)
                   ` (42 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When we receive an unexpected message type on the slave fd, print
the type.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost-user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 02a9b25199..33470c14b0 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1055,7 +1055,7 @@ static void slave_read(void *opaque)
                                                           fd[0]);
         break;
     default:
-        error_report("Received unexpected msg type.");
+        error_report("Received unexpected msg type. (%d)", hdr.request);
         ret = -EINVAL;
     }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 065/104] contrib/libvhost-user: Protect slave fd with mutex
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (63 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 064/104] vhost-user: Print unexpected slave message types Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:19   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support Dr. David Alan Gilbert (git)
                   ` (41 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In future patches we'll be performing commands on the slave-fd driven
by commands on queues, since those queues will be driven by individual
threads we need to make sure they don't attempt to use the slave-fd
for multiple commands in parallel.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 24 ++++++++++++++++++++----
 contrib/libvhost-user/libvhost-user.h |  3 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index ec27b78ff1..63e41062a4 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -392,26 +392,37 @@ vu_send_reply(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
     return vu_message_write(dev, conn_fd, vmsg);
 }
 
+/*
+ * Processes a reply on the slave channel.
+ * Entered with slave_mutex held and releases it before exit.
+ * Returns true on success.
+ */
 static bool
 vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
 {
     VhostUserMsg msg_reply;
+    bool result = false;
 
     if ((vmsg->flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
-        return true;
+        result = true;
+        goto out;
     }
 
     if (!vu_message_read(dev, dev->slave_fd, &msg_reply)) {
-        return false;
+        goto out;
     }
 
     if (msg_reply.request != vmsg->request) {
         DPRINT("Received unexpected msg type. Expected %d received %d",
                vmsg->request, msg_reply.request);
-        return false;
+        goto out;
     }
 
-    return msg_reply.payload.u64 == 0;
+    result = msg_reply.payload.u64 == 0;
+
+out:
+    pthread_mutex_unlock(&dev->slave_mutex);
+    return result;
 }
 
 /* Kick the log_call_fd if required. */
@@ -1105,10 +1116,13 @@ bool vu_set_queue_host_notifier(VuDev *dev, VuVirtq *vq, int fd,
         return false;
     }
 
+    pthread_mutex_lock(&dev->slave_mutex);
     if (!vu_message_write(dev, dev->slave_fd, &vmsg)) {
+        pthread_mutex_unlock(&dev->slave_mutex);
         return false;
     }
 
+    /* Also unlocks the slave_mutex */
     return vu_process_message_reply(dev, &vmsg);
 }
 
@@ -1628,6 +1642,7 @@ vu_deinit(VuDev *dev)
         close(dev->slave_fd);
         dev->slave_fd = -1;
     }
+    pthread_mutex_destroy(&dev->slave_mutex);
 
     if (dev->sock != -1) {
         close(dev->sock);
@@ -1663,6 +1678,7 @@ vu_init(VuDev *dev,
     dev->remove_watch = remove_watch;
     dev->iface = iface;
     dev->log_call_fd = -1;
+    pthread_mutex_init(&dev->slave_mutex, NULL);
     dev->slave_fd = -1;
     dev->max_queues = max_queues;
 
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 46b600799b..1844b6f8d4 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -19,6 +19,7 @@
 #include <stddef.h>
 #include <sys/poll.h>
 #include <linux/vhost.h>
+#include <pthread.h>
 #include "standard-headers/linux/virtio_ring.h"
 
 /* Based on qemu/hw/virtio/vhost-user.c */
@@ -355,6 +356,8 @@ struct VuDev {
     VuVirtq *vq;
     VuDevInflightInfo inflight_info;
     int log_call_fd;
+    /* Must be held while using slave_fd */
+    pthread_mutex_t slave_mutex;
     int slave_fd;
     uint64_t log_size;
     uint8_t *log_table;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (64 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 065/104] contrib/libvhost-user: Protect slave fd with mutex Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:21   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 067/104] virtiofsd: passthrough_ll: disable readdirplus on cache=never Dr. David Alan Gilbert (git)
                   ` (40 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

No glibc support yet, so use syscall().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 91d3120033..bed2270141 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1083,7 +1083,17 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
     }
 
     if (flags) {
+#ifndef SYS_renameat2
         fuse_reply_err(req, EINVAL);
+#else
+        res = syscall(SYS_renameat2, lo_fd(req, parent), name,
+                      lo_fd(req, newparent), newname, flags);
+        if (res == -1 && errno == ENOSYS) {
+            fuse_reply_err(req, EINVAL);
+        } else {
+            fuse_reply_err(req, res == -1 ? errno : 0);
+        }
+#endif
         return;
     }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 067/104] virtiofsd: passthrough_ll: disable readdirplus on cache=never
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (65 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:22   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus Dr. David Alan Gilbert (git)
                   ` (39 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

...because the attributes sent in the READDIRPLUS reply would be discarded
anyway.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index bed2270141..0d70a367bd 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -479,6 +479,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
         fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
         conn->want |= FUSE_CAP_FLOCK_LOCKS;
     }
+    if (lo->cache == CACHE_NEVER) {
+        fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling readdirplus\n");
+        conn->want &= ~FUSE_CAP_READDIRPLUS;
+    }
 }
 
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (66 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 067/104] virtiofsd: passthrough_ll: disable readdirplus on cache=never Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:23   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 069/104] virtiofsd: rename unref_inode() to unref_inode_lolocked() Dr. David Alan Gilbert (git)
                   ` (38 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0d70a367bd..c3e8bde5cf 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -118,6 +118,8 @@ struct lo_data {
     double timeout;
     int cache;
     int timeout_set;
+    int readdirplus_set;
+    int readdirplus_clear;
     struct lo_inode root; /* protected by lo->mutex */
     struct lo_map ino_map; /* protected by lo->mutex */
     struct lo_map dirp_map; /* protected by lo->mutex */
@@ -141,6 +143,8 @@ static const struct fuse_opt lo_opts[] = {
     { "cache=auto", offsetof(struct lo_data, cache), CACHE_NORMAL },
     { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },
     { "norace", offsetof(struct lo_data, norace), 1 },
+    { "readdirplus", offsetof(struct lo_data, readdirplus_set), 1 },
+    { "no_readdirplus", offsetof(struct lo_data, readdirplus_clear), 1 },
     FUSE_OPT_END
 };
 static bool use_syslog = false;
@@ -479,7 +483,8 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
         fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
         conn->want |= FUSE_CAP_FLOCK_LOCKS;
     }
-    if (lo->cache == CACHE_NEVER) {
+    if ((lo->cache == CACHE_NEVER && !lo->readdirplus_set) ||
+        lo->readdirplus_clear) {
         fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling readdirplus\n");
         conn->want &= ~FUSE_CAP_READDIRPLUS;
     }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 069/104] virtiofsd: rename unref_inode() to unref_inode_lolocked()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (67 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:23   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 070/104] virtiofsd: fail when parent inode isn't known in lo_do_lookup() Dr. David Alan Gilbert (git)
                   ` (37 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index c3e8bde5cf..1618db5a92 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -149,8 +149,8 @@ static const struct fuse_opt lo_opts[] = {
 };
 static bool use_syslog = false;
 static int current_log_level;
-
-static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
+static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
+                                 uint64_t n);
 
 static struct {
     pthread_mutex_t mutex;
@@ -587,7 +587,7 @@ retry:
     return 0;
 
 fail_unref:
-    unref_inode(lo, p, 1);
+    unref_inode_lolocked(lo, p, 1);
 fail:
     if (retries) {
         retries--;
@@ -625,7 +625,7 @@ fallback:
     res = lo_parent_and_name(lo, inode, path, &parent);
     if (res != -1) {
         res = utimensat(parent->fd, path, tv, AT_SYMLINK_NOFOLLOW);
-        unref_inode(lo, parent, 1);
+        unref_inode_lolocked(lo, parent, 1);
     }
 
     return res;
@@ -1011,7 +1011,7 @@ fallback:
     res = lo_parent_and_name(lo, inode, path, &parent);
     if (res != -1) {
         res = linkat(parent->fd, path, dfd, name, 0);
-        unref_inode(lo, parent, 1);
+        unref_inode_lolocked(lo, parent, 1);
     }
 
     return res;
@@ -1125,7 +1125,8 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
     fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
-static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
+static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
+                                 uint64_t n)
 {
     if (!inode) {
         return;
@@ -1165,7 +1166,7 @@ static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
              (unsigned long long)ino, (unsigned long long)inode->refcount,
              (unsigned long long)nlookup);
 
-    unref_inode(lo, inode, nlookup);
+    unref_inode_lolocked(lo, inode, nlookup);
 }
 
 static void lo_forget(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 070/104] virtiofsd: fail when parent inode isn't known in lo_do_lookup()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (68 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 069/104] virtiofsd: rename unref_inode() to unref_inode_lolocked() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16  7:17   ` Misono Tomohiro
  2020-01-20 10:08   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 071/104] virtiofsd: extract root inode init into setup_root() Dr. David Alan Gilbert (git)
                   ` (36 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

The Linux file handle APIs (struct export_operations) can access inodes
that are not attached to parents because path name traversal is not
performed.  Refuse if there is no parent in lo_do_lookup().

Also clean up lo_do_lookup() while we're here.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 1618db5a92..ef8b88e3d1 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -778,6 +778,15 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode, *dir = lo_inode(req, parent);
 
+    /*
+     * name_to_handle_at() and open_by_handle_at() can reach here with fuse
+     * mount point in guest, but we don't have its inode info in the
+     * ino_map.
+     */
+    if (!dir) {
+        return ENOENT;
+    }
+
     memset(e, 0, sizeof(*e));
     e->attr_timeout = lo->timeout;
     e->entry_timeout = lo->timeout;
@@ -787,7 +796,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         name = ".";
     }
 
-    newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
+    newfd = openat(dir->fd, name, O_PATH | O_NOFOLLOW);
     if (newfd == -1) {
         goto out_err;
     }
@@ -797,7 +806,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out_err;
     }
 
-    inode = lo_find(lo_data(req), &e->attr);
+    inode = lo_find(lo, &e->attr);
     if (inode) {
         close(newfd);
         newfd = -1;
@@ -813,6 +822,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         inode->is_symlink = S_ISLNK(e->attr.st_mode);
         inode->refcount = 1;
         inode->fd = newfd;
+        newfd = -1;
         inode->ino = e->attr.st_ino;
         inode->dev = e->attr.st_dev;
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 071/104] virtiofsd: extract root inode init into setup_root()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (69 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 070/104] virtiofsd: fail when parent inode isn't known in lo_do_lookup() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16  7:20   ` Misono Tomohiro
  2020-01-20 10:09   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename Dr. David Alan Gilbert (git)
                   ` (35 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

Inititialize the root inode in a single place.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index ef8b88e3d1..0f33c3c5e9 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2336,6 +2336,29 @@ static void log_func(enum fuse_log_level level, const char *_fmt, va_list ap)
     }
 }
 
+static void setup_root(struct lo_data *lo, struct lo_inode *root)
+{
+    int fd, res;
+    struct stat stat;
+
+    fd = open("/", O_PATH);
+    if (fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(%s, O_PATH): %m\n", lo->source);
+        exit(1);
+    }
+
+    res = fstatat(fd, "", &stat, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    if (res == -1) {
+        fuse_log(FUSE_LOG_ERR, "fstatat(%s): %m\n", lo->source);
+        exit(1);
+    }
+
+    root->fd = fd;
+    root->ino = stat.st_ino;
+    root->dev = stat.st_dev;
+    root->refcount = 2;
+}
+
 int main(int argc, char *argv[])
 {
     struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -2411,8 +2434,6 @@ int main(int argc, char *argv[])
     if (lo.debug) {
         current_log_level = FUSE_LOG_DEBUG;
     }
-    lo.root.refcount = 2;
-
     if (lo.source) {
         struct stat stat;
         int res;
@@ -2480,6 +2501,7 @@ int main(int argc, char *argv[])
 
     setup_sandbox(&lo, se, opts.syslog);
 
+    setup_root(&lo, &lo.root);
     /* Block until ctrl+c or fusermount -u */
     ret = virtio_loop(se);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (70 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 071/104] virtiofsd: extract root inode init into setup_root() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16 11:56   ` Misono Tomohiro
  2020-01-20 10:17   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 073/104] virtiofsd: passthrough_ll: clean up cache related options Dr. David Alan Gilbert (git)
                   ` (34 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 50 +++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0f33c3c5e9..1b84d4f313 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1077,17 +1077,42 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
+static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
+                                    const char *name)
+{
+    int res;
+    struct stat attr;
+
+    res = fstatat(lo_fd(req, parent), name, &attr,
+                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    if (res == -1) {
+        return NULL;
+    }
+
+    return lo_find(lo_data(req), &attr);
+}
+
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
     int res;
+    struct lo_inode *inode;
+    struct lo_data *lo = lo_data(req);
+
     if (!is_safe_path_component(name)) {
         fuse_reply_err(req, EINVAL);
         return;
     }
 
+    inode = lookup_name(req, parent, name);
+    if (!inode) {
+        fuse_reply_err(req, EIO);
+        return;
+    }
+
     res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
+    unref_inode_lolocked(lo, inode, 1);
 }
 
 static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
@@ -1095,12 +1120,23 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
                       unsigned int flags)
 {
     int res;
+    struct lo_inode *oldinode;
+    struct lo_inode *newinode;
+    struct lo_data *lo = lo_data(req);
 
     if (!is_safe_path_component(name) || !is_safe_path_component(newname)) {
         fuse_reply_err(req, EINVAL);
         return;
     }
 
+    oldinode = lookup_name(req, parent, name);
+    newinode = lookup_name(req, newparent, newname);
+
+    if (!oldinode) {
+        fuse_reply_err(req, EIO);
+        goto out;
+    }
+
     if (flags) {
 #ifndef SYS_renameat2
         fuse_reply_err(req, EINVAL);
@@ -1113,26 +1149,38 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
             fuse_reply_err(req, res == -1 ? errno : 0);
         }
 #endif
-        return;
+        goto out;
     }
 
     res = renameat(lo_fd(req, parent), name, lo_fd(req, newparent), newname);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
+out:
+    unref_inode_lolocked(lo, oldinode, 1);
+    unref_inode_lolocked(lo, newinode, 1);
 }
 
 static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
     int res;
+    struct lo_inode *inode;
+    struct lo_data *lo = lo_data(req);
 
     if (!is_safe_path_component(name)) {
         fuse_reply_err(req, EINVAL);
         return;
     }
 
+    inode = lookup_name(req, parent, name);
+    if (!inode) {
+        fuse_reply_err(req, EIO);
+        return;
+    }
+
     res = unlinkat(lo_fd(req, parent), name, 0);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
+    unref_inode_lolocked(lo, inode, 1);
 }
 
 static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 073/104] virtiofsd: passthrough_ll: clean up cache related options
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (71 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:24   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 074/104] virtiofsd: passthrough_ll: use hashtable Dr. David Alan Gilbert (git)
                   ` (33 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

 - Rename "cache=never" to "cache=none" to match 9p's similar option.

 - Rename CACHE_NORMAL constant to CACHE_AUTO to match the "cache=auto"
   option.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 1b84d4f313..cd26db74cf 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -102,8 +102,8 @@ struct lo_cred {
 };
 
 enum {
-    CACHE_NEVER,
-    CACHE_NORMAL,
+    CACHE_NONE,
+    CACHE_AUTO,
     CACHE_ALWAYS,
 };
 
@@ -139,8 +139,8 @@ static const struct fuse_opt lo_opts[] = {
     { "no_xattr", offsetof(struct lo_data, xattr), 0 },
     { "timeout=%lf", offsetof(struct lo_data, timeout), 0 },
     { "timeout=", offsetof(struct lo_data, timeout_set), 1 },
-    { "cache=never", offsetof(struct lo_data, cache), CACHE_NEVER },
-    { "cache=auto", offsetof(struct lo_data, cache), CACHE_NORMAL },
+    { "cache=none", offsetof(struct lo_data, cache), CACHE_NONE },
+    { "cache=auto", offsetof(struct lo_data, cache), CACHE_AUTO },
     { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },
     { "norace", offsetof(struct lo_data, norace), 1 },
     { "readdirplus", offsetof(struct lo_data, readdirplus_set), 1 },
@@ -483,7 +483,7 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
         fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
         conn->want |= FUSE_CAP_FLOCK_LOCKS;
     }
-    if ((lo->cache == CACHE_NEVER && !lo->readdirplus_set) ||
+    if ((lo->cache == CACHE_NONE && !lo->readdirplus_set) ||
         lo->readdirplus_clear) {
         fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling readdirplus\n");
         conn->want &= ~FUSE_CAP_READDIRPLUS;
@@ -1525,7 +1525,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         fi->fh = fh;
         err = lo_do_lookup(req, parent, name, &e);
     }
-    if (lo->cache == CACHE_NEVER) {
+    if (lo->cache == CACHE_NONE) {
         fi->direct_io = 1;
     } else if (lo->cache == CACHE_ALWAYS) {
         fi->keep_cache = 1;
@@ -1610,7 +1610,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     }
 
     fi->fh = fh;
-    if (lo->cache == CACHE_NEVER) {
+    if (lo->cache == CACHE_NONE) {
         fi->direct_io = 1;
     } else if (lo->cache == CACHE_ALWAYS) {
         fi->keep_cache = 1;
@@ -2427,7 +2427,7 @@ int main(int argc, char *argv[])
     lo.root.next = lo.root.prev = &lo.root;
     lo.root.fd = -1;
     lo.root.fuse_ino = FUSE_ROOT_ID;
-    lo.cache = CACHE_NORMAL;
+    lo.cache = CACHE_AUTO;
 
     /*
      * Set up the ino map like this:
@@ -2503,11 +2503,11 @@ int main(int argc, char *argv[])
     lo.root.is_symlink = false;
     if (!lo.timeout_set) {
         switch (lo.cache) {
-        case CACHE_NEVER:
+        case CACHE_NONE:
             lo.timeout = 0.0;
             break;
 
-        case CACHE_NORMAL:
+        case CACHE_AUTO:
             lo.timeout = 1.0;
             break;
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 074/104] virtiofsd: passthrough_ll: use hashtable
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (72 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 073/104] virtiofsd: passthrough_ll: clean up cache related options Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:28   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 075/104] virtiofsd: Clean up inodes on destroy Dr. David Alan Gilbert (git)
                   ` (32 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Miklos Szeredi <mszeredi@redhat.com>

Improve performance of inode lookup by using a hash table.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 81 ++++++++++++++++++--------------
 1 file changed, 45 insertions(+), 36 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index cd26db74cf..bbc5f0981e 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -85,13 +85,15 @@ struct lo_map {
     ssize_t freelist;
 };
 
+struct lo_key {
+    ino_t ino;
+    dev_t dev;
+};
+
 struct lo_inode {
-    struct lo_inode *next; /* protected by lo->mutex */
-    struct lo_inode *prev; /* protected by lo->mutex */
     int fd;
     bool is_symlink;
-    ino_t ino;
-    dev_t dev;
+    struct lo_key key;
     uint64_t refcount; /* protected by lo->mutex */
     fuse_ino_t fuse_ino;
 };
@@ -120,7 +122,8 @@ struct lo_data {
     int timeout_set;
     int readdirplus_set;
     int readdirplus_clear;
-    struct lo_inode root; /* protected by lo->mutex */
+    struct lo_inode root;
+    GHashTable *inodes; /* protected by lo->mutex */
     struct lo_map ino_map; /* protected by lo->mutex */
     struct lo_map dirp_map; /* protected by lo->mutex */
     struct lo_map fd_map; /* protected by lo->mutex */
@@ -574,7 +577,7 @@ retry:
         }
         goto fail_unref;
     }
-    if (stat.st_dev != inode->dev || stat.st_ino != inode->ino) {
+    if (stat.st_dev != inode->key.dev || stat.st_ino != inode->key.ino) {
         if (!retries) {
             fuse_log(FUSE_LOG_WARNING,
                      "lo_parent_and_name: failed to match last\n");
@@ -754,19 +757,20 @@ out_err:
 static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st)
 {
     struct lo_inode *p;
-    struct lo_inode *ret = NULL;
+    struct lo_key key = {
+        .ino = st->st_ino,
+        .dev = st->st_dev,
+    };
 
     pthread_mutex_lock(&lo->mutex);
-    for (p = lo->root.next; p != &lo->root; p = p->next) {
-        if (p->ino == st->st_ino && p->dev == st->st_dev) {
-            assert(p->refcount > 0);
-            ret = p;
-            ret->refcount++;
-            break;
-        }
+    p = g_hash_table_lookup(lo->inodes, &key);
+    if (p) {
+        assert(p->refcount > 0);
+        p->refcount++;
     }
     pthread_mutex_unlock(&lo->mutex);
-    return ret;
+
+    return p;
 }
 
 static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
@@ -811,8 +815,6 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         close(newfd);
         newfd = -1;
     } else {
-        struct lo_inode *prev, *next;
-
         saverr = ENOMEM;
         inode = calloc(1, sizeof(struct lo_inode));
         if (!inode) {
@@ -823,17 +825,12 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         inode->refcount = 1;
         inode->fd = newfd;
         newfd = -1;
-        inode->ino = e->attr.st_ino;
-        inode->dev = e->attr.st_dev;
+        inode->key.ino = e->attr.st_ino;
+        inode->key.dev = e->attr.st_dev;
 
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
-        prev = &lo->root;
-        next = prev->next;
-        next->prev = inode;
-        inode->next = next;
-        inode->prev = prev;
-        prev->next = inode;
+        g_hash_table_insert(lo->inodes, &inode->key, inode);
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1194,14 +1191,8 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
     assert(inode->refcount >= n);
     inode->refcount -= n;
     if (!inode->refcount) {
-        struct lo_inode *prev, *next;
-
-        prev = inode->prev;
-        next = inode->next;
-        next->prev = prev;
-        prev->next = next;
-
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
+        g_hash_table_remove(lo->inodes, &inode->key);
         pthread_mutex_unlock(&lo->mutex);
         close(inode->fd);
         free(inode);
@@ -1401,7 +1392,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
 
         /* Hide root's parent directory */
         if (dinode == &lo->root && strcmp(name, "..") == 0) {
-            e.attr.st_ino = lo->root.ino;
+            e.attr.st_ino = lo->root.key.ino;
             e.attr.st_mode = DT_DIR << 12;
         }
 
@@ -2402,11 +2393,26 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
     }
 
     root->fd = fd;
-    root->ino = stat.st_ino;
-    root->dev = stat.st_dev;
+    root->key.ino = stat.st_ino;
+    root->key.dev = stat.st_dev;
     root->refcount = 2;
 }
 
+static guint lo_key_hash(gconstpointer key)
+{
+    const struct lo_key *lkey = key;
+
+    return (guint)lkey->ino + (guint)lkey->dev;
+}
+
+static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
+{
+    const struct lo_key *la = a;
+    const struct lo_key *lb = b;
+
+    return la->ino == lb->ino && la->dev == lb->dev;
+}
+
 int main(int argc, char *argv[])
 {
     struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -2424,7 +2430,7 @@ int main(int argc, char *argv[])
     umask(0);
 
     pthread_mutex_init(&lo.mutex, NULL);
-    lo.root.next = lo.root.prev = &lo.root;
+    lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
     lo.root.fd = -1;
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
@@ -2562,6 +2568,9 @@ err_out2:
 err_out1:
     fuse_opt_free_args(&args);
 
+    if (lo.inodes) {
+        g_hash_table_destroy(lo.inodes);
+    }
     lo_map_destroy(&lo.fd_map);
     lo_map_destroy(&lo.dirp_map);
     lo_map_destroy(&lo.ino_map);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 075/104] virtiofsd: Clean up inodes on destroy
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (73 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 074/104] virtiofsd: passthrough_ll: use hashtable Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:29   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 076/104] virtiofsd: support nanosecond resolution for file timestamp Dr. David Alan Gilbert (git)
                   ` (31 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Clear out our inodes and fd's on a 'destroy' - so we get rid
of them if we reboot the guest.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index bbc5f0981e..c526d6f4f6 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1201,6 +1201,25 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
     }
 }
 
+static int unref_all_inodes_cb(gpointer key, gpointer value, gpointer user_data)
+{
+    struct lo_inode *inode = value;
+    struct lo_data *lo = user_data;
+
+    inode->refcount = 0;
+    lo_map_remove(&lo->ino_map, inode->fuse_ino);
+    close(inode->fd);
+
+    return TRUE;
+}
+
+static void unref_all_inodes(struct lo_data *lo)
+{
+    pthread_mutex_lock(&lo->mutex);
+    g_hash_table_foreach_remove(lo->inodes, unref_all_inodes_cb, lo);
+    pthread_mutex_unlock(&lo->mutex);
+}
+
 static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
 {
     struct lo_data *lo = lo_data(req);
@@ -2066,6 +2085,12 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
     }
 }
 
+static void lo_destroy(void *userdata)
+{
+    struct lo_data *lo = (struct lo_data *)userdata;
+    unref_all_inodes(lo);
+}
+
 static struct fuse_lowlevel_ops lo_oper = {
     .init = lo_init,
     .lookup = lo_lookup,
@@ -2104,6 +2129,7 @@ static struct fuse_lowlevel_ops lo_oper = {
     .copy_file_range = lo_copy_file_range,
 #endif
     .lseek = lo_lseek,
+    .destroy = lo_destroy,
 };
 
 /* Print vhost-user.json backend program capabilities */
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 076/104] virtiofsd: support nanosecond resolution for file timestamp
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (74 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 075/104] virtiofsd: Clean up inodes on destroy Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:30   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 077/104] virtiofsd: fix error handling in main() Dr. David Alan Gilbert (git)
                   ` (30 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Jiufei Xue <jiufei.xue@linux.alibaba.com>

Define HAVE_STRUCT_STAT_ST_ATIM to 1 if `st_atim' is member of `struct
stat' which means support nanosecond resolution for the file timestamp
fields.

Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
---
 configure                   | 16 ++++++++++++++++
 tools/virtiofsd/fuse_misc.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/configure b/configure
index afe9393f04..dd50b03b01 100755
--- a/configure
+++ b/configure
@@ -5217,6 +5217,19 @@ if compile_prog "" "" ; then
     strchrnul=yes
 fi
 
+#########################################
+# check if we have st_atim
+
+st_atim=no
+cat > $TMPC << EOF
+#include <sys/stat.h>
+#include <stddef.h>
+int main(void) { return offsetof(struct stat, st_atim); }
+EOF
+if compile_prog "" "" ; then
+    st_atim=yes
+fi
+
 ##########################################
 # check if trace backend exists
 
@@ -6918,6 +6931,9 @@ fi
 if test "$strchrnul" = "yes" ; then
   echo "HAVE_STRCHRNUL=y" >> $config_host_mak
 fi
+if test "$st_atim" = "yes" ; then
+  echo "HAVE_STRUCT_STAT_ST_ATIM=y" >> $config_host_mak
+fi
 if test "$byteswap_h" = "yes" ; then
   echo "CONFIG_BYTESWAP_H=y" >> $config_host_mak
 fi
diff --git a/tools/virtiofsd/fuse_misc.h b/tools/virtiofsd/fuse_misc.h
index f252baa752..5c618ce21f 100644
--- a/tools/virtiofsd/fuse_misc.h
+++ b/tools/virtiofsd/fuse_misc.h
@@ -7,6 +7,7 @@
  */
 
 #include <pthread.h>
+#include "config-host.h"
 
 /*
  * Versioned symbols cannot be used in some cases because it
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 077/104] virtiofsd: fix error handling in main()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (75 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 076/104] virtiofsd: support nanosecond resolution for file timestamp Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:30   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 078/104] virtiofsd: cleanup allocated resource in se Dr. David Alan Gilbert (git)
                   ` (29 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Liu Bo <bo.liu@linux.alibaba.com>

Neither fuse_parse_cmdline() nor fuse_opt_parse() goes to the right place
to do cleanup.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index c526d6f4f6..33092de65a 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2475,13 +2475,14 @@ int main(int argc, char *argv[])
     lo_map_init(&lo.fd_map);
 
     if (fuse_parse_cmdline(&args, &opts) != 0) {
-        return 1;
+        goto err_out1;
     }
     fuse_set_log_func(log_func);
     use_syslog = opts.syslog;
     if (use_syslog) {
         openlog("virtiofsd", LOG_PID, LOG_DAEMON);
     }
+
     if (opts.show_help) {
         printf("usage: %s [options]\n\n", argv[0]);
         fuse_cmdline_help();
@@ -2500,7 +2501,7 @@ int main(int argc, char *argv[])
     }
 
     if (fuse_opt_parse(&args, &lo, lo_opts, NULL) == -1) {
-        return 1;
+        goto err_out1;
     }
 
     /*
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 078/104] virtiofsd: cleanup allocated resource in se
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (76 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 077/104] virtiofsd: fix error handling in main() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:34   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 079/104] virtiofsd: fix memory leak on lo.source Dr. David Alan Gilbert (git)
                   ` (28 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Liu Bo <bo.liu@linux.alibaba.com>

This cleans up unfreed resources in se on quiting, including
se->virtio_dev, se->vu_socket_path, se->vu_socketfd.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 7 +++++++
 tools/virtiofsd/fuse_virtio.c   | 7 +++++++
 tools/virtiofsd/fuse_virtio.h   | 2 +-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 45125ef66a..14c9d99374 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2539,6 +2539,13 @@ void fuse_session_destroy(struct fuse_session *se)
     if (se->fd != -1) {
         close(se->fd);
     }
+
+    if (se->vu_socket_path) {
+        virtio_session_close(se);
+        free(se->vu_socket_path);
+        se->vu_socket_path = NULL;
+    }
+
     free(se);
 }
 
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 2f11fee46d..1b5d27fe16 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -827,3 +827,10 @@ int virtio_session_mount(struct fuse_session *se)
 
     return 0;
 }
+
+void virtio_session_close(struct fuse_session *se)
+{
+    close(se->vu_socketfd);
+    free(se->virtio_dev);
+    se->virtio_dev = NULL;
+}
diff --git a/tools/virtiofsd/fuse_virtio.h b/tools/virtiofsd/fuse_virtio.h
index cc676b9193..111684032c 100644
--- a/tools/virtiofsd/fuse_virtio.h
+++ b/tools/virtiofsd/fuse_virtio.h
@@ -19,7 +19,7 @@
 struct fuse_session;
 
 int virtio_session_mount(struct fuse_session *se);
-
+void virtio_session_close(struct fuse_session *se);
 int virtio_loop(struct fuse_session *se);
 
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 079/104] virtiofsd: fix memory leak on lo.source
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (77 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 078/104] virtiofsd: cleanup allocated resource in se Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:37   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 080/104] virtiofsd: add helper for lo_data cleanup Dr. David Alan Gilbert (git)
                   ` (27 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Liu Bo <bo.liu@linux.alibaba.com>

valgrind reported that lo.source is leaked on quiting, but it was defined
as (const char*) as it may point to a const string "/".

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 33092de65a..45cf466178 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2529,9 +2529,8 @@ int main(int argc, char *argv[])
             fuse_log(FUSE_LOG_ERR, "source is not a directory\n");
             exit(1);
         }
-
     } else {
-        lo.source = "/";
+        lo.source = strdup("/");
     }
     lo.root.is_symlink = false;
     if (!lo.timeout_set) {
@@ -2610,5 +2609,7 @@ err_out1:
         close(lo.root.fd);
     }
 
+    free((char *)lo.source);
+
     return ret ? 1 : 0;
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 080/104] virtiofsd: add helper for lo_data cleanup
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (78 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 079/104] virtiofsd: fix memory leak on lo.source Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:40   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 081/104] virtiofsd: Prevent multiply running with same vhost_user_socket Dr. David Alan Gilbert (git)
                   ` (26 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Liu Bo <bo.liu@linux.alibaba.com>

This offers an helper function for lo_data's cleanup.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 37 ++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 45cf466178..097033aa00 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2439,6 +2439,26 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
     return la->ino == lb->ino && la->dev == lb->dev;
 }
 
+static void fuse_lo_data_cleanup(struct lo_data *lo)
+{
+    if (lo->inodes) {
+        g_hash_table_destroy(lo->inodes);
+    }
+    lo_map_destroy(&lo->fd_map);
+    lo_map_destroy(&lo->dirp_map);
+    lo_map_destroy(&lo->ino_map);
+
+    if (lo->proc_self_fd >= 0) {
+        close(lo->proc_self_fd);
+    }
+
+    if (lo->root.fd >= 0) {
+        close(lo->root.fd);
+    }
+
+    free((char *)lo->source);
+}
+
 int main(int argc, char *argv[])
 {
     struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -2594,22 +2614,7 @@ err_out2:
 err_out1:
     fuse_opt_free_args(&args);
 
-    if (lo.inodes) {
-        g_hash_table_destroy(lo.inodes);
-    }
-    lo_map_destroy(&lo.fd_map);
-    lo_map_destroy(&lo.dirp_map);
-    lo_map_destroy(&lo.ino_map);
-
-    if (lo.proc_self_fd >= 0) {
-        close(lo.proc_self_fd);
-    }
-
-    if (lo.root.fd >= 0) {
-        close(lo.root.fd);
-    }
-
-    free((char *)lo.source);
+    fuse_lo_data_cleanup(&lo);
 
     return ret ? 1 : 0;
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 081/104] virtiofsd: Prevent multiply running with same vhost_user_socket
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (79 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 080/104] virtiofsd: add helper for lo_data cleanup Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:43   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 082/104] virtiofsd: enable PARALLEL_DIROPS during INIT Dr. David Alan Gilbert (git)
                   ` (25 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

virtiofsd can run multiply even if the vhost_user_socket is same path.

  ]# ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/tmp/share &
  [1] 244965
  virtio_session_mount: Waiting for vhost-user socket connection...
  ]# ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/tmp/share &
  [2] 244966
  virtio_session_mount: Waiting for vhost-user socket connection...
  ]#

The user will get confused about the situation and maybe the cause of the
unexpected problem. So it's better to prevent the multiple running.

Create a regular file under localstatedir directory to exclude the
vhost_user_socket. To create and lock the file, use qemu_write_pidfile()
because the API has some sanity checks and file lock.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Applied fixes from Stefan's review and moved osdep include
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_lowlevel.c |  1 +
 tools/virtiofsd/fuse_virtio.c   | 49 ++++++++++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 14c9d99374..b1ff684de9 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -17,6 +17,7 @@
 
 #include <assert.h>
 #include <errno.h>
+#include <glib.h>
 #include <limits.h>
 #include <stdbool.h>
 #include <stddef.h>
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 1b5d27fe16..7b22ae8d4f 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -13,11 +13,12 @@
 
 #include "qemu/osdep.h"
 #include "qemu/iov.h"
-#include "fuse_virtio.h"
+#include "qapi/error.h"
 #include "fuse_i.h"
 #include "standard-headers/linux/fuse.h"
 #include "fuse_misc.h"
 #include "fuse_opt.h"
+#include "fuse_virtio.h"
 
 #include <assert.h>
 #include <errno.h>
@@ -733,6 +734,42 @@ int virtio_loop(struct fuse_session *se)
     return 0;
 }
 
+static void strreplace(char *s, char old, char new)
+{
+    for (; *s; ++s) {
+        if (*s == old) {
+            *s = new;
+        }
+    }
+}
+
+static bool fv_socket_lock(struct fuse_session *se)
+{
+    g_autofree gchar *sk_name = NULL;
+    g_autofree gchar *pidfile = NULL;
+    g_autofree gchar *dir = NULL;
+    Error *local_err = NULL;
+
+    dir = qemu_get_local_state_pathname("run/virtiofsd");
+
+    if (g_mkdir_with_parents(dir, S_IRWXU) < 0) {
+        fuse_log(FUSE_LOG_ERR, "%s: Failed to create directory %s: %s",
+                 __func__, dir, strerror(errno));
+        return false;
+    }
+
+    sk_name = g_strdup(se->vu_socket_path);
+    strreplace(sk_name, '/', '.');
+    pidfile = g_strdup_printf("%s/%s.pid", dir, sk_name);
+
+    if (!qemu_write_pidfile(pidfile, &local_err)) {
+        error_report_err(local_err);
+        return false;
+    }
+
+    return true;
+}
+
 static int fv_create_listen_socket(struct fuse_session *se)
 {
     struct sockaddr_un un;
@@ -748,6 +785,16 @@ static int fv_create_listen_socket(struct fuse_session *se)
         return -1;
     }
 
+    if (!strlen(se->vu_socket_path)) {
+        fuse_log(FUSE_LOG_ERR, "Socket path is empty\n");
+        return -1;
+    }
+
+    /* Check the vu_socket_path is already used */
+    if (!fv_socket_lock(se)) {
+        return -1;
+    }
+
     /*
      * Create the Unix socket to communicate with qemu
      * based on QEMU's vhost-user-bridge
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 082/104] virtiofsd: enable PARALLEL_DIROPS during INIT
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (80 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 081/104] virtiofsd: Prevent multiply running with same vhost_user_socket Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:44   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 083/104] virtiofsd: fix incorrect error handling in lo_do_lookup Dr. David Alan Gilbert (git)
                   ` (24 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Liu Bo <bo.liu@linux.alibaba.com>

lookup is a RO operations, PARALLEL_DIROPS can be enabled.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index b1ff684de9..4b5fe1d7a1 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2064,6 +2064,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     if (se->conn.want & FUSE_CAP_ASYNC_READ) {
         outarg.flags |= FUSE_ASYNC_READ;
     }
+    if (se->conn.want & FUSE_CAP_PARALLEL_DIROPS) {
+        outarg.flags |= FUSE_PARALLEL_DIROPS;
+    }
     if (se->conn.want & FUSE_CAP_POSIX_LOCKS) {
         outarg.flags |= FUSE_POSIX_LOCKS;
     }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 083/104] virtiofsd: fix incorrect error handling in lo_do_lookup
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (81 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 082/104] virtiofsd: enable PARALLEL_DIROPS during INIT Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 11:45   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Eric Ren <renzhen@linux.alibaba.com>

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 097033aa00..fbcc222860 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -815,7 +815,6 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         close(newfd);
         newfd = -1;
     } else {
-        saverr = ENOMEM;
         inode = calloc(1, sizeof(struct lo_inode));
         if (!inode) {
             goto out_err;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (82 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 083/104] virtiofsd: fix incorrect error handling in lo_do_lookup Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-15 11:20   ` Misono Tomohiro
  2020-01-20 10:24   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 085/104] virtiofsd: Support remote posix locks Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Liu Bo <bo.liu@linux.alibaba.com>

For fuse's queueinfo, both queueinfo array and queueinfos are allocated in
fv_queue_set_started() but not cleaned up when the daemon process quits.

This fixes the leak in proper places.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
---
 tools/virtiofsd/fuse_virtio.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 7b22ae8d4f..a364f23d5d 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -671,6 +671,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
         }
         close(ourqi->kill_fd);
         ourqi->kick_fd = -1;
+        free(vud->qi[qidx]);
+        vud->qi[qidx] = NULL;
     }
 }
 
@@ -878,6 +880,13 @@ int virtio_session_mount(struct fuse_session *se)
 void virtio_session_close(struct fuse_session *se)
 {
     close(se->vu_socketfd);
+
+    if (!se->virtio_dev) {
+        return;
+    }
+
+    close(se->vu_socketfd);
+    free(se->virtio_dev->qi);
     free(se->virtio_dev);
     se->virtio_dev = NULL;
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 085/104] virtiofsd: Support remote posix locks
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (83 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-15 23:38   ` Masayoshi Mizuma
  2019-12-12 16:38 ` [PATCH 086/104] virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy() Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Vivek Goyal <vgoyal@redhat.com>

Doing posix locks with-in guest kernel are not sufficient if a file/dir
is being shared by multiple guests. So we need the notion of daemon doing
the locks which are visible to rest of the guests.

Given posix locks are per process, one can not call posix lock API on host,
otherwise bunch of basic posix locks properties are broken. For example,
If two processes (A and B) in guest open the file and take locks on different
sections of file, if one of the processes closes the fd, it will close
fd on virtiofsd and all posix locks on file will go away. This means if
process A closes the fd, then locks of process B will go away too.

Similar other problems exist too.

This patch set tries to emulate posix locks while using open file
description locks provided on Linux.

Daemon provides two options (-o posix_lock, -o no_posix_lock) to enable
or disable posix locking in daemon. By default it is enabled.

There are few issues though.

- GETLK() returns pid of process holding lock. As we are emulating locks
  using OFD, and these locks are not per process and don't return pid
  of process, so GETLK() in guest does not reuturn process pid.

- As of now only F_SETLK is supported and not F_SETLKW. We can't block
  the thread in virtiofsd for arbitrary long duration as there is only
  one thread serving the queue. That means unlock request will not make
  it to daemon and F_SETLKW will block infinitely and bring virtio-fs
  to a halt. This is a solvable problem though and will require significant
  changes in virtiofsd and kernel. Left as a TODO item for now.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 190 +++++++++++++++++++++++++++++++
 1 file changed, 190 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index fbcc222860..fc79d5ac43 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -68,6 +68,13 @@
 #include "seccomp.h"
 
 #define HAVE_POSIX_FALLOCATE 1
+
+/* Keep track of inode posix locks for each owner. */
+struct lo_inode_plock {
+    uint64_t lock_owner;
+    int fd; /* fd for OFD locks */
+};
+
 struct lo_map_elem {
     union {
         struct lo_inode *inode;
@@ -96,6 +103,8 @@ struct lo_inode {
     struct lo_key key;
     uint64_t refcount; /* protected by lo->mutex */
     fuse_ino_t fuse_ino;
+    pthread_mutex_t plock_mutex;
+    GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
 };
 
 struct lo_cred {
@@ -115,6 +124,7 @@ struct lo_data {
     int norace;
     int writeback;
     int flock;
+    int posix_lock;
     int xattr;
     const char *source;
     double timeout;
@@ -138,6 +148,8 @@ static const struct fuse_opt lo_opts[] = {
     { "source=%s", offsetof(struct lo_data, source), 0 },
     { "flock", offsetof(struct lo_data, flock), 1 },
     { "no_flock", offsetof(struct lo_data, flock), 0 },
+    { "posix_lock", offsetof(struct lo_data, posix_lock), 1 },
+    { "no_posix_lock", offsetof(struct lo_data, posix_lock), 0 },
     { "xattr", offsetof(struct lo_data, xattr), 1 },
     { "no_xattr", offsetof(struct lo_data, xattr), 0 },
     { "timeout=%lf", offsetof(struct lo_data, timeout), 0 },
@@ -486,6 +498,17 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
         fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
         conn->want |= FUSE_CAP_FLOCK_LOCKS;
     }
+
+    if (conn->capable & FUSE_CAP_POSIX_LOCKS) {
+        if (lo->posix_lock) {
+            fuse_log(FUSE_LOG_DEBUG, "lo_init: activating posix locks\n");
+            conn->want |= FUSE_CAP_POSIX_LOCKS;
+        } else {
+            fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling posix locks\n");
+            conn->want &= ~FUSE_CAP_POSIX_LOCKS;
+        }
+    }
+
     if ((lo->cache == CACHE_NONE && !lo->readdirplus_set) ||
         lo->readdirplus_clear) {
         fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling readdirplus\n");
@@ -773,6 +796,19 @@ static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st)
     return p;
 }
 
+/* value_destroy_func for posix_locks GHashTable */
+static void posix_locks_value_destroy(gpointer data)
+{
+    struct lo_inode_plock *plock = data;
+
+    /*
+     * We had used open() for locks and had only one fd. So
+     * closing this fd should release all OFD locks.
+     */
+    close(plock->fd);
+    free(plock);
+}
+
 static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
                         struct fuse_entry_param *e)
 {
@@ -826,6 +862,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         newfd = -1;
         inode->key.ino = e->attr.st_ino;
         inode->key.dev = e->attr.st_dev;
+        pthread_mutex_init(&inode->plock_mutex, NULL);
+        inode->posix_locks = g_hash_table_new_full(
+            g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
 
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
@@ -1192,6 +1231,11 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
     if (!inode->refcount) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
         g_hash_table_remove(lo->inodes, &inode->key);
+        if (g_hash_table_size(inode->posix_locks)) {
+            fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
+        }
+        g_hash_table_destroy(inode->posix_locks);
+        pthread_mutex_destroy(&inode->plock_mutex);
         pthread_mutex_unlock(&lo->mutex);
         close(inode->fd);
         free(inode);
@@ -1548,6 +1592,136 @@ out:
     }
 }
 
+/* Should be called with inode->plock_mutex held */
+static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
+                                                      struct lo_inode *inode,
+                                                      uint64_t lock_owner,
+                                                      pid_t pid, int *err)
+{
+    struct lo_inode_plock *plock;
+    char procname[64];
+    int fd;
+
+    plock =
+        g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
+
+    if (plock) {
+        return plock;
+    }
+
+    plock = malloc(sizeof(struct lo_inode_plock));
+    if (!plock) {
+        *err = ENOMEM;
+        return NULL;
+    }
+
+    /* Open another instance of file which can be used for ofd locks. */
+    sprintf(procname, "%i", inode->fd);
+
+    /* TODO: What if file is not writable? */
+    fd = openat(lo->proc_self_fd, procname, O_RDWR);
+    if (fd == -1) {
+        *err = -errno;
+        free(plock);
+        return NULL;
+    }
+
+    plock->lock_owner = lock_owner;
+    plock->fd = fd;
+    g_hash_table_insert(inode->posix_locks, GUINT_TO_POINTER(plock->lock_owner),
+                        plock);
+    return plock;
+}
+
+static void lo_getlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
+                     struct flock *lock)
+{
+    struct lo_data *lo = lo_data(req);
+    struct lo_inode *inode;
+    struct lo_inode_plock *plock;
+    int ret, saverr = 0;
+
+    fuse_log(FUSE_LOG_DEBUG,
+             "lo_getlk(ino=%" PRIu64 ", flags=%d)"
+             " owner=0x%lx, l_type=%d l_start=0x%lx"
+             " l_len=0x%lx\n",
+             ino, fi->flags, fi->lock_owner, lock->l_type, lock->l_start,
+             lock->l_len);
+
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
+    pthread_mutex_lock(&inode->plock_mutex);
+    plock =
+        lookup_create_plock_ctx(lo, inode, fi->lock_owner, lock->l_pid, &ret);
+    if (!plock) {
+        pthread_mutex_unlock(&inode->plock_mutex);
+        fuse_reply_err(req, ret);
+        return;
+    }
+
+    ret = fcntl(plock->fd, F_OFD_GETLK, lock);
+    if (ret == -1) {
+        saverr = errno;
+    }
+    pthread_mutex_unlock(&inode->plock_mutex);
+
+    if (saverr) {
+        fuse_reply_err(req, saverr);
+    } else {
+        fuse_reply_lock(req, lock);
+    }
+}
+
+static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
+                     struct flock *lock, int sleep)
+{
+    struct lo_data *lo = lo_data(req);
+    struct lo_inode *inode;
+    struct lo_inode_plock *plock;
+    int ret, saverr = 0;
+
+    fuse_log(FUSE_LOG_DEBUG,
+             "lo_setlk(ino=%" PRIu64 ", flags=%d)"
+             " cmd=%d pid=%d owner=0x%lx sleep=%d l_whence=%d"
+             " l_start=0x%lx l_len=0x%lx\n",
+             ino, fi->flags, lock->l_type, lock->l_pid, fi->lock_owner, sleep,
+             lock->l_whence, lock->l_start, lock->l_len);
+
+    if (sleep) {
+        fuse_reply_err(req, EOPNOTSUPP);
+        return;
+    }
+
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
+    pthread_mutex_lock(&inode->plock_mutex);
+    plock =
+        lookup_create_plock_ctx(lo, inode, fi->lock_owner, lock->l_pid, &ret);
+
+    if (!plock) {
+        pthread_mutex_unlock(&inode->plock_mutex);
+        fuse_reply_err(req, ret);
+        return;
+    }
+
+    /* TODO: Is it alright to modify flock? */
+    lock->l_pid = 0;
+    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
+    if (ret == -1) {
+        saverr = errno;
+    }
+    pthread_mutex_unlock(&inode->plock_mutex);
+    fuse_reply_err(req, saverr);
+}
+
 static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
                         struct fuse_file_info *fi)
 {
@@ -1649,6 +1823,19 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
     int res;
     (void)ino;
+    struct lo_inode *inode;
+
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
+    /* An fd is going away. Cleanup associated posix locks */
+    pthread_mutex_lock(&inode->plock_mutex);
+    g_hash_table_remove(inode->posix_locks, GUINT_TO_POINTER(fi->lock_owner));
+    pthread_mutex_unlock(&inode->plock_mutex);
+
     res = close(dup(lo_fi_fd(req, fi)));
     fuse_reply_err(req, res == -1 ? errno : 0);
 }
@@ -2111,6 +2298,8 @@ static struct fuse_lowlevel_ops lo_oper = {
     .releasedir = lo_releasedir,
     .fsyncdir = lo_fsyncdir,
     .create = lo_create,
+    .getlk = lo_getlk,
+    .setlk = lo_setlk,
     .open = lo_open,
     .release = lo_release,
     .flush = lo_flush,
@@ -2466,6 +2655,7 @@ int main(int argc, char *argv[])
     struct lo_data lo = {
         .debug = 0,
         .writeback = 0,
+        .posix_lock = 1,
         .proc_self_fd = -1,
     };
     struct lo_map_elem *root_elem;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 086/104] virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (84 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 085/104] virtiofsd: Support remote posix locks Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 12:01   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 087/104] virtiofsd: prevent fv_queue_thread() vs virtio_loop() races Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

vu_socket_path is NULL when --fd=FDNUM was used.  Use
fuse_lowlevel_is_virtio() instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
  pull request 10
---
 tools/virtiofsd/fuse_lowlevel.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 4b5fe1d7a1..10f478b00c 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2544,12 +2544,13 @@ void fuse_session_destroy(struct fuse_session *se)
         close(se->fd);
     }
 
-    if (se->vu_socket_path) {
+    if (fuse_lowlevel_is_virtio(se)) {
         virtio_session_close(se);
-        free(se->vu_socket_path);
-        se->vu_socket_path = NULL;
     }
 
+    free(se->vu_socket_path);
+    se->vu_socket_path = NULL;
+
     free(se);
 }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 087/104] virtiofsd: prevent fv_queue_thread() vs virtio_loop() races
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (85 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 086/104] virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 12:02   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 088/104] virtiofsd: make lo_release() atomic Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

We call into libvhost-user from the virtqueue handler thread and the
vhost-user message processing thread without a lock.  There is nothing
protecting the virtqueue handler thread if the vhost-user message
processing thread changes the virtqueue or memory table while it is
running.

This patch introduces a read-write lock.  Virtqueue handler threads are
readers.  The vhost-user message processing thread is a writer.  This
will allow concurrency for multiqueue in the future while protecting
against fv_queue_thread() vs virtio_loop() races.

Note that the critical sections could be made smaller but it would be
more invasive and require libvhost-user changes.  Let's start simple and
improve performance later, if necessary.  Another option would be an
RCU-style approach with lighter-weight primitives.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
  Merged with log changes
  pull request 12
---
 tools/virtiofsd/fuse_virtio.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index a364f23d5d..2c1e524852 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -58,6 +58,18 @@ struct fv_VuDev {
     VuDev dev;
     struct fuse_session *se;
 
+    /*
+     * Either handle virtqueues or vhost-user protocol messages.  Don't do
+     * both at the same time since that could lead to race conditions if
+     * virtqueues or memory tables change while another thread is accessing
+     * them.
+     *
+     * The assumptions are:
+     * 1. fv_queue_thread() reads/writes to virtqueues and only reads VuDev.
+     * 2. virtio_loop() reads/writes virtqueues and VuDev.
+     */
+    pthread_rwlock_t vu_dispatch_rwlock;
+
     /*
      * The following pair of fields are only accessed in the main
      * virtio_loop
@@ -413,6 +425,8 @@ static void *fv_queue_thread(void *opaque)
              qi->qidx, qi->kick_fd);
     while (1) {
         struct pollfd pf[2];
+        int ret;
+
         pf[0].fd = qi->kick_fd;
         pf[0].events = POLLIN;
         pf[0].revents = 0;
@@ -459,6 +473,9 @@ static void *fv_queue_thread(void *opaque)
             fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
             break;
         }
+        /* Mutual exclusion with virtio_loop() */
+        ret = pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+        assert(ret == 0); /* there is no possible error case */
         /* out is from guest, in is too guest */
         unsigned int in_bytes, out_bytes;
         vu_queue_get_avail_bytes(dev, q, &in_bytes, &out_bytes, ~0, ~0);
@@ -467,6 +484,7 @@ static void *fv_queue_thread(void *opaque)
                  "%s: Queue %d gave evalue: %zx available: in: %u out: %u\n",
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
+
         while (1) {
             bool allocated_bufv = false;
             struct fuse_bufvec bufv;
@@ -595,6 +613,8 @@ static void *fv_queue_thread(void *opaque)
             free(elem);
             elem = NULL;
         }
+
+        pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
     }
 out:
     pthread_mutex_destroy(&ch.lock);
@@ -701,6 +721,8 @@ int virtio_loop(struct fuse_session *se)
 
     while (!fuse_session_exited(se)) {
         struct pollfd pf[1];
+        bool ok;
+        int ret;
         pf[0].fd = se->vu_socketfd;
         pf[0].events = POLLIN;
         pf[0].revents = 0;
@@ -725,7 +747,15 @@ int virtio_loop(struct fuse_session *se)
         }
         assert(pf[0].revents & POLLIN);
         fuse_log(FUSE_LOG_DEBUG, "%s: Got VU event\n", __func__);
-        if (!vu_dispatch(&se->virtio_dev->dev)) {
+        /* Mutual exclusion with fv_queue_thread() */
+        ret = pthread_rwlock_wrlock(&se->virtio_dev->vu_dispatch_rwlock);
+        assert(ret == 0); /* there is no possible error case */
+
+        ok = vu_dispatch(&se->virtio_dev->dev);
+
+        pthread_rwlock_unlock(&se->virtio_dev->vu_dispatch_rwlock);
+
+        if (!ok) {
             fuse_log(FUSE_LOG_ERR, "%s: vu_dispatch failed\n", __func__);
             break;
         }
@@ -871,6 +901,7 @@ int virtio_session_mount(struct fuse_session *se)
 
     se->vu_socketfd = data_sock;
     se->virtio_dev->se = se;
+    pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
     vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
             fv_remove_watch, &fv_iface);
 
@@ -887,6 +918,7 @@ void virtio_session_close(struct fuse_session *se)
 
     close(se->vu_socketfd);
     free(se->virtio_dev->qi);
+    pthread_rwlock_destroy(&se->virtio_dev->vu_dispatch_rwlock);
     free(se->virtio_dev);
     se->virtio_dev = NULL;
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 088/104] virtiofsd: make lo_release() atomic
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (86 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 087/104] virtiofsd: prevent fv_queue_thread() vs virtio_loop() races Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 12:03   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 089/104] virtiofsd: prevent races with lo_dirp_put() Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Hold the lock across both lo_map_get() and lo_map_remove() to prevent
races between two FUSE_RELEASE requests.  In this case I don't see a
serious bug but it's safer to do things atomically.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index fc79d5ac43..eadd568435 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1805,14 +1805,18 @@ static void lo_release(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
     struct lo_data *lo = lo_data(req);
-    int fd;
+    struct lo_map_elem *elem;
+    int fd = -1;
 
     (void)ino;
 
-    fd = lo_fi_fd(req, fi);
-
     pthread_mutex_lock(&lo->mutex);
-    lo_map_remove(&lo->fd_map, fi->fh);
+    elem = lo_map_get(&lo->fd_map, fi->fh);
+    if (elem) {
+        fd = elem->fd;
+        elem = NULL;
+        lo_map_remove(&lo->fd_map, fi->fh);
+    }
     pthread_mutex_unlock(&lo->mutex);
 
     close(fd);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 089/104] virtiofsd: prevent races with lo_dirp_put()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (87 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 088/104] virtiofsd: make lo_release() atomic Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-17 13:52   ` Philippe Mathieu-Daudé
  2019-12-12 16:38 ` [PATCH 090/104] virtiofsd: rename inode->refcount to inode->nlookup Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Introduce lo_dirp_put() so that FUSE_RELEASEDIR does not cause
use-after-free races with other threads that are accessing lo_dirp.

Also make lo_releasedir() atomic to prevent FUSE_RELEASEDIR racing with
itself.  This prevents double-frees.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 41 +++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index eadd568435..7663e574d8 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1317,11 +1317,28 @@ static void lo_readlink(fuse_req_t req, fuse_ino_t ino)
 }
 
 struct lo_dirp {
+    gint refcount;
     DIR *dp;
     struct dirent *entry;
     off_t offset;
 };
 
+static void lo_dirp_put(struct lo_dirp **dp)
+{
+    struct lo_dirp *d = *dp;
+
+    if (!d) {
+        return;
+    }
+    *dp = NULL;
+
+    if (g_atomic_int_dec_and_test(&d->refcount)) {
+        closedir(d->dp);
+        free(d);
+    }
+}
+
+/* Call lo_dirp_put() on the return value when no longer needed */
 static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
 {
     struct lo_data *lo = lo_data(req);
@@ -1329,6 +1346,9 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
 
     pthread_mutex_lock(&lo->mutex);
     elem = lo_map_get(&lo->dirp_map, fi->fh);
+    if (elem) {
+        g_atomic_int_inc(&elem->dirp->refcount);
+    }
     pthread_mutex_unlock(&lo->mutex);
     if (!elem) {
         return NULL;
@@ -1364,6 +1384,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
     d->offset = 0;
     d->entry = NULL;
 
+    g_atomic_int_set(&d->refcount, 1); /* paired with lo_releasedir() */
     pthread_mutex_lock(&lo->mutex);
     fh = lo_add_dirp_mapping(req, d);
     pthread_mutex_unlock(&lo->mutex);
@@ -1397,7 +1418,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
                           off_t offset, struct fuse_file_info *fi, int plus)
 {
     struct lo_data *lo = lo_data(req);
-    struct lo_dirp *d;
+    struct lo_dirp *d = NULL;
     struct lo_inode *dinode;
     char *buf = NULL;
     char *p;
@@ -1487,6 +1508,8 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
 
     err = 0;
 error:
+    lo_dirp_put(&d);
+
     /*
      * If there's an error, we can only signal it if we haven't stored
      * any entries yet - otherwise we'd end up with wrong lookup
@@ -1517,22 +1540,25 @@ static void lo_releasedir(fuse_req_t req, fuse_ino_t ino,
                           struct fuse_file_info *fi)
 {
     struct lo_data *lo = lo_data(req);
+    struct lo_map_elem *elem;
     struct lo_dirp *d;
 
     (void)ino;
 
-    d = lo_dirp(req, fi);
-    if (!d) {
+    pthread_mutex_lock(&lo->mutex);
+    elem = lo_map_get(&lo->dirp_map, fi->fh);
+    if (!elem) {
+        pthread_mutex_unlock(&lo->mutex);
         fuse_reply_err(req, EBADF);
         return;
     }
 
-    pthread_mutex_lock(&lo->mutex);
+    d = elem->dirp;
     lo_map_remove(&lo->dirp_map, fi->fh);
     pthread_mutex_unlock(&lo->mutex);
 
-    closedir(d->dp);
-    free(d);
+    lo_dirp_put(&d); /* paired with lo_opendir() */
+
     fuse_reply_err(req, 0);
 }
 
@@ -1743,6 +1769,9 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
     } else {
         res = fsync(fd);
     }
+
+    lo_dirp_put(&d);
+
     fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 090/104] virtiofsd: rename inode->refcount to inode->nlookup
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (88 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 089/104] virtiofsd: prevent races with lo_dirp_put() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-17 13:54   ` Philippe Mathieu-Daudé
  2019-12-12 16:38 ` [PATCH 091/104] libvhost-user: Fix some memtable remap cases Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

This reference counter plays a specific role in the FUSE protocol.  It's
not a generic object reference counter and the FUSE kernel code calls it
"nlookup".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 37 +++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 7663e574d8..b19c9ee328 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -101,7 +101,20 @@ struct lo_inode {
     int fd;
     bool is_symlink;
     struct lo_key key;
-    uint64_t refcount; /* protected by lo->mutex */
+
+    /*
+     * This counter keeps the inode alive during the FUSE session.
+     * Incremented when the FUSE inode number is sent in a reply
+     * (FUSE_LOOKUP, FUSE_READDIRPLUS, etc).  Decremented when an inode is
+     * released by requests like FUSE_FORGET, FUSE_RMDIR, FUSE_RENAME, etc.
+     *
+     * Note that this value is untrusted because the client can manipulate
+     * it arbitrarily using FUSE_FORGET requests.
+     *
+     * Protected by lo->mutex.
+     */
+    uint64_t nlookup;
+
     fuse_ino_t fuse_ino;
     pthread_mutex_t plock_mutex;
     GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
@@ -570,7 +583,7 @@ retry:
     if (last == path) {
         p = &lo->root;
         pthread_mutex_lock(&lo->mutex);
-        p->refcount++;
+        p->nlookup++;
         pthread_mutex_unlock(&lo->mutex);
     } else {
         *last = '\0';
@@ -788,8 +801,8 @@ static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st)
     pthread_mutex_lock(&lo->mutex);
     p = g_hash_table_lookup(lo->inodes, &key);
     if (p) {
-        assert(p->refcount > 0);
-        p->refcount++;
+        assert(p->nlookup > 0);
+        p->nlookup++;
     }
     pthread_mutex_unlock(&lo->mutex);
 
@@ -857,7 +870,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         }
 
         inode->is_symlink = S_ISLNK(e->attr.st_mode);
-        inode->refcount = 1;
+        inode->nlookup = 1;
         inode->fd = newfd;
         newfd = -1;
         inode->key.ino = e->attr.st_ino;
@@ -1097,7 +1110,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
     }
 
     pthread_mutex_lock(&lo->mutex);
-    inode->refcount++;
+    inode->nlookup++;
     pthread_mutex_unlock(&lo->mutex);
     e.ino = inode->fuse_ino;
 
@@ -1226,9 +1239,9 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
     }
 
     pthread_mutex_lock(&lo->mutex);
-    assert(inode->refcount >= n);
-    inode->refcount -= n;
-    if (!inode->refcount) {
+    assert(inode->nlookup >= n);
+    inode->nlookup -= n;
+    if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
         g_hash_table_remove(lo->inodes, &inode->key);
         if (g_hash_table_size(inode->posix_locks)) {
@@ -1249,7 +1262,7 @@ static int unref_all_inodes_cb(gpointer key, gpointer value, gpointer user_data)
     struct lo_inode *inode = value;
     struct lo_data *lo = user_data;
 
-    inode->refcount = 0;
+    inode->nlookup = 0;
     lo_map_remove(&lo->ino_map, inode->fuse_ino);
     close(inode->fd);
 
@@ -1274,7 +1287,7 @@ static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
     }
 
     fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
-             (unsigned long long)ino, (unsigned long long)inode->refcount,
+             (unsigned long long)ino, (unsigned long long)inode->nlookup,
              (unsigned long long)nlookup);
 
     unref_inode_lolocked(lo, inode, nlookup);
@@ -2642,7 +2655,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
     root->fd = fd;
     root->key.ino = stat.st_ino;
     root->key.dev = stat.st_dev;
-    root->refcount = 2;
+    root->nlookup = 2;
 }
 
 static guint lo_key_hash(gconstpointer key)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 091/104] libvhost-user: Fix some memtable remap cases
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (89 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 090/104] virtiofsd: rename inode->refcount to inode->nlookup Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-17 13:58   ` Marc-André Lureau
  2019-12-12 16:38 ` [PATCH 092/104] virtiofsd: add man page Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

If a new setmemtable command comes in once the vhost threads are
running, it will remap the guests address space and the threads
will now be looking in the wrong place.

Fortunately we're running this command under lock, so we can
update the queue mappings so that threads will look in the new-right
place.

Note: This doesn't fix things that the threads might be doing
without a lock (e.g. a readv/writev!)  That's for another time.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 33 ++++++++++++++++++++-------
 contrib/libvhost-user/libvhost-user.h |  3 +++
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 63e41062a4..b89bf18501 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -564,6 +564,21 @@ vu_reset_device_exec(VuDev *dev, VhostUserMsg *vmsg)
     return false;
 }
 
+static bool
+map_ring(VuDev *dev, VuVirtq *vq)
+{
+    vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
+    vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
+    vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
+
+    DPRINT("Setting virtq addresses:\n");
+    DPRINT("    vring_desc  at %p\n", vq->vring.desc);
+    DPRINT("    vring_used  at %p\n", vq->vring.used);
+    DPRINT("    vring_avail at %p\n", vq->vring.avail);
+
+    return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
+}
+
 static bool
 vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -767,6 +782,14 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
         close(vmsg->fds[i]);
     }
 
+    for (i = 0; i < dev->max_queues; i++) {
+        if (dev->vq[i].vring.desc) {
+            if (map_ring(dev, &dev->vq[i])) {
+                vu_panic(dev, "remaping queue %d during setmemtable", i);
+            }
+        }
+    }
+
     return false;
 }
 
@@ -853,18 +876,12 @@ vu_set_vring_addr_exec(VuDev *dev, VhostUserMsg *vmsg)
     DPRINT("    avail_user_addr:  0x%016" PRIx64 "\n", vra->avail_user_addr);
     DPRINT("    log_guest_addr:   0x%016" PRIx64 "\n", vra->log_guest_addr);
 
+    vq->vra = *vra;
     vq->vring.flags = vra->flags;
-    vq->vring.desc = qva_to_va(dev, vra->desc_user_addr);
-    vq->vring.used = qva_to_va(dev, vra->used_user_addr);
-    vq->vring.avail = qva_to_va(dev, vra->avail_user_addr);
     vq->vring.log_guest_addr = vra->log_guest_addr;
 
-    DPRINT("Setting virtq addresses:\n");
-    DPRINT("    vring_desc  at %p\n", vq->vring.desc);
-    DPRINT("    vring_used  at %p\n", vq->vring.used);
-    DPRINT("    vring_avail at %p\n", vq->vring.avail);
 
-    if (!(vq->vring.desc && vq->vring.used && vq->vring.avail)) {
+    if (map_ring(dev, vq)) {
         vu_panic(dev, "Invalid vring_addr message");
         return false;
     }
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 1844b6f8d4..5cb7708559 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -327,6 +327,9 @@ typedef struct VuVirtq {
     int err_fd;
     unsigned int enable;
     bool started;
+
+    /* Guest addresses of our ring */
+    struct vhost_vring_addr vra;
 } VuVirtq;
 
 enum VuWatchCondtion {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 092/104] virtiofsd: add man page
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (90 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 091/104] libvhost-user: Fix some memtable remap cases Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2019-12-13 14:33   ` Liam Merwick
  2020-01-07 12:13   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 Makefile                       |  7 +++
 tools/virtiofsd/virtiofsd.texi | 85 ++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)
 create mode 100644 tools/virtiofsd/virtiofsd.texi

diff --git a/Makefile b/Makefile
index fa15174ba0..53d175d12f 100644
--- a/Makefile
+++ b/Makefile
@@ -357,6 +357,9 @@ DOCS+=docs/qemu-cpu-models.7
 ifdef CONFIG_VIRTFS
 DOCS+=fsdev/virtfs-proxy-helper.1
 endif
+ifdef CONFIG_LINUX
+DOCS+=tools/virtiofsd/virtiofsd.1
+endif
 ifdef CONFIG_TRACE_SYSTEMTAP
 DOCS+=scripts/qemu-trace-stap.1
 endif
@@ -863,6 +866,9 @@ ifdef CONFIG_VIRTFS
 	$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1"
 	$(INSTALL_DATA) fsdev/virtfs-proxy-helper.1 "$(DESTDIR)$(mandir)/man1"
 endif
+ifdef CONFIG_LINUX
+	$(INSTALL_DATA) tools/virtiofsd/virtiofsd.1 "$(DESTDIR)$(mandir)/man1"
+endif
 
 install-datadir:
 	$(INSTALL_DIR) "$(DESTDIR)$(qemu_datadir)"
@@ -1061,6 +1067,7 @@ qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi qemu-monitor-info.texi
 qemu.1: qemu-option-trace.texi
 qemu-img.1: qemu-img.texi qemu-option-trace.texi qemu-img-cmds.texi
 fsdev/virtfs-proxy-helper.1: fsdev/virtfs-proxy-helper.texi
+tools/virtiofsd/virtiofsd.1: tools/virtiofsd/virtiofsd.texi
 qemu-nbd.8: qemu-nbd.texi qemu-option-trace.texi
 docs/qemu-block-drivers.7: docs/qemu-block-drivers.texi
 docs/qemu-cpu-models.7: docs/qemu-cpu-models.texi
diff --git a/tools/virtiofsd/virtiofsd.texi b/tools/virtiofsd/virtiofsd.texi
new file mode 100644
index 0000000000..eec7fbf4e6
--- /dev/null
+++ b/tools/virtiofsd/virtiofsd.texi
@@ -0,0 +1,85 @@
+@example
+@c man begin SYNOPSIS
+@command{virtiofsd} [OPTION] @option{--socket-path=}@var{path}|@option{--fd=}@var{fdnum} @option{-o source=}@var{path}
+@c man end
+@end example
+
+@c man begin DESCRIPTION
+
+Share a host directory tree with a guest through a virtio-fs device.  This
+program is a vhost-user backend that implements the virtio-fs device.  Each
+virtio-fs device instance requires its own virtiofsd process.
+
+This program is designed to work with QEMU's @code{--device vhost-user-fs-pci}
+but should work with any virtual machine monitor (VMM) that supports
+vhost-user.  See the EXAMPLES section below.
+
+This program must be run as the root user.  Upon startup the program will
+switch into a new file system namespace with the shared directory tree as its
+root.  This prevents "file system escapes" due to symlinks and other file
+system objects that might lead to files outside the shared directory.  The
+program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other
+vectors that could allow an attacker to compromise the system after gaining
+control of the virtiofsd process.
+
+@c man end
+
+@c man begin OPTIONS
+@table @option
+@item -h, --help
+Print help.
+@item -V, --version
+Print version.
+@item -d, -o debug
+Enable debug output.
+@item --syslog
+Print log messages to syslog instead of stderr.
+@item -o log_level=@var{level}
+Print only log messages matching @var{level} or more severe.  @var{level} is
+one of @code{err}, @code{warn}, @code{info}, or @code{debug}.  The default is
+@var{info}.
+@item -o source=@var{path}
+Share host directory tree located at @var{path}.  This option is required.
+@item --socket-path=@var{path}, -o vhost_user_socket=@var{path}
+Listen on vhost-user UNIX domain socket at @var{path}.
+@item --fd=@var{fdnum}
+Accept connections from vhost-user UNIX domain socket file descriptor @var{fdnum}.  The file descriptor must already be listening for connections.
+@item --thread-pool-size=@var{num}
+Restrict the number of worker threads per request queue to @var{num}.  The default is 64.
+@item --cache=@code{none}|@code{auto}|@code{always}
+Select the desired trade-off between coherency and performance.  @code{none}
+forbids the FUSE client from caching to achieve best coherency at the cost of
+performance.  @code{auto} acts similar to NFS with a 1 second metadata cache
+timeout.  @code{always} sets a long cache lifetime at the expense of coherency.
+@item --writeback
+Enable writeback cache, allowing the FUSE client to buffer and merge write requests.
+@end table
+@c man end
+
+@c man begin EXAMPLES
+Export @code{/var/lib/fs/vm001/} on vhost-user UNIX domain socket @code{/var/run/vm001-vhost-fs.sock}:
+
+@example
+host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
+host# qemu-system-x86_64 \
+    -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
+    -device vhost-user-fs-pci,chardev=char0,tag=myfs \
+    -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
+    -numa node,memdev=mem \
+    ...
+guest# mount -t virtio_fs \
+    -o default_permissions,allow_other,user_id=0,group_id=0,rootmode=040000,dax \
+    myfs /mnt
+@end example
+@c man end
+
+@ignore
+@setfilename virtiofsd
+@settitle QEMU virtio-fs shared file system daemon
+
+@c man begin AUTHOR
+Copyright (C) 2019 Red Hat, Inc.
+This is free software; see the source for copying conditions.  There is NO
+warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+@c man end
+@end ignore
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (91 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 092/104] virtiofsd: add man page Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-16 12:25   ` Misono Tomohiro
  2020-01-20 10:28   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 094/104] virtiofsd: do not always set FUSE_FLOCK_LOCKS Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

If thread A is using an inode it must not be deleted by thread B when
processing a FUSE_FORGET request.

The FUSE protocol itself already has a counter called nlookup that is
used in FUSE_FORGET messages.  We cannot trust this counter since the
untrusted client can manipulate it via FUSE_FORGET messages.

Introduce a new refcount to keep inodes alive for the required lifespan.
lo_inode_put() must be called to release a reference.  FUSE's nlookup
counter holds exactly one reference so that the inode stays alive as
long as the client still wants to remember it.

Note that the lo_inode->is_symlink field is moved to avoid creating a
hole in the struct due to struct field alignment.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 168 ++++++++++++++++++++++++++-----
 1 file changed, 145 insertions(+), 23 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b19c9ee328..8f4ab8351c 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -99,7 +99,13 @@ struct lo_key {
 
 struct lo_inode {
     int fd;
-    bool is_symlink;
+
+    /*
+     * Atomic reference count for this object.  The nlookup field holds a
+     * reference and release it when nlookup reaches 0.
+     */
+    gint refcount;
+
     struct lo_key key;
 
     /*
@@ -118,6 +124,8 @@ struct lo_inode {
     fuse_ino_t fuse_ino;
     pthread_mutex_t plock_mutex;
     GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
+
+    bool is_symlink;
 };
 
 struct lo_cred {
@@ -473,6 +481,23 @@ static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
     return elem - lo_data(req)->ino_map.elems;
 }
 
+static void lo_inode_put(struct lo_data *lo, struct lo_inode **inodep)
+{
+    struct lo_inode *inode = *inodep;
+
+    if (!inode) {
+        return;
+    }
+
+    *inodep = NULL;
+
+    if (g_atomic_int_dec_and_test(&inode->refcount)) {
+        close(inode->fd);
+        free(inode);
+    }
+}
+
+/* Caller must release refcount using lo_inode_put() */
 static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 {
     struct lo_data *lo = lo_data(req);
@@ -480,6 +505,9 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 
     pthread_mutex_lock(&lo->mutex);
     elem = lo_map_get(&lo->ino_map, ino);
+    if (elem) {
+        g_atomic_int_inc(&elem->inode->refcount);
+    }
     pthread_mutex_unlock(&lo->mutex);
 
     if (!elem) {
@@ -489,10 +517,23 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
     return elem->inode;
 }
 
+/*
+ * TODO Remove this helper and force callers to hold an inode refcount until
+ * they are done with the fd.  This will be done in a later patch to make
+ * review easier.
+ */
 static int lo_fd(fuse_req_t req, fuse_ino_t ino)
 {
     struct lo_inode *inode = lo_inode(req, ino);
-    return inode ? inode->fd : -1;
+    int fd;
+
+    if (!inode) {
+        return -1;
+    }
+
+    fd = inode->fd;
+    lo_inode_put(lo_data(req), &inode);
+    return fd;
 }
 
 static void lo_init(void *userdata, struct fuse_conn_info *conn)
@@ -547,6 +588,10 @@ static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
     fuse_reply_attr(req, &buf, lo->timeout);
 }
 
+/*
+ * Increments parent->nlookup and caller must release refcount using
+ * lo_inode_put(&parent).
+ */
 static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
                               char path[PATH_MAX], struct lo_inode **parent)
 {
@@ -584,6 +629,7 @@ retry:
         p = &lo->root;
         pthread_mutex_lock(&lo->mutex);
         p->nlookup++;
+        g_atomic_int_inc(&p->refcount);
         pthread_mutex_unlock(&lo->mutex);
     } else {
         *last = '\0';
@@ -665,6 +711,7 @@ fallback:
     if (res != -1) {
         res = utimensat(parent->fd, path, tv, AT_SYMLINK_NOFOLLOW);
         unref_inode_lolocked(lo, parent, 1);
+        lo_inode_put(lo, &parent);
     }
 
     return res;
@@ -782,11 +829,13 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
             goto out_err;
         }
     }
+    lo_inode_put(lo, &inode);
 
     return lo_getattr(req, ino, fi);
 
 out_err:
     saverr = errno;
+    lo_inode_put(lo, &inode);
     fuse_reply_err(req, saverr);
 }
 
@@ -803,6 +852,7 @@ static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st)
     if (p) {
         assert(p->nlookup > 0);
         p->nlookup++;
+        g_atomic_int_inc(&p->refcount);
     }
     pthread_mutex_unlock(&lo->mutex);
 
@@ -822,6 +872,10 @@ static void posix_locks_value_destroy(gpointer data)
     free(plock);
 }
 
+/*
+ * Increments nlookup and caller must release refcount using
+ * lo_inode_put(&parent).
+ */
 static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
                         struct fuse_entry_param *e)
 {
@@ -829,7 +883,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     int res;
     int saverr;
     struct lo_data *lo = lo_data(req);
-    struct lo_inode *inode, *dir = lo_inode(req, parent);
+    struct lo_inode *inode = NULL;
+    struct lo_inode *dir = lo_inode(req, parent);
 
     /*
      * name_to_handle_at() and open_by_handle_at() can reach here with fuse
@@ -870,6 +925,13 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         }
 
         inode->is_symlink = S_ISLNK(e->attr.st_mode);
+
+        /*
+         * One for the caller and one for nlookup (released in
+         * unref_inode_lolocked())
+         */
+        g_atomic_int_set(&inode->refcount, 2);
+
         inode->nlookup = 1;
         inode->fd = newfd;
         newfd = -1;
@@ -885,6 +947,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
+    lo_inode_put(lo, &inode);
+    lo_inode_put(lo, &dir);
 
     fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n", (unsigned long long)parent,
              name, (unsigned long long)e->ino);
@@ -896,6 +960,8 @@ out_err:
     if (newfd != -1) {
         close(newfd);
     }
+    lo_inode_put(lo, &inode);
+    lo_inode_put(lo, &dir);
     return saverr;
 }
 
@@ -976,6 +1042,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
 {
     int res;
     int saverr;
+    struct lo_data *lo = lo_data(req);
     struct lo_inode *dir;
     struct fuse_entry_param e;
     struct lo_cred old = {};
@@ -1017,9 +1084,11 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
              name, (unsigned long long)e.ino);
 
     fuse_reply_entry(req, &e);
+    lo_inode_put(lo, &dir);
     return;
 
 out:
+    lo_inode_put(lo, &dir);
     fuse_reply_err(req, saverr);
 }
 
@@ -1070,6 +1139,7 @@ fallback:
     if (res != -1) {
         res = linkat(parent->fd, path, dfd, name, 0);
         unref_inode_lolocked(lo, parent, 1);
+        lo_inode_put(lo, &parent);
     }
 
     return res;
@@ -1080,6 +1150,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
 {
     int res;
     struct lo_data *lo = lo_data(req);
+    struct lo_inode *parent_inode;
     struct lo_inode *inode;
     struct fuse_entry_param e;
     int saverr;
@@ -1089,17 +1160,18 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
         return;
     }
 
+    parent_inode = lo_inode(req, parent);
     inode = lo_inode(req, ino);
-    if (!inode) {
-        fuse_reply_err(req, EBADF);
-        return;
+    if (!parent_inode || !inode) {
+        errno = EBADF;
+        goto out_err;
     }
 
     memset(&e, 0, sizeof(struct fuse_entry_param));
     e.attr_timeout = lo->timeout;
     e.entry_timeout = lo->timeout;
 
-    res = linkat_empty_nofollow(lo, inode, lo_fd(req, parent), name);
+    res = linkat_empty_nofollow(lo, inode, parent_inode->fd, name);
     if (res == -1) {
         goto out_err;
     }
@@ -1118,13 +1190,18 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
              name, (unsigned long long)e.ino);
 
     fuse_reply_entry(req, &e);
+    lo_inode_put(lo, &parent_inode);
+    lo_inode_put(lo, &inode);
     return;
 
 out_err:
     saverr = errno;
+    lo_inode_put(lo, &parent_inode);
+    lo_inode_put(lo, &inode);
     fuse_reply_err(req, saverr);
 }
 
+/* Increments nlookup and caller must release refcount using lo_inode_put() */
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
@@ -1161,6 +1238,7 @@ static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
+    lo_inode_put(lo, &inode);
 }
 
 static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
@@ -1168,8 +1246,10 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
                       unsigned int flags)
 {
     int res;
-    struct lo_inode *oldinode;
-    struct lo_inode *newinode;
+    struct lo_inode *parent_inode;
+    struct lo_inode *newparent_inode;
+    struct lo_inode *oldinode = NULL;
+    struct lo_inode *newinode = NULL;
     struct lo_data *lo = lo_data(req);
 
     if (!is_safe_path_component(name) || !is_safe_path_component(newname)) {
@@ -1177,6 +1257,13 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         return;
     }
 
+    parent_inode = lo_inode(req, parent);
+    newparent_inode = lo_inode(req, newparent);
+    if (!parent_inode || !newparent_inode) {
+        fuse_reply_err(req, EBADF);
+        goto out;
+    }
+
     oldinode = lookup_name(req, parent, name);
     newinode = lookup_name(req, newparent, newname);
 
@@ -1189,8 +1276,8 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
 #ifndef SYS_renameat2
         fuse_reply_err(req, EINVAL);
 #else
-        res = syscall(SYS_renameat2, lo_fd(req, parent), name,
-                      lo_fd(req, newparent), newname, flags);
+        res = syscall(SYS_renameat2, parent_inode->fd, name,
+                      newparent_inode->fd, newname, flags);
         if (res == -1 && errno == ENOSYS) {
             fuse_reply_err(req, EINVAL);
         } else {
@@ -1200,12 +1287,16 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    res = renameat(lo_fd(req, parent), name, lo_fd(req, newparent), newname);
+    res = renameat(parent_inode->fd, name, newparent_inode->fd, newname);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
 out:
     unref_inode_lolocked(lo, oldinode, 1);
     unref_inode_lolocked(lo, newinode, 1);
+    lo_inode_put(lo, &oldinode);
+    lo_inode_put(lo, &newinode);
+    lo_inode_put(lo, &parent_inode);
+    lo_inode_put(lo, &newparent_inode);
 }
 
 static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
@@ -1229,6 +1320,7 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
+    lo_inode_put(lo, &inode);
 }
 
 static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
@@ -1250,8 +1342,9 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
         g_hash_table_destroy(inode->posix_locks);
         pthread_mutex_destroy(&inode->plock_mutex);
         pthread_mutex_unlock(&lo->mutex);
-        close(inode->fd);
-        free(inode);
+
+        /* Drop our refcount from lo_do_lookup() */
+        lo_inode_put(lo, &inode);
     } else {
         pthread_mutex_unlock(&lo->mutex);
     }
@@ -1265,6 +1358,7 @@ static int unref_all_inodes_cb(gpointer key, gpointer value, gpointer user_data)
     inode->nlookup = 0;
     lo_map_remove(&lo->ino_map, inode->fuse_ino);
     close(inode->fd);
+    lo_inode_put(lo, &inode); /* Drop our refcount from lo_do_lookup() */
 
     return TRUE;
 }
@@ -1291,6 +1385,7 @@ static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
              (unsigned long long)nlookup);
 
     unref_inode_lolocked(lo, inode, nlookup);
+    lo_inode_put(lo, &inode);
 }
 
 static void lo_forget(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
@@ -1522,6 +1617,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
     err = 0;
 error:
     lo_dirp_put(&d);
+    lo_inode_put(lo, &dinode);
 
     /*
      * If there's an error, we can only signal it if we haven't stored
@@ -1580,6 +1676,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
 {
     int fd;
     struct lo_data *lo = lo_data(req);
+    struct lo_inode *parent_inode;
     struct fuse_entry_param e;
     int err;
     struct lo_cred old = {};
@@ -1592,12 +1689,18 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         return;
     }
 
+    parent_inode = lo_inode(req, parent);
+    if (!parent_inode) {
+        fuse_reply_err(req, EBADF);
+        return;
+    }
+
     err = lo_change_cred(req, &old);
     if (err) {
         goto out;
     }
 
-    fd = openat(lo_fd(req, parent), name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
+    fd = openat(parent_inode->fd, name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
                 mode);
     err = fd == -1 ? errno : 0;
     lo_restore_cred(&old);
@@ -1610,8 +1713,8 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         pthread_mutex_unlock(&lo->mutex);
         if (fh == -1) {
             close(fd);
-            fuse_reply_err(req, ENOMEM);
-            return;
+            err = ENOMEM;
+            goto out;
         }
 
         fi->fh = fh;
@@ -1624,6 +1727,8 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
     }
 
 out:
+    lo_inode_put(lo, &parent_inode);
+
     if (err) {
         fuse_reply_err(req, err);
     } else {
@@ -1697,16 +1802,18 @@ static void lo_getlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
     plock =
         lookup_create_plock_ctx(lo, inode, fi->lock_owner, lock->l_pid, &ret);
     if (!plock) {
-        pthread_mutex_unlock(&inode->plock_mutex);
-        fuse_reply_err(req, ret);
-        return;
+        saverr = ret;
+        goto out;
     }
 
     ret = fcntl(plock->fd, F_OFD_GETLK, lock);
     if (ret == -1) {
         saverr = errno;
     }
+
+out:
     pthread_mutex_unlock(&inode->plock_mutex);
+    lo_inode_put(lo, &inode);
 
     if (saverr) {
         fuse_reply_err(req, saverr);
@@ -1746,9 +1853,8 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
         lookup_create_plock_ctx(lo, inode, fi->lock_owner, lock->l_pid, &ret);
 
     if (!plock) {
-        pthread_mutex_unlock(&inode->plock_mutex);
-        fuse_reply_err(req, ret);
-        return;
+        saverr = ret;
+        goto out;
     }
 
     /* TODO: Is it alright to modify flock? */
@@ -1757,7 +1863,11 @@ static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
     if (ret == -1) {
         saverr = errno;
     }
+
+out:
     pthread_mutex_unlock(&inode->plock_mutex);
+    lo_inode_put(lo, &inode);
+
     fuse_reply_err(req, saverr);
 }
 
@@ -1883,6 +1993,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     pthread_mutex_unlock(&inode->plock_mutex);
 
     res = close(dup(lo_fi_fd(req, fi)));
+    lo_inode_put(lo_data(req), &inode);
     fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
@@ -2099,11 +2210,14 @@ out_free:
     if (fd >= 0) {
         close(fd);
     }
+
+    lo_inode_put(lo, &inode);
     return;
 
 out_err:
     saverr = errno;
 out:
+    lo_inode_put(lo, &inode);
     fuse_reply_err(req, saverr);
     goto out_free;
 }
@@ -2174,11 +2288,14 @@ out_free:
     if (fd >= 0) {
         close(fd);
     }
+
+    lo_inode_put(lo, &inode);
     return;
 
 out_err:
     saverr = errno;
 out:
+    lo_inode_put(lo, &inode);
     fuse_reply_err(req, saverr);
     goto out_free;
 }
@@ -2227,6 +2344,8 @@ out:
     if (fd >= 0) {
         close(fd);
     }
+
+    lo_inode_put(lo, &inode);
     fuse_reply_err(req, saverr);
 }
 
@@ -2273,6 +2392,8 @@ out:
     if (fd >= 0) {
         close(fd);
     }
+
+    lo_inode_put(lo, &inode);
     fuse_reply_err(req, saverr);
 }
 
@@ -2656,6 +2777,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
     root->key.ino = stat.st_ino;
     root->key.dev = stat.st_dev;
     root->nlookup = 2;
+    g_atomic_int_set(&root->refcount, 2);
 }
 
 static guint lo_key_hash(gconstpointer key)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 094/104] virtiofsd: do not always set FUSE_FLOCK_LOCKS
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (92 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-17  8:50   ` Misono Tomohiro
  2020-01-20 10:31   ` Sergio Lopez
  2019-12-12 16:38 ` [PATCH 095/104] virtiofsd: convert more fprintf and perror to use fuse log infra Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Peng Tao <tao.peng@linux.alibaba.com>

Right now we always enable it regardless of given commandlines.
Fix it by setting the flag relying on the lo->flock bit.

Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 8f4ab8351c..cf6b548eee 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -548,9 +548,14 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
         fuse_log(FUSE_LOG_DEBUG, "lo_init: activating writeback\n");
         conn->want |= FUSE_CAP_WRITEBACK_CACHE;
     }
-    if (lo->flock && conn->capable & FUSE_CAP_FLOCK_LOCKS) {
-        fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
-        conn->want |= FUSE_CAP_FLOCK_LOCKS;
+    if (conn->capable & FUSE_CAP_FLOCK_LOCKS) {
+        if (lo->flock) {
+            fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
+            conn->want |= FUSE_CAP_FLOCK_LOCKS;
+        } else {
+            fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling flock locks\n");
+            conn->want &= ~FUSE_CAP_FLOCK_LOCKS;
+        }
     }
 
     if (conn->capable & FUSE_CAP_POSIX_LOCKS) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 095/104] virtiofsd: convert more fprintf and perror to use fuse log infra
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (93 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 094/104] virtiofsd: do not always set FUSE_FLOCK_LOCKS Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 12:16   ` Daniel P. Berrangé
  2020-01-16 12:29   ` Misono Tomohiro
  2019-12-12 16:38 ` [PATCH 096/104] virtiofsd: Reset O_DIRECT flag during file open Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Eryu Guan <eguan@linux.alibaba.com>

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
---
 tools/virtiofsd/fuse_signals.c | 6 +++++-
 tools/virtiofsd/helper.c       | 9 ++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/fuse_signals.c b/tools/virtiofsd/fuse_signals.c
index 10a6f88088..edabf24e0d 100644
--- a/tools/virtiofsd/fuse_signals.c
+++ b/tools/virtiofsd/fuse_signals.c
@@ -11,6 +11,7 @@
 #include "fuse_i.h"
 #include "fuse_lowlevel.h"
 
+#include <errno.h>
 #include <signal.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -46,12 +47,15 @@ static int set_one_signal_handler(int sig, void (*handler)(int), int remove)
     sa.sa_flags = 0;
 
     if (sigaction(sig, NULL, &old_sa) == -1) {
-        perror("fuse: cannot get old signal handler");
+        fuse_log(FUSE_LOG_ERR, "fuse: cannot get old signal handler: %s\n",
+                 strerror(errno));
         return -1;
     }
 
     if (old_sa.sa_handler == (remove ? handler : SIG_DFL) &&
         sigaction(sig, &sa, NULL) == -1) {
+        fuse_log(FUSE_LOG_ERR, "fuse: cannot set signal handler: %s\n",
+                 strerror(errno));
         perror("fuse: cannot set signal handler");
         return -1;
     }
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 7b28507a38..bcb8c05063 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -200,7 +200,8 @@ int fuse_daemonize(int foreground)
         char completed;
 
         if (pipe(waiter)) {
-            perror("fuse_daemonize: pipe");
+            fuse_log(FUSE_LOG_ERR, "fuse_daemonize: pipe: %s\n",
+                     strerror(errno));
             return -1;
         }
 
@@ -210,7 +211,8 @@ int fuse_daemonize(int foreground)
          */
         switch (fork()) {
         case -1:
-            perror("fuse_daemonize: fork");
+            fuse_log(FUSE_LOG_ERR, "fuse_daemonize: fork: %s\n",
+                     strerror(errno));
             return -1;
         case 0:
             break;
@@ -220,7 +222,8 @@ int fuse_daemonize(int foreground)
         }
 
         if (setsid() == -1) {
-            perror("fuse_daemonize: setsid");
+            fuse_log(FUSE_LOG_ERR, "fuse_daemonize: setsid: %s\n",
+                     strerror(errno));
             return -1;
         }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 096/104] virtiofsd: Reset O_DIRECT flag during file open
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (94 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 095/104] virtiofsd: convert more fprintf and perror to use fuse log infra Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 12:17   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 097/104] virtiofsd: Fix data corruption with O_APPEND wirte in writeback mode Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Vivek Goyal <vgoyal@redhat.com>

If an application wants to do direct IO and opens a file with O_DIRECT
in guest, that does not necessarily mean that we need to bypass page
cache on host as well. So reset this flag on host.

If somebody needs to bypass page cache on host as well (and it is safe to
do so), we can add a knob in daemon later to control this behavior.

I check virtio-9p and they do reset O_DIRECT flag.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index cf6b548eee..6b3d396b6f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1705,6 +1705,13 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
+    /*
+     * O_DIRECT in guest should not necessarily mean bypassing page
+     * cache on host as well. If somebody needs that behavior, it
+     * probably should be a configuration knob in daemon.
+     */
+    fi->flags &= ~O_DIRECT;
+
     fd = openat(parent_inode->fd, name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
                 mode);
     err = fd == -1 ? errno : 0;
@@ -1934,6 +1941,13 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
         fi->flags &= ~O_APPEND;
     }
 
+    /*
+     * O_DIRECT in guest should not necessarily mean bypassing page
+     * cache on host as well. If somebody needs that behavior, it
+     * probably should be a configuration knob in daemon.
+     */
+    fi->flags &= ~O_DIRECT;
+
     sprintf(buf, "%i", lo_fd(req, ino));
     fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
     if (fd == -1) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 097/104] virtiofsd: Fix data corruption with O_APPEND wirte in writeback mode
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (95 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 096/104] virtiofsd: Reset O_DIRECT flag during file open Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 12:20   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 098/104] virtiofsd: add definition of fuse_buf_writev() Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

When writeback mode is enabled (-o writeback), O_APPEND handling is
done in kernel. Therefore virtiofsd clears O_APPEND flag when open.
Otherwise O_APPEND flag takes precedence over pwrite() and write
data may corrupt.

Currently clearing O_APPEND flag is done in lo_open(), but we also
need the same operation in lo_create(). So, factor out the flag
update operation in lo_open() to update_open_flags() and call it
in both lo_open() and lo_create().

This fixes the failure of xfstest generic/069 in writeback mode
(which tests O_APPEND write data integrity).

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
 tools/virtiofsd/passthrough_ll.c | 66 ++++++++++++++++----------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 6b3d396b6f..1bf251a91d 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1676,6 +1676,37 @@ static void lo_releasedir(fuse_req_t req, fuse_ino_t ino,
     fuse_reply_err(req, 0);
 }
 
+static void update_open_flags(int writeback, struct fuse_file_info *fi)
+{
+    /*
+     * With writeback cache, kernel may send read requests even
+     * when userspace opened write-only
+     */
+    if (writeback && (fi->flags & O_ACCMODE) == O_WRONLY) {
+        fi->flags &= ~O_ACCMODE;
+        fi->flags |= O_RDWR;
+    }
+
+    /*
+     * With writeback cache, O_APPEND is handled by the kernel.
+     * This breaks atomicity (since the file may change in the
+     * underlying filesystem, so that the kernel's idea of the
+     * end of the file isn't accurate anymore). In this example,
+     * we just accept that. A more rigorous filesystem may want
+     * to return an error here
+     */
+    if (writeback && (fi->flags & O_APPEND)) {
+        fi->flags &= ~O_APPEND;
+    }
+
+    /*
+     * O_DIRECT in guest should not necessarily mean bypassing page
+     * cache on host as well. If somebody needs that behavior, it
+     * probably should be a configuration knob in daemon.
+     */
+    fi->flags &= ~O_DIRECT;
+}
+
 static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
                       mode_t mode, struct fuse_file_info *fi)
 {
@@ -1705,12 +1736,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    /*
-     * O_DIRECT in guest should not necessarily mean bypassing page
-     * cache on host as well. If somebody needs that behavior, it
-     * probably should be a configuration knob in daemon.
-     */
-    fi->flags &= ~O_DIRECT;
+    update_open_flags(lo->writeback, fi);
 
     fd = openat(parent_inode->fd, name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
                 mode);
@@ -1920,33 +1946,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
     fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
              fi->flags);
 
-    /*
-     * With writeback cache, kernel may send read requests even
-     * when userspace opened write-only
-     */
-    if (lo->writeback && (fi->flags & O_ACCMODE) == O_WRONLY) {
-        fi->flags &= ~O_ACCMODE;
-        fi->flags |= O_RDWR;
-    }
-
-    /*
-     * With writeback cache, O_APPEND is handled by the kernel.
-     * This breaks atomicity (since the file may change in the
-     * underlying filesystem, so that the kernel's idea of the
-     * end of the file isn't accurate anymore). In this example,
-     * we just accept that. A more rigorous filesystem may want
-     * to return an error here
-     */
-    if (lo->writeback && (fi->flags & O_APPEND)) {
-        fi->flags &= ~O_APPEND;
-    }
-
-    /*
-     * O_DIRECT in guest should not necessarily mean bypassing page
-     * cache on host as well. If somebody needs that behavior, it
-     * probably should be a configuration knob in daemon.
-     */
-    fi->flags &= ~O_DIRECT;
+    update_open_flags(lo->writeback, fi);
 
     sprintf(buf, "%i", lo_fd(req, ino));
     fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 098/104] virtiofsd: add definition of fuse_buf_writev()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (96 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 097/104] virtiofsd: Fix data corruption with O_APPEND wirte in writeback mode Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 12:21   ` Daniel P. Berrangé
  2019-12-12 16:38 ` [PATCH 099/104] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: piaojun <piaojun@huawei.com>

Define fuse_buf_writev() which use pwritev and writev to improve io
bandwidth. Especially, the src bufs with 0 size should be skipped as
their mems are not *block_size* aligned which will cause writev failed
in direct io mode.

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/buffer.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index f59d8d72eb..ae420c70c4 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -13,6 +13,7 @@
 #include "fuse_lowlevel.h"
 #include <assert.h>
 #include <errno.h>
+#include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 
@@ -32,6 +33,44 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv)
     return size;
 }
 
+__attribute__((unused))
+static ssize_t fuse_buf_writev(fuse_req_t req,
+                               struct fuse_buf *out_buf,
+                               struct fuse_bufvec *in_buf)
+{
+    ssize_t res, i, j;
+    size_t iovcnt = in_buf->count;
+    struct iovec *iov;
+    int fd = out_buf->fd;
+
+    iov = calloc(iovcnt, sizeof(struct iovec));
+    if (!iov) {
+        return -ENOMEM;
+    }
+
+    for (i = 0, j = 0; i < iovcnt; i++) {
+        /* Skip the buf with 0 size */
+        if (in_buf->buf[i].size) {
+            iov[j].iov_base = in_buf->buf[i].mem;
+            iov[j].iov_len = in_buf->buf[i].size;
+            j++;
+        }
+    }
+
+    if (out_buf->flags & FUSE_BUF_FD_SEEK) {
+        res = pwritev(fd, iov, iovcnt, out_buf->pos);
+    } else {
+        res = writev(fd, iov, iovcnt);
+    }
+
+    if (res == -1) {
+        res = -errno;
+    }
+
+    free(iov);
+    return res;
+}
+
 static size_t min_size(size_t s1, size_t s2)
 {
     return s1 < s2 ? s1 : s2;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 099/104] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (97 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 098/104] virtiofsd: add definition of fuse_buf_writev() Dr. David Alan Gilbert (git)
@ 2019-12-12 16:38 ` Dr. David Alan Gilbert (git)
  2020-01-07 12:23   ` Daniel P. Berrangé
  2019-12-12 16:39 ` [PATCH 100/104] virtiofsd: process requests in a thread pool Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:38 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: piaojun <piaojun@huawei.com>

fuse_buf_writev() only handles the normal write in which src is buffer
and dest is fd. Specially if src buffer represents guest physical
address that can't be mapped by the daemon process, IO must be bounced
back to the VMM to do it by fuse_buf_copy().

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/buffer.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
index ae420c70c4..4875473785 100644
--- a/tools/virtiofsd/buffer.c
+++ b/tools/virtiofsd/buffer.c
@@ -33,9 +33,7 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv)
     return size;
 }
 
-__attribute__((unused))
-static ssize_t fuse_buf_writev(fuse_req_t req,
-                               struct fuse_buf *out_buf,
+static ssize_t fuse_buf_writev(struct fuse_buf *out_buf,
                                struct fuse_bufvec *in_buf)
 {
     ssize_t res, i, j;
@@ -334,12 +332,29 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
 ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
                       enum fuse_buf_copy_flags flags)
 {
-    size_t copied = 0;
+    size_t copied = 0, i;
 
     if (dstv == srcv) {
         return fuse_buf_size(dstv);
     }
 
+    /*
+     * use writev to improve bandwidth when all the
+     * src buffers already mapped by the daemon
+     * process
+     */
+    for (i = 0; i < srcv->count; i++) {
+        if (srcv->buf[i].flags & FUSE_BUF_IS_FD) {
+            break;
+        }
+    }
+    if ((i == srcv->count) && (dstv->count == 1) &&
+        (dstv->idx == 0) &&
+        (dstv->buf[0].flags & FUSE_BUF_IS_FD)) {
+        dstv->buf[0].pos += dstv->off;
+        return fuse_buf_writev(&dstv->buf[0], srcv);
+    }
+
     for (;;) {
         const struct fuse_buf *src = fuse_bufvec_current(srcv);
         const struct fuse_buf *dst = fuse_bufvec_current(dstv);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 100/104] virtiofsd: process requests in a thread pool
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (98 preceding siblings ...)
  2019-12-12 16:38 ` [PATCH 099/104] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance Dr. David Alan Gilbert (git)
@ 2019-12-12 16:39 ` Dr. David Alan Gilbert (git)
  2020-01-20 12:54   ` Misono Tomohiro
  2019-12-12 16:39 ` [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:39 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Introduce a thread pool so that fv_queue_thread() just pops
VuVirtqElements and hands them to the thread pool.  For the time being
only one worker thread is allowed since passthrough_ll.c is not
thread-safe yet.  Future patches will lift this restriction so that
multiple FUSE requests can be processed in parallel.

The main new concept is struct FVRequest, which contains both
VuVirtqElement and struct fuse_chan.  We now have fv_VuDev for a device,
fv_QueueInfo for a virtqueue, and FVRequest for a request.  Some of
fv_QueueInfo's fields are moved into FVRequest because they are
per-request.  The name FVRequest conforms to QEMU coding style and I
expect the struct fv_* types will be renamed in a future refactoring.

This patch series is not optimal.  fbuf reuse is dropped so each request
does malloc(se->bufsize), but there is no clean and cheap way to keep
this with a thread pool.  The vq_lock mutex is held for longer than
necessary, especially during the eventfd_write() syscall.  Performance
can be improved in the future.

prctl(2) had to be added to the seccomp whitelist because glib invokes
it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c | 361 +++++++++++++++++++---------------
 1 file changed, 202 insertions(+), 159 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index 2c1e524852..b696ac3135 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -22,6 +22,7 @@
 
 #include <assert.h>
 #include <errno.h>
+#include <glib.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -37,17 +38,28 @@
 struct fv_VuDev;
 struct fv_QueueInfo {
     pthread_t thread;
+    /*
+     * This lock protects the VuVirtq preventing races between
+     * fv_queue_thread() and fv_queue_worker().
+     */
+    pthread_mutex_t vq_lock;
+
     struct fv_VuDev *virtio_dev;
 
     /* Our queue index, corresponds to array position */
     int qidx;
     int kick_fd;
     int kill_fd; /* For killing the thread */
+};
 
-    /* The element for the command currently being processed */
-    VuVirtqElement *qe;
+/* A FUSE request */
+typedef struct {
+    VuVirtqElement elem;
+    struct fuse_chan ch;
+
+    /* Used to complete requests that involve no reply */
     bool reply_sent;
-};
+} FVRequest;
 
 /*
  * We pass the dev element into libvhost-user
@@ -191,8 +203,11 @@ static void copy_iov(struct iovec *src_iov, int src_count,
 int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
                     struct iovec *iov, int count)
 {
-    VuVirtqElement *elem;
-    VuVirtq *q;
+    FVRequest *req = container_of(ch, FVRequest, ch);
+    struct fv_QueueInfo *qi = ch->qi;
+    VuDev *dev = &se->virtio_dev->dev;
+    VuVirtq *q = vu_get_queue(dev, qi->qidx);
+    VuVirtqElement *elem = &req->elem;
     int ret = 0;
 
     assert(count >= 1);
@@ -205,11 +220,7 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 
     /* unique == 0 is notification, which we don't support */
     assert(out->unique);
-    /* For virtio we always have ch */
-    assert(ch);
-    assert(!ch->qi->reply_sent);
-    elem = ch->qi->qe;
-    q = &ch->qi->virtio_dev->dev.vq[ch->qi->qidx];
+    assert(!req->reply_sent);
 
     /* The 'in' part of the elem is to qemu */
     unsigned int in_num = elem->in_num;
@@ -236,9 +247,15 @@ int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
     }
 
     copy_iov(iov, count, in_sg, in_num, tosend_len);
-    vu_queue_push(&se->virtio_dev->dev, q, elem, tosend_len);
-    vu_queue_notify(&se->virtio_dev->dev, q);
-    ch->qi->reply_sent = true;
+
+    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+    pthread_mutex_lock(&qi->vq_lock);
+    vu_queue_push(dev, q, elem, tosend_len);
+    vu_queue_notify(dev, q);
+    pthread_mutex_unlock(&qi->vq_lock);
+    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+
+    req->reply_sent = true;
 
 err:
     return ret;
@@ -254,9 +271,12 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
                          struct iovec *iov, int count, struct fuse_bufvec *buf,
                          size_t len)
 {
+    FVRequest *req = container_of(ch, FVRequest, ch);
+    struct fv_QueueInfo *qi = ch->qi;
+    VuDev *dev = &se->virtio_dev->dev;
+    VuVirtq *q = vu_get_queue(dev, qi->qidx);
+    VuVirtqElement *elem = &req->elem;
     int ret = 0;
-    VuVirtqElement *elem;
-    VuVirtq *q;
 
     assert(count >= 1);
     assert(iov[0].iov_len >= sizeof(struct fuse_out_header));
@@ -275,11 +295,7 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
     /* unique == 0 is notification which we don't support */
     assert(out->unique);
 
-    /* For virtio we always have ch */
-    assert(ch);
-    assert(!ch->qi->reply_sent);
-    elem = ch->qi->qe;
-    q = &ch->qi->virtio_dev->dev.vq[ch->qi->qidx];
+    assert(!req->reply_sent);
 
     /* The 'in' part of the elem is to qemu */
     unsigned int in_num = elem->in_num;
@@ -392,34 +408,176 @@ int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
 
     ret = 0;
 
-    vu_queue_push(&se->virtio_dev->dev, q, elem, tosend_len);
-    vu_queue_notify(&se->virtio_dev->dev, q);
+    pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+    pthread_mutex_lock(&qi->vq_lock);
+    vu_queue_push(dev, q, elem, tosend_len);
+    vu_queue_notify(dev, q);
+    pthread_mutex_unlock(&qi->vq_lock);
+    pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
 
 err:
     if (ret == 0) {
-        ch->qi->reply_sent = true;
+        req->reply_sent = true;
     }
 
     return ret;
 }
 
+/* Process one FVRequest in a thread pool */
+static void fv_queue_worker(gpointer data, gpointer user_data)
+{
+    struct fv_QueueInfo *qi = user_data;
+    struct fuse_session *se = qi->virtio_dev->se;
+    struct VuDev *dev = &qi->virtio_dev->dev;
+    FVRequest *req = data;
+    VuVirtqElement *elem = &req->elem;
+    struct fuse_buf fbuf = {};
+    bool allocated_bufv = false;
+    struct fuse_bufvec bufv;
+    struct fuse_bufvec *pbufv;
+
+    assert(se->bufsize > sizeof(struct fuse_in_header));
+
+    /*
+     * An element contains one request and the space to send our response
+     * They're spread over multiple descriptors in a scatter/gather set
+     * and we can't trust the guest to keep them still; so copy in/out.
+     */
+    fbuf.mem = malloc(se->bufsize);
+    assert(fbuf.mem);
+
+    fuse_mutex_init(&req->ch.lock);
+    req->ch.fd = (int)0xdaff0d111;
+    req->ch.ctr = 1;
+    req->ch.qi = qi;
+
+    /* The 'out' part of the elem is from qemu */
+    unsigned int out_num = elem->out_num;
+    struct iovec *out_sg = elem->out_sg;
+    size_t out_len = iov_size(out_sg, out_num);
+    fuse_log(FUSE_LOG_DEBUG,
+             "%s: elem %d: with %d out desc of length %zd\n",
+             __func__, elem->index, out_num, out_len);
+
+    /*
+     * The elem should contain a 'fuse_in_header' (in to fuse)
+     * plus the data based on the len in the header.
+     */
+    if (out_len < sizeof(struct fuse_in_header)) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for in_header\n",
+                 __func__, elem->index);
+        assert(0); /* TODO */
+    }
+    if (out_len > se->bufsize) {
+        fuse_log(FUSE_LOG_ERR, "%s: elem %d too large for buffer\n", __func__,
+                 elem->index);
+        assert(0); /* TODO */
+    }
+    /* Copy just the first element and look at it */
+    copy_from_iov(&fbuf, 1, out_sg);
+
+    pbufv = NULL; /* Compiler thinks an unitialised path */
+    if (out_num > 2 &&
+        out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
+        ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
+        out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
+        /*
+         * For a write we don't actually need to copy the
+         * data, we can just do it straight out of guest memory
+         * but we must still copy the headers in case the guest
+         * was nasty and changed them while we were using them.
+         */
+        fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__);
+
+        /* copy the fuse_write_in header afte rthe fuse_in_header */
+        fbuf.mem += out_sg->iov_len;
+        copy_from_iov(&fbuf, 1, out_sg + 1);
+        fbuf.mem -= out_sg->iov_len;
+        fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
+
+        /* Allocate the bufv, with space for the rest of the iov */
+        pbufv = malloc(sizeof(struct fuse_bufvec) +
+                       sizeof(struct fuse_buf) * (out_num - 2));
+        if (!pbufv) {
+            fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
+                    __func__);
+            goto out;
+        }
+
+        allocated_bufv = true;
+        pbufv->count = 1;
+        pbufv->buf[0] = fbuf;
+
+        size_t iovindex, pbufvindex;
+        iovindex = 2; /* 2 headers, separate iovs */
+        pbufvindex = 1; /* 2 headers, 1 fusebuf */
+
+        for (; iovindex < out_num; iovindex++, pbufvindex++) {
+            pbufv->count++;
+            pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
+            pbufv->buf[pbufvindex].flags = 0;
+            pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
+            pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+        }
+    } else {
+        /* Normal (non fast write) path */
+
+        /* Copy the rest of the buffer */
+        fbuf.mem += out_sg->iov_len;
+        copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
+        fbuf.mem -= out_sg->iov_len;
+        fbuf.size = out_len;
+
+        /* TODO! Endianness of header */
+
+        /* TODO: Add checks for fuse_session_exited */
+        bufv.buf[0] = fbuf;
+        bufv.count = 1;
+        pbufv = &bufv;
+    }
+    pbufv->idx = 0;
+    pbufv->off = 0;
+    fuse_session_process_buf_int(se, pbufv, &req->ch);
+
+out:
+    if (allocated_bufv) {
+        free(pbufv);
+    }
+
+    /* If the request has no reply, still recycle the virtqueue element */
+    if (!req->reply_sent) {
+        struct VuVirtq *q = vu_get_queue(dev, qi->qidx);
+
+        fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n", __func__,
+                 elem->index);
+
+        pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+        pthread_mutex_lock(&qi->vq_lock);
+        vu_queue_push(dev, q, elem, 0);
+        vu_queue_notify(dev, q);
+        pthread_mutex_unlock(&qi->vq_lock);
+        pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+    }
+
+    pthread_mutex_destroy(&req->ch.lock);
+    free(fbuf.mem);
+    free(req);
+}
+
 /* Thread function for individual queues, created when a queue is 'started' */
 static void *fv_queue_thread(void *opaque)
 {
     struct fv_QueueInfo *qi = opaque;
     struct VuDev *dev = &qi->virtio_dev->dev;
     struct VuVirtq *q = vu_get_queue(dev, qi->qidx);
-    struct fuse_session *se = qi->virtio_dev->se;
-    struct fuse_chan ch;
-    struct fuse_buf fbuf;
-
-    fbuf.mem = NULL;
-    fbuf.flags = 0;
+    GThreadPool *pool;
 
-    fuse_mutex_init(&ch.lock);
-    ch.fd = (int)0xdaff0d111;
-    ch.ctr = 1;
-    ch.qi = qi;
+    pool = g_thread_pool_new(fv_queue_worker, qi, 1 /* TODO max_threads */,
+                             TRUE, NULL);
+    if (!pool) {
+        fuse_log(FUSE_LOG_ERR, "%s: g_thread_pool_new failed\n", __func__);
+        return NULL;
+    }
 
     fuse_log(FUSE_LOG_INFO, "%s: Start for queue %d kick_fd %d\n", __func__,
              qi->qidx, qi->kick_fd);
@@ -476,6 +634,7 @@ static void *fv_queue_thread(void *opaque)
         /* Mutual exclusion with virtio_loop() */
         ret = pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
         assert(ret == 0); /* there is no possible error case */
+        pthread_mutex_lock(&qi->vq_lock);
         /* out is from guest, in is too guest */
         unsigned int in_bytes, out_bytes;
         vu_queue_get_avail_bytes(dev, q, &in_bytes, &out_bytes, ~0, ~0);
@@ -484,141 +643,22 @@ static void *fv_queue_thread(void *opaque)
                  "%s: Queue %d gave evalue: %zx available: in: %u out: %u\n",
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
-
         while (1) {
-            bool allocated_bufv = false;
-            struct fuse_bufvec bufv;
-            struct fuse_bufvec *pbufv;
-
-            /*
-             * An element contains one request and the space to send our
-             * response They're spread over multiple descriptors in a
-             * scatter/gather set and we can't trust the guest to keep them
-             * still; so copy in/out.
-             */
-            VuVirtqElement *elem = vu_queue_pop(dev, q, sizeof(VuVirtqElement));
-            if (!elem) {
+            FVRequest *req = vu_queue_pop(dev, q, sizeof(FVRequest));
+            if (!req) {
                 break;
             }
 
-            qi->qe = elem;
-            qi->reply_sent = false;
+            req->reply_sent = false;
 
-            if (!fbuf.mem) {
-                fbuf.mem = malloc(se->bufsize);
-                assert(fbuf.mem);
-                assert(se->bufsize > sizeof(struct fuse_in_header));
-            }
-            /* The 'out' part of the elem is from qemu */
-            unsigned int out_num = elem->out_num;
-            struct iovec *out_sg = elem->out_sg;
-            size_t out_len = iov_size(out_sg, out_num);
-            fuse_log(FUSE_LOG_DEBUG,
-                     "%s: elem %d: with %d out desc of length %zd\n", __func__,
-                     elem->index, out_num, out_len);
-
-            /*
-             * The elem should contain a 'fuse_in_header' (in to fuse)
-             * plus the data based on the len in the header.
-             */
-            if (out_len < sizeof(struct fuse_in_header)) {
-                fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for in_header\n",
-                         __func__, elem->index);
-                assert(0); /* TODO */
-            }
-            if (out_len > se->bufsize) {
-                fuse_log(FUSE_LOG_ERR, "%s: elem %d too large for buffer\n",
-                         __func__, elem->index);
-                assert(0); /* TODO */
-            }
-            /* Copy just the first element and look at it */
-            copy_from_iov(&fbuf, 1, out_sg);
-
-            if (out_num > 2 &&
-                out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
-                ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
-                out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
-                /*
-                 * For a write we don't actually need to copy the
-                 * data, we can just do it straight out of guest memory
-                 * but we must still copy the headers in case the guest
-                 * was nasty and changed them while we were using them.
-                 */
-                fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__);
-
-                /* copy the fuse_write_in header after the fuse_in_header */
-                fbuf.mem += out_sg->iov_len;
-                copy_from_iov(&fbuf, 1, out_sg + 1);
-                fbuf.mem -= out_sg->iov_len;
-                fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
-
-                /* Allocate the bufv, with space for the rest of the iov */
-                allocated_bufv = true;
-                pbufv = malloc(sizeof(struct fuse_bufvec) +
-                               sizeof(struct fuse_buf) * (out_num - 2));
-                if (!pbufv) {
-                    vu_queue_unpop(dev, q, elem, 0);
-                    free(elem);
-                    fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
-                             __func__);
-                    goto out;
-                }
-
-                pbufv->count = 1;
-                pbufv->buf[0] = fbuf;
-
-                size_t iovindex, pbufvindex;
-                iovindex = 2; /* 2 headers, separate iovs */
-                pbufvindex = 1; /* 2 headers, 1 fusebuf */
-
-                for (; iovindex < out_num; iovindex++, pbufvindex++) {
-                    pbufv->count++;
-                    pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
-                    pbufv->buf[pbufvindex].flags = 0;
-                    pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
-                    pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
-                }
-            } else {
-                /* Normal (non fast write) path */
-
-                /* Copy the rest of the buffer */
-                fbuf.mem += out_sg->iov_len;
-                copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
-                fbuf.mem -= out_sg->iov_len;
-                fbuf.size = out_len;
-
-                /* TODO! Endianness of header */
-
-                /* TODO: Add checks for fuse_session_exited */
-                bufv.buf[0] = fbuf;
-                bufv.count = 1;
-                pbufv = &bufv;
-            }
-            pbufv->idx = 0;
-            pbufv->off = 0;
-            fuse_session_process_buf_int(se, pbufv, &ch);
-
-            if (allocated_bufv) {
-                free(pbufv);
-            }
-
-            if (!qi->reply_sent) {
-                fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n",
-                         __func__, elem->index);
-                /* I think we've still got to recycle the element */
-                vu_queue_push(dev, q, elem, 0);
-                vu_queue_notify(dev, q);
-            }
-            qi->qe = NULL;
-            free(elem);
-            elem = NULL;
+            g_thread_pool_push(pool, req, NULL);
         }
 
+        pthread_mutex_unlock(&qi->vq_lock);
         pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
     }
-out:
-    pthread_mutex_destroy(&ch.lock);
-    free(fbuf.mem);
+
+    g_thread_pool_free(pool, FALSE, TRUE);
 
     return NULL;
 }
@@ -670,6 +710,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
 
         ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
         assert(ourqi->kill_fd != -1);
+        pthread_mutex_init(&ourqi->vq_lock, NULL);
+
         if (pthread_create(&ourqi->thread, NULL, fv_queue_thread, ourqi)) {
             fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue %d\n",
                      __func__, qidx);
@@ -689,6 +731,7 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
             fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err %d\n",
                      __func__, qidx, ret);
         }
+        pthread_mutex_destroy(&ourqi->vq_lock);
         close(ourqi->kill_fd);
         ourqi->kick_fd = -1;
         free(vud->qi[qidx]);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (99 preceding siblings ...)
  2019-12-12 16:39 ` [PATCH 100/104] virtiofsd: process requests in a thread pool Dr. David Alan Gilbert (git)
@ 2019-12-12 16:39 ` Dr. David Alan Gilbert (git)
  2020-01-15 23:05   ` Masayoshi Mizuma
  2020-01-17 13:40   ` Philippe Mathieu-Daudé
  2019-12-12 16:39 ` [PATCH 102/104] virtiofsd: fix lo_destroy() resource leaks Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:39 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

When running with multiple threads it can be tricky to handle
FUSE_INIT/FUSE_DESTROY in parallel with other request types or in
parallel with themselves.  Serialize FUSE_INIT and FUSE_DESTROY so that
malicious clients cannot trigger race conditions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_i.h        |  1 +
 tools/virtiofsd/fuse_lowlevel.c | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index d0679508cd..8a4a05b319 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -61,6 +61,7 @@ struct fuse_session {
     struct fuse_req list;
     struct fuse_req interrupts;
     pthread_mutex_t lock;
+    pthread_rwlock_t init_rwlock;
     int got_destroy;
     int broken_splice_nonblock;
     uint64_t notify_ctr;
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 10f478b00c..9f01c05e3e 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2431,6 +2431,19 @@ void fuse_session_process_buf_int(struct fuse_session *se,
     req->ctx.pid = in->pid;
     req->ch = ch ? fuse_chan_get(ch) : NULL;
 
+    /*
+     * INIT and DESTROY requests are serialized, all other request types
+     * run in parallel.  This prevents races between FUSE_INIT and ordinary
+     * requests, FUSE_INIT and FUSE_INIT, FUSE_INIT and FUSE_DESTROY, and
+     * FUSE_DESTROY and FUSE_DESTROY.
+     */
+    if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT ||
+        in->opcode == FUSE_DESTROY) {
+        pthread_rwlock_wrlock(&se->init_rwlock);
+    } else {
+        pthread_rwlock_rdlock(&se->init_rwlock);
+    }
+
     err = EIO;
     if (!se->got_init) {
         enum fuse_opcode expected;
@@ -2488,10 +2501,13 @@ void fuse_session_process_buf_int(struct fuse_session *se,
     } else {
         fuse_ll_ops[in->opcode].func(req, in->nodeid, &iter);
     }
+
+    pthread_rwlock_unlock(&se->init_rwlock);
     return;
 
 reply_err:
     fuse_reply_err(req, err);
+    pthread_rwlock_unlock(&se->init_rwlock);
 }
 
 #define LL_OPTION(n, o, v)                     \
@@ -2538,6 +2554,7 @@ void fuse_session_destroy(struct fuse_session *se)
             se->op.destroy(se->userdata);
         }
     }
+    pthread_rwlock_destroy(&se->init_rwlock);
     pthread_mutex_destroy(&se->lock);
     free(se->cuse_data);
     if (se->fd != -1) {
@@ -2631,6 +2648,7 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
     list_init_req(&se->list);
     list_init_req(&se->interrupts);
     fuse_mutex_init(&se->lock);
+    pthread_rwlock_init(&se->init_rwlock, NULL);
 
     memcpy(&se->op, op, op_size);
     se->owner = getuid();
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 102/104] virtiofsd: fix lo_destroy() resource leaks
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (100 preceding siblings ...)
  2019-12-12 16:39 ` [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races Dr. David Alan Gilbert (git)
@ 2019-12-12 16:39 ` Dr. David Alan Gilbert (git)
  2020-01-17 13:43   ` Philippe Mathieu-Daudé
  2019-12-12 16:39 ` [PATCH 103/104] virtiofsd: add --thread-pool-size=NUM option Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:39 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Now that lo_destroy() is serialized we can call unref_inode() so that
all inode resources are freed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 41 ++++++++++++++++----------------
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 1bf251a91d..38f4948e61 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1355,26 +1355,6 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
     }
 }
 
-static int unref_all_inodes_cb(gpointer key, gpointer value, gpointer user_data)
-{
-    struct lo_inode *inode = value;
-    struct lo_data *lo = user_data;
-
-    inode->nlookup = 0;
-    lo_map_remove(&lo->ino_map, inode->fuse_ino);
-    close(inode->fd);
-    lo_inode_put(lo, &inode); /* Drop our refcount from lo_do_lookup() */
-
-    return TRUE;
-}
-
-static void unref_all_inodes(struct lo_data *lo)
-{
-    pthread_mutex_lock(&lo->mutex);
-    g_hash_table_foreach_remove(lo->inodes, unref_all_inodes_cb, lo);
-    pthread_mutex_unlock(&lo->mutex);
-}
-
 static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
 {
     struct lo_data *lo = lo_data(req);
@@ -2460,7 +2440,26 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
 static void lo_destroy(void *userdata)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
-    unref_all_inodes(lo);
+
+    /*
+     * Normally lo->mutex must be taken when traversing lo->inodes but
+     * lo_destroy() is a serialized request so no races are possible here.
+     *
+     * In addition, we cannot acquire lo->mutex since unref_inode() takes it
+     * too and this would result in a recursive lock.
+     */
+    while (true) {
+        GHashTableIter iter;
+        gpointer key, value;
+
+        g_hash_table_iter_init(&iter, lo->inodes);
+        if (!g_hash_table_iter_next(&iter, &key, &value)) {
+            break;
+        }
+
+        struct lo_inode *inode = value;
+        unref_inode_lolocked(lo, inode, inode->nlookup);
+    }
 }
 
 static struct fuse_lowlevel_ops lo_oper = {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 103/104] virtiofsd: add --thread-pool-size=NUM option
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (101 preceding siblings ...)
  2019-12-12 16:39 ` [PATCH 102/104] virtiofsd: fix lo_destroy() resource leaks Dr. David Alan Gilbert (git)
@ 2019-12-12 16:39 ` Dr. David Alan Gilbert (git)
  2020-01-07 12:25   ` Daniel P. Berrangé
  2020-01-17 13:35   ` Philippe Mathieu-Daudé
  2019-12-12 16:39 ` [PATCH 104/104] virtiofsd: Convert lo_destroy to take the lo->mutex lock itself Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  106 siblings, 2 replies; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:39 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: Stefan Hajnoczi <stefanha@redhat.com>

Add an option to control the size of the thread pool.  Requests are now
processed in parallel by default.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/virtiofsd/fuse_i.h        | 1 +
 tools/virtiofsd/fuse_lowlevel.c | 7 ++++++-
 tools/virtiofsd/fuse_virtio.c   | 5 +++--
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index 8a4a05b319..4da6a242ba 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -72,6 +72,7 @@ struct fuse_session {
     int   vu_listen_fd;
     int   vu_socketfd;
     struct fv_VuDev *virtio_dev;
+    int thread_pool_size;
 };
 
 struct fuse_chan {
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 9f01c05e3e..09a7b23726 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -27,6 +27,7 @@
 #include <sys/file.h>
 #include <unistd.h>
 
+#define THREAD_POOL_SIZE 64
 
 #define OFFSET_MAX 0x7fffffffffffffffLL
 
@@ -2523,6 +2524,7 @@ static const struct fuse_opt fuse_ll_opts[] = {
     LL_OPTION("--socket-path=%s", vu_socket_path, 0),
     LL_OPTION("vhost_user_socket=%s", vu_socket_path, 0),
     LL_OPTION("--fd=%d", vu_listen_fd, 0),
+    LL_OPTION("--thread-pool-size=%d", thread_pool_size, 0),
     FUSE_OPT_END
 };
 
@@ -2544,7 +2546,9 @@ void fuse_lowlevel_help(void)
         "    --socket-path=PATH         path for the vhost-user socket\n"
         "    -o vhost_user_socket=PATH  path for the vhost-user socket\n"
         "    --fd=FDNUM                 fd number of vhost-user socket\n"
-        "    -o auto_unmount            auto unmount on process termination\n");
+        "    -o auto_unmount            auto unmount on process termination\n"
+        "    --thread-pool-size=NUM     thread pool size limit (default %d)\n",
+        THREAD_POOL_SIZE);
 }
 
 void fuse_session_destroy(struct fuse_session *se)
@@ -2598,6 +2602,7 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
     }
     se->fd = -1;
     se->vu_listen_fd = -1;
+    se->thread_pool_size = THREAD_POOL_SIZE;
     se->conn.max_write = UINT_MAX;
     se->conn.max_readahead = UINT_MAX;
 
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index b696ac3135..7bc6ff2f19 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -570,10 +570,11 @@ static void *fv_queue_thread(void *opaque)
     struct fv_QueueInfo *qi = opaque;
     struct VuDev *dev = &qi->virtio_dev->dev;
     struct VuVirtq *q = vu_get_queue(dev, qi->qidx);
+    struct fuse_session *se = qi->virtio_dev->se;
     GThreadPool *pool;
 
-    pool = g_thread_pool_new(fv_queue_worker, qi, 1 /* TODO max_threads */,
-                             TRUE, NULL);
+    pool = g_thread_pool_new(fv_queue_worker, qi, se->thread_pool_size, TRUE,
+                             NULL);
     if (!pool) {
         fuse_log(FUSE_LOG_ERR, "%s: g_thread_pool_new failed\n", __func__);
         return NULL;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* [PATCH 104/104] virtiofsd: Convert lo_destroy to take the lo->mutex lock itself
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (102 preceding siblings ...)
  2019-12-12 16:39 ` [PATCH 103/104] virtiofsd: add --thread-pool-size=NUM option Dr. David Alan Gilbert (git)
@ 2019-12-12 16:39 ` Dr. David Alan Gilbert (git)
  2020-01-17 13:33   ` Philippe Mathieu-Daudé
  2019-12-12 18:21 ` [PATCH 000/104] virtiofs daemon [all] no-reply
                   ` (2 subsequent siblings)
  106 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-12 16:39 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

lo_destroy was relying on some implicit knowledge of the locking;
we can avoid this if we create an unref_inode that doesn't take
the lock and then grab it for the whole of the lo_destroy.

Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 38f4948e61..c37f57157e 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1328,14 +1328,13 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
     lo_inode_put(lo, &inode);
 }
 
-static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
-                                 uint64_t n)
+/* To be called with lo->mutex held */
+static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
 {
     if (!inode) {
         return;
     }
 
-    pthread_mutex_lock(&lo->mutex);
     assert(inode->nlookup >= n);
     inode->nlookup -= n;
     if (!inode->nlookup) {
@@ -1346,15 +1345,24 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
         }
         g_hash_table_destroy(inode->posix_locks);
         pthread_mutex_destroy(&inode->plock_mutex);
-        pthread_mutex_unlock(&lo->mutex);
 
         /* Drop our refcount from lo_do_lookup() */
         lo_inode_put(lo, &inode);
-    } else {
-        pthread_mutex_unlock(&lo->mutex);
     }
 }
 
+static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
+                                 uint64_t n)
+{
+    if (!inode) {
+        return;
+    }
+
+    pthread_mutex_lock(&lo->mutex);
+    unref_inode(lo, inode, n);
+    pthread_mutex_unlock(&lo->mutex);
+}
+
 static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
 {
     struct lo_data *lo = lo_data(req);
@@ -2441,13 +2449,7 @@ static void lo_destroy(void *userdata)
 {
     struct lo_data *lo = (struct lo_data *)userdata;
 
-    /*
-     * Normally lo->mutex must be taken when traversing lo->inodes but
-     * lo_destroy() is a serialized request so no races are possible here.
-     *
-     * In addition, we cannot acquire lo->mutex since unref_inode() takes it
-     * too and this would result in a recursive lock.
-     */
+    pthread_mutex_lock(&lo->mutex);
     while (true) {
         GHashTableIter iter;
         gpointer key, value;
@@ -2458,8 +2460,9 @@ static void lo_destroy(void *userdata)
         }
 
         struct lo_inode *inode = value;
-        unref_inode_lolocked(lo, inode, inode->nlookup);
+        unref_inode(lo, inode, inode->nlookup);
     }
+    pthread_mutex_unlock(&lo->mutex);
 }
 
 static struct fuse_lowlevel_ops lo_oper = {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* Re: [PATCH 000/104] virtiofs daemon [all]
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (103 preceding siblings ...)
  2019-12-12 16:39 ` [PATCH 104/104] virtiofsd: Convert lo_destroy to take the lo->mutex lock itself Dr. David Alan Gilbert (git)
@ 2019-12-12 18:21 ` no-reply
  2020-01-17 11:32 ` Dr. David Alan Gilbert
  2020-01-17 13:32 ` [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename() Philippe Mathieu-Daudé
  106 siblings, 0 replies; 307+ messages in thread
From: no-reply @ 2019-12-12 18:21 UTC (permalink / raw)
  To: dgilbert; +Cc: qemu-devel, stefanha, vgoyal

Patchew URL: https://patchew.org/QEMU/20191212163904.159893-1-dgilbert@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===




The full log is available at
http://patchew.org/logs/20191212163904.159893-1-dgilbert@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 092/104] virtiofsd: add man page
  2019-12-12 16:38 ` [PATCH 092/104] virtiofsd: add man page Dr. David Alan Gilbert (git)
@ 2019-12-13 14:33   ` Liam Merwick
  2019-12-13 15:26     ` Dr. David Alan Gilbert
  2020-01-07 12:13   ` Daniel P. Berrangé
  1 sibling, 1 reply; 307+ messages in thread
From: Liam Merwick @ 2019-12-13 14:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/2019 16:38, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   Makefile                       |  7 +++
>   tools/virtiofsd/virtiofsd.texi | 85 ++++++++++++++++++++++++++++++++++
>   2 files changed, 92 insertions(+)
>   create mode 100644 tools/virtiofsd/virtiofsd.texi
> 

... deleted ...

> +@c man begin EXAMPLES
> +Export @code{/var/lib/fs/vm001/} on vhost-user UNIX domain socket @code{/var/run/vm001-vhost-fs.sock}:
> +
> +@example
> +host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
> +host# qemu-system-x86_64 \
> +    -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
> +    -device vhost-user-fs-pci,chardev=char0,tag=myfs \
> +    -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
> +    -numa node,memdev=mem \
> +    ...
> +guest# mount -t virtio_fs \
> +    -o default_permissions,allow_other,user_id=0,group_id=0,rootmode=040000,dax \
> +    myfs /mnt



Should this be 'mount -t virtiofs myfs /mnt' like on 
https://virtio-fs.gitlab.io/howto-qemu.html ?

otherwise

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 092/104] virtiofsd: add man page
  2019-12-13 14:33   ` Liam Merwick
@ 2019-12-13 15:26     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2019-12-13 15:26 UTC (permalink / raw)
  To: Liam Merwick; +Cc: qemu-devel, stefanha, vgoyal

* Liam Merwick (liam.merwick@oracle.com) wrote:
> On 12/12/2019 16:38, Dr. David Alan Gilbert (git) wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   Makefile                       |  7 +++
> >   tools/virtiofsd/virtiofsd.texi | 85 ++++++++++++++++++++++++++++++++++
> >   2 files changed, 92 insertions(+)
> >   create mode 100644 tools/virtiofsd/virtiofsd.texi
> > 
> 
> ... deleted ...
> 
> > +@c man begin EXAMPLES
> > +Export @code{/var/lib/fs/vm001/} on vhost-user UNIX domain socket @code{/var/run/vm001-vhost-fs.sock}:
> > +
> > +@example
> > +host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
> > +host# qemu-system-x86_64 \
> > +    -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
> > +    -device vhost-user-fs-pci,chardev=char0,tag=myfs \
> > +    -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
> > +    -numa node,memdev=mem \
> > +    ...
> > +guest# mount -t virtio_fs \
> > +    -o default_permissions,allow_other,user_id=0,group_id=0,rootmode=040000,dax \
> > +    myfs /mnt
> 
> 
> 
> Should this be 'mount -t virtiofs myfs /mnt' like on
> https://virtio-fs.gitlab.io/howto-qemu.html ?

It should! The man page still had the old format; thanks for spotting
it.

> otherwise
> 
> Reviewed-by: Liam Merwick <liam.merwick@oracle.com>

Thank you!

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 025/104] virtiofsd: Add Makefile wiring for virtiofsd contrib
  2019-12-12 16:37 ` [PATCH 025/104] virtiofsd: Add Makefile wiring for virtiofsd contrib Dr. David Alan Gilbert (git)
@ 2019-12-13 16:02   ` Liam Merwick
  2019-12-13 16:56     ` Dr. David Alan Gilbert
  2020-01-03 15:41   ` Daniel P. Berrangé
  1 sibling, 1 reply; 307+ messages in thread
From: Liam Merwick @ 2019-12-13 16:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/2019 16:37, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Wire up the building of the virtiofsd in contrib.

s/contrib/tools/

otherwise

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>

> 
> virtiofsd relies on Linux-specific system calls and seccomp.  Anyone
> wishing to port it to other host operating systems should do so
> carefully and without reducing security.
> 
> Only allow building on Linux hosts.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   Makefile                      |  8 ++++++++
>   Makefile.objs                 |  1 +
>   tools/virtiofsd/Makefile.objs | 10 ++++++++++
>   3 files changed, 19 insertions(+)
>   create mode 100644 tools/virtiofsd/Makefile.objs
> 
> diff --git a/Makefile b/Makefile
> index b437a346d7..b7f7019a50 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -322,6 +322,8 @@ HELPERS-y =
>   HELPERS-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_LINUX)) = qemu-bridge-helper$(EXESUF)
>   
>   ifdef CONFIG_LINUX
> +HELPERS-y += virtiofsd$(EXESUF)
> +
>   ifdef CONFIG_VIRGL
>   ifdef CONFIG_GBM
>   HELPERS-y += vhost-user-gpu$(EXESUF)
> @@ -430,6 +432,7 @@ dummy := $(call unnest-vars,, \
>                   elf2dmp-obj-y \
>                   ivshmem-client-obj-y \
>                   ivshmem-server-obj-y \
> +                virtiofsd-obj-y \
>                   rdmacm-mux-obj-y \
>                   libvhost-user-obj-y \
>                   vhost-user-scsi-obj-y \
> @@ -674,6 +677,11 @@ rdmacm-mux$(EXESUF): LIBS += "-libumad"
>   rdmacm-mux$(EXESUF): $(rdmacm-mux-obj-y) $(COMMON_LDADDS)
>   	$(call LINK, $^)
>   
> +ifdef CONFIG_LINUX # relies on Linux-specific syscalls
> +virtiofsd$(EXESUF): $(virtiofsd-obj-y) libvhost-user.a $(COMMON_LDADDS)
> +	$(call LINK, $^)
> +endif
> +
>   vhost-user-gpu$(EXESUF): $(vhost-user-gpu-obj-y) $(libvhost-user-obj-y) libqemuutil.a libqemustub.a
>   	$(call LINK, $^)
>   
> diff --git a/Makefile.objs b/Makefile.objs
> index 11ba1a36bd..b5f667a4ba 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -125,6 +125,7 @@ vhost-user-blk-obj-y = contrib/vhost-user-blk/
>   rdmacm-mux-obj-y = contrib/rdmacm-mux/
>   vhost-user-input-obj-y = contrib/vhost-user-input/
>   vhost-user-gpu-obj-y = contrib/vhost-user-gpu/
> +virtiofsd-obj-y = tools/virtiofsd/
>   
>   ######################################################################
>   trace-events-subdirs =
> diff --git a/tools/virtiofsd/Makefile.objs b/tools/virtiofsd/Makefile.objs
> new file mode 100644
> index 0000000000..67be16332c
> --- /dev/null
> +++ b/tools/virtiofsd/Makefile.objs
> @@ -0,0 +1,10 @@
> +virtiofsd-obj-y = buffer.o \
> +                  fuse_opt.o \
> +                  fuse_log.o \
> +                  fuse_loop_mt.o \
> +                  fuse_lowlevel.o \
> +                  fuse_signals.o \
> +                  fuse_virtio.o \
> +                  helper.o \
> +                  passthrough_ll.o
> +
> 



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 025/104] virtiofsd: Add Makefile wiring for virtiofsd contrib
  2019-12-13 16:02   ` Liam Merwick
@ 2019-12-13 16:56     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2019-12-13 16:56 UTC (permalink / raw)
  To: Liam Merwick; +Cc: qemu-devel, stefanha, vgoyal

* Liam Merwick (liam.merwick@oracle.com) wrote:
> On 12/12/2019 16:37, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Wire up the building of the virtiofsd in contrib.
> 
> s/contrib/tools/

Ah! My sed of s-contrib/virtiofsd-tools/virtiofsd-
wasn't smart enough for that!

> otherwise
> 
> Reviewed-by: Liam Merwick <liam.merwick@oracle.com>

Thanks.

Dave

> 
> > 
> > virtiofsd relies on Linux-specific system calls and seccomp.  Anyone
> > wishing to port it to other host operating systems should do so
> > carefully and without reducing security.
> > 
> > Only allow building on Linux hosts.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   Makefile                      |  8 ++++++++
> >   Makefile.objs                 |  1 +
> >   tools/virtiofsd/Makefile.objs | 10 ++++++++++
> >   3 files changed, 19 insertions(+)
> >   create mode 100644 tools/virtiofsd/Makefile.objs
> > 
> > diff --git a/Makefile b/Makefile
> > index b437a346d7..b7f7019a50 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -322,6 +322,8 @@ HELPERS-y =
> >   HELPERS-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_LINUX)) = qemu-bridge-helper$(EXESUF)
> >   ifdef CONFIG_LINUX
> > +HELPERS-y += virtiofsd$(EXESUF)
> > +
> >   ifdef CONFIG_VIRGL
> >   ifdef CONFIG_GBM
> >   HELPERS-y += vhost-user-gpu$(EXESUF)
> > @@ -430,6 +432,7 @@ dummy := $(call unnest-vars,, \
> >                   elf2dmp-obj-y \
> >                   ivshmem-client-obj-y \
> >                   ivshmem-server-obj-y \
> > +                virtiofsd-obj-y \
> >                   rdmacm-mux-obj-y \
> >                   libvhost-user-obj-y \
> >                   vhost-user-scsi-obj-y \
> > @@ -674,6 +677,11 @@ rdmacm-mux$(EXESUF): LIBS += "-libumad"
> >   rdmacm-mux$(EXESUF): $(rdmacm-mux-obj-y) $(COMMON_LDADDS)
> >   	$(call LINK, $^)
> > +ifdef CONFIG_LINUX # relies on Linux-specific syscalls
> > +virtiofsd$(EXESUF): $(virtiofsd-obj-y) libvhost-user.a $(COMMON_LDADDS)
> > +	$(call LINK, $^)
> > +endif
> > +
> >   vhost-user-gpu$(EXESUF): $(vhost-user-gpu-obj-y) $(libvhost-user-obj-y) libqemuutil.a libqemustub.a
> >   	$(call LINK, $^)
> > diff --git a/Makefile.objs b/Makefile.objs
> > index 11ba1a36bd..b5f667a4ba 100644
> > --- a/Makefile.objs
> > +++ b/Makefile.objs
> > @@ -125,6 +125,7 @@ vhost-user-blk-obj-y = contrib/vhost-user-blk/
> >   rdmacm-mux-obj-y = contrib/rdmacm-mux/
> >   vhost-user-input-obj-y = contrib/vhost-user-input/
> >   vhost-user-gpu-obj-y = contrib/vhost-user-gpu/
> > +virtiofsd-obj-y = tools/virtiofsd/
> >   ######################################################################
> >   trace-events-subdirs =
> > diff --git a/tools/virtiofsd/Makefile.objs b/tools/virtiofsd/Makefile.objs
> > new file mode 100644
> > index 0000000000..67be16332c
> > --- /dev/null
> > +++ b/tools/virtiofsd/Makefile.objs
> > @@ -0,0 +1,10 @@
> > +virtiofsd-obj-y = buffer.o \
> > +                  fuse_opt.o \
> > +                  fuse_log.o \
> > +                  fuse_loop_mt.o \
> > +                  fuse_lowlevel.o \
> > +                  fuse_signals.o \
> > +                  fuse_virtio.o \
> > +                  helper.o \
> > +                  passthrough_ll.o
> > +
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 001/104] virtiofsd: Pull in upstream headers
  2019-12-12 16:37 ` [PATCH 001/104] virtiofsd: Pull in upstream headers Dr. David Alan Gilbert (git)
@ 2020-01-03 11:54   ` Daniel P. Berrangé
  2020-01-15 17:38   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 11:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:21PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Pull in headers fromlibfuse's upstream fuse-3.8.0
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse.h                | 1275 +++++++++++++++
>  tools/virtiofsd/fuse_common.h         |  823 ++++++++++
>  tools/virtiofsd/fuse_i.h              |  139 ++
>  tools/virtiofsd/fuse_log.h            |   82 +
>  tools/virtiofsd/fuse_lowlevel.h       | 2089 +++++++++++++++++++++++++
>  tools/virtiofsd/fuse_misc.h           |   59 +
>  tools/virtiofsd/fuse_opt.h            |  271 ++++
>  tools/virtiofsd/passthrough_helpers.h |   76 +
>  8 files changed, 4814 insertions(+)
>  create mode 100644 tools/virtiofsd/fuse.h
>  create mode 100644 tools/virtiofsd/fuse_common.h
>  create mode 100644 tools/virtiofsd/fuse_i.h
>  create mode 100644 tools/virtiofsd/fuse_log.h
>  create mode 100644 tools/virtiofsd/fuse_lowlevel.h
>  create mode 100644 tools/virtiofsd/fuse_misc.h
>  create mode 100644 tools/virtiofsd/fuse_opt.h
>  create mode 100644 tools/virtiofsd/passthrough_helpers.h

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 002/104] virtiofsd: Pull in kernel's fuse.h
  2019-12-12 16:37 ` [PATCH 002/104] virtiofsd: Pull in kernel's fuse.h Dr. David Alan Gilbert (git)
@ 2020-01-03 11:56   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 11:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:22PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Update scripts/update-linux-headers.sh to add fuse.h and
> use it to pull in fuse.h from the kernel; from v5.5-rc1
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/standard-headers/linux/fuse.h | 891 ++++++++++++++++++++++++++
>  scripts/update-linux-headers.sh       |   1 +
>  2 files changed, 892 insertions(+)
>  create mode 100644 include/standard-headers/linux/fuse.h

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 003/104] virtiofsd: Add auxiliary .c's
  2019-12-12 16:37 ` [PATCH 003/104] virtiofsd: Add auxiliary .c's Dr. David Alan Gilbert (git)
@ 2020-01-03 11:57   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 11:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add most of the non-main .c files we need from upstream fuse-3.8.0
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/buffer.c       | 321 ++++++++++++++++++++++++
>  tools/virtiofsd/fuse_log.c     |  40 +++
>  tools/virtiofsd/fuse_loop_mt.c | 362 +++++++++++++++++++++++++++
>  tools/virtiofsd/fuse_opt.c     | 423 +++++++++++++++++++++++++++++++
>  tools/virtiofsd/fuse_signals.c |  91 +++++++
>  tools/virtiofsd/helper.c       | 440 +++++++++++++++++++++++++++++++++
>  6 files changed, 1677 insertions(+)
>  create mode 100644 tools/virtiofsd/buffer.c
>  create mode 100644 tools/virtiofsd/fuse_log.c
>  create mode 100644 tools/virtiofsd/fuse_loop_mt.c
>  create mode 100644 tools/virtiofsd/fuse_opt.c
>  create mode 100644 tools/virtiofsd/fuse_signals.c
>  create mode 100644 tools/virtiofsd/helper.c

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 004/104] virtiofsd: Add fuse_lowlevel.c
  2019-12-12 16:37 ` [PATCH 004/104] virtiofsd: Add fuse_lowlevel.c Dr. David Alan Gilbert (git)
@ 2020-01-03 11:58   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 11:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:24PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> fuse_lowlevel is one of the largest files from the library
> and does most of the work.  Add it separately to keep the diff
> sizes small.
> Again this is from upstream fuse-3.8.0
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 3129 +++++++++++++++++++++++++++++++
>  1 file changed, 3129 insertions(+)
>  create mode 100644 tools/virtiofsd/fuse_lowlevel.c

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 005/104] virtiofsd: Add passthrough_ll
  2019-12-12 16:37 ` [PATCH 005/104] virtiofsd: Add passthrough_ll Dr. David Alan Gilbert (git)
@ 2020-01-03 12:01   ` Daniel P. Berrangé
  2020-01-03 12:15     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 12:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:25PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> passthrough_ll is one of the examples in the upstream fuse project
> and is the main part of our daemon here.  It passes through requests
> from fuse to the underlying filesystem, using syscalls as directly
> as possible.
> 
> From libfuse fuse-3.8.0
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 1338 ++++++++++++++++++++++++++++++
>  1 file changed, 1338 insertions(+)
>  create mode 100644 tools/virtiofsd/passthrough_ll.c
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> new file mode 100644
> index 0000000000..5372d02934
> --- /dev/null
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -0,0 +1,1338 @@
> +/*
> +  FUSE: Filesystem in Userspace
> +  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
> +
> +  This program can be distributed under the terms of the GNU GPL.

I presume this mistake exists in upstream fuse GIT - missing GPL version
number info here. This is important to correct since we're moving code
from another repo and thus the COPYING file it is referring to on the
next line is ambiguous to the casual reader.

> +  See the file COPYING.
> +*/

With the GPL version info added:

  Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 006/104] virtiofsd: Trim down imported files
  2019-12-12 16:37 ` [PATCH 006/104] virtiofsd: Trim down imported files Dr. David Alan Gilbert (git)
@ 2020-01-03 12:02   ` Daniel P. Berrangé
  2020-01-21  9:58   ` Xiao Yang
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 12:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:26PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> There's a lot of the original fuse code we don't need; trim them down.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse.h                |   8 -
>  tools/virtiofsd/fuse_common.h         |   8 -
>  tools/virtiofsd/fuse_i.h              |  22 -
>  tools/virtiofsd/fuse_log.h            |   8 -
>  tools/virtiofsd/fuse_loop_mt.c        | 308 ------------
>  tools/virtiofsd/fuse_lowlevel.c       | 653 +-------------------------
>  tools/virtiofsd/fuse_lowlevel.h       |   8 -
>  tools/virtiofsd/fuse_opt.h            |   8 -
>  tools/virtiofsd/helper.c              | 138 ------
>  tools/virtiofsd/passthrough_helpers.h |  26 -
>  10 files changed, 5 insertions(+), 1182 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 007/104] virtiofsd: Format imported files to qemu style
  2019-12-12 16:37 ` [PATCH 007/104] virtiofsd: Format imported files to qemu style Dr. David Alan Gilbert (git)
@ 2020-01-03 12:04   ` Daniel P. Berrangé
  2020-01-09 12:21   ` Aleksandar Markovic
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 12:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:27PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Mostly using a set like:
> 
> indent -nut -i 4 -nlp -br -cs -ce --no-space-after-function-call-names file
> clang-format -style=file -i -- file
> clang-tidy -fix-errors -checks=readability-braces-around-statements file
> clang-format -style=file -i -- file
> 
> With manual cleanups.
> 
> The .clang-format used is below.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Language:        Cpp
> AlignAfterOpenBracket: Align
> AlignConsecutiveAssignments: false # although we like it, it creates churn
> AlignConsecutiveDeclarations: false
> AlignEscapedNewlinesLeft: true
> AlignOperands:   true
> AlignTrailingComments: false # churn
> AllowAllParametersOfDeclarationOnNextLine: true
> AllowShortBlocksOnASingleLine: false
> AllowShortCaseLabelsOnASingleLine: false
> AllowShortFunctionsOnASingleLine: None
> AllowShortIfStatementsOnASingleLine: false
> AllowShortLoopsOnASingleLine: false
> AlwaysBreakAfterReturnType: None # AlwaysBreakAfterDefinitionReturnType is taken into account
> AlwaysBreakBeforeMultilineStrings: false
> BinPackArguments: true
> BinPackParameters: true
> BraceWrapping:
>   AfterControlStatement: false
>   AfterEnum:       false
>   AfterFunction:   true
>   AfterStruct:     false
>   AfterUnion:      false
>   BeforeElse:      false
>   IndentBraces:    false
> BreakBeforeBinaryOperators: None
> BreakBeforeBraces: Custom
> BreakBeforeTernaryOperators: false
> BreakStringLiterals: true
> ColumnLimit:     80
> ContinuationIndentWidth: 4
> Cpp11BracedListStyle: false
> DerivePointerAlignment: false
> DisableFormat:   false
> ForEachMacros:   [
>   'CPU_FOREACH',
>   'CPU_FOREACH_REVERSE',
>   'CPU_FOREACH_SAFE',
>   'IOMMU_NOTIFIER_FOREACH',
>   'QLIST_FOREACH',
>   'QLIST_FOREACH_ENTRY',
>   'QLIST_FOREACH_RCU',
>   'QLIST_FOREACH_SAFE',
>   'QLIST_FOREACH_SAFE_RCU',
>   'QSIMPLEQ_FOREACH',
>   'QSIMPLEQ_FOREACH_SAFE',
>   'QSLIST_FOREACH',
>   'QSLIST_FOREACH_SAFE',
>   'QTAILQ_FOREACH',
>   'QTAILQ_FOREACH_REVERSE',
>   'QTAILQ_FOREACH_SAFE',
>   'QTAILQ_RAW_FOREACH',
>   'RAMBLOCK_FOREACH'
> ]
> IncludeCategories:
>   - Regex:           '^"qemu/osdep.h'
>     Priority:        -3
>   - Regex:           '^"(block|chardev|crypto|disas|exec|fpu|hw|io|libdecnumber|migration|monitor|net|qapi|qemu|qom|standard-headers|sysemu|ui)/'
>     Priority:        -2
>   - Regex:           '^"(elf.h|qemu-common.h|glib-compat.h|qemu-io.h|trace-tcg.h)'
>     Priority:        -1
>   - Regex:           '.*'
>     Priority:        1
> IncludeIsMainRegex: '$'
> IndentCaseLabels: false
> IndentWidth:     4
> IndentWrappedFunctionNames: false
> KeepEmptyLinesAtTheStartOfBlocks: false
> MacroBlockBegin: '.*_BEGIN$' # only PREC_BEGIN ?
> MacroBlockEnd:   '.*_END$'
> MaxEmptyLinesToKeep: 2
> PointerAlignment: Right
> ReflowComments:  true
> SortIncludes:    true
> SpaceAfterCStyleCast: false
> SpaceBeforeAssignmentOperators: true
> SpaceBeforeParens: ControlStatements
> SpaceInEmptyParentheses: false
> SpacesBeforeTrailingComments: 1
> SpacesInContainerLiterals: true
> SpacesInParentheses: false
> SpacesInSquareBrackets: false
> Standard:        Auto
> UseTab:          Never
> ...
> ---
>  tools/virtiofsd/buffer.c              |  550 ++--
>  tools/virtiofsd/fuse.h                | 1572 +++++------
>  tools/virtiofsd/fuse_common.h         |  764 ++---
>  tools/virtiofsd/fuse_i.h              |  127 +-
>  tools/virtiofsd/fuse_log.c            |   38 +-
>  tools/virtiofsd/fuse_log.h            |   32 +-
>  tools/virtiofsd/fuse_loop_mt.c        |   66 +-
>  tools/virtiofsd/fuse_lowlevel.c       | 3678 +++++++++++++------------
>  tools/virtiofsd/fuse_lowlevel.h       | 2401 ++++++++--------
>  tools/virtiofsd/fuse_misc.h           |   30 +-
>  tools/virtiofsd/fuse_opt.c            |  659 ++---
>  tools/virtiofsd/fuse_opt.h            |   79 +-
>  tools/virtiofsd/fuse_signals.c        |  118 +-
>  tools/virtiofsd/helper.c              |  517 ++--
>  tools/virtiofsd/passthrough_helpers.h |   33 +-
>  tools/virtiofsd/passthrough_ll.c      | 2063 +++++++-------
>  16 files changed, 6530 insertions(+), 6197 deletions(-)


Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 008/104] virtiofsd: remove mountpoint dummy argument
  2019-12-12 16:37 ` [PATCH 008/104] virtiofsd: remove mountpoint dummy argument Dr. David Alan Gilbert (git)
@ 2020-01-03 12:12   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 12:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:28PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Classic FUSE file system daemons take a mountpoint argument but
> virtiofsd exposes a vhost-user UNIX domain socket instead.  The
> mountpoint argument is not used by virtiofsd but the user is still
> required to pass a dummy argument on the command-line.
> 
> Remove the mountpoint argument to clean up the command-line.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  |  2 +-
>  tools/virtiofsd/fuse_lowlevel.h  |  4 +---
>  tools/virtiofsd/helper.c         | 20 +++-----------------
>  tools/virtiofsd/passthrough_ll.c | 12 ++----------
>  4 files changed, 7 insertions(+), 31 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 009/104] virtiofsd: remove unused notify reply support
  2019-12-12 16:37 ` [PATCH 009/104] virtiofsd: remove unused notify reply support Dr. David Alan Gilbert (git)
@ 2020-01-03 12:14   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 12:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:29PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Notify reply support is unused by virtiofsd.  The code would need to be
> updated to validate input buffer sizes.  Remove this unused code since
> changes to it are untestable.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 147 +-------------------------------
>  tools/virtiofsd/fuse_lowlevel.h |  47 ----------
>  2 files changed, 1 insertion(+), 193 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 005/104] virtiofsd: Add passthrough_ll
  2020-01-03 12:01   ` Daniel P. Berrangé
@ 2020-01-03 12:15     ` Dr. David Alan Gilbert
  2020-01-03 12:33       ` Daniel P. Berrangé
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-03 12:15 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:37:25PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > passthrough_ll is one of the examples in the upstream fuse project
> > and is the main part of our daemon here.  It passes through requests
> > from fuse to the underlying filesystem, using syscalls as directly
> > as possible.
> > 
> > From libfuse fuse-3.8.0
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 1338 ++++++++++++++++++++++++++++++
> >  1 file changed, 1338 insertions(+)
> >  create mode 100644 tools/virtiofsd/passthrough_ll.c
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > new file mode 100644
> > index 0000000000..5372d02934
> > --- /dev/null
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -0,0 +1,1338 @@
> > +/*
> > +  FUSE: Filesystem in Userspace
> > +  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
> > +
> > +  This program can be distributed under the terms of the GNU GPL.
> 
> I presume this mistake exists in upstream fuse GIT - missing GPL version
> number info here.

Yes it is, see:
https://github.com/libfuse/libfuse/blob/d735af94fa54a5555ce725f1d4e6b97b812b6603/example/passthrough_ll.c

And at the time that was added their COPYING file was GPL v2.

although they've since renamed COPYING to GPL2.txt but not updated
the comments; but they added a LICENSE file.
https://github.com/libfuse/libfuse/commit/e8bcd8461ce7dfdc7366f44bad8d855696e73c3b

> This is important to correct since we're moving code
> from another repo and thus the COPYING file it is referring to on the
> next line is ambiguous to the casual reader.
> 
> > +  See the file COPYING.
> > +*/
> 
> With the GPL version info added:

So just change 'GNU GPL' to 'GNU GPL version 2' during the import?

>   Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks,

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 010/104] virtiofsd: Fix fuse_daemonize ignored return values
  2019-12-12 16:37 ` [PATCH 010/104] virtiofsd: Fix fuse_daemonize ignored return values Dr. David Alan Gilbert (git)
@ 2020-01-03 12:18   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 12:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:30PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> QEMU's compiler enables warnings/errors for ignored values
> and the (void) trick used in the fuse code isn't enough.
> Turn all the return values into a return value on the function.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/helper.c | 33 ++++++++++++++++++++++-----------
>  1 file changed, 22 insertions(+), 11 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index d8c42401a7..8afccfc15e 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -10,12 +10,10 @@
>   * See the file COPYING.LIB.
>   */
>  
> -#include "config.h"
>  #include "fuse_i.h"
>  #include "fuse_lowlevel.h"
>  #include "fuse_misc.h"
>  #include "fuse_opt.h"
> -#include "mount_util.h"
>  
>  #include <errno.h>
>  #include <limits.h>
> @@ -177,6 +175,7 @@ int fuse_parse_cmdline(struct fuse_args *args, struct fuse_cmdline_opts *opts)
>  
>  int fuse_daemonize(int foreground)

Yay, 4th implementation of "daemonize" logic in QEMU codebase :-)

One day in the future it would be a nice idea to have a helper for
this that we can share across the system emulator, qemu guest agent,
qemu-nbd and virtiofsd. Not a requirement for this initial merge.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 011/104] virtiofsd: Fix common header and define for QEMU builds
  2019-12-12 16:37 ` [PATCH 011/104] virtiofsd: Fix common header and define for QEMU builds Dr. David Alan Gilbert (git)
@ 2020-01-03 12:22   ` Daniel P. Berrangé
  2020-01-06 16:29     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 12:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:31PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> All of the fuse files include config.h and define GNU_SOURCE
> where we don't have either under our build - remove them.

There's a bunch of other random changes in this patch - were these
all fallout from removing the config.h header, or were they meant
for a separate commit ?

> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/buffer.c         |  3 ---
>  tools/virtiofsd/fuse_i.h         |  3 +++
>  tools/virtiofsd/fuse_loop_mt.c   |  3 +--
>  tools/virtiofsd/fuse_lowlevel.c  | 12 +-----------
>  tools/virtiofsd/fuse_opt.c       |  1 -
>  tools/virtiofsd/fuse_signals.c   |  1 -
>  tools/virtiofsd/passthrough_ll.c |  9 ++-------
>  7 files changed, 7 insertions(+), 25 deletions(-)
> 
> diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
> index 38521f5889..1d7e6d2439 100644
> --- a/tools/virtiofsd/buffer.c
> +++ b/tools/virtiofsd/buffer.c
> @@ -9,9 +9,6 @@
>   * See the file COPYING.LIB
>   */
>  
> -#define _GNU_SOURCE
> -
> -#include "config.h"
>  #include "fuse_i.h"
>  #include "fuse_lowlevel.h"
>  #include <assert.h>
> diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> index 1119e85e57..0b5acc8765 100644
> --- a/tools/virtiofsd/fuse_i.h
> +++ b/tools/virtiofsd/fuse_i.h
> @@ -6,6 +6,9 @@
>   * See the file COPYING.LIB
>   */
>  
> +#define FUSE_USE_VERSION 31
> +
> +
>  #include "fuse.h"
>  #include "fuse_lowlevel.h"
>  
> diff --git a/tools/virtiofsd/fuse_loop_mt.c b/tools/virtiofsd/fuse_loop_mt.c
> index 39e080d9ff..00138b2ab3 100644
> --- a/tools/virtiofsd/fuse_loop_mt.c
> +++ b/tools/virtiofsd/fuse_loop_mt.c
> @@ -8,11 +8,10 @@
>   * See the file COPYING.LIB.
>   */
>  
> -#include "config.h"
>  #include "fuse_i.h"
> -#include "fuse_kernel.h"
>  #include "fuse_lowlevel.h"
>  #include "fuse_misc.h"
> +#include "standard-headers/linux/fuse.h"
>  
>  #include <assert.h>
>  #include <errno.h>
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 0d7b2c3dc9..497eb25487 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -9,14 +9,10 @@
>   * See the file COPYING.LIB
>   */
>  
> -#define _GNU_SOURCE
> -
> -#include "config.h"
>  #include "fuse_i.h"
> -#include "fuse_kernel.h"
> +#include "standard-headers/linux/fuse.h"
>  #include "fuse_misc.h"
>  #include "fuse_opt.h"
> -#include "mount_util.h"
>  
>  #include <assert.h>
>  #include <errno.h>
> @@ -2093,7 +2089,6 @@ static struct {
>      [FUSE_RENAME2] = { do_rename2, "RENAME2" },
>      [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
>      [FUSE_LSEEK] = { do_lseek, "LSEEK" },
> -    [CUSE_INIT] = { cuse_lowlevel_init, "CUSE_INIT" },
>  };
>  
>  #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
> @@ -2220,7 +2215,6 @@ void fuse_lowlevel_version(void)
>  {
>      printf("using FUSE kernel interface version %i.%i\n", FUSE_KERNEL_VERSION,
>             FUSE_KERNEL_MINOR_VERSION);
> -    fuse_mount_version();
>  }
>  
>  void fuse_lowlevel_help(void)
> @@ -2310,10 +2304,6 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
>          goto out4;
>      }
>  
> -    if (se->debug) {
> -        fuse_log(FUSE_LOG_DEBUG, "FUSE library version: %s\n", PACKAGE_VERSION);
> -    }
> -
>      se->bufsize = FUSE_MAX_MAX_PAGES * getpagesize() + FUSE_BUFFER_HEADER_SIZE;
>  
>      list_init_req(&se->list);
> diff --git a/tools/virtiofsd/fuse_opt.c b/tools/virtiofsd/fuse_opt.c
> index edd36f4a3b..1fee55e266 100644
> --- a/tools/virtiofsd/fuse_opt.c
> +++ b/tools/virtiofsd/fuse_opt.c
> @@ -10,7 +10,6 @@
>   */
>  
>  #include "fuse_opt.h"
> -#include "config.h"
>  #include "fuse_i.h"
>  #include "fuse_misc.h"
>  
> diff --git a/tools/virtiofsd/fuse_signals.c b/tools/virtiofsd/fuse_signals.c
> index 19d6791cb9..10a6f88088 100644
> --- a/tools/virtiofsd/fuse_signals.c
> +++ b/tools/virtiofsd/fuse_signals.c
> @@ -8,7 +8,6 @@
>   * See the file COPYING.LIB
>   */
>  
> -#include "config.h"
>  #include "fuse_i.h"
>  #include "fuse_lowlevel.h"
>  
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index a79ec2c70d..0e543353a4 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -35,15 +35,10 @@
>   * \include passthrough_ll.c
>   */
>  
> -#define _GNU_SOURCE
> -#define FUSE_USE_VERSION 31
> -
> -#include "config.h"
> -
> +#include "fuse_lowlevel.h"
>  #include <assert.h>
>  #include <dirent.h>
>  #include <errno.h>
> -#include <fuse_lowlevel.h>
>  #include <inttypes.h>
>  #include <limits.h>
>  #include <pthread.h>
> @@ -58,6 +53,7 @@
>  
>  #include "passthrough_helpers.h"
>  
> +#define HAVE_POSIX_FALLOCATE 1
>  /*
>   * We are re-using pointers to our `struct lo_inode` and `struct
>   * lo_dirp` elements as inodes. This means that we must be able to
> @@ -1303,7 +1299,6 @@ int main(int argc, char *argv[])
>          ret = 0;
>          goto err_out1;
>      } else if (opts.show_version) {
> -        printf("FUSE library version %s\n", fuse_pkgversion());
>          fuse_lowlevel_version();
>          ret = 0;
>          goto err_out1;
> -- 
> 2.23.0
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 012/104] virtiofsd: Trim out compatibility code
  2019-12-12 16:37 ` [PATCH 012/104] virtiofsd: Trim out compatibility code Dr. David Alan Gilbert (git)
@ 2020-01-03 12:26   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 12:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:32PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> virtiofsd only supports major=7, minor>=31; trim out a lot of
> old compatibility code.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 330 ++++++++++++--------------------
>  1 file changed, 119 insertions(+), 211 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 005/104] virtiofsd: Add passthrough_ll
  2020-01-03 12:15     ` Dr. David Alan Gilbert
@ 2020-01-03 12:33       ` Daniel P. Berrangé
  2020-01-03 14:37         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 12:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, stefanha, vgoyal

On Fri, Jan 03, 2020 at 12:15:35PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Dec 12, 2019 at 04:37:25PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > passthrough_ll is one of the examples in the upstream fuse project
> > > and is the main part of our daemon here.  It passes through requests
> > > from fuse to the underlying filesystem, using syscalls as directly
> > > as possible.
> > > 
> > > From libfuse fuse-3.8.0
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 1338 ++++++++++++++++++++++++++++++
> > >  1 file changed, 1338 insertions(+)
> > >  create mode 100644 tools/virtiofsd/passthrough_ll.c
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > new file mode 100644
> > > index 0000000000..5372d02934
> > > --- /dev/null
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -0,0 +1,1338 @@
> > > +/*
> > > +  FUSE: Filesystem in Userspace
> > > +  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
> > > +
> > > +  This program can be distributed under the terms of the GNU GPL.
> > 
> > I presume this mistake exists in upstream fuse GIT - missing GPL version
> > number info here.
> 
> Yes it is, see:
> https://github.com/libfuse/libfuse/blob/d735af94fa54a5555ce725f1d4e6b97b812b6603/example/passthrough_ll.c
> 
> And at the time that was added their COPYING file was GPL v2.
> 
> although they've since renamed COPYING to GPL2.txt but not updated
> the comments; but they added a LICENSE file.
> https://github.com/libfuse/libfuse/commit/e8bcd8461ce7dfdc7366f44bad8d855696e73c3b
> 
> > This is important to correct since we're moving code
> > from another repo and thus the COPYING file it is referring to on the
> > next line is ambiguous to the casual reader.
> > 
> > > +  See the file COPYING.
> > > +*/
> > 
> > With the GPL version info added:
> 
> So just change 'GNU GPL' to 'GNU GPL version 2' during the import?

Yes, or just  "GPLv2" is fine too. Ideally submit a patch to libfuse
upstream & then just copy the result.

> 
> >   Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 005/104] virtiofsd: Add passthrough_ll
  2020-01-03 12:33       ` Daniel P. Berrangé
@ 2020-01-03 14:37         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-03 14:37 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Fri, Jan 03, 2020 at 12:15:35PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Dec 12, 2019 at 04:37:25PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > passthrough_ll is one of the examples in the upstream fuse project
> > > > and is the main part of our daemon here.  It passes through requests
> > > > from fuse to the underlying filesystem, using syscalls as directly
> > > > as possible.
> > > > 
> > > > From libfuse fuse-3.8.0
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/passthrough_ll.c | 1338 ++++++++++++++++++++++++++++++
> > > >  1 file changed, 1338 insertions(+)
> > > >  create mode 100644 tools/virtiofsd/passthrough_ll.c
> > > > 
> > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > new file mode 100644
> > > > index 0000000000..5372d02934
> > > > --- /dev/null
> > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > @@ -0,0 +1,1338 @@
> > > > +/*
> > > > +  FUSE: Filesystem in Userspace
> > > > +  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
> > > > +
> > > > +  This program can be distributed under the terms of the GNU GPL.
> > > 
> > > I presume this mistake exists in upstream fuse GIT - missing GPL version
> > > number info here.
> > 
> > Yes it is, see:
> > https://github.com/libfuse/libfuse/blob/d735af94fa54a5555ce725f1d4e6b97b812b6603/example/passthrough_ll.c
> > 
> > And at the time that was added their COPYING file was GPL v2.
> > 
> > although they've since renamed COPYING to GPL2.txt but not updated
> > the comments; but they added a LICENSE file.
> > https://github.com/libfuse/libfuse/commit/e8bcd8461ce7dfdc7366f44bad8d855696e73c3b
> > 
> > > This is important to correct since we're moving code
> > > from another repo and thus the COPYING file it is referring to on the
> > > next line is ambiguous to the casual reader.
> > > 
> > > > +  See the file COPYING.
> > > > +*/
> > > 
> > > With the GPL version info added:
> > 
> > So just change 'GNU GPL' to 'GNU GPL version 2' during the import?
> 
> Yes, or just  "GPLv2" is fine too. Ideally submit a patch to libfuse
> upstream & then just copy the result.

libfuse pull request created:
  https://github.com/libfuse/libfuse/pull/485

> > 
> > >   Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> > 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 013/104] virtiofsd: Make fsync work even if only inode is passed in
  2019-12-12 16:37 ` [PATCH 013/104] virtiofsd: Make fsync work even if only inode is passed in Dr. David Alan Gilbert (git)
@ 2020-01-03 15:13   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:33PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> 
> If caller has not sent file handle in request, then using inode, retrieve
> the fd opened using O_PATH and use that to open file again and issue
> fsync. This will be needed when dax_flush() calls fsync. At that time
> we only have inode information (and not file).
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c  |  6 +++++-
>  tools/virtiofsd/passthrough_ll.c | 28 ++++++++++++++++++++++++++--
>  2 files changed, 31 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 014/104] virtiofsd: Add options for virtio
  2019-12-12 16:37 ` [PATCH 014/104] virtiofsd: Add options for virtio Dr. David Alan Gilbert (git)
@ 2020-01-03 15:18   ` Daniel P. Berrangé
  2020-01-10 16:01     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:34PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add options to specify parameters for virtio-fs paths, i.e.
> 
>    ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_i.h        |  1 +
>  tools/virtiofsd/fuse_lowlevel.c | 17 ++++++++++++-----
>  tools/virtiofsd/helper.c        | 22 +++++++++++-----------
>  3 files changed, 24 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> index 0b5acc8765..f58be71e4b 100644
> --- a/tools/virtiofsd/fuse_i.h
> +++ b/tools/virtiofsd/fuse_i.h
> @@ -63,6 +63,7 @@ struct fuse_session {
>      struct fuse_notify_req notify_list;
>      size_t bufsize;
>      int error;
> +    char *vu_socket_path;
>  };
>  
>  struct fuse_chan {
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 167701b453..da708161e1 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -2118,8 +2118,12 @@ reply_err:
>      }
>  
>  static const struct fuse_opt fuse_ll_opts[] = {
> -    LL_OPTION("debug", debug, 1), LL_OPTION("-d", debug, 1),
> -    LL_OPTION("--debug", debug, 1), LL_OPTION("allow_root", deny_others, 1),
> +    LL_OPTION("debug", debug, 1),
> +    LL_OPTION("-d", debug, 1),
> +    LL_OPTION("--debug", debug, 1),

Pre-existing, but I'm not convinced we really need 3 different
ways to enable debugging - I would think -d / --debug is sufficient,
without needing "-o debug".


> +    LL_OPTION("allow_root", deny_others, 1),
> +    LL_OPTION("--socket-path=%s", vu_socket_path, 0),
> +    LL_OPTION("vhost_user_socket=%s", vu_socket_path, 0),

Similarly here I'm not convinced we need to add both
"--socket-path PATH" and "-o vhost_user_socket=PATH"


IIRC, we need --socket-path for compliance with QEMU's
standard execution model for vhost helpers.

>      FUSE_OPT_END
>  };
>  
> @@ -2135,9 +2139,12 @@ void fuse_lowlevel_help(void)
>       * These are not all options, but the ones that are
>       * potentially of interest to an end-user
>       */
> -    printf("    -o allow_other         allow access by all users\n"
> -           "    -o allow_root          allow access by root\n"
> -           "    -o auto_unmount        auto unmount on process termination\n");
> +    printf(
> +        "    -o allow_other             allow access by all users\n"
> +        "    -o allow_root              allow access by root\n"
> +        "    --socket-path=PATH         path for the vhost-user socket\n"
> +        "    -o vhost_user_socket=PATH  path for the vhost-user socket\n"
> +        "    -o auto_unmount            auto unmount on process termination\n");
>  }
>  
>  void fuse_session_destroy(struct fuse_session *se)
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 8afccfc15e..48e38a7963 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -128,17 +128,17 @@ static const struct fuse_opt conn_info_opt_spec[] = {
>  
>  void fuse_cmdline_help(void)
>  {
> -    printf(
> -        "    -h   --help            print help\n"
> -        "    -V   --version         print version\n"
> -        "    -d   -o debug          enable debug output (implies -f)\n"
> -        "    -f                     foreground operation\n"
> -        "    -s                     disable multi-threaded operation\n"
> -        "    -o clone_fd            use separate fuse device fd for each "
> -        "thread\n"
> -        "                           (may improve performance)\n"
> -        "    -o max_idle_threads    the maximum number of idle worker threads\n"
> -        "                           allowed (default: 10)\n");
> +    printf("    -h   --help                print help\n"
> +           "    -V   --version             print version\n"
> +           "    -d   -o debug              enable debug output (implies -f)\n"
> +           "    -f                         foreground operation\n"
> +           "    -s                         disable multi-threaded operation\n"
> +           "    -o clone_fd                use separate fuse device fd for "
> +           "each thread\n"
> +           "                               (may improve performance)\n"
> +           "    -o max_idle_threads        the maximum number of idle worker "
> +           "threads\n"
> +           "                               allowed (default: 10)\n");
>  }
>  
>  static int fuse_helper_opt_proc(void *data, const char *arg, int key,
> -- 
> 2.23.0
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 015/104] virtiofsd: add -o source=PATH to help output
  2019-12-12 16:37 ` [PATCH 015/104] virtiofsd: add -o source=PATH to help output Dr. David Alan Gilbert (git)
@ 2020-01-03 15:18   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:35PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> The -o source=PATH option will be used by most command-line invocations.
> Let's document it!
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting
  2019-12-12 16:37 ` [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting Dr. David Alan Gilbert (git)
@ 2020-01-03 15:21   ` Daniel P. Berrangé
  2020-01-21  6:57   ` Misono Tomohiro
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:36PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When run with vhost-user options we conect to the QEMU instead
> via a socket.  Start this off by creating the socket.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_i.h        |  7 ++-
>  tools/virtiofsd/fuse_lowlevel.c | 55 +++-------------------
>  tools/virtiofsd/fuse_virtio.c   | 83 +++++++++++++++++++++++++++++++++
>  tools/virtiofsd/fuse_virtio.h   | 23 +++++++++
>  4 files changed, 118 insertions(+), 50 deletions(-)
>  create mode 100644 tools/virtiofsd/fuse_virtio.c
>  create mode 100644 tools/virtiofsd/fuse_virtio.h

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 017/104] virtiofsd: Start wiring up vhost-user
  2019-12-12 16:37 ` [PATCH 017/104] virtiofsd: Start wiring up vhost-user Dr. David Alan Gilbert (git)
@ 2020-01-03 15:25   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:37PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Listen on our unix socket for the connection from QEMU, when we get it
> initialise vhost-user and dive into our own loop variant (currently
> dummy).
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_i.h         |  4 ++
>  tools/virtiofsd/fuse_loop_mt.c   |  1 +
>  tools/virtiofsd/fuse_lowlevel.c  |  5 ++
>  tools/virtiofsd/fuse_lowlevel.h  |  7 +++
>  tools/virtiofsd/fuse_virtio.c    | 87 +++++++++++++++++++++++++++++++-
>  tools/virtiofsd/fuse_virtio.h    |  2 +
>  tools/virtiofsd/passthrough_ll.c |  7 +--
>  7 files changed, 107 insertions(+), 6 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 018/104] virtiofsd: Add main virtio loop
  2019-12-12 16:37 ` [PATCH 018/104] virtiofsd: Add main virtio loop Dr. David Alan Gilbert (git)
@ 2020-01-03 15:26   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:38PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Processes incoming requests on the vhost-user fd.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 42 ++++++++++++++++++++++++++++++++---
>  1 file changed, 39 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 019/104] virtiofsd: get/set features callbacks
  2019-12-12 16:37 ` [PATCH 019/104] virtiofsd: get/set features callbacks Dr. David Alan Gilbert (git)
@ 2020-01-03 15:26   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:39PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add the get/set features callbacks.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 020/104] virtiofsd: Start queue threads
  2019-12-12 16:37 ` [PATCH 020/104] virtiofsd: Start queue threads Dr. David Alan Gilbert (git)
@ 2020-01-03 15:27   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:40PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Start a thread for each queue when we get notified it's been started.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> fix by:
> Signed-off-by: Jun Piao <piaojun@huawei.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 89 +++++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 021/104] virtiofsd: Poll kick_fd for queue
  2019-12-12 16:37 ` [PATCH 021/104] virtiofsd: Poll kick_fd for queue Dr. David Alan Gilbert (git)
@ 2020-01-03 15:33   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:41PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> In the queue thread poll the kick_fd we're passed.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 40 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 022/104] virtiofsd: Start reading commands from queue
  2019-12-12 16:37 ` [PATCH 022/104] virtiofsd: Start reading commands from queue Dr. David Alan Gilbert (git)
@ 2020-01-03 15:34   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:42PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Pop queue elements off queues, copy the data from them and
> pass that to fuse.
> 
>   Note: 'out' in a VuVirtqElement is from QEMU
>         'in' in libfuse is into the daemon
> 
>   So we read from the out iov's to get a fuse_in_header
> 
> When we get a kick we've got to read all the elements until the queue
> is empty.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_i.h      |   2 +
>  tools/virtiofsd/fuse_virtio.c | 100 +++++++++++++++++++++++++++++++++-
>  2 files changed, 99 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 023/104] virtiofsd: Send replies to messages
  2019-12-12 16:37 ` [PATCH 023/104] virtiofsd: Send replies to messages Dr. David Alan Gilbert (git)
@ 2020-01-03 15:36   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:43PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Route fuse out messages back through the same queue elements
> that had the command that triggered the request.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c |   4 ++
>  tools/virtiofsd/fuse_virtio.c   | 107 ++++++++++++++++++++++++++++++--
>  tools/virtiofsd/fuse_virtio.h   |   4 ++
>  3 files changed, 111 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 024/104] virtiofsd: Keep track of replies
  2019-12-12 16:37 ` [PATCH 024/104] virtiofsd: Keep track of replies Dr. David Alan Gilbert (git)
@ 2020-01-03 15:41   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:44PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Keep track of whether we sent a reply to a request; this is a bit
> paranoid but it means:
>   a) We should always recycle an element even if there was an error
>      in the request
>   b) Never try and send two replies on one queue element
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 025/104] virtiofsd: Add Makefile wiring for virtiofsd contrib
  2019-12-12 16:37 ` [PATCH 025/104] virtiofsd: Add Makefile wiring for virtiofsd contrib Dr. David Alan Gilbert (git)
  2019-12-13 16:02   ` Liam Merwick
@ 2020-01-03 15:41   ` Daniel P. Berrangé
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-03 15:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:45PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Wire up the building of the virtiofsd in contrib.
> 
> virtiofsd relies on Linux-specific system calls and seccomp.  Anyone
> wishing to port it to other host operating systems should do so
> carefully and without reducing security.
> 
> Only allow building on Linux hosts.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  Makefile                      |  8 ++++++++
>  Makefile.objs                 |  1 +
>  tools/virtiofsd/Makefile.objs | 10 ++++++++++
>  3 files changed, 19 insertions(+)
>  create mode 100644 tools/virtiofsd/Makefile.objs

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 027/104] virtiofsd: add --fd=FDNUM fd passing option
  2019-12-12 16:37 ` [PATCH 027/104] virtiofsd: add --fd=FDNUM fd passing option Dr. David Alan Gilbert (git)
@ 2020-01-06 14:12   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:47PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Although --socket-path=PATH is useful for manual invocations, management
> tools typically create the UNIX domain socket themselves and pass it to
> the vhost-user device backend.  This way QEMU can be launched
> immediately with a valid socket.  No waiting for the vhost-user device
> backend is required when fd passing is used.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_i.h        |  1 +
>  tools/virtiofsd/fuse_lowlevel.c | 14 +++++++++---
>  tools/virtiofsd/fuse_virtio.c   | 39 ++++++++++++++++++++++++---------
>  3 files changed, 41 insertions(+), 13 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 028/104] virtiofsd: make -f (foreground) the default
  2019-12-12 16:37 ` [PATCH 028/104] virtiofsd: make -f (foreground) the default Dr. David Alan Gilbert (git)
@ 2020-01-06 14:19   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:48PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> According to vhost-user.rst "Backend program conventions", backend
> programs should run in the foregound by default.  Follow the
> conventions so libvirt and other management tools can control virtiofsd
> in a standard way.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/helper.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


> 
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 48e38a7963..d4fff4fa53 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -28,6 +28,11 @@
>      {                                               \
>          t, offsetof(struct fuse_cmdline_opts, p), 1 \
>      }
> +#define FUSE_HELPER_OPT_VALUE(t, p, v)              \
> +    {                                               \
> +        t, offsetof(struct fuse_cmdline_opts, p), v \
> +    }
> +
>  
>  static const struct fuse_opt fuse_helper_opts[] = {
>      FUSE_HELPER_OPT("-h", show_help),
> @@ -41,6 +46,7 @@ static const struct fuse_opt fuse_helper_opts[] = {
>      FUSE_OPT_KEY("-d", FUSE_OPT_KEY_KEEP),
>      FUSE_OPT_KEY("debug", FUSE_OPT_KEY_KEEP),
>      FUSE_HELPER_OPT("-f", foreground),
> +    FUSE_HELPER_OPT_VALUE("--daemonize", foreground, 0),
>      FUSE_HELPER_OPT("-s", singlethread),
>      FUSE_HELPER_OPT("fsname=", nodefault_subtype),
>      FUSE_OPT_KEY("fsname=", FUSE_OPT_KEY_KEEP),
> @@ -132,6 +138,7 @@ void fuse_cmdline_help(void)
>             "    -V   --version             print version\n"
>             "    -d   -o debug              enable debug output (implies -f)\n"
>             "    -f                         foreground operation\n"
> +           "    --daemonize                run in background\n"

Wonder if there is sense in documenting '--daemonize' as a standard
for backend programs, even if QEMU/libvirt don't need it.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 029/104] virtiofsd: add vhost-user.json file
  2019-12-12 16:37 ` [PATCH 029/104] virtiofsd: add vhost-user.json file Dr. David Alan Gilbert (git)
@ 2020-01-06 14:19   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:49PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Install a vhost-user.json file describing virtiofsd.  This allows
> libvirt and other management tools to enumerate vhost-user backend
> programs.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  .gitignore                                | 1 +
>  Makefile                                  | 1 +
>  tools/virtiofsd/50-qemu-virtiofsd.json.in | 5 +++++
>  3 files changed, 7 insertions(+)
>  create mode 100644 tools/virtiofsd/50-qemu-virtiofsd.json.in

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 030/104] virtiofsd: add --print-capabilities option
  2019-12-12 16:37 ` [PATCH 030/104] virtiofsd: add --print-capabilities option Dr. David Alan Gilbert (git)
@ 2020-01-06 14:20   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:50PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Add the --print-capabilities option as per vhost-user.rst "Backend
> programs conventions".  Currently there are no advertised features.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  docs/interop/vhost-user.json     |  4 +++-
>  tools/virtiofsd/fuse_lowlevel.h  |  1 +
>  tools/virtiofsd/helper.c         |  2 ++
>  tools/virtiofsd/passthrough_ll.c | 12 ++++++++++++
>  4 files changed, 18 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 031/104] virtiofs: Add maintainers entry
  2019-12-12 16:37 ` [PATCH 031/104] virtiofs: Add maintainers entry Dr. David Alan Gilbert (git)
@ 2020-01-06 14:21   ` Daniel P. Berrangé
  2020-01-15 17:19   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:51PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  MAINTAINERS | 8 ++++++++
>  1 file changed, 8 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context
  2019-12-12 16:37 ` [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context Dr. David Alan Gilbert (git)
@ 2020-01-06 14:30   ` Daniel P. Berrangé
  2020-01-06 19:00     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> 
> We need to create files in the caller's context. Otherwise after
> creating a file, the caller might not be able to do file operations on
> that file.
> 
> Changed effective uid/gid to caller's uid/gid, create file and then
> switch back to uid/gid 0.
> 
> Use syscall(setresuid, ...) otherwise glibc does some magic to change EUID
> in all threads, which is not what we want.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 79 ++++++++++++++++++++++++++++++--
>  1 file changed, 74 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 68bacb6fc5..0188cd9ad6 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c


> +static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
> +{
> +    int res;
> +
> +    old->euid = geteuid();
> +    old->egid = getegid();
> +
> +    res = syscall(SYS_setresgid, -1, fuse_req_ctx(req)->gid, -1);

Do we need to be using  SYS_setres[u,g]id32 instead...

[quote setresgid(2)]
       The original Linux setresuid() and setresgid() system  calls
       supported  only  16-bit  user  and group IDs.  Subsequently,
       Linux 2.4 added setresuid32() and setresgid32(),  supporting
       32-bit  IDs.   The glibc setresuid() and setresgid() wrapper
       functions transparently deal with the variations across ker‐
       nel versions.
[/quote]

> +    if (res == -1) {
> +        return errno;
> +    }
> +
> +    res = syscall(SYS_setresuid, -1, fuse_req_ctx(req)->uid, -1);
> +    if (res == -1) {
> +        int errno_save = errno;
> +
> +        syscall(SYS_setresgid, -1, old->egid, -1);
> +        return errno_save;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Regain Privileges */
> +static void lo_restore_cred(struct lo_cred *old)
> +{
> +    int res;
> +
> +    res = syscall(SYS_setresuid, -1, old->euid, -1);
> +    if (res == -1) {
> +        fuse_log(FUSE_LOG_ERR, "seteuid(%u): %m\n", old->euid);
> +        exit(1);
> +    }
> +
> +    res = syscall(SYS_setresgid, -1, old->egid, -1);
> +    if (res == -1) {
> +        fuse_log(FUSE_LOG_ERR, "setegid(%u): %m\n", old->egid);
> +        exit(1);
> +    }
> +}
> +
>  static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>                               const char *name, mode_t mode, dev_t rdev,
>                               const char *link)
> @@ -391,12 +443,21 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>      int saverr;
>      struct lo_inode *dir = lo_inode(req, parent);
>      struct fuse_entry_param e;
> +    struct lo_cred old = {};
>  
>      saverr = ENOMEM;
>  
> +    saverr = lo_change_cred(req, &old);
> +    if (saverr) {
> +        goto out;
> +    }
> +
>      res = mknod_wrapper(dir->fd, name, link, mode, rdev);
>  
>      saverr = errno;
> +
> +    lo_restore_cred(&old);
> +
>      if (res == -1) {
>          goto out;
>      }
> @@ -794,26 +855,34 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct fuse_entry_param e;
>      int err;
> +    struct lo_cred old = {};
>  
>      if (lo_debug(req)) {
>          fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
>                   parent, name);
>      }
>  
> +    err = lo_change_cred(req, &old);
> +    if (err) {
> +        goto out;
> +    }
> +
>      fd = openat(lo_fd(req, parent), name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
>                  mode);
> -    if (fd == -1) {
> -        return (void)fuse_reply_err(req, errno);
> -    }
> +    err = fd == -1 ? errno : 0;
> +    lo_restore_cred(&old);
>  
> -    fi->fh = fd;
> +    if (!err) {
> +        fi->fh = fd;
> +        err = lo_do_lookup(req, parent, name, &e);
> +    }
>      if (lo->cache == CACHE_NEVER) {
>          fi->direct_io = 1;
>      } else if (lo->cache == CACHE_ALWAYS) {
>          fi->keep_cache = 1;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e);
> +out:
>      if (err) {
>          fuse_reply_err(req, err);
>      } else {
> -- 
> 2.23.0
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 038/104] virtiofsd: validate path components
  2019-12-12 16:37 ` [PATCH 038/104] virtiofsd: validate path components Dr. David Alan Gilbert (git)
@ 2020-01-06 14:32   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:58PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Several FUSE requests contain single path components.  A correct FUSE
> client sends well-formed path components but there is currently no input
> validation in case something went wrong or the client is malicious.
> 
> Refuse ".", "..", and paths containing '/' when we expect a path
> component.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 59 ++++++++++++++++++++++++++++----
>  1 file changed, 53 insertions(+), 6 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 047/104] virtiofsd: sandbox mount namespace
  2019-12-12 16:38 ` [PATCH 047/104] virtiofsd: sandbox mount namespace Dr. David Alan Gilbert (git)
@ 2020-01-06 14:36   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:07PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Use a mount namespace with the shared directory tree mounted at "/" and
> no other mounts.
> 
> This prevents symlink escape attacks because symlink targets are
> resolved only against the shared directory and cannot go outside it.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 89 ++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 048/104] virtiofsd: move to an empty network namespace
  2019-12-12 16:38 ` [PATCH 048/104] virtiofsd: move to an empty network namespace Dr. David Alan Gilbert (git)
@ 2020-01-06 14:37   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:08PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> If the process is compromised there should be no network access.  Use an
> empty network namespace to sandbox networking.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 049/104] virtiofsd: move to a new pid namespace
  2019-12-12 16:38 ` [PATCH 049/104] virtiofsd: move to a new pid namespace Dr. David Alan Gilbert (git)
@ 2020-01-06 14:40   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:09PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> virtiofsd needs access to /proc/self/fd.  Let's move to a new pid
> namespace so that a compromised process cannot see another other
> processes running on the system.
> 
> One wrinkle in this approach: unshare(CLONE_NEWPID) affects *child*
> processes and not the current process.  Therefore we need to fork the
> pid 1 process that will actually run virtiofsd and leave a parent in
> waitpid(2).  This is not the same thing as daemonization and parent
> processes should not notice a difference.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 134 ++++++++++++++++++++-----------
>  1 file changed, 86 insertions(+), 48 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 050/104] virtiofsd: add seccomp whitelist
  2019-12-12 16:38 ` [PATCH 050/104] virtiofsd: add seccomp whitelist Dr. David Alan Gilbert (git)
@ 2020-01-06 14:56   ` Daniel P. Berrangé
  2020-01-06 18:54     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Only allow system calls that are needed by virtiofsd.  All other system
> calls cause SIGSYS to be directed at the thread and the process will
> coredump.
> 
> Restricting system calls reduces the kernel attack surface and limits
> what the process can do when compromised.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> with additional entries by:
> Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
> Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> Signed-off-by: piaojun <piaojun@huawei.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
> ---
>  Makefile                         |   2 +
>  tools/virtiofsd/Makefile.objs    |   5 +-
>  tools/virtiofsd/passthrough_ll.c |   2 +
>  tools/virtiofsd/seccomp.c        | 141 +++++++++++++++++++++++++++++++
>  tools/virtiofsd/seccomp.h        |  14 +++
>  5 files changed, 163 insertions(+), 1 deletion(-)
>  create mode 100644 tools/virtiofsd/seccomp.c
>  create mode 100644 tools/virtiofsd/seccomp.h
> 
> diff --git a/Makefile b/Makefile
> index 8a5746d8a0..3f5d04e1f7 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -322,8 +322,10 @@ HELPERS-y =
>  HELPERS-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_LINUX)) = qemu-bridge-helper$(EXESUF)
>  
>  ifdef CONFIG_LINUX
> +ifdef CONFIG_SECCOMP
>  HELPERS-y += virtiofsd$(EXESUF)
>  vhost-user-json-y += tools/virtiofsd/50-qemu-virtiofsd.json
> +endif
>  
>  ifdef CONFIG_VIRGL
>  ifdef CONFIG_GBM
> diff --git a/tools/virtiofsd/Makefile.objs b/tools/virtiofsd/Makefile.objs
> index 67be16332c..941b19f18e 100644
> --- a/tools/virtiofsd/Makefile.objs
> +++ b/tools/virtiofsd/Makefile.objs
> @@ -6,5 +6,8 @@ virtiofsd-obj-y = buffer.o \
>                    fuse_signals.o \
>                    fuse_virtio.o \
>                    helper.o \
> -                  passthrough_ll.o
> +                  passthrough_ll.o \
> +                  seccomp.o
>  
> +seccomp.o-cflags := $(SECCOMP_CFLAGS)
> +seccomp.o-libs := $(SECCOMP_LIBS)
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 754ef2618b..701608c6df 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -58,6 +58,7 @@
>  #include <unistd.h>
>  
>  #include "passthrough_helpers.h"
> +#include "seccomp.h"
>  
>  #define HAVE_POSIX_FALLOCATE 1
>  struct lo_map_elem {
> @@ -2073,6 +2074,7 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se)
>  {
>      setup_namespaces(lo, se);
>      setup_mounts(lo->source);
> +    setup_seccomp();
>  }
>  
>  int main(int argc, char *argv[])
> diff --git a/tools/virtiofsd/seccomp.c b/tools/virtiofsd/seccomp.c
> new file mode 100644
> index 0000000000..6359bb55bb
> --- /dev/null
> +++ b/tools/virtiofsd/seccomp.c
> @@ -0,0 +1,141 @@
> +/*
> + * Seccomp sandboxing for virtiofsd
> + *
> + * Copyright (C) 2019 Red Hat, Inc.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "seccomp.h"
> +#include "fuse_i.h"
> +#include "fuse_log.h"
> +#include <errno.h>
> +#include <glib.h>
> +#include <seccomp.h>
> +#include <stdlib.h>
> +
> +/* Bodge for libseccomp 2.4.2 which broke ppoll */
> +#if !defined(__SNR_ppoll) && defined(__SNR_brk)
> +#ifdef __NR_ppoll
> +#define __SNR_ppoll __NR_ppoll
> +#else
> +#define __SNR_ppoll __PNR_ppoll
> +#endif
> +#endif
> +
> +static const int syscall_whitelist[] = {
> +    /* TODO ireg sem*() syscalls */
> +    SCMP_SYS(brk),
> +    SCMP_SYS(capget), /* For CAP_FSETID */
> +    SCMP_SYS(capset),
> +    SCMP_SYS(clock_gettime),
> +    SCMP_SYS(clone),

clone2 ?  clone3 ? IIC some archs in Linux
will require the newer variants.

> +    SCMP_SYS(close),
> +    SCMP_SYS(copy_file_range),
> +    SCMP_SYS(dup),
> +    SCMP_SYS(eventfd2),
> +    SCMP_SYS(exit),
> +    SCMP_SYS(exit_group),
> +    SCMP_SYS(fallocate),
> +    SCMP_SYS(fchmodat),
> +    SCMP_SYS(fchownat),
> +    SCMP_SYS(fcntl),
> +    SCMP_SYS(fdatasync),
> +    SCMP_SYS(fgetxattr),
> +    SCMP_SYS(flistxattr),
> +    SCMP_SYS(flock),
> +    SCMP_SYS(fremovexattr),
> +    SCMP_SYS(fsetxattr),
> +    SCMP_SYS(fstat),
> +    SCMP_SYS(fstatfs),
> +    SCMP_SYS(fsync),
> +    SCMP_SYS(ftruncate),
> +    SCMP_SYS(futex),
> +    SCMP_SYS(getdents),
> +    SCMP_SYS(getdents64),
> +    SCMP_SYS(getegid),
> +    SCMP_SYS(geteuid),
> +    SCMP_SYS(getpid),
> +    SCMP_SYS(gettid),
> +    SCMP_SYS(gettimeofday),
> +    SCMP_SYS(linkat),
> +    SCMP_SYS(lseek),
> +    SCMP_SYS(madvise),
> +    SCMP_SYS(mkdirat),
> +    SCMP_SYS(mknodat),
> +    SCMP_SYS(mmap),
> +    SCMP_SYS(mprotect),
> +    SCMP_SYS(mremap),
> +    SCMP_SYS(munmap),
> +    SCMP_SYS(newfstatat),
> +    SCMP_SYS(open),
> +    SCMP_SYS(openat),
> +    SCMP_SYS(ppoll),
> +    SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
> +    SCMP_SYS(preadv),
> +    SCMP_SYS(pread64),
> +    SCMP_SYS(pwritev),
> +    SCMP_SYS(pwrite64),
> +    SCMP_SYS(read),
> +    SCMP_SYS(readlinkat),
> +    SCMP_SYS(recvmsg),
> +    SCMP_SYS(renameat),
> +    SCMP_SYS(renameat2),
> +    SCMP_SYS(rt_sigaction),
> +    SCMP_SYS(rt_sigprocmask),
> +    SCMP_SYS(rt_sigreturn),
> +    SCMP_SYS(sendmsg),
> +    SCMP_SYS(setresgid),

Should be setresgid32 instead I think. We don't
want the legacy syscall that's limted to 16-bit GIDs

Needs the code fix I mention in an earlier patch too.

> +    SCMP_SYS(setresuid),

Same as above

> +    SCMP_SYS(set_robust_list),
> +    SCMP_SYS(symlinkat),
> +    SCMP_SYS(time), /* Rarely needed, except on static builds */
> +    SCMP_SYS(tgkill),
> +    SCMP_SYS(unlinkat),
> +    SCMP_SYS(utimensat),
> +    SCMP_SYS(write),
> +    SCMP_SYS(writev),
> +};


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 052/104] virtiofsd: cap-ng helpers
  2019-12-12 16:38 ` [PATCH 052/104] virtiofsd: cap-ng helpers Dr. David Alan Gilbert (git)
@ 2020-01-06 14:58   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 14:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:12PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> libcap-ng reads /proc during capng_get_caps_process, and virtiofsd's
> sandboxing doesn't have /proc mounted; thus we have to do the
> caps read before we sandbox it and save/restore the state.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  Makefile                         |  2 +
>  tools/virtiofsd/passthrough_ll.c | 72 ++++++++++++++++++++++++++++++++
>  2 files changed, 74 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 054/104] virtiofsd: set maximum RLIMIT_NOFILE limit
  2019-12-12 16:38 ` [PATCH 054/104] virtiofsd: set maximum RLIMIT_NOFILE limit Dr. David Alan Gilbert (git)
@ 2020-01-06 15:00   ` Daniel P. Berrangé
  2020-01-15 17:09   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 15:00 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:14PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> virtiofsd can exceed the default open file descriptor limit easily on
> most systems.  Take advantage of the fact that it runs as root to raise
> the limit.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 055/104] virtiofsd: fix libfuse information leaks
  2019-12-12 16:38 ` [PATCH 055/104] virtiofsd: fix libfuse information leaks Dr. David Alan Gilbert (git)
@ 2020-01-06 15:01   ` Daniel P. Berrangé
  2020-01-15 17:07   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 15:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:15PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Some FUSE message replies contain padding fields that are not
> initialized by libfuse.  This is fine in traditional FUSE applications
> because the kernel is trusted.  virtiofsd does not trust the guest and
> must not expose uninitialized memory.
> 
> Use C struct initializers to automatically zero out memory.  Not all of
> these code changes are strictly necessary but they will prevent future
> information leaks if the structs are extended.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 150 ++++++++++++++++----------------
>  1 file changed, 76 insertions(+), 74 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 056/104] virtiofsd: add security guide document
  2019-12-12 16:38 ` [PATCH 056/104] virtiofsd: add security guide document Dr. David Alan Gilbert (git)
@ 2020-01-06 15:03   ` Daniel P. Berrangé
  2020-01-06 17:53     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 15:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Many people want to know: what's up with virtiofsd and security?  This
> document provides the answers!
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/security.rst | 118 +++++++++++++++++++++++++++++++++++

Do we need to link this into the rest of QEMU's docs in some
index page ?

>  1 file changed, 118 insertions(+)
>  create mode 100644 tools/virtiofsd/security.rst

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 057/104] virtiofsd: add --syslog command-line option
  2019-12-12 16:38 ` [PATCH 057/104] virtiofsd: add --syslog command-line option Dr. David Alan Gilbert (git)
@ 2020-01-06 15:05   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 15:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:17PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Sometimes collecting output from stderr is inconvenient or does not fit
> within the overall logging architecture.  Add syslog(3) support for
> cases where stderr cannot be used.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> dgilbert: Reworked as a logging function
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.h  |  1 +
>  tools/virtiofsd/helper.c         |  2 ++
>  tools/virtiofsd/passthrough_ll.c | 50 ++++++++++++++++++++++++++++++--
>  tools/virtiofsd/seccomp.c        | 32 ++++++++++++++------
>  tools/virtiofsd/seccomp.h        |  4 ++-
>  5 files changed, 76 insertions(+), 13 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 058/104] virtiofsd: print log only when priority is high enough
  2019-12-12 16:38 ` [PATCH 058/104] virtiofsd: print log only when priority is high enough Dr. David Alan Gilbert (git)
@ 2020-01-06 15:10   ` Daniel P. Berrangé
  2020-01-06 17:05     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 15:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Eryu Guan <eguan@linux.alibaba.com>
> 
> Introduce "-o log_level=" command line option to specify current log
> level (priority), valid values are "debug info warn err", e.g.
> 
>     ./virtiofsd -o log_level=debug ...
> 
> So only log priority higher than "debug" will be printed to
> stderr/syslog. And the default level is info.
> 
> The "-o debug"/"-d" options are kept, and imply debug log level.
> 
> Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
> dgilbert: Reworked for libfuse's log_func
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_log.c       |   4 ++
>  tools/virtiofsd/fuse_lowlevel.c  |  75 ++++++++------------
>  tools/virtiofsd/fuse_lowlevel.h  |   1 +
>  tools/virtiofsd/helper.c         |  10 ++-
>  tools/virtiofsd/passthrough_ll.c | 118 +++++++++++++------------------
>  5 files changed, 92 insertions(+), 116 deletions(-)

> diff --git a/tools/virtiofsd/fuse_log.c b/tools/virtiofsd/fuse_log.c
> index 11345f9ec8..79a18a7aaa 100644
> --- a/tools/virtiofsd/fuse_log.c
> +++ b/tools/virtiofsd/fuse_log.c
> @@ -8,6 +8,10 @@
>   * See the file COPYING.LIB
>   */
>  
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdarg.h>
> +#include <syslog.h>
>  #include "fuse_log.h"
>  
>  #include <stdarg.h>

Why do we need to add these headers if there are no code changes in this
file ?

> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index f3c8bdf7cb..0abb369b3d 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -158,19 +158,17 @@ static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>      struct fuse_out_header *out = iov[0].iov_base;
>  
>      out->len = iov_length(iov, count);
> -    if (se->debug) {
> -        if (out->unique == 0) {
> -            fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
> -                     out->len);
> -        } else if (out->error) {
> -            fuse_log(FUSE_LOG_DEBUG,
> -                     "   unique: %llu, error: %i (%s), outsize: %i\n",
> -                     (unsigned long long)out->unique, out->error,
> -                     strerror(-out->error), out->len);
> -        } else {
> -            fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
> -                     (unsigned long long)out->unique, out->len);
> -        }
> +    if (out->unique == 0) {
> +        fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
> +                 out->len);
> +    } else if (out->error) {
> +        fuse_log(FUSE_LOG_DEBUG,
> +                 "   unique: %llu, error: %i (%s), outsize: %i\n",
> +                 (unsigned long long)out->unique, out->error,
> +                 strerror(-out->error), out->len);
> +    } else {
> +        fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
> +                 (unsigned long long)out->unique, out->len);
>      }

Removing all the 'if (se->debug)' checks means that we take the
performance hit of calling many logging functions in the common
case where debug is disabled. Hopefully 'fuse_log' is smart
enough to avoid printf formatting of the msg + args unless
it is actually goiing to output the message


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 059/104] virtiofsd: Add ID to the log with FUSE_LOG_DEBUG level
  2019-12-12 16:38 ` [PATCH 059/104] virtiofsd: Add ID to the log with FUSE_LOG_DEBUG level Dr. David Alan Gilbert (git)
@ 2020-01-06 15:18   ` Daniel P. Berrangé
  2020-01-06 17:47     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 15:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:19PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> 
> virtiofsd has some threads, so we see a lot of logs with debug option.
> It would be useful for debugging if we can identify the specific thread
> from the log.
> 
> Add ID, which is got by gettid(), to the log with FUSE_LOG_DEBUG level
> so that we can grep the specific thread.
> 
> The log is like as:
> 
>   ]# ./virtiofsd -d -o vhost_user_socket=/tmp/vhostqemu0 -o source=/tmp/share0 -o cache=auto
>   ...
>   [ID: 00000097]    unique: 12696, success, outsize: 120
>   [ID: 00000097] virtio_send_msg: elem 18: with 2 in desc of length 120
>   [ID: 00000003] fv_queue_thread: Got queue event on Queue 1
>   [ID: 00000003] fv_queue_thread: Queue 1 gave evalue: 1 available: in: 65552 out: 80
>   [ID: 00000003] fv_queue_thread: Waiting for Queue 1 event
>   [ID: 00000071] fv_queue_worker: elem 33: with 2 out desc of length 80 bad_in_num=0 bad_out_num=0
>   [ID: 00000071] unique: 12694, opcode: READ (15), nodeid: 2, insize: 80, pid: 2014
>   [ID: 00000071] lo_read(ino=2, size=65536, off=131072)
> 
> Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)

> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 6f398a7ff2..8e00a90e6f 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -42,6 +42,7 @@
>  #include <cap-ng.h>
>  #include <dirent.h>
>  #include <errno.h>
> +#include <glib.h>
>  #include <inttypes.h>
>  #include <limits.h>
>  #include <pthread.h>
> @@ -2248,12 +2249,18 @@ static void setup_nofile_rlimit(void)
>      }
>  }
>  
> -static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
> +static void log_func(enum fuse_log_level level, const char *_fmt, va_list ap)
>  {
> +    char *fmt = (char *)_fmt;

Reusing a variable for data that may be const from stack or
non-const from heap is really gross & asking for trouble.

If instead it does:

    g_autofree *localfmt = NULL;

> +
>      if (current_log_level < level) {
>          return;
>      }
>  
> +    if (current_log_level == FUSE_LOG_DEBUG) {
> +        fmt = g_strdup_printf("[ID: %08ld] %s", syscall(__NR_gettid), _fmt);

Then:

           localfmt = g_strdup_printf(....)
	   fmt = localfmt;

> +    }
> +
>      if (use_syslog) {
>          int priority = LOG_ERR;
>          switch (level) {
> @@ -2286,6 +2293,10 @@ static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
>      } else {
>          vfprintf(stderr, fmt, ap);
>      }
> +
> +    if (current_log_level == FUSE_LOG_DEBUG) {
> +        g_free(fmt);
> +    }

This can then be deleted.

>  }

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 011/104] virtiofsd: Fix common header and define for QEMU builds
  2020-01-03 12:22   ` Daniel P. Berrangé
@ 2020-01-06 16:29     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-06 16:29 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:37:31PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > All of the fuse files include config.h and define GNU_SOURCE
> > where we don't have either under our build - remove them.
> 
> There's a bunch of other random changes in this patch - were these
> all fallout from removing the config.h header, or were they meant
> for a separate commit ?

I've moved the removals of cuse_lowlevel_init call and
version printing into the earlier 'Trim down imported files'.
So we're left with header and #define tweaks.

I've added:
   'Fixup path to the kernel's fuse.h in the QEMUs world.'

to the commit message.

Dave


> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  tools/virtiofsd/buffer.c         |  3 ---
> >  tools/virtiofsd/fuse_i.h         |  3 +++
> >  tools/virtiofsd/fuse_loop_mt.c   |  3 +--
> >  tools/virtiofsd/fuse_lowlevel.c  | 12 +-----------
> >  tools/virtiofsd/fuse_opt.c       |  1 -
> >  tools/virtiofsd/fuse_signals.c   |  1 -
> >  tools/virtiofsd/passthrough_ll.c |  9 ++-------
> >  7 files changed, 7 insertions(+), 25 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
> > index 38521f5889..1d7e6d2439 100644
> > --- a/tools/virtiofsd/buffer.c
> > +++ b/tools/virtiofsd/buffer.c
> > @@ -9,9 +9,6 @@
> >   * See the file COPYING.LIB
> >   */
> >  
> > -#define _GNU_SOURCE
> > -
> > -#include "config.h"
> >  #include "fuse_i.h"
> >  #include "fuse_lowlevel.h"
> >  #include <assert.h>
> > diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> > index 1119e85e57..0b5acc8765 100644
> > --- a/tools/virtiofsd/fuse_i.h
> > +++ b/tools/virtiofsd/fuse_i.h
> > @@ -6,6 +6,9 @@
> >   * See the file COPYING.LIB
> >   */
> >  
> > +#define FUSE_USE_VERSION 31
> > +
> > +
> >  #include "fuse.h"
> >  #include "fuse_lowlevel.h"
> >  
> > diff --git a/tools/virtiofsd/fuse_loop_mt.c b/tools/virtiofsd/fuse_loop_mt.c
> > index 39e080d9ff..00138b2ab3 100644
> > --- a/tools/virtiofsd/fuse_loop_mt.c
> > +++ b/tools/virtiofsd/fuse_loop_mt.c
> > @@ -8,11 +8,10 @@
> >   * See the file COPYING.LIB.
> >   */
> >  
> > -#include "config.h"
> >  #include "fuse_i.h"
> > -#include "fuse_kernel.h"
> >  #include "fuse_lowlevel.h"
> >  #include "fuse_misc.h"
> > +#include "standard-headers/linux/fuse.h"
> >  
> >  #include <assert.h>
> >  #include <errno.h>
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index 0d7b2c3dc9..497eb25487 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -9,14 +9,10 @@
> >   * See the file COPYING.LIB
> >   */
> >  
> > -#define _GNU_SOURCE
> > -
> > -#include "config.h"
> >  #include "fuse_i.h"
> > -#include "fuse_kernel.h"
> > +#include "standard-headers/linux/fuse.h"
> >  #include "fuse_misc.h"
> >  #include "fuse_opt.h"
> > -#include "mount_util.h"
> >  
> >  #include <assert.h>
> >  #include <errno.h>
> > @@ -2093,7 +2089,6 @@ static struct {
> >      [FUSE_RENAME2] = { do_rename2, "RENAME2" },
> >      [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
> >      [FUSE_LSEEK] = { do_lseek, "LSEEK" },
> > -    [CUSE_INIT] = { cuse_lowlevel_init, "CUSE_INIT" },
> >  };
> >  
> >  #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
> > @@ -2220,7 +2215,6 @@ void fuse_lowlevel_version(void)
> >  {
> >      printf("using FUSE kernel interface version %i.%i\n", FUSE_KERNEL_VERSION,
> >             FUSE_KERNEL_MINOR_VERSION);
> > -    fuse_mount_version();
> >  }
> >  
> >  void fuse_lowlevel_help(void)
> > @@ -2310,10 +2304,6 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
> >          goto out4;
> >      }
> >  
> > -    if (se->debug) {
> > -        fuse_log(FUSE_LOG_DEBUG, "FUSE library version: %s\n", PACKAGE_VERSION);
> > -    }
> > -
> >      se->bufsize = FUSE_MAX_MAX_PAGES * getpagesize() + FUSE_BUFFER_HEADER_SIZE;
> >  
> >      list_init_req(&se->list);
> > diff --git a/tools/virtiofsd/fuse_opt.c b/tools/virtiofsd/fuse_opt.c
> > index edd36f4a3b..1fee55e266 100644
> > --- a/tools/virtiofsd/fuse_opt.c
> > +++ b/tools/virtiofsd/fuse_opt.c
> > @@ -10,7 +10,6 @@
> >   */
> >  
> >  #include "fuse_opt.h"
> > -#include "config.h"
> >  #include "fuse_i.h"
> >  #include "fuse_misc.h"
> >  
> > diff --git a/tools/virtiofsd/fuse_signals.c b/tools/virtiofsd/fuse_signals.c
> > index 19d6791cb9..10a6f88088 100644
> > --- a/tools/virtiofsd/fuse_signals.c
> > +++ b/tools/virtiofsd/fuse_signals.c
> > @@ -8,7 +8,6 @@
> >   * See the file COPYING.LIB
> >   */
> >  
> > -#include "config.h"
> >  #include "fuse_i.h"
> >  #include "fuse_lowlevel.h"
> >  
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index a79ec2c70d..0e543353a4 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -35,15 +35,10 @@
> >   * \include passthrough_ll.c
> >   */
> >  
> > -#define _GNU_SOURCE
> > -#define FUSE_USE_VERSION 31
> > -
> > -#include "config.h"
> > -
> > +#include "fuse_lowlevel.h"
> >  #include <assert.h>
> >  #include <dirent.h>
> >  #include <errno.h>
> > -#include <fuse_lowlevel.h>
> >  #include <inttypes.h>
> >  #include <limits.h>
> >  #include <pthread.h>
> > @@ -58,6 +53,7 @@
> >  
> >  #include "passthrough_helpers.h"
> >  
> > +#define HAVE_POSIX_FALLOCATE 1
> >  /*
> >   * We are re-using pointers to our `struct lo_inode` and `struct
> >   * lo_dirp` elements as inodes. This means that we must be able to
> > @@ -1303,7 +1299,6 @@ int main(int argc, char *argv[])
> >          ret = 0;
> >          goto err_out1;
> >      } else if (opts.show_version) {
> > -        printf("FUSE library version %s\n", fuse_pkgversion());
> >          fuse_lowlevel_version();
> >          ret = 0;
> >          goto err_out1;
> > -- 
> > 2.23.0
> > 
> > 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 058/104] virtiofsd: print log only when priority is high enough
  2020-01-06 15:10   ` Daniel P. Berrangé
@ 2020-01-06 17:05     ` Dr. David Alan Gilbert
  2020-01-06 17:20       ` Daniel P. Berrangé
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-06 17:05 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Eryu Guan <eguan@linux.alibaba.com>
> > 
> > Introduce "-o log_level=" command line option to specify current log
> > level (priority), valid values are "debug info warn err", e.g.
> > 
> >     ./virtiofsd -o log_level=debug ...
> > 
> > So only log priority higher than "debug" will be printed to
> > stderr/syslog. And the default level is info.
> > 
> > The "-o debug"/"-d" options are kept, and imply debug log level.
> > 
> > Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
> > dgilbert: Reworked for libfuse's log_func
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_log.c       |   4 ++
> >  tools/virtiofsd/fuse_lowlevel.c  |  75 ++++++++------------
> >  tools/virtiofsd/fuse_lowlevel.h  |   1 +
> >  tools/virtiofsd/helper.c         |  10 ++-
> >  tools/virtiofsd/passthrough_ll.c | 118 +++++++++++++------------------
> >  5 files changed, 92 insertions(+), 116 deletions(-)
> 
> > diff --git a/tools/virtiofsd/fuse_log.c b/tools/virtiofsd/fuse_log.c
> > index 11345f9ec8..79a18a7aaa 100644
> > --- a/tools/virtiofsd/fuse_log.c
> > +++ b/tools/virtiofsd/fuse_log.c
> > @@ -8,6 +8,10 @@
> >   * See the file COPYING.LIB
> >   */
> >  
> > +#include <stdbool.h>
> > +#include <stdio.h>
> > +#include <stdarg.h>
> > +#include <syslog.h>
> >  #include "fuse_log.h"
> >  
> >  #include <stdarg.h>
> 
> Why do we need to add these headers if there are no code changes in this
> file ?

Thanks, those are left overs from an earlier version; I've deleted them now.

> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index f3c8bdf7cb..0abb369b3d 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -158,19 +158,17 @@ static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >      struct fuse_out_header *out = iov[0].iov_base;
> >  
> >      out->len = iov_length(iov, count);
> > -    if (se->debug) {
> > -        if (out->unique == 0) {
> > -            fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
> > -                     out->len);
> > -        } else if (out->error) {
> > -            fuse_log(FUSE_LOG_DEBUG,
> > -                     "   unique: %llu, error: %i (%s), outsize: %i\n",
> > -                     (unsigned long long)out->unique, out->error,
> > -                     strerror(-out->error), out->len);
> > -        } else {
> > -            fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
> > -                     (unsigned long long)out->unique, out->len);
> > -        }
> > +    if (out->unique == 0) {
> > +        fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
> > +                 out->len);
> > +    } else if (out->error) {
> > +        fuse_log(FUSE_LOG_DEBUG,
> > +                 "   unique: %llu, error: %i (%s), outsize: %i\n",
> > +                 (unsigned long long)out->unique, out->error,
> > +                 strerror(-out->error), out->len);
> > +    } else {
> > +        fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
> > +                 (unsigned long long)out->unique, out->len);
> >      }
> 
> Removing all the 'if (se->debug)' checks means that we take the
> performance hit of calling many logging functions in the common
> case where debug is disabled. Hopefully 'fuse_log' is smart
> enough to avoid printf formatting of the msg + args unless
> it is actually goiing to output the message

It is; we go through fuse_log (fuse_log.c an imported file) that just
does the va_start and then calls the log_func that was set later in this
patch and the first thing it does is check the level and exit.

Dave

> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 058/104] virtiofsd: print log only when priority is high enough
  2020-01-06 17:05     ` Dr. David Alan Gilbert
@ 2020-01-06 17:20       ` Daniel P. Berrangé
  2020-01-06 17:27         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-06 17:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, stefanha, vgoyal

On Mon, Jan 06, 2020 at 05:05:11PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Dec 12, 2019 at 04:38:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: Eryu Guan <eguan@linux.alibaba.com>
> > > 
> > > Introduce "-o log_level=" command line option to specify current log
> > > level (priority), valid values are "debug info warn err", e.g.
> > > 
> > >     ./virtiofsd -o log_level=debug ...
> > > 
> > > So only log priority higher than "debug" will be printed to
> > > stderr/syslog. And the default level is info.
> > > 
> > > The "-o debug"/"-d" options are kept, and imply debug log level.
> > > 
> > > Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
> > > dgilbert: Reworked for libfuse's log_func
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  tools/virtiofsd/fuse_log.c       |   4 ++
> > >  tools/virtiofsd/fuse_lowlevel.c  |  75 ++++++++------------
> > >  tools/virtiofsd/fuse_lowlevel.h  |   1 +
> > >  tools/virtiofsd/helper.c         |  10 ++-
> > >  tools/virtiofsd/passthrough_ll.c | 118 +++++++++++++------------------
> > >  5 files changed, 92 insertions(+), 116 deletions(-)
> > 
> > > diff --git a/tools/virtiofsd/fuse_log.c b/tools/virtiofsd/fuse_log.c
> > > index 11345f9ec8..79a18a7aaa 100644
> > > --- a/tools/virtiofsd/fuse_log.c
> > > +++ b/tools/virtiofsd/fuse_log.c
> > > @@ -8,6 +8,10 @@
> > >   * See the file COPYING.LIB
> > >   */
> > >  
> > > +#include <stdbool.h>
> > > +#include <stdio.h>
> > > +#include <stdarg.h>
> > > +#include <syslog.h>
> > >  #include "fuse_log.h"
> > >  
> > >  #include <stdarg.h>
> > 
> > Why do we need to add these headers if there are no code changes in this
> > file ?
> 
> Thanks, those are left overs from an earlier version; I've deleted them now.
> 
> > > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > > index f3c8bdf7cb..0abb369b3d 100644
> > > --- a/tools/virtiofsd/fuse_lowlevel.c
> > > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > > @@ -158,19 +158,17 @@ static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> > >      struct fuse_out_header *out = iov[0].iov_base;
> > >  
> > >      out->len = iov_length(iov, count);
> > > -    if (se->debug) {
> > > -        if (out->unique == 0) {
> > > -            fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
> > > -                     out->len);
> > > -        } else if (out->error) {
> > > -            fuse_log(FUSE_LOG_DEBUG,
> > > -                     "   unique: %llu, error: %i (%s), outsize: %i\n",
> > > -                     (unsigned long long)out->unique, out->error,
> > > -                     strerror(-out->error), out->len);
> > > -        } else {
> > > -            fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
> > > -                     (unsigned long long)out->unique, out->len);
> > > -        }
> > > +    if (out->unique == 0) {
> > > +        fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
> > > +                 out->len);
> > > +    } else if (out->error) {
> > > +        fuse_log(FUSE_LOG_DEBUG,
> > > +                 "   unique: %llu, error: %i (%s), outsize: %i\n",
> > > +                 (unsigned long long)out->unique, out->error,
> > > +                 strerror(-out->error), out->len);
> > > +    } else {
> > > +        fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
> > > +                 (unsigned long long)out->unique, out->len);
> > >      }
> > 
> > Removing all the 'if (se->debug)' checks means that we take the
> > performance hit of calling many logging functions in the common
> > case where debug is disabled. Hopefully 'fuse_log' is smart
> > enough to avoid printf formatting of the msg + args unless
> > it is actually goiing to output the message
> 
> It is; we go through fuse_log (fuse_log.c an imported file) that just
> does the va_start and then calls the log_func that was set later in this
> patch and the first thing it does is check the level and exit.

ok then

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 058/104] virtiofsd: print log only when priority is high enough
  2020-01-06 17:20       ` Daniel P. Berrangé
@ 2020-01-06 17:27         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-06 17:27 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Jan 06, 2020 at 05:05:11PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Dec 12, 2019 at 04:38:18PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: Eryu Guan <eguan@linux.alibaba.com>
> > > > 
> > > > Introduce "-o log_level=" command line option to specify current log
> > > > level (priority), valid values are "debug info warn err", e.g.
> > > > 
> > > >     ./virtiofsd -o log_level=debug ...
> > > > 
> > > > So only log priority higher than "debug" will be printed to
> > > > stderr/syslog. And the default level is info.
> > > > 
> > > > The "-o debug"/"-d" options are kept, and imply debug log level.
> > > > 
> > > > Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
> > > > dgilbert: Reworked for libfuse's log_func
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/fuse_log.c       |   4 ++
> > > >  tools/virtiofsd/fuse_lowlevel.c  |  75 ++++++++------------
> > > >  tools/virtiofsd/fuse_lowlevel.h  |   1 +
> > > >  tools/virtiofsd/helper.c         |  10 ++-
> > > >  tools/virtiofsd/passthrough_ll.c | 118 +++++++++++++------------------
> > > >  5 files changed, 92 insertions(+), 116 deletions(-)
> > > 
> > > > diff --git a/tools/virtiofsd/fuse_log.c b/tools/virtiofsd/fuse_log.c
> > > > index 11345f9ec8..79a18a7aaa 100644
> > > > --- a/tools/virtiofsd/fuse_log.c
> > > > +++ b/tools/virtiofsd/fuse_log.c
> > > > @@ -8,6 +8,10 @@
> > > >   * See the file COPYING.LIB
> > > >   */
> > > >  
> > > > +#include <stdbool.h>
> > > > +#include <stdio.h>
> > > > +#include <stdarg.h>
> > > > +#include <syslog.h>
> > > >  #include "fuse_log.h"
> > > >  
> > > >  #include <stdarg.h>
> > > 
> > > Why do we need to add these headers if there are no code changes in this
> > > file ?
> > 
> > Thanks, those are left overs from an earlier version; I've deleted them now.
> > 
> > > > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > > > index f3c8bdf7cb..0abb369b3d 100644
> > > > --- a/tools/virtiofsd/fuse_lowlevel.c
> > > > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > > > @@ -158,19 +158,17 @@ static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> > > >      struct fuse_out_header *out = iov[0].iov_base;
> > > >  
> > > >      out->len = iov_length(iov, count);
> > > > -    if (se->debug) {
> > > > -        if (out->unique == 0) {
> > > > -            fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
> > > > -                     out->len);
> > > > -        } else if (out->error) {
> > > > -            fuse_log(FUSE_LOG_DEBUG,
> > > > -                     "   unique: %llu, error: %i (%s), outsize: %i\n",
> > > > -                     (unsigned long long)out->unique, out->error,
> > > > -                     strerror(-out->error), out->len);
> > > > -        } else {
> > > > -            fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
> > > > -                     (unsigned long long)out->unique, out->len);
> > > > -        }
> > > > +    if (out->unique == 0) {
> > > > +        fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n", out->error,
> > > > +                 out->len);
> > > > +    } else if (out->error) {
> > > > +        fuse_log(FUSE_LOG_DEBUG,
> > > > +                 "   unique: %llu, error: %i (%s), outsize: %i\n",
> > > > +                 (unsigned long long)out->unique, out->error,
> > > > +                 strerror(-out->error), out->len);
> > > > +    } else {
> > > > +        fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
> > > > +                 (unsigned long long)out->unique, out->len);
> > > >      }
> > > 
> > > Removing all the 'if (se->debug)' checks means that we take the
> > > performance hit of calling many logging functions in the common
> > > case where debug is disabled. Hopefully 'fuse_log' is smart
> > > enough to avoid printf formatting of the msg + args unless
> > > it is actually goiing to output the message
> > 
> > It is; we go through fuse_log (fuse_log.c an imported file) that just
> > does the va_start and then calls the log_func that was set later in this
> > patch and the first thing it does is check the level and exit.
> 
> ok then
> 
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks!

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 059/104] virtiofsd: Add ID to the log with FUSE_LOG_DEBUG level
  2020-01-06 15:18   ` Daniel P. Berrangé
@ 2020-01-06 17:47     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-06 17:47 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:19PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> > 
> > virtiofsd has some threads, so we see a lot of logs with debug option.
> > It would be useful for debugging if we can identify the specific thread
> > from the log.
> > 
> > Add ID, which is got by gettid(), to the log with FUSE_LOG_DEBUG level
> > so that we can grep the specific thread.
> > 
> > The log is like as:
> > 
> >   ]# ./virtiofsd -d -o vhost_user_socket=/tmp/vhostqemu0 -o source=/tmp/share0 -o cache=auto
> >   ...
> >   [ID: 00000097]    unique: 12696, success, outsize: 120
> >   [ID: 00000097] virtio_send_msg: elem 18: with 2 in desc of length 120
> >   [ID: 00000003] fv_queue_thread: Got queue event on Queue 1
> >   [ID: 00000003] fv_queue_thread: Queue 1 gave evalue: 1 available: in: 65552 out: 80
> >   [ID: 00000003] fv_queue_thread: Waiting for Queue 1 event
> >   [ID: 00000071] fv_queue_worker: elem 33: with 2 out desc of length 80 bad_in_num=0 bad_out_num=0
> >   [ID: 00000071] unique: 12694, opcode: READ (15), nodeid: 2, insize: 80, pid: 2014
> >   [ID: 00000071] lo_read(ino=2, size=65536, off=131072)
> > 
> > Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 6f398a7ff2..8e00a90e6f 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -42,6 +42,7 @@
> >  #include <cap-ng.h>
> >  #include <dirent.h>
> >  #include <errno.h>
> > +#include <glib.h>
> >  #include <inttypes.h>
> >  #include <limits.h>
> >  #include <pthread.h>
> > @@ -2248,12 +2249,18 @@ static void setup_nofile_rlimit(void)
> >      }
> >  }
> >  
> > -static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
> > +static void log_func(enum fuse_log_level level, const char *_fmt, va_list ap)
> >  {
> > +    char *fmt = (char *)_fmt;
> 
> Reusing a variable for data that may be const from stack or
> non-const from heap is really gross & asking for trouble.

Yeh, that is a bit of a hack.

> If instead it does:
> 
>     g_autofree *localfmt = NULL;

                 ^ char

> > +
> >      if (current_log_level < level) {
> >          return;
> >      }
> >  
> > +    if (current_log_level == FUSE_LOG_DEBUG) {
> > +        fmt = g_strdup_printf("[ID: %08ld] %s", syscall(__NR_gettid), _fmt);
> 
> Then:
> 
>            localfmt = g_strdup_printf(....)
> 	   fmt = localfmt;
> 
> > +    }
> > +
> >      if (use_syslog) {
> >          int priority = LOG_ERR;
> >          switch (level) {
> > @@ -2286,6 +2293,10 @@ static void log_func(enum fuse_log_level level, const char *fmt, va_list ap)
> >      } else {
> >          vfprintf(stderr, fmt, ap);
> >      }
> > +
> > +    if (current_log_level == FUSE_LOG_DEBUG) {
> > +        g_free(fmt);
> > +    }
> 
> This can then be deleted.

Yes, that works nicely.

Dave

> >  }
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 056/104] virtiofsd: add security guide document
  2020-01-06 15:03   ` Daniel P. Berrangé
@ 2020-01-06 17:53     ` Dr. David Alan Gilbert
  2020-01-07 10:05       ` Daniel P. Berrangé
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-06 17:53 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > Many people want to know: what's up with virtiofsd and security?  This
> > document provides the answers!
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tools/virtiofsd/security.rst | 118 +++++++++++++++++++++++++++++++++++
> 
> Do we need to link this into the rest of QEMU's docs in some
> index page ?

I wonder how;  there's a autogenerated thing in
  docs/index.rst

that includes some of the docs directories subdirectories/index
Does that mean we should have this in docs/tools/virtiofsd/security.rst
and a docs/tools/index  and a docs/tools/virtiofsd/index  ?

> >  1 file changed, 118 insertions(+)
> >  create mode 100644 tools/virtiofsd/security.rst
> 
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks!

Dave

> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 050/104] virtiofsd: add seccomp whitelist
  2020-01-06 14:56   ` Daniel P. Berrangé
@ 2020-01-06 18:54     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-06 18:54 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:10PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > Only allow system calls that are needed by virtiofsd.  All other system
> > calls cause SIGSYS to be directed at the thread and the process will
> > coredump.
> > 
> > Restricting system calls reduces the kernel attack surface and limits
> > what the process can do when compromised.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > with additional entries by:
> > Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
> > Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> > Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> > Signed-off-by: piaojun <piaojun@huawei.com>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
> > ---
> >  Makefile                         |   2 +
> >  tools/virtiofsd/Makefile.objs    |   5 +-
> >  tools/virtiofsd/passthrough_ll.c |   2 +
> >  tools/virtiofsd/seccomp.c        | 141 +++++++++++++++++++++++++++++++
> >  tools/virtiofsd/seccomp.h        |  14 +++
> >  5 files changed, 163 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/virtiofsd/seccomp.c
> >  create mode 100644 tools/virtiofsd/seccomp.h
> > 
> > diff --git a/Makefile b/Makefile
> > index 8a5746d8a0..3f5d04e1f7 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -322,8 +322,10 @@ HELPERS-y =
> >  HELPERS-$(call land,$(CONFIG_SOFTMMU),$(CONFIG_LINUX)) = qemu-bridge-helper$(EXESUF)
> >  
> >  ifdef CONFIG_LINUX
> > +ifdef CONFIG_SECCOMP
> >  HELPERS-y += virtiofsd$(EXESUF)
> >  vhost-user-json-y += tools/virtiofsd/50-qemu-virtiofsd.json
> > +endif
> >  
> >  ifdef CONFIG_VIRGL
> >  ifdef CONFIG_GBM
> > diff --git a/tools/virtiofsd/Makefile.objs b/tools/virtiofsd/Makefile.objs
> > index 67be16332c..941b19f18e 100644
> > --- a/tools/virtiofsd/Makefile.objs
> > +++ b/tools/virtiofsd/Makefile.objs
> > @@ -6,5 +6,8 @@ virtiofsd-obj-y = buffer.o \
> >                    fuse_signals.o \
> >                    fuse_virtio.o \
> >                    helper.o \
> > -                  passthrough_ll.o
> > +                  passthrough_ll.o \
> > +                  seccomp.o
> >  
> > +seccomp.o-cflags := $(SECCOMP_CFLAGS)
> > +seccomp.o-libs := $(SECCOMP_LIBS)
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 754ef2618b..701608c6df 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -58,6 +58,7 @@
> >  #include <unistd.h>
> >  
> >  #include "passthrough_helpers.h"
> > +#include "seccomp.h"
> >  
> >  #define HAVE_POSIX_FALLOCATE 1
> >  struct lo_map_elem {
> > @@ -2073,6 +2074,7 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se)
> >  {
> >      setup_namespaces(lo, se);
> >      setup_mounts(lo->source);
> > +    setup_seccomp();
> >  }
> >  
> >  int main(int argc, char *argv[])
> > diff --git a/tools/virtiofsd/seccomp.c b/tools/virtiofsd/seccomp.c
> > new file mode 100644
> > index 0000000000..6359bb55bb
> > --- /dev/null
> > +++ b/tools/virtiofsd/seccomp.c
> > @@ -0,0 +1,141 @@
> > +/*
> > + * Seccomp sandboxing for virtiofsd
> > + *
> > + * Copyright (C) 2019 Red Hat, Inc.
> > + *
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + */
> > +
> > +#include "seccomp.h"
> > +#include "fuse_i.h"
> > +#include "fuse_log.h"
> > +#include <errno.h>
> > +#include <glib.h>
> > +#include <seccomp.h>
> > +#include <stdlib.h>
> > +
> > +/* Bodge for libseccomp 2.4.2 which broke ppoll */
> > +#if !defined(__SNR_ppoll) && defined(__SNR_brk)
> > +#ifdef __NR_ppoll
> > +#define __SNR_ppoll __NR_ppoll
> > +#else
> > +#define __SNR_ppoll __PNR_ppoll
> > +#endif
> > +#endif
> > +
> > +static const int syscall_whitelist[] = {
> > +    /* TODO ireg sem*() syscalls */
> > +    SCMP_SYS(brk),
> > +    SCMP_SYS(capget), /* For CAP_FSETID */
> > +    SCMP_SYS(capset),
> > +    SCMP_SYS(clock_gettime),
> > +    SCMP_SYS(clone),
> 
> clone2 ?  clone3 ? IIC some archs in Linux
> will require the newer variants.

It looks like clone2 was an Itanium only thing; lets ignore that.
Clone3 is very new; so we're going to have to do:

  #ifdef __NR_clone3
  SCMP_SYS(clone3);
  #endif

> > +    SCMP_SYS(close),
> > +    SCMP_SYS(copy_file_range),
> > +    SCMP_SYS(dup),
> > +    SCMP_SYS(eventfd2),
> > +    SCMP_SYS(exit),
> > +    SCMP_SYS(exit_group),
> > +    SCMP_SYS(fallocate),
> > +    SCMP_SYS(fchmodat),
> > +    SCMP_SYS(fchownat),
> > +    SCMP_SYS(fcntl),
> > +    SCMP_SYS(fdatasync),
> > +    SCMP_SYS(fgetxattr),
> > +    SCMP_SYS(flistxattr),
> > +    SCMP_SYS(flock),
> > +    SCMP_SYS(fremovexattr),
> > +    SCMP_SYS(fsetxattr),
> > +    SCMP_SYS(fstat),
> > +    SCMP_SYS(fstatfs),
> > +    SCMP_SYS(fsync),
> > +    SCMP_SYS(ftruncate),
> > +    SCMP_SYS(futex),
> > +    SCMP_SYS(getdents),
> > +    SCMP_SYS(getdents64),
> > +    SCMP_SYS(getegid),
> > +    SCMP_SYS(geteuid),
> > +    SCMP_SYS(getpid),
> > +    SCMP_SYS(gettid),
> > +    SCMP_SYS(gettimeofday),
> > +    SCMP_SYS(linkat),
> > +    SCMP_SYS(lseek),
> > +    SCMP_SYS(madvise),
> > +    SCMP_SYS(mkdirat),
> > +    SCMP_SYS(mknodat),
> > +    SCMP_SYS(mmap),
> > +    SCMP_SYS(mprotect),
> > +    SCMP_SYS(mremap),
> > +    SCMP_SYS(munmap),
> > +    SCMP_SYS(newfstatat),
> > +    SCMP_SYS(open),
> > +    SCMP_SYS(openat),
> > +    SCMP_SYS(ppoll),
> > +    SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
> > +    SCMP_SYS(preadv),
> > +    SCMP_SYS(pread64),
> > +    SCMP_SYS(pwritev),
> > +    SCMP_SYS(pwrite64),
> > +    SCMP_SYS(read),
> > +    SCMP_SYS(readlinkat),
> > +    SCMP_SYS(recvmsg),
> > +    SCMP_SYS(renameat),
> > +    SCMP_SYS(renameat2),
> > +    SCMP_SYS(rt_sigaction),
> > +    SCMP_SYS(rt_sigprocmask),
> > +    SCMP_SYS(rt_sigreturn),
> > +    SCMP_SYS(sendmsg),
> > +    SCMP_SYS(setresgid),
> 
> Should be setresgid32 instead I think. We don't
> want the legacy syscall that's limted to 16-bit GIDs
> 
> Needs the code fix I mention in an earlier patch too.
> 
> > +    SCMP_SYS(setresuid),
> 
> Same as above

OK.
Interestingly I see setresuid/setresgid blacklisted as SET_PRIVILEGED
in qemu's qemu-secomp.c but not the 32 versions; perhaps those should
be added - but then I don't understand why qemu would ever allow them to
be enabled.

Dave

> > +    SCMP_SYS(set_robust_list),
> > +    SCMP_SYS(symlinkat),
> > +    SCMP_SYS(time), /* Rarely needed, except on static builds */
> > +    SCMP_SYS(tgkill),
> > +    SCMP_SYS(unlinkat),
> > +    SCMP_SYS(utimensat),
> > +    SCMP_SYS(write),
> > +    SCMP_SYS(writev),
> > +};
> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context
  2020-01-06 14:30   ` Daniel P. Berrangé
@ 2020-01-06 19:00     ` Dr. David Alan Gilbert
  2020-01-06 19:08       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-06 19:00 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:37:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Vivek Goyal <vgoyal@redhat.com>
> > 
> > We need to create files in the caller's context. Otherwise after
> > creating a file, the caller might not be able to do file operations on
> > that file.
> > 
> > Changed effective uid/gid to caller's uid/gid, create file and then
> > switch back to uid/gid 0.
> > 
> > Use syscall(setresuid, ...) otherwise glibc does some magic to change EUID
> > in all threads, which is not what we want.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 79 ++++++++++++++++++++++++++++++--
> >  1 file changed, 74 insertions(+), 5 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 68bacb6fc5..0188cd9ad6 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> 
> 
> > +static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
> > +{
> > +    int res;
> > +
> > +    old->euid = geteuid();
> > +    old->egid = getegid();
> > +
> > +    res = syscall(SYS_setresgid, -1, fuse_req_ctx(req)->gid, -1);
> 
> Do we need to be using  SYS_setres[u,g]id32 instead...
> 
> [quote setresgid(2)]
>        The original Linux setresuid() and setresgid() system  calls
>        supported  only  16-bit  user  and group IDs.  Subsequently,
>        Linux 2.4 added setresuid32() and setresgid32(),  supporting
>        32-bit  IDs.   The glibc setresuid() and setresgid() wrapper
>        functions transparently deal with the variations across ker‐
>        nel versions.
> [/quote]

OK, updated.

Dave

> > +    if (res == -1) {
> > +        return errno;
> > +    }
> > +
> > +    res = syscall(SYS_setresuid, -1, fuse_req_ctx(req)->uid, -1);
> > +    if (res == -1) {
> > +        int errno_save = errno;
> > +
> > +        syscall(SYS_setresgid, -1, old->egid, -1);
> > +        return errno_save;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +/* Regain Privileges */
> > +static void lo_restore_cred(struct lo_cred *old)
> > +{
> > +    int res;
> > +
> > +    res = syscall(SYS_setresuid, -1, old->euid, -1);
> > +    if (res == -1) {
> > +        fuse_log(FUSE_LOG_ERR, "seteuid(%u): %m\n", old->euid);
> > +        exit(1);
> > +    }
> > +
> > +    res = syscall(SYS_setresgid, -1, old->egid, -1);
> > +    if (res == -1) {
> > +        fuse_log(FUSE_LOG_ERR, "setegid(%u): %m\n", old->egid);
> > +        exit(1);
> > +    }
> > +}
> > +
> >  static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
> >                               const char *name, mode_t mode, dev_t rdev,
> >                               const char *link)
> > @@ -391,12 +443,21 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
> >      int saverr;
> >      struct lo_inode *dir = lo_inode(req, parent);
> >      struct fuse_entry_param e;
> > +    struct lo_cred old = {};
> >  
> >      saverr = ENOMEM;
> >  
> > +    saverr = lo_change_cred(req, &old);
> > +    if (saverr) {
> > +        goto out;
> > +    }
> > +
> >      res = mknod_wrapper(dir->fd, name, link, mode, rdev);
> >  
> >      saverr = errno;
> > +
> > +    lo_restore_cred(&old);
> > +
> >      if (res == -1) {
> >          goto out;
> >      }
> > @@ -794,26 +855,34 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
> >      struct lo_data *lo = lo_data(req);
> >      struct fuse_entry_param e;
> >      int err;
> > +    struct lo_cred old = {};
> >  
> >      if (lo_debug(req)) {
> >          fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
> >                   parent, name);
> >      }
> >  
> > +    err = lo_change_cred(req, &old);
> > +    if (err) {
> > +        goto out;
> > +    }
> > +
> >      fd = openat(lo_fd(req, parent), name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
> >                  mode);
> > -    if (fd == -1) {
> > -        return (void)fuse_reply_err(req, errno);
> > -    }
> > +    err = fd == -1 ? errno : 0;
> > +    lo_restore_cred(&old);
> >  
> > -    fi->fh = fd;
> > +    if (!err) {
> > +        fi->fh = fd;
> > +        err = lo_do_lookup(req, parent, name, &e);
> > +    }
> >      if (lo->cache == CACHE_NEVER) {
> >          fi->direct_io = 1;
> >      } else if (lo->cache == CACHE_ALWAYS) {
> >          fi->keep_cache = 1;
> >      }
> >  
> > -    err = lo_do_lookup(req, parent, name, &e);
> > +out:
> >      if (err) {
> >          fuse_reply_err(req, err);
> >      } else {
> > -- 
> > 2.23.0
> > 
> > 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context
  2020-01-06 19:00     ` Dr. David Alan Gilbert
@ 2020-01-06 19:08       ` Dr. David Alan Gilbert
  2020-01-07  9:22         ` Daniel P. Berrangé
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-06 19:08 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Dec 12, 2019 at 04:37:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: Vivek Goyal <vgoyal@redhat.com>
> > > 
> > > We need to create files in the caller's context. Otherwise after
> > > creating a file, the caller might not be able to do file operations on
> > > that file.
> > > 
> > > Changed effective uid/gid to caller's uid/gid, create file and then
> > > switch back to uid/gid 0.
> > > 
> > > Use syscall(setresuid, ...) otherwise glibc does some magic to change EUID
> > > in all threads, which is not what we want.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 79 ++++++++++++++++++++++++++++++--
> > >  1 file changed, 74 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 68bacb6fc5..0188cd9ad6 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > 
> > 
> > > +static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
> > > +{
> > > +    int res;
> > > +
> > > +    old->euid = geteuid();
> > > +    old->egid = getegid();
> > > +
> > > +    res = syscall(SYS_setresgid, -1, fuse_req_ctx(req)->gid, -1);
> > 
> > Do we need to be using  SYS_setres[u,g]id32 instead...
> > 
> > [quote setresgid(2)]
> >        The original Linux setresuid() and setresgid() system  calls
> >        supported  only  16-bit  user  and group IDs.  Subsequently,
> >        Linux 2.4 added setresuid32() and setresgid32(),  supporting
> >        32-bit  IDs.   The glibc setresuid() and setresgid() wrapper
> >        functions transparently deal with the variations across ker‐
> >        nel versions.
> > [/quote]
> 
> OK, updated.

Hmm hang on; this is messy.  x86-64 only seems to have setresuid
where as some architectures have both;  If I'm reading this right, all
64 bit machines have setresuid/gid calling the code that takes the
32bit ID; some have compat entries for 32bit syscalls.

I think it's probably more correct to call setresuid here; except
for 32 bit platforms - but how do we tell?

Dave

> Dave
> 
> > > +    if (res == -1) {
> > > +        return errno;
> > > +    }
> > > +
> > > +    res = syscall(SYS_setresuid, -1, fuse_req_ctx(req)->uid, -1);
> > > +    if (res == -1) {
> > > +        int errno_save = errno;
> > > +
> > > +        syscall(SYS_setresgid, -1, old->egid, -1);
> > > +        return errno_save;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +/* Regain Privileges */
> > > +static void lo_restore_cred(struct lo_cred *old)
> > > +{
> > > +    int res;
> > > +
> > > +    res = syscall(SYS_setresuid, -1, old->euid, -1);
> > > +    if (res == -1) {
> > > +        fuse_log(FUSE_LOG_ERR, "seteuid(%u): %m\n", old->euid);
> > > +        exit(1);
> > > +    }
> > > +
> > > +    res = syscall(SYS_setresgid, -1, old->egid, -1);
> > > +    if (res == -1) {
> > > +        fuse_log(FUSE_LOG_ERR, "setegid(%u): %m\n", old->egid);
> > > +        exit(1);
> > > +    }
> > > +}
> > > +
> > >  static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
> > >                               const char *name, mode_t mode, dev_t rdev,
> > >                               const char *link)
> > > @@ -391,12 +443,21 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
> > >      int saverr;
> > >      struct lo_inode *dir = lo_inode(req, parent);
> > >      struct fuse_entry_param e;
> > > +    struct lo_cred old = {};
> > >  
> > >      saverr = ENOMEM;
> > >  
> > > +    saverr = lo_change_cred(req, &old);
> > > +    if (saverr) {
> > > +        goto out;
> > > +    }
> > > +
> > >      res = mknod_wrapper(dir->fd, name, link, mode, rdev);
> > >  
> > >      saverr = errno;
> > > +
> > > +    lo_restore_cred(&old);
> > > +
> > >      if (res == -1) {
> > >          goto out;
> > >      }
> > > @@ -794,26 +855,34 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >      struct lo_data *lo = lo_data(req);
> > >      struct fuse_entry_param e;
> > >      int err;
> > > +    struct lo_cred old = {};
> > >  
> > >      if (lo_debug(req)) {
> > >          fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
> > >                   parent, name);
> > >      }
> > >  
> > > +    err = lo_change_cred(req, &old);
> > > +    if (err) {
> > > +        goto out;
> > > +    }
> > > +
> > >      fd = openat(lo_fd(req, parent), name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
> > >                  mode);
> > > -    if (fd == -1) {
> > > -        return (void)fuse_reply_err(req, errno);
> > > -    }
> > > +    err = fd == -1 ? errno : 0;
> > > +    lo_restore_cred(&old);
> > >  
> > > -    fi->fh = fd;
> > > +    if (!err) {
> > > +        fi->fh = fd;
> > > +        err = lo_do_lookup(req, parent, name, &e);
> > > +    }
> > >      if (lo->cache == CACHE_NEVER) {
> > >          fi->direct_io = 1;
> > >      } else if (lo->cache == CACHE_ALWAYS) {
> > >          fi->keep_cache = 1;
> > >      }
> > >  
> > > -    err = lo_do_lookup(req, parent, name, &e);
> > > +out:
> > >      if (err) {
> > >          fuse_reply_err(req, err);
> > >      } else {
> > > -- 
> > > 2.23.0
> > > 
> > > 
> > 
> > Regards,
> > Daniel
> > -- 
> > |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> > |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context
  2020-01-06 19:08       ` Dr. David Alan Gilbert
@ 2020-01-07  9:22         ` Daniel P. Berrangé
  2020-01-10 13:05           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07  9:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, stefanha, vgoyal

On Mon, Jan 06, 2020 at 07:08:43PM +0000, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Dec 12, 2019 at 04:37:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: Vivek Goyal <vgoyal@redhat.com>
> > > > 
> > > > We need to create files in the caller's context. Otherwise after
> > > > creating a file, the caller might not be able to do file operations on
> > > > that file.
> > > > 
> > > > Changed effective uid/gid to caller's uid/gid, create file and then
> > > > switch back to uid/gid 0.
> > > > 
> > > > Use syscall(setresuid, ...) otherwise glibc does some magic to change EUID
> > > > in all threads, which is not what we want.
> > > > 
> > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/passthrough_ll.c | 79 ++++++++++++++++++++++++++++++--
> > > >  1 file changed, 74 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > index 68bacb6fc5..0188cd9ad6 100644
> > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > 
> > > 
> > > > +static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
> > > > +{
> > > > +    int res;
> > > > +
> > > > +    old->euid = geteuid();
> > > > +    old->egid = getegid();
> > > > +
> > > > +    res = syscall(SYS_setresgid, -1, fuse_req_ctx(req)->gid, -1);
> > > 
> > > Do we need to be using  SYS_setres[u,g]id32 instead...
> > > 
> > > [quote setresgid(2)]
> > >        The original Linux setresuid() and setresgid() system  calls
> > >        supported  only  16-bit  user  and group IDs.  Subsequently,
> > >        Linux 2.4 added setresuid32() and setresgid32(),  supporting
> > >        32-bit  IDs.   The glibc setresuid() and setresgid() wrapper
> > >        functions transparently deal with the variations across ker‐
> > >        nel versions.
> > > [/quote]
> > 
> > OK, updated.
> 
> Hmm hang on; this is messy.  x86-64 only seems to have setresuid
> where as some architectures have both;  If I'm reading this right, all
> 64 bit machines have setresuid/gid calling the code that takes the
> 32bit ID; some have compat entries for 32bit syscalls.

Oh yuk.

> I think it's probably more correct to call setresuid here; except
> for 32 bit platforms - but how do we tell?

Is it possible to just do an #ifdef SYS_setresgid32 check to see
if the wider variant exists ?


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 056/104] virtiofsd: add security guide document
  2020-01-06 17:53     ` Dr. David Alan Gilbert
@ 2020-01-07 10:05       ` Daniel P. Berrangé
  2020-01-09 17:02         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 10:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, stefanha, vgoyal

On Mon, Jan 06, 2020 at 05:53:55PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Dec 12, 2019 at 04:38:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: Stefan Hajnoczi <stefanha@redhat.com>
> > > 
> > > Many people want to know: what's up with virtiofsd and security?  This
> > > document provides the answers!
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  tools/virtiofsd/security.rst | 118 +++++++++++++++++++++++++++++++++++
> > 
> > Do we need to link this into the rest of QEMU's docs in some
> > index page ?
> 
> I wonder how;  there's a autogenerated thing in
>   docs/index.rst
> 
> that includes some of the docs directories subdirectories/index
> Does that mean we should have this in docs/tools/virtiofsd/security.rst
> and a docs/tools/index  and a docs/tools/virtiofsd/index  ?

I was wondering if this fits into any of the current three sections
"devel" or "interop" or "specs", but it doesn't feel quite right in
any of them to me. So having a new docs/tools subtree looks like an
ok idea in absence of better suggestions.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 060/104] virtiofsd: Add timestamp to the log with FUSE_LOG_DEBUG level
  2019-12-12 16:38 ` [PATCH 060/104] virtiofsd: Add timestamp " Dr. David Alan Gilbert (git)
@ 2020-01-07 11:11   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:20PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> 
> virtiofsd has some threads, so we see a lot of logs with debug option.
> It would be useful for debugging if we can see the timestamp.
> 
> Add nano second timestamp, which got by get_clock(), to the log with
> FUSE_LOG_DEBUG level if the syslog option isn't set.
> 
> The log is like as:
> 
>   ]# ./virtiofsd -d -o vhost_user_socket=/tmp/vhostqemu0 -o source=/tmp/share0 -o cache=auto
>   ...
>   [5365943125463727] [ID: 00000002] fv_queue_thread: Start for queue 0 kick_fd 9
>   [5365943125568644] [ID: 00000002] fv_queue_thread: Waiting for Queue 0 event
>   [5365943125573561] [ID: 00000002] fv_queue_thread: Got queue event on Queue 0
> 
> Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 061/104] virtiofsd: Handle reinit
  2019-12-12 16:38 ` [PATCH 061/104] virtiofsd: Handle reinit Dr. David Alan Gilbert (git)
@ 2020-01-07 11:12   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:21PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Allow init->destroy->init  for mount->umount->mount
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 062/104] virtiofsd: Handle hard reboot
  2019-12-12 16:38 ` [PATCH 062/104] virtiofsd: Handle hard reboot Dr. David Alan Gilbert (git)
@ 2020-01-07 11:14   ` Daniel P. Berrangé
  2020-01-10 15:43     ` Dr. David Alan Gilbert
  2020-01-20  6:46   ` Misono Tomohiro
  1 sibling, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:22PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Handle a
>   mount
>   hard reboot (without unmount)
>   mount
> 
> we get another 'init' which FUSE doesn't normally expect.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 2d1d1a2e59..45125ef66a 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -2436,7 +2436,21 @@ void fuse_session_process_buf_int(struct fuse_session *se,
>              goto reply_err;
>          }
>      } else if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT) {
> -        goto reply_err;
> +        if (fuse_lowlevel_is_virtio(se)) {
> +            /*
> +             * TODO: This is after a hard reboot typically, we need to do
> +             * a destroy, but we can't reply to this request yet so
> +             * we can't use do_destroy
> +             */
> +            fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
> +            se->got_destroy = 1;
> +            se->got_init = 0;
> +            if (se->op.destroy) {
> +                se->op.destroy(se->userdata);
> +            }
> +        } else {
> +            goto reply_err;
> +        }

In doing this, is there any danger we're exposed to from a malicious
guest which does

   mount
   mount

without a reboot in between ?

I'm thinking not so if its ok, then

 Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 063/104] virtiofsd: Kill threads when queues are stopped
  2019-12-12 16:38 ` [PATCH 063/104] virtiofsd: Kill threads when queues are stopped Dr. David Alan Gilbert (git)
@ 2020-01-07 11:16   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:23PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Kill the threads we've started when the queues get stopped.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 37 +++++++++++++++++++++++++++++++----
>  1 file changed, 33 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 064/104] vhost-user: Print unexpected slave message types
  2019-12-12 16:38 ` [PATCH 064/104] vhost-user: Print unexpected slave message types Dr. David Alan Gilbert (git)
@ 2020-01-07 11:18   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:24PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When we receive an unexpected message type on the slave fd, print
> the type.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hw/virtio/vhost-user.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 065/104] contrib/libvhost-user: Protect slave fd with mutex
  2019-12-12 16:38 ` [PATCH 065/104] contrib/libvhost-user: Protect slave fd with mutex Dr. David Alan Gilbert (git)
@ 2020-01-07 11:19   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:25PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> In future patches we'll be performing commands on the slave-fd driven
> by commands on queues, since those queues will be driven by individual
> threads we need to make sure they don't attempt to use the slave-fd
> for multiple commands in parallel.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  contrib/libvhost-user/libvhost-user.c | 24 ++++++++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h |  3 +++
>  2 files changed, 23 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support
  2019-12-12 16:38 ` [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support Dr. David Alan Gilbert (git)
@ 2020-01-07 11:21   ` Daniel P. Berrangé
  2020-01-10  9:52     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:26PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Miklos Szeredi <mszeredi@redhat.com>
> 
> No glibc support yet, so use syscall().

It exists in glibc in my Fedora 31 install.

Presumably this is related to an older version

> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 91d3120033..bed2270141 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1083,7 +1083,17 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
>      }
>  
>      if (flags) {
> +#ifndef SYS_renameat2
>          fuse_reply_err(req, EINVAL);
> +#else
> +        res = syscall(SYS_renameat2, lo_fd(req, parent), name,
> +                      lo_fd(req, newparent), newname, flags);
> +        if (res == -1 && errno == ENOSYS) {
> +            fuse_reply_err(req, EINVAL);
> +        } else {
> +            fuse_reply_err(req, res == -1 ? errno : 0);
> +        }
> +#endif

We should use the formal API if available as first choice


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 067/104] virtiofsd: passthrough_ll: disable readdirplus on cache=never
  2019-12-12 16:38 ` [PATCH 067/104] virtiofsd: passthrough_ll: disable readdirplus on cache=never Dr. David Alan Gilbert (git)
@ 2020-01-07 11:22   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:27PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Miklos Szeredi <mszeredi@redhat.com>
> 
> ...because the attributes sent in the READDIRPLUS reply would be discarded
> anyway.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 4 ++++
>  1 file changed, 4 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus
  2019-12-12 16:38 ` [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus Dr. David Alan Gilbert (git)
@ 2020-01-07 11:23   ` Daniel P. Berrangé
  2020-01-10 15:04     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:28PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Miklos Szeredi <mszeredi@redhat.com>
>

What is readdirplus and what do we need a command line option to
control it ? What's the user benefit of changing the setting ?

> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 0d70a367bd..c3e8bde5cf 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -118,6 +118,8 @@ struct lo_data {
>      double timeout;
>      int cache;
>      int timeout_set;
> +    int readdirplus_set;
> +    int readdirplus_clear;
>      struct lo_inode root; /* protected by lo->mutex */
>      struct lo_map ino_map; /* protected by lo->mutex */
>      struct lo_map dirp_map; /* protected by lo->mutex */
> @@ -141,6 +143,8 @@ static const struct fuse_opt lo_opts[] = {
>      { "cache=auto", offsetof(struct lo_data, cache), CACHE_NORMAL },
>      { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },
>      { "norace", offsetof(struct lo_data, norace), 1 },
> +    { "readdirplus", offsetof(struct lo_data, readdirplus_set), 1 },
> +    { "no_readdirplus", offsetof(struct lo_data, readdirplus_clear), 1 },
>      FUSE_OPT_END
>  };
>  static bool use_syslog = false;
> @@ -479,7 +483,8 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>          fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
>          conn->want |= FUSE_CAP_FLOCK_LOCKS;
>      }
> -    if (lo->cache == CACHE_NEVER) {
> +    if ((lo->cache == CACHE_NEVER && !lo->readdirplus_set) ||
> +        lo->readdirplus_clear) {
>          fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling readdirplus\n");
>          conn->want &= ~FUSE_CAP_READDIRPLUS;
>      }
> -- 
> 2.23.0
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 069/104] virtiofsd: rename unref_inode() to unref_inode_lolocked()
  2019-12-12 16:38 ` [PATCH 069/104] virtiofsd: rename unref_inode() to unref_inode_lolocked() Dr. David Alan Gilbert (git)
@ 2020-01-07 11:23   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:29PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Miklos Szeredi <mszeredi@redhat.com>
> 
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 073/104] virtiofsd: passthrough_ll: clean up cache related options
  2019-12-12 16:38 ` [PATCH 073/104] virtiofsd: passthrough_ll: clean up cache related options Dr. David Alan Gilbert (git)
@ 2020-01-07 11:24   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:33PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Miklos Szeredi <mszeredi@redhat.com>
> 
>  - Rename "cache=never" to "cache=none" to match 9p's similar option.
> 
>  - Rename CACHE_NORMAL constant to CACHE_AUTO to match the "cache=auto"
>    option.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 074/104] virtiofsd: passthrough_ll: use hashtable
  2019-12-12 16:38 ` [PATCH 074/104] virtiofsd: passthrough_ll: use hashtable Dr. David Alan Gilbert (git)
@ 2020-01-07 11:28   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:34PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Miklos Szeredi <mszeredi@redhat.com>
> 
> Improve performance of inode lookup by using a hash table.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 81 ++++++++++++++++++--------------
>  1 file changed, 45 insertions(+), 36 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 075/104] virtiofsd: Clean up inodes on destroy
  2019-12-12 16:38 ` [PATCH 075/104] virtiofsd: Clean up inodes on destroy Dr. David Alan Gilbert (git)
@ 2020-01-07 11:29   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:35PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Clear out our inodes and fd's on a 'destroy' - so we get rid
> of them if we reboot the guest.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 076/104] virtiofsd: support nanosecond resolution for file timestamp
  2019-12-12 16:38 ` [PATCH 076/104] virtiofsd: support nanosecond resolution for file timestamp Dr. David Alan Gilbert (git)
@ 2020-01-07 11:30   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:36PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Jiufei Xue <jiufei.xue@linux.alibaba.com>
> 
> Define HAVE_STRUCT_STAT_ST_ATIM to 1 if `st_atim' is member of `struct
> stat' which means support nanosecond resolution for the file timestamp
> fields.
> 
> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
> ---
>  configure                   | 16 ++++++++++++++++
>  tools/virtiofsd/fuse_misc.h |  1 +
>  2 files changed, 17 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 077/104] virtiofsd: fix error handling in main()
  2019-12-12 16:38 ` [PATCH 077/104] virtiofsd: fix error handling in main() Dr. David Alan Gilbert (git)
@ 2020-01-07 11:30   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:37PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Liu Bo <bo.liu@linux.alibaba.com>
> 
> Neither fuse_parse_cmdline() nor fuse_opt_parse() goes to the right place
> to do cleanup.
> 
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 078/104] virtiofsd: cleanup allocated resource in se
  2019-12-12 16:38 ` [PATCH 078/104] virtiofsd: cleanup allocated resource in se Dr. David Alan Gilbert (git)
@ 2020-01-07 11:34   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:38PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Liu Bo <bo.liu@linux.alibaba.com>
> 
> This cleans up unfreed resources in se on quiting, including
> se->virtio_dev, se->vu_socket_path, se->vu_socketfd.
> 
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 7 +++++++
>  tools/virtiofsd/fuse_virtio.c   | 7 +++++++
>  tools/virtiofsd/fuse_virtio.h   | 2 +-
>  3 files changed, 15 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 079/104] virtiofsd: fix memory leak on lo.source
  2019-12-12 16:38 ` [PATCH 079/104] virtiofsd: fix memory leak on lo.source Dr. David Alan Gilbert (git)
@ 2020-01-07 11:37   ` Daniel P. Berrangé
  2020-01-09 17:38     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:39PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Liu Bo <bo.liu@linux.alibaba.com>
> 
> valgrind reported that lo.source is leaked on quiting, but it was defined
> as (const char*) as it may point to a const string "/".
> 
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 33092de65a..45cf466178 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2529,9 +2529,8 @@ int main(int argc, char *argv[])
>              fuse_log(FUSE_LOG_ERR, "source is not a directory\n");
>              exit(1);
>          }
> -
>      } else {
> -        lo.source = "/";
> +        lo.source = strdup("/");
>      }
>      lo.root.is_symlink = false;
>      if (!lo.timeout_set) {
> @@ -2610,5 +2609,7 @@ err_out1:
>          close(lo.root.fd);
>      }
>  
> +    free((char *)lo.source);

Can we not change the 'lo_data' struct so that source is not const
and thus avoid free'ing a const field ?


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 080/104] virtiofsd: add helper for lo_data cleanup
  2019-12-12 16:38 ` [PATCH 080/104] virtiofsd: add helper for lo_data cleanup Dr. David Alan Gilbert (git)
@ 2020-01-07 11:40   ` Daniel P. Berrangé
  2020-01-09 17:41     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:40PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Liu Bo <bo.liu@linux.alibaba.com>
> 
> This offers an helper function for lo_data's cleanup.
> 
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 37 ++++++++++++++++++--------------
>  1 file changed, 21 insertions(+), 16 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 45cf466178..097033aa00 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2439,6 +2439,26 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
>      return la->ino == lb->ino && la->dev == lb->dev;
>  }
>  
> +static void fuse_lo_data_cleanup(struct lo_data *lo)
> +{
> +    if (lo->inodes) {
> +        g_hash_table_destroy(lo->inodes);
> +    }
> +    lo_map_destroy(&lo->fd_map);
> +    lo_map_destroy(&lo->dirp_map);
> +    lo_map_destroy(&lo->ino_map);
> +
> +    if (lo->proc_self_fd >= 0) {
> +        close(lo->proc_self_fd);
> +    }
> +
> +    if (lo->root.fd >= 0) {
> +        close(lo->root.fd);
> +    }
> +
> +    free((char *)lo->source);

This will need changing if you follow my comment on prev patch about
removing the const & cast

> +}
> +
>  int main(int argc, char *argv[])
>  {
>      struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
> @@ -2594,22 +2614,7 @@ err_out2:
>  err_out1:
>      fuse_opt_free_args(&args);
>  
> -    if (lo.inodes) {
> -        g_hash_table_destroy(lo.inodes);
> -    }
> -    lo_map_destroy(&lo.fd_map);
> -    lo_map_destroy(&lo.dirp_map);
> -    lo_map_destroy(&lo.ino_map);
> -
> -    if (lo.proc_self_fd >= 0) {
> -        close(lo.proc_self_fd);
> -    }
> -
> -    if (lo.root.fd >= 0) {
> -        close(lo.root.fd);
> -    }
> -
> -    free((char *)lo.source);
> +    fuse_lo_data_cleanup(&lo);
>  
>      return ret ? 1 : 0;
>  }
> -- 
> 2.23.0
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 081/104] virtiofsd: Prevent multiply running with same vhost_user_socket
  2019-12-12 16:38 ` [PATCH 081/104] virtiofsd: Prevent multiply running with same vhost_user_socket Dr. David Alan Gilbert (git)
@ 2020-01-07 11:43   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:41PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> 
> virtiofsd can run multiply even if the vhost_user_socket is same path.
> 
>   ]# ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/tmp/share &
>   [1] 244965
>   virtio_session_mount: Waiting for vhost-user socket connection...
>   ]# ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/tmp/share &
>   [2] 244966
>   virtio_session_mount: Waiting for vhost-user socket connection...
>   ]#
> 
> The user will get confused about the situation and maybe the cause of the
> unexpected problem. So it's better to prevent the multiple running.
> 
> Create a regular file under localstatedir directory to exclude the
> vhost_user_socket. To create and lock the file, use qemu_write_pidfile()
> because the API has some sanity checks and file lock.
> 
> Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>   Applied fixes from Stefan's review and moved osdep include
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c |  1 +
>  tools/virtiofsd/fuse_virtio.c   | 49 ++++++++++++++++++++++++++++++++-
>  2 files changed, 49 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 082/104] virtiofsd: enable PARALLEL_DIROPS during INIT
  2019-12-12 16:38 ` [PATCH 082/104] virtiofsd: enable PARALLEL_DIROPS during INIT Dr. David Alan Gilbert (git)
@ 2020-01-07 11:44   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:42PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Liu Bo <bo.liu@linux.alibaba.com>
> 
> lookup is a RO operations, PARALLEL_DIROPS can be enabled.
> 
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 083/104] virtiofsd: fix incorrect error handling in lo_do_lookup
  2019-12-12 16:38 ` [PATCH 083/104] virtiofsd: fix incorrect error handling in lo_do_lookup Dr. David Alan Gilbert (git)
@ 2020-01-07 11:45   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 11:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:43PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Eric Ren <renzhen@linux.alibaba.com>
> 
> Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 086/104] virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy()
  2019-12-12 16:38 ` [PATCH 086/104] virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy() Dr. David Alan Gilbert (git)
@ 2020-01-07 12:01   ` Daniel P. Berrangé
  2020-01-07 13:24     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:46PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> vu_socket_path is NULL when --fd=FDNUM was used.  Use
> fuse_lowlevel_is_virtio() instead.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>   pull request 10

Extraneous line

> ---
>  tools/virtiofsd/fuse_lowlevel.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 087/104] virtiofsd: prevent fv_queue_thread() vs virtio_loop() races
  2019-12-12 16:38 ` [PATCH 087/104] virtiofsd: prevent fv_queue_thread() vs virtio_loop() races Dr. David Alan Gilbert (git)
@ 2020-01-07 12:02   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:47PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> We call into libvhost-user from the virtqueue handler thread and the
> vhost-user message processing thread without a lock.  There is nothing
> protecting the virtqueue handler thread if the vhost-user message
> processing thread changes the virtqueue or memory table while it is
> running.
> 
> This patch introduces a read-write lock.  Virtqueue handler threads are
> readers.  The vhost-user message processing thread is a writer.  This
> will allow concurrency for multiqueue in the future while protecting
> against fv_queue_thread() vs virtio_loop() races.
> 
> Note that the critical sections could be made smaller but it would be
> more invasive and require libvhost-user changes.  Let's start simple and
> improve performance later, if necessary.  Another option would be an
> RCU-style approach with lighter-weight primitives.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>   Merged with log changes
>   pull request 12

Two extraneous lines

> ---
>  tools/virtiofsd/fuse_virtio.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 088/104] virtiofsd: make lo_release() atomic
  2019-12-12 16:38 ` [PATCH 088/104] virtiofsd: make lo_release() atomic Dr. David Alan Gilbert (git)
@ 2020-01-07 12:03   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:48PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Hold the lock across both lo_map_get() and lo_map_remove() to prevent
> races between two FUSE_RELEASE requests.  In this case I don't see a
> serious bug but it's safer to do things atomically.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 092/104] virtiofsd: add man page
  2019-12-12 16:38 ` [PATCH 092/104] virtiofsd: add man page Dr. David Alan Gilbert (git)
  2019-12-13 14:33   ` Liam Merwick
@ 2020-01-07 12:13   ` Daniel P. Berrangé
  2020-01-09 20:02     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  Makefile                       |  7 +++
>  tools/virtiofsd/virtiofsd.texi | 85 ++++++++++++++++++++++++++++++++++
>  2 files changed, 92 insertions(+)
>  create mode 100644 tools/virtiofsd/virtiofsd.texi

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

with some notes at the very end


> diff --git a/Makefile b/Makefile
> index fa15174ba0..53d175d12f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -357,6 +357,9 @@ DOCS+=docs/qemu-cpu-models.7
>  ifdef CONFIG_VIRTFS
>  DOCS+=fsdev/virtfs-proxy-helper.1
>  endif
> +ifdef CONFIG_LINUX
> +DOCS+=tools/virtiofsd/virtiofsd.1
> +endif
>  ifdef CONFIG_TRACE_SYSTEMTAP
>  DOCS+=scripts/qemu-trace-stap.1
>  endif
> @@ -863,6 +866,9 @@ ifdef CONFIG_VIRTFS
>  	$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1"
>  	$(INSTALL_DATA) fsdev/virtfs-proxy-helper.1 "$(DESTDIR)$(mandir)/man1"
>  endif
> +ifdef CONFIG_LINUX
> +	$(INSTALL_DATA) tools/virtiofsd/virtiofsd.1 "$(DESTDIR)$(mandir)/man1"
> +endif
>  
>  install-datadir:
>  	$(INSTALL_DIR) "$(DESTDIR)$(qemu_datadir)"
> @@ -1061,6 +1067,7 @@ qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi qemu-monitor-info.texi
>  qemu.1: qemu-option-trace.texi
>  qemu-img.1: qemu-img.texi qemu-option-trace.texi qemu-img-cmds.texi
>  fsdev/virtfs-proxy-helper.1: fsdev/virtfs-proxy-helper.texi
> +tools/virtiofsd/virtiofsd.1: tools/virtiofsd/virtiofsd.texi
>  qemu-nbd.8: qemu-nbd.texi qemu-option-trace.texi
>  docs/qemu-block-drivers.7: docs/qemu-block-drivers.texi
>  docs/qemu-cpu-models.7: docs/qemu-cpu-models.texi
> diff --git a/tools/virtiofsd/virtiofsd.texi b/tools/virtiofsd/virtiofsd.texi
> new file mode 100644
> index 0000000000..eec7fbf4e6
> --- /dev/null
> +++ b/tools/virtiofsd/virtiofsd.texi
> @@ -0,0 +1,85 @@
> +@example
> +@c man begin SYNOPSIS
> +@command{virtiofsd} [OPTION] @option{--socket-path=}@var{path}|@option{--fd=}@var{fdnum} @option{-o source=}@var{path}
> +@c man end
> +@end example
> +
> +@c man begin DESCRIPTION
> +
> +Share a host directory tree with a guest through a virtio-fs device.  This
> +program is a vhost-user backend that implements the virtio-fs device.  Each
> +virtio-fs device instance requires its own virtiofsd process.
> +
> +This program is designed to work with QEMU's @code{--device vhost-user-fs-pci}
> +but should work with any virtual machine monitor (VMM) that supports
> +vhost-user.  See the EXAMPLES section below.
> +
> +This program must be run as the root user.

So there's no way for an unprivileged user to do file sharing like they
can with 9p right now ?

>                                              Upon startup the program will
> +switch into a new file system namespace with the shared directory tree as its
> +root.  This prevents "file system escapes" due to symlinks and other file
> +system objects that might lead to files outside the shared directory.  The
> +program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other
> +vectors that could allow an attacker to compromise the system after gaining
> +control of the virtiofsd process.
> +
> +@c man end
> +
> +@c man begin OPTIONS
> +@table @option
> +@item -h, --help
> +Print help.
> +@item -V, --version
> +Print version.
> +@item -d, -o debug
> +Enable debug output.
> +@item --syslog
> +Print log messages to syslog instead of stderr.
> +@item -o log_level=@var{level}
> +Print only log messages matching @var{level} or more severe.  @var{level} is
> +one of @code{err}, @code{warn}, @code{info}, or @code{debug}.  The default is
> +@var{info}.
> +@item -o source=@var{path}
> +Share host directory tree located at @var{path}.  This option is required.
> +@item --socket-path=@var{path}, -o vhost_user_socket=@var{path}
> +Listen on vhost-user UNIX domain socket at @var{path}.
> +@item --fd=@var{fdnum}
> +Accept connections from vhost-user UNIX domain socket file descriptor @var{fdnum}.  The file descriptor must already be listening for connections.
> +@item --thread-pool-size=@var{num}
> +Restrict the number of worker threads per request queue to @var{num}.  The default is 64.
> +@item --cache=@code{none}|@code{auto}|@code{always}
> +Select the desired trade-off between coherency and performance.  @code{none}
> +forbids the FUSE client from caching to achieve best coherency at the cost of
> +performance.  @code{auto} acts similar to NFS with a 1 second metadata cache
> +timeout.  @code{always} sets a long cache lifetime at the expense of coherency.
> +@item --writeback
> +Enable writeback cache, allowing the FUSE client to buffer and merge write requests.
> +@end table
> +@c man end
> +
> +@c man begin EXAMPLES
> +Export @code{/var/lib/fs/vm001/} on vhost-user UNIX domain socket @code{/var/run/vm001-vhost-fs.sock}:
> +
> +@example
> +host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
> +host# qemu-system-x86_64 \
> +    -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
> +    -device vhost-user-fs-pci,chardev=char0,tag=myfs \
> +    -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
> +    -numa node,memdev=mem \
> +    ...
> +guest# mount -t virtio_fs \
> +    -o default_permissions,allow_other,user_id=0,group_id=0,rootmode=040000,dax \
> +    myfs /mnt
> +@end example
> +@c man end
> +
> +@ignore
> +@setfilename virtiofsd
> +@settitle QEMU virtio-fs shared file system daemon
> +
> +@c man begin AUTHOR

s/AUTHOR/COPYRIGHT/

since this isn't providing any author information.

> +Copyright (C) 2019 Red Hat, Inc.

2019-2020 !

And now insert

 @c man end
 @c man begin LICENSE

> +This is free software; see the source for copying conditions.  There is NO
> +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> +@c man end
> +@end ignore



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 095/104] virtiofsd: convert more fprintf and perror to use fuse log infra
  2019-12-12 16:38 ` [PATCH 095/104] virtiofsd: convert more fprintf and perror to use fuse log infra Dr. David Alan Gilbert (git)
@ 2020-01-07 12:16   ` Daniel P. Berrangé
  2020-01-16 12:29   ` Misono Tomohiro
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:55PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Eryu Guan <eguan@linux.alibaba.com>
> 
> Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_signals.c | 6 +++++-
>  tools/virtiofsd/helper.c       | 9 ++++++---
>  2 files changed, 11 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 096/104] virtiofsd: Reset O_DIRECT flag during file open
  2019-12-12 16:38 ` [PATCH 096/104] virtiofsd: Reset O_DIRECT flag during file open Dr. David Alan Gilbert (git)
@ 2020-01-07 12:17   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:56PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> 
> If an application wants to do direct IO and opens a file with O_DIRECT
> in guest, that does not necessarily mean that we need to bypass page
> cache on host as well. So reset this flag on host.
> 
> If somebody needs to bypass page cache on host as well (and it is safe to
> do so), we can add a knob in daemon later to control this behavior.
> 
> I check virtio-9p and they do reset O_DIRECT flag.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 097/104] virtiofsd: Fix data corruption with O_APPEND wirte in writeback mode
  2019-12-12 16:38 ` [PATCH 097/104] virtiofsd: Fix data corruption with O_APPEND wirte in writeback mode Dr. David Alan Gilbert (git)
@ 2020-01-07 12:20   ` Daniel P. Berrangé
  2020-01-07 13:27     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:57PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> 
> When writeback mode is enabled (-o writeback), O_APPEND handling is
> done in kernel. Therefore virtiofsd clears O_APPEND flag when open.
> Otherwise O_APPEND flag takes precedence over pwrite() and write
> data may corrupt.
> 
> Currently clearing O_APPEND flag is done in lo_open(), but we also
> need the same operation in lo_create(). So, factor out the flag
> update operation in lo_open() to update_open_flags() and call it
> in both lo_open() and lo_create().
> 
> This fixes the failure of xfstest generic/069 in writeback mode
> (which tests O_APPEND write data integrity).

Seeing this mention of xfstest reminds me that there are no tests
added anywhere in this patch series.  Is there another followup
pending with tests or is it a todo item ?

We can usefully wire up this xfstest program in the functional
tests of QEMU in some way ?

> 
> Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 66 ++++++++++++++++----------------
>  1 file changed, 33 insertions(+), 33 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 098/104] virtiofsd: add definition of fuse_buf_writev()
  2019-12-12 16:38 ` [PATCH 098/104] virtiofsd: add definition of fuse_buf_writev() Dr. David Alan Gilbert (git)
@ 2020-01-07 12:21   ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:58PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: piaojun <piaojun@huawei.com>
> 
> Define fuse_buf_writev() which use pwritev and writev to improve io
> bandwidth. Especially, the src bufs with 0 size should be skipped as
> their mems are not *block_size* aligned which will cause writev failed
> in direct io mode.
> 
> Signed-off-by: Jun Piao <piaojun@huawei.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/buffer.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 099/104] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2019-12-12 16:38 ` [PATCH 099/104] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance Dr. David Alan Gilbert (git)
@ 2020-01-07 12:23   ` Daniel P. Berrangé
  2020-01-10 13:15     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:59PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: piaojun <piaojun@huawei.com>
> 
> fuse_buf_writev() only handles the normal write in which src is buffer
> and dest is fd. Specially if src buffer represents guest physical
> address that can't be mapped by the daemon process, IO must be bounced
> back to the VMM to do it by fuse_buf_copy().
> 
> Signed-off-by: Jun Piao <piaojun@huawei.com>
> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/buffer.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

> 
> diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
> index ae420c70c4..4875473785 100644
> --- a/tools/virtiofsd/buffer.c
> +++ b/tools/virtiofsd/buffer.c
> @@ -33,9 +33,7 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv)
>      return size;
>  }
>  
> -__attribute__((unused))
> -static ssize_t fuse_buf_writev(fuse_req_t req,

Lets cull the fuse_req_t param in the previous patch

> -                               struct fuse_buf *out_buf,
> +static ssize_t fuse_buf_writev(struct fuse_buf *out_buf,
>                                 struct fuse_bufvec *in_buf)
>  {
>      ssize_t res, i, j;

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 103/104] virtiofsd: add --thread-pool-size=NUM option
  2019-12-12 16:39 ` [PATCH 103/104] virtiofsd: add --thread-pool-size=NUM option Dr. David Alan Gilbert (git)
@ 2020-01-07 12:25   ` Daniel P. Berrangé
  2020-01-17 13:35   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-07 12:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:39:03PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Add an option to control the size of the thread pool.  Requests are now
> processed in parallel by default.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_i.h        | 1 +
>  tools/virtiofsd/fuse_lowlevel.c | 7 ++++++-
>  tools/virtiofsd/fuse_virtio.c   | 5 +++--
>  3 files changed, 10 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 086/104] virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy()
  2020-01-07 12:01   ` Daniel P. Berrangé
@ 2020-01-07 13:24     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-07 13:24 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:46PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > vu_socket_path is NULL when --fd=FDNUM was used.  Use
> > fuse_lowlevel_is_virtio() instead.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >   pull request 10
> 
> Extraneous line

Hmm, not sure where that came from; thanks.

> > ---
> >  tools/virtiofsd/fuse_lowlevel.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks!

> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 097/104] virtiofsd: Fix data corruption with O_APPEND wirte in writeback mode
  2020-01-07 12:20   ` Daniel P. Berrangé
@ 2020-01-07 13:27     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-07 13:27 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:57PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> > 
> > When writeback mode is enabled (-o writeback), O_APPEND handling is
> > done in kernel. Therefore virtiofsd clears O_APPEND flag when open.
> > Otherwise O_APPEND flag takes precedence over pwrite() and write
> > data may corrupt.
> > 
> > Currently clearing O_APPEND flag is done in lo_open(), but we also
> > need the same operation in lo_create(). So, factor out the flag
> > update operation in lo_open() to update_open_flags() and call it
> > in both lo_open() and lo_create().
> > 
> > This fixes the failure of xfstest generic/069 in writeback mode
> > (which tests O_APPEND write data integrity).
> 
> Seeing this mention of xfstest reminds me that there are no tests
> added anywhere in this patch series.  Is there another followup
> pending with tests or is it a todo item ?

We've got some github CI setup, but not too much automatic of as yet.
As you spotted in another patch we need root to run this at the moment
which makes life harder; we also need a full guest to drive fuse
requests.

> We can usefully wire up this xfstest program in the functional
> tests of QEMU in some way ?

> 
> > 
> > Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 66 ++++++++++++++++----------------
> >  1 file changed, 33 insertions(+), 33 deletions(-)
> 
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks.

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 007/104] virtiofsd: Format imported files to qemu style
  2019-12-12 16:37 ` [PATCH 007/104] virtiofsd: Format imported files to qemu style Dr. David Alan Gilbert (git)
  2020-01-03 12:04   ` Daniel P. Berrangé
@ 2020-01-09 12:21   ` Aleksandar Markovic
  1 sibling, 0 replies; 307+ messages in thread
From: Aleksandar Markovic @ 2020-01-09 12:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 49146 bytes --]

On Thursday, December 12, 2019, Dr. David Alan Gilbert (git) <
dgilbert@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Mostly using a set like:
>
> indent -nut -i 4 -nlp -br -cs -ce --no-space-after-function-call-names
> file
> clang-format -style=file -i -- file
> clang-tidy -fix-errors -checks=readability-braces-around-statements file
> clang-format -style=file -i -- file
>
> With manual cleanups.
>
> The .clang-format used is below.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> Language:        Cpp
> AlignAfterOpenBracket: Align
> AlignConsecutiveAssignments: false # although we like it, it creates churn
> AlignConsecutiveDeclarations: false
> AlignEscapedNewlinesLeft: true
> AlignOperands:   true
> AlignTrailingComments: false # churn
> AllowAllParametersOfDeclarationOnNextLine: true
> AllowShortBlocksOnASingleLine: false
> AllowShortCaseLabelsOnASingleLine: false
> AllowShortFunctionsOnASingleLine: None
> AllowShortIfStatementsOnASingleLine: false
> AllowShortLoopsOnASingleLine: false
> AlwaysBreakAfterReturnType: None # AlwaysBreakAfterDefinitionReturnType
> is taken into account
> AlwaysBreakBeforeMultilineStrings: false
> BinPackArguments: true
> BinPackParameters: true
> BraceWrapping:
>   AfterControlStatement: false
>   AfterEnum:       false
>   AfterFunction:   true
>   AfterStruct:     false
>   AfterUnion:      false
>   BeforeElse:      false
>   IndentBraces:    false
> BreakBeforeBinaryOperators: None
> BreakBeforeBraces: Custom
> BreakBeforeTernaryOperators: false
> BreakStringLiterals: true
> ColumnLimit:     80
> ContinuationIndentWidth: 4
> Cpp11BracedListStyle: false
> DerivePointerAlignment: false
> DisableFormat:   false
> ForEachMacros:   [
>   'CPU_FOREACH',
>   'CPU_FOREACH_REVERSE',
>   'CPU_FOREACH_SAFE',
>   'IOMMU_NOTIFIER_FOREACH',
>   'QLIST_FOREACH',
>   'QLIST_FOREACH_ENTRY',
>   'QLIST_FOREACH_RCU',
>   'QLIST_FOREACH_SAFE',
>   'QLIST_FOREACH_SAFE_RCU',
>   'QSIMPLEQ_FOREACH',
>   'QSIMPLEQ_FOREACH_SAFE',
>   'QSLIST_FOREACH',
>   'QSLIST_FOREACH_SAFE',
>   'QTAILQ_FOREACH',
>   'QTAILQ_FOREACH_REVERSE',
>   'QTAILQ_FOREACH_SAFE',
>   'QTAILQ_RAW_FOREACH',
>   'RAMBLOCK_FOREACH'
> ]
> IncludeCategories:
>   - Regex:           '^"qemu/osdep.h'
>     Priority:        -3
>   - Regex:           '^"(block|chardev|crypto|disas|exec|fpu|hw|io|
> libdecnumber|migration|monitor|net|qapi|qemu|qom|
> standard-headers|sysemu|ui)/'
>     Priority:        -2
>   - Regex:           '^"(elf.h|qemu-common.h|glib-
> compat.h|qemu-io.h|trace-tcg.h)'
>     Priority:        -1
>   - Regex:           '.*'
>     Priority:        1
> IncludeIsMainRegex: '$'
> IndentCaseLabels: false
> IndentWidth:     4
> IndentWrappedFunctionNames: false
> KeepEmptyLinesAtTheStartOfBlocks: false
> MacroBlockBegin: '.*_BEGIN$' # only PREC_BEGIN ?
> MacroBlockEnd:   '.*_END$'
> MaxEmptyLinesToKeep: 2
> PointerAlignment: Right
> ReflowComments:  true
> SortIncludes:    true
> SpaceAfterCStyleCast: false
> SpaceBeforeAssignmentOperators: true
> SpaceBeforeParens: ControlStatements
> SpaceInEmptyParentheses: false
> SpacesBeforeTrailingComments: 1
> SpacesInContainerLiterals: true
> SpacesInParentheses: false
> SpacesInSquareBrackets: false
> Standard:        Auto
> UseTab:          Never
> ...
> ---
>  tools/virtiofsd/buffer.c              |  550 ++--
>  tools/virtiofsd/fuse.h                | 1572 +++++------
>  tools/virtiofsd/fuse_common.h         |  764 ++---
>  tools/virtiofsd/fuse_i.h              |  127 +-
>  tools/virtiofsd/fuse_log.c            |   38 +-
>  tools/virtiofsd/fuse_log.h            |   32 +-
>  tools/virtiofsd/fuse_loop_mt.c        |   66 +-
>  tools/virtiofsd/fuse_lowlevel.c       | 3678 +++++++++++++------------
>  tools/virtiofsd/fuse_lowlevel.h       | 2401 ++++++++--------
>  tools/virtiofsd/fuse_misc.h           |   30 +-
>  tools/virtiofsd/fuse_opt.c            |  659 ++---
>  tools/virtiofsd/fuse_opt.h            |   79 +-
>  tools/virtiofsd/fuse_signals.c        |  118 +-
>  tools/virtiofsd/helper.c              |  517 ++--
>  tools/virtiofsd/passthrough_helpers.h |   33 +-
>  tools/virtiofsd/passthrough_ll.c      | 2063 +++++++-------
>  16 files changed, 6530 insertions(+), 6197 deletions(-)
>
>
Reviewed by: Aleksandar Markovic <amarkovic@wavecomp.com>


> diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
> index 5ab9b87455..38521f5889 100644
> --- a/tools/virtiofsd/buffer.c
> +++ b/tools/virtiofsd/buffer.c
> @@ -1,321 +1,343 @@
>  /*
> -  FUSE: Filesystem in Userspace
> -  Copyright (C) 2010  Miklos Szeredi <miklos@szeredi.hu>
> -
> -  Functions for dealing with `struct fuse_buf` and `struct
> -  fuse_bufvec`.
> -
> -  This program can be distributed under the terms of the GNU LGPLv2.
> -  See the file COPYING.LIB
> -*/
> + * FUSE: Filesystem in Userspace
> + * Copyright (C) 2010  Miklos Szeredi <miklos@szeredi.hu>
> + *
> + * Functions for dealing with `struct fuse_buf` and `struct
> + * fuse_bufvec`.
> + *
> + * This program can be distributed under the terms of the GNU LGPLv2.
> + * See the file COPYING.LIB
> + */
>
>  #define _GNU_SOURCE
>
>  #include "config.h"
>  #include "fuse_i.h"
>  #include "fuse_lowlevel.h"
> +#include <assert.h>
> +#include <errno.h>
>  #include <string.h>
>  #include <unistd.h>
> -#include <errno.h>
> -#include <assert.h>
>
>  size_t fuse_buf_size(const struct fuse_bufvec *bufv)
>  {
> -       size_t i;
> -       size_t size = 0;
> -
> -       for (i = 0; i < bufv->count; i++) {
> -               if (bufv->buf[i].size == SIZE_MAX)
> -                       size = SIZE_MAX;
> -               else
> -                       size += bufv->buf[i].size;
> -       }
> -
> -       return size;
> +    size_t i;
> +    size_t size = 0;
> +
> +    for (i = 0; i < bufv->count; i++) {
> +        if (bufv->buf[i].size == SIZE_MAX) {
> +            size = SIZE_MAX;
> +        } else {
> +            size += bufv->buf[i].size;
> +        }
> +    }
> +
> +    return size;
>  }
>
>  static size_t min_size(size_t s1, size_t s2)
>  {
> -       return s1 < s2 ? s1 : s2;
> +    return s1 < s2 ? s1 : s2;
>  }
>
>  static ssize_t fuse_buf_write(const struct fuse_buf *dst, size_t dst_off,
> -                             const struct fuse_buf *src, size_t src_off,
> -                             size_t len)
> +                              const struct fuse_buf *src, size_t src_off,
> +                              size_t len)
>  {
> -       ssize_t res = 0;
> -       size_t copied = 0;
> -
> -       while (len) {
> -               if (dst->flags & FUSE_BUF_FD_SEEK) {
> -                       res = pwrite(dst->fd, (char *)src->mem + src_off,
> len,
> -                                    dst->pos + dst_off);
> -               } else {
> -                       res = write(dst->fd, (char *)src->mem + src_off,
> len);
> -               }
> -               if (res == -1) {
> -                       if (!copied)
> -                               return -errno;
> -                       break;
> -               }
> -               if (res == 0)
> -                       break;
> -
> -               copied += res;
> -               if (!(dst->flags & FUSE_BUF_FD_RETRY))
> -                       break;
> -
> -               src_off += res;
> -               dst_off += res;
> -               len -= res;
> -       }
> -
> -       return copied;
> +    ssize_t res = 0;
> +    size_t copied = 0;
> +
> +    while (len) {
> +        if (dst->flags & FUSE_BUF_FD_SEEK) {
> +            res = pwrite(dst->fd, (char *)src->mem + src_off, len,
> +                         dst->pos + dst_off);
> +        } else {
> +            res = write(dst->fd, (char *)src->mem + src_off, len);
> +        }
> +        if (res == -1) {
> +            if (!copied) {
> +                return -errno;
> +            }
> +            break;
> +        }
> +        if (res == 0) {
> +            break;
> +        }
> +
> +        copied += res;
> +        if (!(dst->flags & FUSE_BUF_FD_RETRY)) {
> +            break;
> +        }
> +
> +        src_off += res;
> +        dst_off += res;
> +        len -= res;
> +    }
> +
> +    return copied;
>  }
>
>  static ssize_t fuse_buf_read(const struct fuse_buf *dst, size_t dst_off,
> -                            const struct fuse_buf *src, size_t src_off,
> -                            size_t len)
> +                             const struct fuse_buf *src, size_t src_off,
> +                             size_t len)
>  {
> -       ssize_t res = 0;
> -       size_t copied = 0;
> -
> -       while (len) {
> -               if (src->flags & FUSE_BUF_FD_SEEK) {
> -                       res = pread(src->fd, (char *)dst->mem + dst_off,
> len,
> -                                    src->pos + src_off);
> -               } else {
> -                       res = read(src->fd, (char *)dst->mem + dst_off,
> len);
> -               }
> -               if (res == -1) {
> -                       if (!copied)
> -                               return -errno;
> -                       break;
> -               }
> -               if (res == 0)
> -                       break;
> -
> -               copied += res;
> -               if (!(src->flags & FUSE_BUF_FD_RETRY))
> -                       break;
> -
> -               dst_off += res;
> -               src_off += res;
> -               len -= res;
> -       }
> -
> -       return copied;
> +    ssize_t res = 0;
> +    size_t copied = 0;
> +
> +    while (len) {
> +        if (src->flags & FUSE_BUF_FD_SEEK) {
> +            res = pread(src->fd, (char *)dst->mem + dst_off, len,
> +                        src->pos + src_off);
> +        } else {
> +            res = read(src->fd, (char *)dst->mem + dst_off, len);
> +        }
> +        if (res == -1) {
> +            if (!copied) {
> +                return -errno;
> +            }
> +            break;
> +        }
> +        if (res == 0) {
> +            break;
> +        }
> +
> +        copied += res;
> +        if (!(src->flags & FUSE_BUF_FD_RETRY)) {
> +            break;
> +        }
> +
> +        dst_off += res;
> +        src_off += res;
> +        len -= res;
> +    }
> +
> +    return copied;
>  }
>
>  static ssize_t fuse_buf_fd_to_fd(const struct fuse_buf *dst, size_t
> dst_off,
> -                                const struct fuse_buf *src, size_t
> src_off,
> -                                size_t len)
> +                                 const struct fuse_buf *src, size_t
> src_off,
> +                                 size_t len)
>  {
> -       char buf[4096];
> -       struct fuse_buf tmp = {
> -               .size = sizeof(buf),
> -               .flags = 0,
> -       };
> -       ssize_t res;
> -       size_t copied = 0;
> -
> -       tmp.mem = buf;
> -
> -       while (len) {
> -               size_t this_len = min_size(tmp.size, len);
> -               size_t read_len;
> -
> -               res = fuse_buf_read(&tmp, 0, src, src_off, this_len);
> -               if (res < 0) {
> -                       if (!copied)
> -                               return res;
> -                       break;
> -               }
> -               if (res == 0)
> -                       break;
> -
> -               read_len = res;
> -               res = fuse_buf_write(dst, dst_off, &tmp, 0, read_len);
> -               if (res < 0) {
> -                       if (!copied)
> -                               return res;
> -                       break;
> -               }
> -               if (res == 0)
> -                       break;
> -
> -               copied += res;
> -
> -               if (res < this_len)
> -                       break;
> -
> -               dst_off += res;
> -               src_off += res;
> -               len -= res;
> -       }
> -
> -       return copied;
> +    char buf[4096];
> +    struct fuse_buf tmp = {
> +        .size = sizeof(buf),
> +        .flags = 0,
> +    };
> +    ssize_t res;
> +    size_t copied = 0;
> +
> +    tmp.mem = buf;
> +
> +    while (len) {
> +        size_t this_len = min_size(tmp.size, len);
> +        size_t read_len;
> +
> +        res = fuse_buf_read(&tmp, 0, src, src_off, this_len);
> +        if (res < 0) {
> +            if (!copied) {
> +                return res;
> +            }
> +            break;
> +        }
> +        if (res == 0) {
> +            break;
> +        }
> +
> +        read_len = res;
> +        res = fuse_buf_write(dst, dst_off, &tmp, 0, read_len);
> +        if (res < 0) {
> +            if (!copied) {
> +                return res;
> +            }
> +            break;
> +        }
> +        if (res == 0) {
> +            break;
> +        }
> +
> +        copied += res;
> +
> +        if (res < this_len) {
> +            break;
> +        }
> +
> +        dst_off += res;
> +        src_off += res;
> +        len -= res;
> +    }
> +
> +    return copied;
>  }
>
>  #ifdef HAVE_SPLICE
>  static ssize_t fuse_buf_splice(const struct fuse_buf *dst, size_t dst_off,
> -                              const struct fuse_buf *src, size_t src_off,
> -                              size_t len, enum fuse_buf_copy_flags flags)
> +                               const struct fuse_buf *src, size_t src_off,
> +                               size_t len, enum fuse_buf_copy_flags flags)
>  {
> -       int splice_flags = 0;
> -       off_t *srcpos = NULL;
> -       off_t *dstpos = NULL;
> -       off_t srcpos_val;
> -       off_t dstpos_val;
> -       ssize_t res;
> -       size_t copied = 0;
> -
> -       if (flags & FUSE_BUF_SPLICE_MOVE)
> -               splice_flags |= SPLICE_F_MOVE;
> -       if (flags & FUSE_BUF_SPLICE_NONBLOCK)
> -               splice_flags |= SPLICE_F_NONBLOCK;
> -
> -       if (src->flags & FUSE_BUF_FD_SEEK) {
> -               srcpos_val = src->pos + src_off;
> -               srcpos = &srcpos_val;
> -       }
> -       if (dst->flags & FUSE_BUF_FD_SEEK) {
> -               dstpos_val = dst->pos + dst_off;
> -               dstpos = &dstpos_val;
> -       }
> -
> -       while (len) {
> -               res = splice(src->fd, srcpos, dst->fd, dstpos, len,
> -                            splice_flags);
> -               if (res == -1) {
> -                       if (copied)
> -                               break;
> -
> -                       if (errno != EINVAL || (flags &
> FUSE_BUF_FORCE_SPLICE))
> -                               return -errno;
> -
> -                       /* Maybe splice is not supported for this
> combination */
> -                       return fuse_buf_fd_to_fd(dst, dst_off, src,
> src_off,
> -                                                len);
> -               }
> -               if (res == 0)
> -                       break;
> -
> -               copied += res;
> -               if (!(src->flags & FUSE_BUF_FD_RETRY) &&
> -                   !(dst->flags & FUSE_BUF_FD_RETRY)) {
> -                       break;
> -               }
> -
> -               len -= res;
> -       }
> -
> -       return copied;
> +    int splice_flags = 0;
> +    off_t *srcpos = NULL;
> +    off_t *dstpos = NULL;
> +    off_t srcpos_val;
> +    off_t dstpos_val;
> +    ssize_t res;
> +    size_t copied = 0;
> +
> +    if (flags & FUSE_BUF_SPLICE_MOVE) {
> +        splice_flags |= SPLICE_F_MOVE;
> +    }
> +    if (flags & FUSE_BUF_SPLICE_NONBLOCK) {
> +        splice_flags |= SPLICE_F_NONBLOCK;
> +    }
> +    if (src->flags & FUSE_BUF_FD_SEEK) {
> +        srcpos_val = src->pos + src_off;
> +        srcpos = &srcpos_val;
> +    }
> +    if (dst->flags & FUSE_BUF_FD_SEEK) {
> +        dstpos_val = dst->pos + dst_off;
> +        dstpos = &dstpos_val;
> +    }
> +
> +    while (len) {
> +        res = splice(src->fd, srcpos, dst->fd, dstpos, len, splice_flags);
> +        if (res == -1) {
> +            if (copied) {
> +                break;
> +            }
> +
> +            if (errno != EINVAL || (flags & FUSE_BUF_FORCE_SPLICE)) {
> +                return -errno;
> +            }
> +
> +            /* Maybe splice is not supported for this combination */
> +            return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
> +        }
> +        if (res == 0) {
> +            break;
> +        }
> +
> +        copied += res;
> +        if (!(src->flags & FUSE_BUF_FD_RETRY) &&
> +            !(dst->flags & FUSE_BUF_FD_RETRY)) {
> +            break;
> +        }
> +
> +        len -= res;
> +    }
> +
> +    return copied;
>  }
>  #else
>  static ssize_t fuse_buf_splice(const struct fuse_buf *dst, size_t dst_off,
> -                              const struct fuse_buf *src, size_t src_off,
> -                              size_t len, enum fuse_buf_copy_flags flags)
> +                               const struct fuse_buf *src, size_t src_off,
> +                               size_t len, enum fuse_buf_copy_flags flags)
>  {
> -       (void) flags;
> +    (void)flags;
>
> -       return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
> +    return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
>  }
>  #endif
>
>
>  static ssize_t fuse_buf_copy_one(const struct fuse_buf *dst, size_t
> dst_off,
> -                                const struct fuse_buf *src, size_t
> src_off,
> -                                size_t len, enum fuse_buf_copy_flags
> flags)
> +                                 const struct fuse_buf *src, size_t
> src_off,
> +                                 size_t len, enum fuse_buf_copy_flags
> flags)
>  {
> -       int src_is_fd = src->flags & FUSE_BUF_IS_FD;
> -       int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
> -
> -       if (!src_is_fd && !dst_is_fd) {
> -               char *dstmem = (char *)dst->mem + dst_off;
> -               char *srcmem = (char *)src->mem + src_off;
> -
> -               if (dstmem != srcmem) {
> -                       if (dstmem + len <= srcmem || srcmem + len <=
> dstmem)
> -                               memcpy(dstmem, srcmem, len);
> -                       else
> -                               memmove(dstmem, srcmem, len);
> -               }
> -
> -               return len;
> -       } else if (!src_is_fd) {
> -               return fuse_buf_write(dst, dst_off, src, src_off, len);
> -       } else if (!dst_is_fd) {
> -               return fuse_buf_read(dst, dst_off, src, src_off, len);
> -       } else if (flags & FUSE_BUF_NO_SPLICE) {
> -               return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
> -       } else {
> -               return fuse_buf_splice(dst, dst_off, src, src_off, len,
> flags);
> -       }
> +    int src_is_fd = src->flags & FUSE_BUF_IS_FD;
> +    int dst_is_fd = dst->flags & FUSE_BUF_IS_FD;
> +
> +    if (!src_is_fd && !dst_is_fd) {
> +        char *dstmem = (char *)dst->mem + dst_off;
> +        char *srcmem = (char *)src->mem + src_off;
> +
> +        if (dstmem != srcmem) {
> +            if (dstmem + len <= srcmem || srcmem + len <= dstmem) {
> +                memcpy(dstmem, srcmem, len);
> +            } else {
> +                memmove(dstmem, srcmem, len);
> +            }
> +        }
> +
> +        return len;
> +    } else if (!src_is_fd) {
> +        return fuse_buf_write(dst, dst_off, src, src_off, len);
> +    } else if (!dst_is_fd) {
> +        return fuse_buf_read(dst, dst_off, src, src_off, len);
> +    } else if (flags & FUSE_BUF_NO_SPLICE) {
> +        return fuse_buf_fd_to_fd(dst, dst_off, src, src_off, len);
> +    } else {
> +        return fuse_buf_splice(dst, dst_off, src, src_off, len, flags);
> +    }
>  }
>
>  static const struct fuse_buf *fuse_bufvec_current(struct fuse_bufvec
> *bufv)
>  {
> -       if (bufv->idx < bufv->count)
> -               return &bufv->buf[bufv->idx];
> -       else
> -               return NULL;
> +    if (bufv->idx < bufv->count) {
> +        return &bufv->buf[bufv->idx];
> +    } else {
> +        return NULL;
> +    }
>  }
>
>  static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
>  {
> -       const struct fuse_buf *buf = fuse_bufvec_current(bufv);
> -
> -       bufv->off += len;
> -       assert(bufv->off <= buf->size);
> -       if (bufv->off == buf->size) {
> -               assert(bufv->idx < bufv->count);
> -               bufv->idx++;
> -               if (bufv->idx == bufv->count)
> -                       return 0;
> -               bufv->off = 0;
> -       }
> -       return 1;
> +    const struct fuse_buf *buf = fuse_bufvec_current(bufv);
> +
> +    bufv->off += len;
> +    assert(bufv->off <= buf->size);
> +    if (bufv->off == buf->size) {
> +        assert(bufv->idx < bufv->count);
> +        bufv->idx++;
> +        if (bufv->idx == bufv->count) {
> +            return 0;
> +        }
> +        bufv->off = 0;
> +    }
> +    return 1;
>  }
>
>  ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
> -                     enum fuse_buf_copy_flags flags)
> +                      enum fuse_buf_copy_flags flags)
>  {
> -       size_t copied = 0;
> -
> -       if (dstv == srcv)
> -               return fuse_buf_size(dstv);
> -
> -       for (;;) {
> -               const struct fuse_buf *src = fuse_bufvec_current(srcv);
> -               const struct fuse_buf *dst = fuse_bufvec_current(dstv);
> -               size_t src_len;
> -               size_t dst_len;
> -               size_t len;
> -               ssize_t res;
> -
> -               if (src == NULL || dst == NULL)
> -                       break;
> -
> -               src_len = src->size - srcv->off;
> -               dst_len = dst->size - dstv->off;
> -               len = min_size(src_len, dst_len);
> -
> -               res = fuse_buf_copy_one(dst, dstv->off, src, srcv->off,
> len, flags);
> -               if (res < 0) {
> -                       if (!copied)
> -                               return res;
> -                       break;
> -               }
> -               copied += res;
> -
> -               if (!fuse_bufvec_advance(srcv, res) ||
> -                   !fuse_bufvec_advance(dstv, res))
> -                       break;
> -
> -               if (res < len)
> -                       break;
> -       }
> -
> -       return copied;
> +    size_t copied = 0;
> +
> +    if (dstv == srcv) {
> +        return fuse_buf_size(dstv);
> +    }
> +
> +    for (;;) {
> +        const struct fuse_buf *src = fuse_bufvec_current(srcv);
> +        const struct fuse_buf *dst = fuse_bufvec_current(dstv);
> +        size_t src_len;
> +        size_t dst_len;
> +        size_t len;
> +        ssize_t res;
> +
> +        if (src == NULL || dst == NULL) {
> +            break;
> +        }
> +
> +        src_len = src->size - srcv->off;
> +        dst_len = dst->size - dstv->off;
> +        len = min_size(src_len, dst_len);
> +
> +        res = fuse_buf_copy_one(dst, dstv->off, src, srcv->off, len,
> flags);
> +        if (res < 0) {
> +            if (!copied) {
> +                return res;
> +            }
> +            break;
> +        }
> +        copied += res;
> +
> +        if (!fuse_bufvec_advance(srcv, res) ||
> +            !fuse_bufvec_advance(dstv, res)) {
> +            break;
> +        }
> +
> +        if (res < len) {
> +            break;
> +        }
> +    }
> +
> +    return copied;
>  }
> diff --git a/tools/virtiofsd/fuse.h b/tools/virtiofsd/fuse.h
> index 6c16a0041d..945ebc7a0d 100644
> --- a/tools/virtiofsd/fuse.h
> +++ b/tools/virtiofsd/fuse.h
> @@ -1,15 +1,15 @@
>  /*
> -  FUSE: Filesystem in Userspace
> -  Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
> -
> -  This program can be distributed under the terms of the GNU LGPLv2.
> -  See the file COPYING.LIB.
> -*/
> + * FUSE: Filesystem in Userspace
> + * Copyright (C) 2001-2007  Miklos Szeredi <miklos@szeredi.hu>
> + *
> + * This program can be distributed under the terms of the GNU LGPLv2.
> + * See the file COPYING.LIB.
> + */
>
>  #ifndef FUSE_H_
>  #define FUSE_H_
>
> -/** @file
> +/*
>   *
>   * This file defines the library interface of FUSE
>   *
> @@ -19,15 +19,15 @@
>  #include "fuse_common.h"
>
>  #include <fcntl.h>
> -#include <time.h>
> -#include <sys/types.h>
>  #include <sys/stat.h>
>  #include <sys/statvfs.h>
> +#include <sys/types.h>
>  #include <sys/uio.h>
> +#include <time.h>
>
> -/* ----------------------------------------------------------- *
> - * Basic FUSE API                                             *
> - * ----------------------------------------------------------- */
> +/*
> + * Basic FUSE API
> + */
>
>  /** Handle for a FUSE filesystem */
>  struct fuse;
> @@ -36,38 +36,39 @@ struct fuse;
>   * Readdir flags, passed to ->readdir()
>   */
>  enum fuse_readdir_flags {
> -       /**
> -        * "Plus" mode.
> -        *
> -        * The kernel wants to prefill the inode cache during readdir.  The
> -        * filesystem may honour this by filling in the attributes and
> setting
> -        * FUSE_FILL_DIR_FLAGS for the filler function.  The filesystem
> may also
> -        * just ignore this flag completely.
> -        */
> -       FUSE_READDIR_PLUS = (1 << 0),
> +    /**
> +     * "Plus" mode.
> +     *
> +     * The kernel wants to prefill the inode cache during readdir.  The
> +     * filesystem may honour this by filling in the attributes and setting
> +     * FUSE_FILL_DIR_FLAGS for the filler function.  The filesystem may
> also
> +     * just ignore this flag completely.
> +     */
> +    FUSE_READDIR_PLUS = (1 << 0),
>  };
>
>  enum fuse_fill_dir_flags {
> -       /**
> -        * "Plus" mode: all file attributes are valid
> -        *
> -        * The attributes are used by the kernel to prefill the inode cache
> -        * during a readdir.
> -        *
> -        * It is okay to set FUSE_FILL_DIR_PLUS if FUSE_READDIR_PLUS is
> not set
> -        * and vice versa.
> -        */
> -       FUSE_FILL_DIR_PLUS = (1 << 1),
> +    /**
> +     * "Plus" mode: all file attributes are valid
> +     *
> +     * The attributes are used by the kernel to prefill the inode cache
> +     * during a readdir.
> +     *
> +     * It is okay to set FUSE_FILL_DIR_PLUS if FUSE_READDIR_PLUS is not
> set
> +     * and vice versa.
> +     */
> +    FUSE_FILL_DIR_PLUS = (1 << 1),
>  };
>
> -/** Function to add an entry in a readdir() operation
> +/**
> + * Function to add an entry in a readdir() operation
>   *
>   * The *off* parameter can be any non-zero value that enables the
>   * filesystem to identify the current point in the directory
>   * stream. It does not need to be the actual physical position. A
>   * value of zero is reserved to indicate that seeking in directories
>   * is not supported.
> - *
> + *
>   * @param buf the buffer passed to the readdir() operation
>   * @param name the file name of the directory entry
>   * @param stat file attributes, can be NULL
> @@ -75,9 +76,9 @@ enum fuse_fill_dir_flags {
>   * @param flags fill flags
>   * @return 1 if buffer is full, zero otherwise
>   */
> -typedef int (*fuse_fill_dir_t) (void *buf, const char *name,
> -                               const struct stat *stbuf, off_t off,
> -                               enum fuse_fill_dir_flags flags);
> +typedef int (*fuse_fill_dir_t)(void *buf, const char *name,
> +                               const struct stat *stbuf, off_t off,
> +                               enum fuse_fill_dir_flags flags);
>  /**
>   * Configuration of the high-level API
>   *
> @@ -87,186 +88,186 @@ typedef int (*fuse_fill_dir_t) (void *buf, const
> char *name,
>   * file system implementation.
>   */
>  struct fuse_config {
> -       /**
> -        * If `set_gid` is non-zero, the st_gid attribute of each file
> -        * is overwritten with the value of `gid`.
> -        */
> -       int set_gid;
> -       unsigned int gid;
> -
> -       /**
> -        * If `set_uid` is non-zero, the st_uid attribute of each file
> -        * is overwritten with the value of `uid`.
> -        */
> -       int set_uid;
> -       unsigned int uid;
> -
> -       /**
> -        * If `set_mode` is non-zero, the any permissions bits set in
> -        * `umask` are unset in the st_mode attribute of each file.
> -        */
> -       int set_mode;
> -       unsigned int umask;
> -
> -       /**
> -        * The timeout in seconds for which name lookups will be
> -        * cached.
> -        */
> -       double entry_timeout;
> -
> -       /**
> -        * The timeout in seconds for which a negative lookup will be
> -        * cached. This means, that if file did not exist (lookup
> -        * retuned ENOENT), the lookup will only be redone after the
> -        * timeout, and the file/directory will be assumed to not
> -        * exist until then. A value of zero means that negative
> -        * lookups are not cached.
> -        */
> -       double negative_timeout;
> -
> -       /**
> -        * The timeout in seconds for which file/directory attributes
> -        * (as returned by e.g. the `getattr` handler) are cached.
> -        */
> -       double attr_timeout;
> -
> -       /**
> -        * Allow requests to be interrupted
> -        */
> -       int intr;
> -
> -       /**
> -        * Specify which signal number to send to the filesystem when
> -        * a request is interrupted.  The default is hardcoded to
> -        * USR1.
> -        */
> -       int intr_signal;
> -
> -       /**
> -        * Normally, FUSE assigns inodes to paths only for as long as
> -        * the kernel is aware of them. With this option inodes are
> -        * instead remembered for at least this many seconds.  This
> -        * will require more memory, but may be necessary when using
> -        * applications that make use of inode numbers.
> -        *
> -        * A number of -1 means that inodes will be remembered for the
> -        * entire life-time of the file-system process.
> -        */
> -       int remember;
> -
> -       /**
> -        * The default behavior is that if an open file is deleted,
> -        * the file is renamed to a hidden file (.fuse_hiddenXXX), and
> -        * only removed when the file is finally released.  This
> -        * relieves the filesystem implementation of having to deal
> -        * with this problem. This option disables the hiding
> -        * behavior, and files are removed immediately in an unlink
> -        * operation (or in a rename operation which overwrites an
> -        * existing file).
> -        *
> -        * It is recommended that you not use the hard_remove
> -        * option. When hard_remove is set, the following libc
> -        * functions fail on unlinked files (returning errno of
> -        * ENOENT): read(2), write(2), fsync(2), close(2), f*xattr(2),
> -        * ftruncate(2), fstat(2), fchmod(2), fchown(2)
> -        */
> -       int hard_remove;
> -
> -       /**
> -        * Honor the st_ino field in the functions getattr() and
> -        * fill_dir(). This value is used to fill in the st_ino field
> -        * in the stat(2), lstat(2), fstat(2) functions and the d_ino
> -        * field in the readdir(2) function. The filesystem does not
> -        * have to guarantee uniqueness, however some applications
> -        * rely on this value being unique for the whole filesystem.
> -        *
> -        * Note that this does *not* affect the inode that libfuse
> -        * and the kernel use internally (also called the "nodeid").
> -        */
> -       int use_ino;
> -
> -       /**
> -        * If use_ino option is not given, still try to fill in the
> -        * d_ino field in readdir(2). If the name was previously
> -        * looked up, and is still in the cache, the inode number
> -        * found there will be used.  Otherwise it will be set to -1.
> -        * If use_ino option is given, this option is ignored.
> -        */
> -       int readdir_ino;
> -
> -       /**
> -        * This option disables the use of page cache (file content cache)
> -        * in the kernel for this filesystem. This has several affects:
> -        *
> -        * 1. Each read(2) or write(2) system call will initiate one
> -        *    or more read or write operations, data will not be
> -        *    cached in the kernel.
> -        *
> -        * 2. The return value of the read() and write() system calls
> -        *    will correspond to the return values of the read and
> -        *    write operations. This is useful for example if the
> -        *    file size is not known in advance (before reading it).
> -        *
> -        * Internally, enabling this option causes fuse to set the
> -        * `direct_io` field of `struct fuse_file_info` - overwriting
> -        * any value that was put there by the file system.
> -        */
> -       int direct_io;
> -
> -       /**
> -        * This option disables flushing the cache of the file
> -        * contents on every open(2).  This should only be enabled on
> -        * filesystems where the file data is never changed
> -        * externally (not through the mounted FUSE filesystem).  Thus
> -        * it is not suitable for network filesystems and other
> -        * intermediate filesystems.
> -        *
> -        * NOTE: if this option is not specified (and neither
> -        * direct_io) data is still cached after the open(2), so a
> -        * read(2) system call will not always initiate a read
> -        * operation.
> -        *
> -        * Internally, enabling this option causes fuse to set the
> -        * `keep_cache` field of `struct fuse_file_info` - overwriting
> -        * any value that was put there by the file system.
> -        */
> -       int kernel_cache;
> -
> -       /**
> -        * This option is an alternative to `kernel_cache`. Instead of
> -        * unconditionally keeping cached data, the cached data is
> -        * invalidated on open(2) if if the modification time or the
> -        * size of the file has changed since it was last opened.
> -        */
> -       int auto_cache;
> -
> -       /**
> -        * The timeout in seconds for which file attributes are cached
> -        * for the purpose of checking if auto_cache should flush the
> -        * file data on open.
> -        */
> -       int ac_attr_timeout_set;
> -       double ac_attr_timeout;
> -
> -       /**
> -        * If this option is given the file-system handlers for the
> -        * following operations will not receive path information:
> -        * read, write, flush, release, fsync, readdir, releasedir,
> -        * fsyncdir, lock, ioctl and poll.
> -        *
> -        * For the truncate, getattr, chmod, chown and utimens
> -        * operations the path will be provided only if the struct
> -        * fuse_file_info argument is NULL.
> -        */
> -       int nullpath_ok;
> -
> -       /**
> -        * The remaining options are used by libfuse internally and
> -        * should not be touched.
> -        */
> -       int show_help;
> -       char *modules;
> -       int debug;
> +    /**
> +     * If `set_gid` is non-zero, the st_gid attribute of each file
> +     * is overwritten with the value of `gid`.
> +     */
> +    int set_gid;
> +    unsigned int gid;
> +
> +    /**
> +     * If `set_uid` is non-zero, the st_uid attribute of each file
> +     * is overwritten with the value of `uid`.
> +     */
> +    int set_uid;
> +    unsigned int uid;
> +
> +    /**
> +     * If `set_mode` is non-zero, the any permissions bits set in
> +     * `umask` are unset in the st_mode attribute of each file.
> +     */
> +    int set_mode;
> +    unsigned int umask;
> +
> +    /**
> +     * The timeout in seconds for which name lookups will be
> +     * cached.
> +     */
> +    double entry_timeout;
> +
> +    /**
> +     * The timeout in seconds for which a negative lookup will be
> +     * cached. This means, that if file did not exist (lookup
> +     * retuned ENOENT), the lookup will only be redone after the
> +     * timeout, and the file/directory will be assumed to not
> +     * exist until then. A value of zero means that negative
> +     * lookups are not cached.
> +     */
> +    double negative_timeout;
> +
> +    /**
> +     * The timeout in seconds for which file/directory attributes
> +     * (as returned by e.g. the `getattr` handler) are cached.
> +     */
> +    double attr_timeout;
> +
> +    /**
> +     * Allow requests to be interrupted
> +     */
> +    int intr;
> +
> +    /**
> +     * Specify which signal number to send to the filesystem when
> +     * a request is interrupted.  The default is hardcoded to
> +     * USR1.
> +     */
> +    int intr_signal;
> +
> +    /**
> +     * Normally, FUSE assigns inodes to paths only for as long as
> +     * the kernel is aware of them. With this option inodes are
> +     * instead remembered for at least this many seconds.  This
> +     * will require more memory, but may be necessary when using
> +     * applications that make use of inode numbers.
> +     *
> +     * A number of -1 means that inodes will be remembered for the
> +     * entire life-time of the file-system process.
> +     */
> +    int remember;
> +
> +    /**
> +     * The default behavior is that if an open file is deleted,
> +     * the file is renamed to a hidden file (.fuse_hiddenXXX), and
> +     * only removed when the file is finally released.  This
> +     * relieves the filesystem implementation of having to deal
> +     * with this problem. This option disables the hiding
> +     * behavior, and files are removed immediately in an unlink
> +     * operation (or in a rename operation which overwrites an
> +     * existing file).
> +     *
> +     * It is recommended that you not use the hard_remove
> +     * option. When hard_remove is set, the following libc
> +     * functions fail on unlinked files (returning errno of
> +     * ENOENT): read(2), write(2), fsync(2), close(2), f*xattr(2),
> +     * ftruncate(2), fstat(2), fchmod(2), fchown(2)
> +     */
> +    int hard_remove;
> +
> +    /**
> +     * Honor the st_ino field in the functions getattr() and
> +     * fill_dir(). This value is used to fill in the st_ino field
> +     * in the stat(2), lstat(2), fstat(2) functions and the d_ino
> +     * field in the readdir(2) function. The filesystem does not
> +     * have to guarantee uniqueness, however some applications
> +     * rely on this value being unique for the whole filesystem.
> +     *
> +     * Note that this does *not* affect the inode that libfuse
> +     * and the kernel use internally (also called the "nodeid").
> +     */
> +    int use_ino;
> +
> +    /**
> +     * If use_ino option is not given, still try to fill in the
> +     * d_ino field in readdir(2). If the name was previously
> +     * looked up, and is still in the cache, the inode number
> +     * found there will be used.  Otherwise it will be set to -1.
> +     * If use_ino option is given, this option is ignored.
> +     */
> +    int readdir_ino;
> +
> +    /**
> +     * This option disables the use of page cache (file content cache)
> +     * in the kernel for this filesystem. This has several affects:
> +     *
> +     * 1. Each read(2) or write(2) system call will initiate one
> +     *    or more read or write operations, data will not be
> +     *    cached in the kernel.
> +     *
> +     * 2. The return value of the read() and write() system calls
> +     *    will correspond to the return values of the read and
> +     *    write operations. This is useful for example if the
> +     *    file size is not known in advance (before reading it).
> +     *
> +     * Internally, enabling this option causes fuse to set the
> +     * `direct_io` field of `struct fuse_file_info` - overwriting
> +     * any value that was put there by the file system.
> +     */
> +    int direct_io;
> +
> +    /**
> +     * This option disables flushing the cache of the file
> +     * contents on every open(2).  This should only be enabled on
> +     * filesystems where the file data is never changed
> +     * externally (not through the mounted FUSE filesystem).  Thus
> +     * it is not suitable for network filesystems and other
> +     * intermediate filesystems.
> +     *
> +     * NOTE: if this option is not specified (and neither
> +     * direct_io) data is still cached after the open(2), so a
> +     * read(2) system call will not always initiate a read
> +     * operation.
> +     *
> +     * Internally, enabling this option causes fuse to set the
> +     * `keep_cache` field of `struct fuse_file_info` - overwriting
> +     * any value that was put there by the file system.
> +     */
> +    int kernel_cache;
> +
> +    /**
> +     * This option is an alternative to `kernel_cache`. Instead of
> +     * unconditionally keeping cached data, the cached data is
> +     * invalidated on open(2) if if the modification time or the
> +     * size of the file has changed since it was last opened.
> +     */
> +    int auto_cache;
> +
> +    /**
> +     * The timeout in seconds for which file attributes are cached
> +     * for the purpose of checking if auto_cache should flush the
> +     * file data on open.
> +     */
> +    int ac_attr_timeout_set;
> +    double ac_attr_timeout;
> +
> +    /**
> +     * If this option is given the file-system handlers for the
> +     * following operations will not receive path information:
> +     * read, write, flush, release, fsync, readdir, releasedir,
> +     * fsyncdir, lock, ioctl and poll.
> +     *
> +     * For the truncate, getattr, chmod, chown and utimens
> +     * operations the path will be provided only if the struct
> +     * fuse_file_info argument is NULL.
> +     */
> +    int nullpath_ok;
> +
> +    /**
> +     * The remaining options are used by libfuse internally and
> +     * should not be touched.
> +     */
> +    int show_help;
> +    char *modules;
> +    int debug;
>  };
>
>
> @@ -293,515 +294,535 @@ struct fuse_config {
>   * Almost all operations take a path which can be of any length.
>   */
>  struct fuse_operations {
> -       /** Get file attributes.
> -        *
> -        * Similar to stat().  The 'st_dev' and 'st_blksize' fields are
> -        * ignored. The 'st_ino' field is ignored except if the 'use_ino'
> -        * mount option is given. In that case it is passed to userspace,
> -        * but libfuse and the kernel will still assign a different
> -        * inode for internal use (called the "nodeid").
> -        *
> -        * `fi` will always be NULL if the file is not currently open, but
> -        * may also be NULL if the file is open.
> -        */
> -       int (*getattr) (const char *, struct stat *, struct fuse_file_info
> *fi);
> -
> -       /** Read the target of a symbolic link
> -        *
> -        * The buffer should be filled with a null terminated string.  The
> -        * buffer size argument includes the space for the terminating
> -        * null character.      If the linkname is too long to fit in the
> -        * buffer, it should be truncated.      The return value should be
> 0
> -        * for success.
> -        */
> -       int (*readlink) (const char *, char *, size_t);
> -
> -       /** Create a file node
> -        *
> -        * This is called for creation of all non-directory, non-symlink
> -        * nodes.  If the filesystem defines a create() method, then for
> -        * regular files that will be called instead.
> -        */
> -       int (*mknod) (const char *, mode_t, dev_t);
> -
> -       /** Create a directory
> -        *
> -        * Note that the mode argument may not have the type specification
> -        * bits set, i.e. S_ISDIR(mode) can be false.  To obtain the
> -        * correct directory type bits use  mode|S_IFDIR
> -        * */
> -       int (*mkdir) (const char *, mode_t);
> -
> -       /** Remove a file */
> -       int (*unlink) (const char *);
> -
> -       /** Remove a directory */
> -       int (*rmdir) (const char *);
> -
> -       /** Create a symbolic link */
> -       int (*symlink) (const char *, const char *);
> -
> -       /** Rename a file
> -        *
> -        * *flags* may be `RENAME_EXCHANGE` or `RENAME_NOREPLACE`. If
> -        * RENAME_NOREPLACE is specified, the filesystem must not
> -        * overwrite *newname* if it exists and return an error
> -        * instead. If `RENAME_EXCHANGE` is specified, the filesystem
> -        * must atomically exchange the two files, i.e. both must
> -        * exist and neither may be deleted.
> -        */
> -       int (*rename) (const char *, const char *, unsigned int flags);
> -
> -       /** Create a hard link to a file */
> -       int (*link) (const char *, const char *);
> -
> -       /** Change the permission bits of a file
> -        *
> -        * `fi` will always be NULL if the file is not currenlty open, but
> -        * may also be NULL if the file is open.
> -        */
> -       int (*chmod) (const char *, mode_t, struct fuse_file_info *fi);
> -
> -       /** Change the owner and group of a file
> -        *
> -        * `fi` will always be NULL if the file is not currenlty open, but
> -        * may also be NULL if the file is open.
> -        *
> -        * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
> -        * expected to reset the setuid and setgid bits.
> -        */
> -       int (*chown) (const char *, uid_t, gid_t, struct fuse_file_info
> *fi);
> -
> -       /** Change the size of a file
> -        *
> -        * `fi` will always be NULL if the file is not currenlty open, but
> -        * may also be NULL if the file is open.
> -        *
> -        * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is
> -        * expected to reset the setuid and setgid bits.
> -        */
> -       int (*truncate) (const char *, off_t, struct fuse_file_info *fi);
> -
> -       /** Open a file
> -        *
> -        * Open flags are available in fi->flags. The following rules
> -        * apply.
> -        *
> -        *  - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be
> -        *    filtered out / handled by the kernel.
> -        *
> -        *  - Access modes (O_RDONLY, O_WRONLY, O_RDWR, O_EXEC, O_SEARCH)
> -        *    should be used by the filesystem to check if the operation is
> -        *    permitted.  If the ``-o default_permissions`` mount option is
> -        *    given, this check is already done by the kernel before
> calling
> -        *    open() and may thus be omitted by the filesystem.
> -        *
> -        *  - When writeback caching is enabled, the kernel may send
> -        *    read requests even for files opened with O_WRONLY. The
> -        *    filesystem should be prepared to handle this.
> -        *
> -        *  - When writeback caching is disabled, the filesystem is
> -        *    expected to properly handle the O_APPEND flag and ensure
> -        *    that each write is appending to the end of the file.
> -        *
> -         *  - When writeback caching is enabled, the kernel will
> -        *    handle O_APPEND. However, unless all changes to the file
> -        *    come through the kernel this will not work reliably. The
> -        *    filesystem should thus either ignore the O_APPEND flag
> -        *    (and let the kernel handle it), or return an error
> -        *    (indicating that reliably O_APPEND is not available).
> -        *
> -        * Filesystem may store an arbitrary file handle (pointer,
> -        * index, etc) in fi->fh, and use this in other all other file
> -        * operations (read, write, flush, release, fsync).
> -        *
> -        * Filesystem may also implement stateless file I/O and not store
> -        * anything in fi->fh.
> -        *
> -        * There are also some flags (direct_io, keep_cache) which the
> -        * filesystem may set in fi, to change the way the file is opened.
> -        * See fuse_file_info structure in <fuse_common.h> for more
> details.
> -        *
> -        * If this request is answered with an error code of ENOSYS
> -        * and FUSE_CAP_NO_OPEN_SUPPORT is set in
> -        * `fuse_conn_info.capable`, this is treated as success and
> -        * future calls to open will also succeed without being send
> -        * to the filesystem process.
> -        *
> -        */
> -       int (*open) (const char *, struct fuse_file_info *);
> -
> -       /** Read data from an open file
> -        *
> -        * Read should return exactly the number of bytes requested except
> -        * on EOF or error, otherwise the rest of the data will be
> -        * substituted with zeroes.      An exception to this is when the
> -        * 'direct_io' mount option is specified, in which case the return
> -        * value of the read system call will reflect the return value of
> -        * this operation.
> -        */
> -       int (*read) (const char *, char *, size_t, off_t,
> -                    struct fuse_file_info *);
> -
> -       /** Write data to an open file
> -        *
> -        * Write should return exactly the number of bytes requested
> -        * except on error.      An exception to this is when the
> 'direct_io'
> -        * mount option is specified (see read operation).
> -        *
> -        * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this meth

[-- Attachment #2: Type: text/html, Size: 60661 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 056/104] virtiofsd: add security guide document
  2020-01-07 10:05       ` Daniel P. Berrangé
@ 2020-01-09 17:02         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 17:02 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Jan 06, 2020 at 05:53:55PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Dec 12, 2019 at 04:38:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: Stefan Hajnoczi <stefanha@redhat.com>
> > > > 
> > > > Many people want to know: what's up with virtiofsd and security?  This
> > > > document provides the answers!
> > > > 
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/security.rst | 118 +++++++++++++++++++++++++++++++++++
> > > 
> > > Do we need to link this into the rest of QEMU's docs in some
> > > index page ?
> > 
> > I wonder how;  there's a autogenerated thing in
> >   docs/index.rst
> > 
> > that includes some of the docs directories subdirectories/index
> > Does that mean we should have this in docs/tools/virtiofsd/security.rst
> > and a docs/tools/index  and a docs/tools/virtiofsd/index  ?
> 
> I was wondering if this fits into any of the current three sections
> "devel" or "interop" or "specs", but it doesn't feel quite right in
> any of them to me. So having a new docs/tools subtree looks like an
> ok idea in absence of better suggestions.

OK, so what I've done is I've added a preceding patch that creates:

   docs/tools
             /conf.py
             /index.rst

  and adds it to the Makefile and docs/index.rst 

and then this patch now adds itself as docs/tools/virtiofsd-security.rst
and just adds the entry in docs/tools/index.rst

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 079/104] virtiofsd: fix memory leak on lo.source
  2020-01-07 11:37   ` Daniel P. Berrangé
@ 2020-01-09 17:38     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 17:38 UTC (permalink / raw)
  To: Daniel P. Berrangé, bo.liu; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:39PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Liu Bo <bo.liu@linux.alibaba.com>
> > 
> > valgrind reported that lo.source is leaked on quiting, but it was defined
> > as (const char*) as it may point to a const string "/".
> > 
> > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 33092de65a..45cf466178 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2529,9 +2529,8 @@ int main(int argc, char *argv[])
> >              fuse_log(FUSE_LOG_ERR, "source is not a directory\n");
> >              exit(1);
> >          }
> > -
> >      } else {
> > -        lo.source = "/";
> > +        lo.source = strdup("/");
> >      }
> >      lo.root.is_symlink = false;
> >      if (!lo.timeout_set) {
> > @@ -2610,5 +2609,7 @@ err_out1:
> >          close(lo.root.fd);
> >      }
> >  
> > +    free((char *)lo.source);
> 
> Can we not change the 'lo_data' struct so that source is not const
> and thus avoid free'ing a const field ?

Done.  Made that free(lo.source) and dropped the const.

Dave


> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 080/104] virtiofsd: add helper for lo_data cleanup
  2020-01-07 11:40   ` Daniel P. Berrangé
@ 2020-01-09 17:41     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 17:41 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:40PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Liu Bo <bo.liu@linux.alibaba.com>
> > 
> > This offers an helper function for lo_data's cleanup.
> > 
> > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 37 ++++++++++++++++++--------------
> >  1 file changed, 21 insertions(+), 16 deletions(-)
> 
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks.

> 
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 45cf466178..097033aa00 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2439,6 +2439,26 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
> >      return la->ino == lb->ino && la->dev == lb->dev;
> >  }
> >  
> > +static void fuse_lo_data_cleanup(struct lo_data *lo)
> > +{
> > +    if (lo->inodes) {
> > +        g_hash_table_destroy(lo->inodes);
> > +    }
> > +    lo_map_destroy(&lo->fd_map);
> > +    lo_map_destroy(&lo->dirp_map);
> > +    lo_map_destroy(&lo->ino_map);
> > +
> > +    if (lo->proc_self_fd >= 0) {
> > +        close(lo->proc_self_fd);
> > +    }
> > +
> > +    if (lo->root.fd >= 0) {
> > +        close(lo->root.fd);
> > +    }
> > +
> > +    free((char *)lo->source);
> 
> This will need changing if you follow my comment on prev patch about
> removing the const & cast

Done.

> 
> > +}
> > +
> >  int main(int argc, char *argv[])
> >  {
> >      struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
> > @@ -2594,22 +2614,7 @@ err_out2:
> >  err_out1:
> >      fuse_opt_free_args(&args);
> >  
> > -    if (lo.inodes) {
> > -        g_hash_table_destroy(lo.inodes);
> > -    }
> > -    lo_map_destroy(&lo.fd_map);
> > -    lo_map_destroy(&lo.dirp_map);
> > -    lo_map_destroy(&lo.ino_map);
> > -
> > -    if (lo.proc_self_fd >= 0) {
> > -        close(lo.proc_self_fd);
> > -    }
> > -
> > -    if (lo.root.fd >= 0) {
> > -        close(lo.root.fd);
> > -    }
> > -
> > -    free((char *)lo.source);
> > +    fuse_lo_data_cleanup(&lo);
> >  
> >      return ret ? 1 : 0;
> >  }
> > -- 
> > 2.23.0
> > 
> > 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 092/104] virtiofsd: add man page
  2020-01-07 12:13   ` Daniel P. Berrangé
@ 2020-01-09 20:02     ` Dr. David Alan Gilbert
  2020-01-10  9:30       ` Daniel P. Berrangé
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 20:02 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  Makefile                       |  7 +++
> >  tools/virtiofsd/virtiofsd.texi | 85 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 92 insertions(+)
> >  create mode 100644 tools/virtiofsd/virtiofsd.texi
> 
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks.

> with some notes at the very end

<snip>

> > +@c man begin DESCRIPTION
> > +
> > +Share a host directory tree with a guest through a virtio-fs device.  This
> > +program is a vhost-user backend that implements the virtio-fs device.  Each
> > +virtio-fs device instance requires its own virtiofsd process.
> > +
> > +This program is designed to work with QEMU's @code{--device vhost-user-fs-pci}
> > +but should work with any virtual machine monitor (VMM) that supports
> > +vhost-user.  See the EXAMPLES section below.
> > +
> > +This program must be run as the root user.
> 
> So there's no way for an unprivileged user to do file sharing like they
> can with 9p right now ?

Correct.

(Which also makes it a pain for using in a make check)

> >                                              Upon startup the program will
> > +switch into a new file system namespace with the shared directory tree as its
> > +root.  This prevents "file system escapes" due to symlinks and other file
> > +system objects that might lead to files outside the shared directory.  The
> > +program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other
> > +vectors that could allow an attacker to compromise the system after gaining
> > +control of the virtiofsd process.
> > +
> > +@c man end
> > +
> > +@c man begin OPTIONS
> > +@table @option
> > +@item -h, --help
> > +Print help.
> > +@item -V, --version
> > +Print version.
> > +@item -d, -o debug
> > +Enable debug output.
> > +@item --syslog
> > +Print log messages to syslog instead of stderr.
> > +@item -o log_level=@var{level}
> > +Print only log messages matching @var{level} or more severe.  @var{level} is
> > +one of @code{err}, @code{warn}, @code{info}, or @code{debug}.  The default is
> > +@var{info}.
> > +@item -o source=@var{path}
> > +Share host directory tree located at @var{path}.  This option is required.
> > +@item --socket-path=@var{path}, -o vhost_user_socket=@var{path}
> > +Listen on vhost-user UNIX domain socket at @var{path}.
> > +@item --fd=@var{fdnum}
> > +Accept connections from vhost-user UNIX domain socket file descriptor @var{fdnum}.  The file descriptor must already be listening for connections.
> > +@item --thread-pool-size=@var{num}
> > +Restrict the number of worker threads per request queue to @var{num}.  The default is 64.
> > +@item --cache=@code{none}|@code{auto}|@code{always}
> > +Select the desired trade-off between coherency and performance.  @code{none}
> > +forbids the FUSE client from caching to achieve best coherency at the cost of
> > +performance.  @code{auto} acts similar to NFS with a 1 second metadata cache
> > +timeout.  @code{always} sets a long cache lifetime at the expense of coherency.
> > +@item --writeback
> > +Enable writeback cache, allowing the FUSE client to buffer and merge write requests.
> > +@end table
> > +@c man end
> > +
> > +@c man begin EXAMPLES
> > +Export @code{/var/lib/fs/vm001/} on vhost-user UNIX domain socket @code{/var/run/vm001-vhost-fs.sock}:
> > +
> > +@example
> > +host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
> > +host# qemu-system-x86_64 \
> > +    -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
> > +    -device vhost-user-fs-pci,chardev=char0,tag=myfs \
> > +    -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
> > +    -numa node,memdev=mem \
> > +    ...
> > +guest# mount -t virtio_fs \
> > +    -o default_permissions,allow_other,user_id=0,group_id=0,rootmode=040000,dax \
> > +    myfs /mnt
> > +@end example
> > +@c man end
> > +
> > +@ignore
> > +@setfilename virtiofsd
> > +@settitle QEMU virtio-fs shared file system daemon
> > +
> > +@c man begin AUTHOR
> 
> s/AUTHOR/COPYRIGHT/

OK

> since this isn't providing any author information.
> 
> > +Copyright (C) 2019 Red Hat, Inc.
> 
> 2019-2020 !

Time flies...

> And now insert
> 
>  @c man end
>  @c man begin LICENSE
> 
> > +This is free software; see the source for copying conditions.  There is NO
> > +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > +@c man end
> > +@end ignore

Hmm, so it ends up like:


@c man end

@ignore
@setfilename virtiofsd
@settitle QEMU virtio-fs shared file system daemon

@c man begin COPYRIGHT
Copyright (C) 2019-2020 Red Hat, Inc.
@c man end
@c man begin LICENSE
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
@c man end
@end ignore


That results in:

COPYRIGHT
       Copyright (C) 2019-2020 Red Hat, Inc.

but with no license printed.
That's from after a make doing a   nroff -man ./tools/virtiofsd/virtiofsd.1 |more

is that what's expected?  I'd expected to see the license somewhere.

Dave
> 
> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 092/104] virtiofsd: add man page
  2020-01-09 20:02     ` Dr. David Alan Gilbert
@ 2020-01-10  9:30       ` Daniel P. Berrangé
  2020-01-10 11:06         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-10  9:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, stefanha, vgoyal

On Thu, Jan 09, 2020 at 08:02:13PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Dec 12, 2019 at 04:38:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: Stefan Hajnoczi <stefanha@redhat.com>
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  Makefile                       |  7 +++
> > >  tools/virtiofsd/virtiofsd.texi | 85 ++++++++++++++++++++++++++++++++++
> > >  2 files changed, 92 insertions(+)
> > >  create mode 100644 tools/virtiofsd/virtiofsd.texi
> > 
> > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> 
> Thanks.
> 
> > with some notes at the very end
> 
> <snip>
> 
> > > +@c man begin DESCRIPTION
> > > +
> > > +Share a host directory tree with a guest through a virtio-fs device.  This
> > > +program is a vhost-user backend that implements the virtio-fs device.  Each
> > > +virtio-fs device instance requires its own virtiofsd process.
> > > +
> > > +This program is designed to work with QEMU's @code{--device vhost-user-fs-pci}
> > > +but should work with any virtual machine monitor (VMM) that supports
> > > +vhost-user.  See the EXAMPLES section below.
> > > +
> > > +This program must be run as the root user.
> > 
> > So there's no way for an unprivileged user to do file sharing like they
> > can with 9p right now ?
> 
> Correct.
> 
> (Which also makes it a pain for using in a make check)
> 
> > >                                              Upon startup the program will
> > > +switch into a new file system namespace with the shared directory tree as its
> > > +root.  This prevents "file system escapes" due to symlinks and other file
> > > +system objects that might lead to files outside the shared directory.  The
> > > +program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other
> > > +vectors that could allow an attacker to compromise the system after gaining
> > > +control of the virtiofsd process.
> > > +
> > > +@c man end
> > > +
> > > +@c man begin OPTIONS
> > > +@table @option
> > > +@item -h, --help
> > > +Print help.
> > > +@item -V, --version
> > > +Print version.
> > > +@item -d, -o debug
> > > +Enable debug output.
> > > +@item --syslog
> > > +Print log messages to syslog instead of stderr.
> > > +@item -o log_level=@var{level}
> > > +Print only log messages matching @var{level} or more severe.  @var{level} is
> > > +one of @code{err}, @code{warn}, @code{info}, or @code{debug}.  The default is
> > > +@var{info}.
> > > +@item -o source=@var{path}
> > > +Share host directory tree located at @var{path}.  This option is required.
> > > +@item --socket-path=@var{path}, -o vhost_user_socket=@var{path}
> > > +Listen on vhost-user UNIX domain socket at @var{path}.
> > > +@item --fd=@var{fdnum}
> > > +Accept connections from vhost-user UNIX domain socket file descriptor @var{fdnum}.  The file descriptor must already be listening for connections.
> > > +@item --thread-pool-size=@var{num}
> > > +Restrict the number of worker threads per request queue to @var{num}.  The default is 64.
> > > +@item --cache=@code{none}|@code{auto}|@code{always}
> > > +Select the desired trade-off between coherency and performance.  @code{none}
> > > +forbids the FUSE client from caching to achieve best coherency at the cost of
> > > +performance.  @code{auto} acts similar to NFS with a 1 second metadata cache
> > > +timeout.  @code{always} sets a long cache lifetime at the expense of coherency.
> > > +@item --writeback
> > > +Enable writeback cache, allowing the FUSE client to buffer and merge write requests.
> > > +@end table
> > > +@c man end
> > > +
> > > +@c man begin EXAMPLES
> > > +Export @code{/var/lib/fs/vm001/} on vhost-user UNIX domain socket @code{/var/run/vm001-vhost-fs.sock}:
> > > +
> > > +@example
> > > +host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
> > > +host# qemu-system-x86_64 \
> > > +    -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
> > > +    -device vhost-user-fs-pci,chardev=char0,tag=myfs \
> > > +    -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
> > > +    -numa node,memdev=mem \
> > > +    ...
> > > +guest# mount -t virtio_fs \
> > > +    -o default_permissions,allow_other,user_id=0,group_id=0,rootmode=040000,dax \
> > > +    myfs /mnt
> > > +@end example
> > > +@c man end
> > > +
> > > +@ignore
> > > +@setfilename virtiofsd
> > > +@settitle QEMU virtio-fs shared file system daemon
> > > +
> > > +@c man begin AUTHOR
> > 
> > s/AUTHOR/COPYRIGHT/
> 
> OK
> 
> > since this isn't providing any author information.
> > 
> > > +Copyright (C) 2019 Red Hat, Inc.
> > 
> > 2019-2020 !
> 
> Time flies...
> 
> > And now insert
> > 
> >  @c man end
> >  @c man begin LICENSE
> > 
> > > +This is free software; see the source for copying conditions.  There is NO
> > > +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > > +@c man end
> > > +@end ignore
> 
> Hmm, so it ends up like:
> 
> 
> @c man end
> 
> @ignore
> @setfilename virtiofsd
> @settitle QEMU virtio-fs shared file system daemon
> 
> @c man begin COPYRIGHT
> Copyright (C) 2019-2020 Red Hat, Inc.
> @c man end
> @c man begin LICENSE
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> @c man end
> @end ignore
> 
> 
> That results in:
> 
> COPYRIGHT
>        Copyright (C) 2019-2020 Red Hat, Inc.
> 
> but with no license printed.
> That's from after a make doing a   nroff -man ./tools/virtiofsd/virtiofsd.1 |more
> 
> is that what's expected?  I'd expected to see the license somewhere.

No, that's not expected :-(  It seems my good intentions were killed by
texi2pod.pl which whitelists the permitted section names, and does not
allow "LICENSE" as one of them. Either ignore my suggestion, given that
this bug is pre-existing in all QEMU man pages, or fix texi2pod.pl to
allow LICENSE.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support
  2020-01-07 11:21   ` Daniel P. Berrangé
@ 2020-01-10  9:52     ` Dr. David Alan Gilbert
  2020-01-13 20:06       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-10  9:52 UTC (permalink / raw)
  To: Daniel P. Berrangé, mszeredi; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:26PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Miklos Szeredi <mszeredi@redhat.com>
> > 
> > No glibc support yet, so use syscall().
> 
> It exists in glibc in my Fedora 31 install.
> 
> Presumably this is related to an older version
> 
> > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 91d3120033..bed2270141 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -1083,7 +1083,17 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> >      }
> >  
> >      if (flags) {
> > +#ifndef SYS_renameat2
> >          fuse_reply_err(req, EINVAL);
> > +#else
> > +        res = syscall(SYS_renameat2, lo_fd(req, parent), name,
> > +                      lo_fd(req, newparent), newname, flags);
> > +        if (res == -1 && errno == ENOSYS) {
> > +            fuse_reply_err(req, EINVAL);
> > +        } else {
> > +            fuse_reply_err(req, res == -1 ? errno : 0);
> > +        }
> > +#endif
> 
> We should use the formal API if available as first choice

OK, done - I've kept the 'ifndef SYS_renameat2' that drops back to an
error for truly ancient cases; although I doubt everything else will
build on something that old.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 092/104] virtiofsd: add man page
  2020-01-10  9:30       ` Daniel P. Berrangé
@ 2020-01-10 11:06         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-10 11:06 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Jan 09, 2020 at 08:02:13PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Dec 12, 2019 at 04:38:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: Stefan Hajnoczi <stefanha@redhat.com>
> > > > 
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > >  Makefile                       |  7 +++
> > > >  tools/virtiofsd/virtiofsd.texi | 85 ++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 92 insertions(+)
> > > >  create mode 100644 tools/virtiofsd/virtiofsd.texi
> > > 
> > > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> > 
> > Thanks.
> > 
> > > with some notes at the very end
> > 
> > <snip>
> > 
> > > > +@c man begin DESCRIPTION
> > > > +
> > > > +Share a host directory tree with a guest through a virtio-fs device.  This
> > > > +program is a vhost-user backend that implements the virtio-fs device.  Each
> > > > +virtio-fs device instance requires its own virtiofsd process.
> > > > +
> > > > +This program is designed to work with QEMU's @code{--device vhost-user-fs-pci}
> > > > +but should work with any virtual machine monitor (VMM) that supports
> > > > +vhost-user.  See the EXAMPLES section below.
> > > > +
> > > > +This program must be run as the root user.
> > > 
> > > So there's no way for an unprivileged user to do file sharing like they
> > > can with 9p right now ?
> > 
> > Correct.
> > 
> > (Which also makes it a pain for using in a make check)
> > 
> > > >                                              Upon startup the program will
> > > > +switch into a new file system namespace with the shared directory tree as its
> > > > +root.  This prevents "file system escapes" due to symlinks and other file
> > > > +system objects that might lead to files outside the shared directory.  The
> > > > +program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other
> > > > +vectors that could allow an attacker to compromise the system after gaining
> > > > +control of the virtiofsd process.
> > > > +
> > > > +@c man end
> > > > +
> > > > +@c man begin OPTIONS
> > > > +@table @option
> > > > +@item -h, --help
> > > > +Print help.
> > > > +@item -V, --version
> > > > +Print version.
> > > > +@item -d, -o debug
> > > > +Enable debug output.
> > > > +@item --syslog
> > > > +Print log messages to syslog instead of stderr.
> > > > +@item -o log_level=@var{level}
> > > > +Print only log messages matching @var{level} or more severe.  @var{level} is
> > > > +one of @code{err}, @code{warn}, @code{info}, or @code{debug}.  The default is
> > > > +@var{info}.
> > > > +@item -o source=@var{path}
> > > > +Share host directory tree located at @var{path}.  This option is required.
> > > > +@item --socket-path=@var{path}, -o vhost_user_socket=@var{path}
> > > > +Listen on vhost-user UNIX domain socket at @var{path}.
> > > > +@item --fd=@var{fdnum}
> > > > +Accept connections from vhost-user UNIX domain socket file descriptor @var{fdnum}.  The file descriptor must already be listening for connections.
> > > > +@item --thread-pool-size=@var{num}
> > > > +Restrict the number of worker threads per request queue to @var{num}.  The default is 64.
> > > > +@item --cache=@code{none}|@code{auto}|@code{always}
> > > > +Select the desired trade-off between coherency and performance.  @code{none}
> > > > +forbids the FUSE client from caching to achieve best coherency at the cost of
> > > > +performance.  @code{auto} acts similar to NFS with a 1 second metadata cache
> > > > +timeout.  @code{always} sets a long cache lifetime at the expense of coherency.
> > > > +@item --writeback
> > > > +Enable writeback cache, allowing the FUSE client to buffer and merge write requests.
> > > > +@end table
> > > > +@c man end
> > > > +
> > > > +@c man begin EXAMPLES
> > > > +Export @code{/var/lib/fs/vm001/} on vhost-user UNIX domain socket @code{/var/run/vm001-vhost-fs.sock}:
> > > > +
> > > > +@example
> > > > +host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
> > > > +host# qemu-system-x86_64 \
> > > > +    -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
> > > > +    -device vhost-user-fs-pci,chardev=char0,tag=myfs \
> > > > +    -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
> > > > +    -numa node,memdev=mem \
> > > > +    ...
> > > > +guest# mount -t virtio_fs \
> > > > +    -o default_permissions,allow_other,user_id=0,group_id=0,rootmode=040000,dax \
> > > > +    myfs /mnt
> > > > +@end example
> > > > +@c man end
> > > > +
> > > > +@ignore
> > > > +@setfilename virtiofsd
> > > > +@settitle QEMU virtio-fs shared file system daemon
> > > > +
> > > > +@c man begin AUTHOR
> > > 
> > > s/AUTHOR/COPYRIGHT/
> > 
> > OK
> > 
> > > since this isn't providing any author information.
> > > 
> > > > +Copyright (C) 2019 Red Hat, Inc.
> > > 
> > > 2019-2020 !
> > 
> > Time flies...
> > 
> > > And now insert
> > > 
> > >  @c man end
> > >  @c man begin LICENSE
> > > 
> > > > +This is free software; see the source for copying conditions.  There is NO
> > > > +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > > > +@c man end
> > > > +@end ignore
> > 
> > Hmm, so it ends up like:
> > 
> > 
> > @c man end
> > 
> > @ignore
> > @setfilename virtiofsd
> > @settitle QEMU virtio-fs shared file system daemon
> > 
> > @c man begin COPYRIGHT
> > Copyright (C) 2019-2020 Red Hat, Inc.
> > @c man end
> > @c man begin LICENSE
> > This is free software; see the source for copying conditions.  There is NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > @c man end
> > @end ignore
> > 
> > 
> > That results in:
> > 
> > COPYRIGHT
> >        Copyright (C) 2019-2020 Red Hat, Inc.
> > 
> > but with no license printed.
> > That's from after a make doing a   nroff -man ./tools/virtiofsd/virtiofsd.1 |more
> > 
> > is that what's expected?  I'd expected to see the license somewhere.
> 
> No, that's not expected :-(  It seems my good intentions were killed by
> texi2pod.pl which whitelists the permitted section names, and does not
> allow "LICENSE" as one of them. Either ignore my suggestion, given that
> this bug is pre-existing in all QEMU man pages, or fix texi2pod.pl to
> allow LICENSE.

OK, I've removed the 'man end/man begin LICENSE' and it now comes out
as:

COPYRIGHT
       Copyright (C) 2019-2020 Red Hat, Inc.

       This is free software; see the source for copying conditions.  There is
       NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
       PURPOSE.

which I think is OK.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context
  2020-01-07  9:22         ` Daniel P. Berrangé
@ 2020-01-10 13:05           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-10 13:05 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Jan 06, 2020 at 07:08:43PM +0000, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > On Thu, Dec 12, 2019 at 04:37:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > From: Vivek Goyal <vgoyal@redhat.com>
> > > > > 
> > > > > We need to create files in the caller's context. Otherwise after
> > > > > creating a file, the caller might not be able to do file operations on
> > > > > that file.
> > > > > 
> > > > > Changed effective uid/gid to caller's uid/gid, create file and then
> > > > > switch back to uid/gid 0.
> > > > > 
> > > > > Use syscall(setresuid, ...) otherwise glibc does some magic to change EUID
> > > > > in all threads, which is not what we want.
> > > > > 
> > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > > > > ---
> > > > >  tools/virtiofsd/passthrough_ll.c | 79 ++++++++++++++++++++++++++++++--
> > > > >  1 file changed, 74 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > > index 68bacb6fc5..0188cd9ad6 100644
> > > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > 
> > > > 
> > > > > +static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
> > > > > +{
> > > > > +    int res;
> > > > > +
> > > > > +    old->euid = geteuid();
> > > > > +    old->egid = getegid();
> > > > > +
> > > > > +    res = syscall(SYS_setresgid, -1, fuse_req_ctx(req)->gid, -1);
> > > > 
> > > > Do we need to be using  SYS_setres[u,g]id32 instead...
> > > > 
> > > > [quote setresgid(2)]
> > > >        The original Linux setresuid() and setresgid() system  calls
> > > >        supported  only  16-bit  user  and group IDs.  Subsequently,
> > > >        Linux 2.4 added setresuid32() and setresgid32(),  supporting
> > > >        32-bit  IDs.   The glibc setresuid() and setresgid() wrapper
> > > >        functions transparently deal with the variations across ker‐
> > > >        nel versions.
> > > > [/quote]
> > > 
> > > OK, updated.
> > 
> > Hmm hang on; this is messy.  x86-64 only seems to have setresuid
> > where as some architectures have both;  If I'm reading this right, all
> > 64 bit machines have setresuid/gid calling the code that takes the
> > 32bit ID; some have compat entries for 32bit syscalls.
> 
> Oh yuk.
> 
> > I think it's probably more correct to call setresuid here; except
> > for 32 bit platforms - but how do we tell?
> 
> Is it possible to just do an #ifdef SYS_setresgid32 check to see
> if the wider variant exists ?

I've ended up with:

+/*
+ * On some archs, setres*id is limited to 2^16 but they
+ * provide setres*id32 variants that allow 2^32.
+ * Others just let setres*id do 2^32 anyway.
+ */
+#ifdef SYS_setresgid32
+#define OURSYS_setresgid SYS_setresgid32
+#else
+#define OURSYS_setresgid SYS_setresgid
+#endif
+
+#ifdef SYS_setresuid32
+#define OURSYS_setresuid SYS_setresuid32
+#else
+#define OURSYS_setresuid SYS_setresuid
+#endif
+
+/*
+ * Change to uid/gid of caller so that file is created with
+ * ownership of caller.
+ * TODO: What about selinux context?
+ */
+static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
+{
+    int res;
+
+    old->euid = geteuid();
+    old->egid = getegid();
+
+    res = syscall(OURSYS_setresgid, -1, fuse_req_ctx(req)->gid, -1);
+    if (res == -1) {
+        return errno;
+    }
+
+    res = syscall(OURSYS_setresuid, -1, fuse_req_ctx(req)->uid, -1);
+    if (res == -1) {
+        int errno_save = errno;
+
+        syscall(OURSYS_setresgid, -1, old->egid, -1);
+        return errno_save;
+    }

and in the seccomp:

#ifdef __NR_setresgid32
    SCMP_SYS(setresgid32),
#endif
#ifdef __NR_setresuid32
    SCMP_SYS(setresuid32),
#endif

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 099/104] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2020-01-07 12:23   ` Daniel P. Berrangé
@ 2020-01-10 13:15     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-10 13:15 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:59PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: piaojun <piaojun@huawei.com>
> > 
> > fuse_buf_writev() only handles the normal write in which src is buffer
> > and dest is fd. Specially if src buffer represents guest physical
> > address that can't be mapped by the daemon process, IO must be bounced
> > back to the VMM to do it by fuse_buf_copy().
> > 
> > Signed-off-by: Jun Piao <piaojun@huawei.com>
> > Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tools/virtiofsd/buffer.c | 23 +++++++++++++++++++----
> >  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> 
> > 
> > diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c
> > index ae420c70c4..4875473785 100644
> > --- a/tools/virtiofsd/buffer.c
> > +++ b/tools/virtiofsd/buffer.c
> > @@ -33,9 +33,7 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv)
> >      return size;
> >  }
> >  
> > -__attribute__((unused))
> > -static ssize_t fuse_buf_writev(fuse_req_t req,
> 
> Lets cull the fuse_req_t param in the previous patch

Done.

> > -                               struct fuse_buf *out_buf,
> > +static ssize_t fuse_buf_writev(struct fuse_buf *out_buf,
> >                                 struct fuse_bufvec *in_buf)
> >  {
> >      ssize_t res, i, j;
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus
  2020-01-07 11:23   ` Daniel P. Berrangé
@ 2020-01-10 15:04     ` Dr. David Alan Gilbert
  2020-01-10 15:13       ` Miklos Szeredi
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-10 15:04 UTC (permalink / raw)
  To: Daniel P. Berrangé, mszeredi; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:28PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Miklos Szeredi <mszeredi@redhat.com>
> >
> 
> What is readdirplus and what do we need a command line option to
> control it ? What's the user benefit of changing the setting ?

cc'ing Miklos who understands this better than me.

My understanding is that readdirplus is a heuristic inherited from NFS
where when you iterate over the directory you also pick up stat() data
for each entry in the directory.  You then cache that stat data
somewhere.
The Plus-ness is that a lot of directory operations involve you stating
each entry (e.g. to figure out if you can access it etc) so rolling it
into one op avoids the separate stat.  The unplus-ness is that it's an
overhead and I think changes some of the caching behaviour.

Dave


> > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 0d70a367bd..c3e8bde5cf 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -118,6 +118,8 @@ struct lo_data {
> >      double timeout;
> >      int cache;
> >      int timeout_set;
> > +    int readdirplus_set;
> > +    int readdirplus_clear;
> >      struct lo_inode root; /* protected by lo->mutex */
> >      struct lo_map ino_map; /* protected by lo->mutex */
> >      struct lo_map dirp_map; /* protected by lo->mutex */
> > @@ -141,6 +143,8 @@ static const struct fuse_opt lo_opts[] = {
> >      { "cache=auto", offsetof(struct lo_data, cache), CACHE_NORMAL },
> >      { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },
> >      { "norace", offsetof(struct lo_data, norace), 1 },
> > +    { "readdirplus", offsetof(struct lo_data, readdirplus_set), 1 },
> > +    { "no_readdirplus", offsetof(struct lo_data, readdirplus_clear), 1 },
> >      FUSE_OPT_END
> >  };
> >  static bool use_syslog = false;
> > @@ -479,7 +483,8 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >          fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
> >          conn->want |= FUSE_CAP_FLOCK_LOCKS;
> >      }
> > -    if (lo->cache == CACHE_NEVER) {
> > +    if ((lo->cache == CACHE_NEVER && !lo->readdirplus_set) ||
> > +        lo->readdirplus_clear) {
> >          fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling readdirplus\n");
> >          conn->want &= ~FUSE_CAP_READDIRPLUS;
> >      }
> > -- 
> > 2.23.0
> > 
> > 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus
  2020-01-10 15:04     ` Dr. David Alan Gilbert
@ 2020-01-10 15:13       ` Miklos Szeredi
  2020-01-10 15:18         ` Daniel P. Berrangé
  0 siblings, 1 reply; 307+ messages in thread
From: Miklos Szeredi @ 2020-01-10 15:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Daniel P. Berrangé, qemu-devel, Stefan Hajnoczi, Vivek Goyal

On Fri, Jan 10, 2020 at 4:04 PM Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Dec 12, 2019 at 04:38:28PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: Miklos Szeredi <mszeredi@redhat.com>
> > >
> >
> > What is readdirplus and what do we need a command line option to
> > control it ? What's the user benefit of changing the setting ?
>
> cc'ing Miklos who understands this better than me.
>
> My understanding is that readdirplus is a heuristic inherited from NFS
> where when you iterate over the directory you also pick up stat() data
> for each entry in the directory.  You then cache that stat data
> somewhere.
> The Plus-ness is that a lot of directory operations involve you stating
> each entry (e.g. to figure out if you can access it etc) so rolling it
> into one op avoids the separate stat.  The unplus-ness is that it's an
> overhead and I think changes some of the caching behaviour.

Yeah, so either may give better performance and it's hard to pick a
clear winner.  NFS also has an option to control this.

Thanks,
Miklos



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus
  2020-01-10 15:13       ` Miklos Szeredi
@ 2020-01-10 15:18         ` Daniel P. Berrangé
  2020-01-10 15:30           ` Miklos Szeredi
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-10 15:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Dr. David Alan Gilbert, Stefan Hajnoczi, qemu-devel

On Fri, Jan 10, 2020 at 04:13:08PM +0100, Miklos Szeredi wrote:
> On Fri, Jan 10, 2020 at 4:04 PM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Dec 12, 2019 at 04:38:28PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > >
> > >
> > > What is readdirplus and what do we need a command line option to
> > > control it ? What's the user benefit of changing the setting ?
> >
> > cc'ing Miklos who understands this better than me.
> >
> > My understanding is that readdirplus is a heuristic inherited from NFS
> > where when you iterate over the directory you also pick up stat() data
> > for each entry in the directory.  You then cache that stat data
> > somewhere.
> > The Plus-ness is that a lot of directory operations involve you stating
> > each entry (e.g. to figure out if you can access it etc) so rolling it
> > into one op avoids the separate stat.  The unplus-ness is that it's an
> > overhead and I think changes some of the caching behaviour.
> 
> Yeah, so either may give better performance and it's hard to pick a
> clear winner.  NFS also has an option to control this.

IIUC from the man page, the NFS option for controlling this is a client
side mount option. This makes sense as only the client really has knowledge
of whether its workload will benefit.

With this in mind, should the readdirplus control for virtio-fs also be a
guest mount option instead of a host virtiofsd CLI option ? The guest admin
seems best placed to know whether their workload will benefit or not.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus
  2020-01-10 15:18         ` Daniel P. Berrangé
@ 2020-01-10 15:30           ` Miklos Szeredi
  2020-01-10 15:40             ` Vivek Goyal
  0 siblings, 1 reply; 307+ messages in thread
From: Miklos Szeredi @ 2020-01-10 15:30 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Vivek Goyal, Dr. David Alan Gilbert, Stefan Hajnoczi, qemu-devel

On Fri, Jan 10, 2020 at 4:18 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Fri, Jan 10, 2020 at 04:13:08PM +0100, Miklos Szeredi wrote:
> > On Fri, Jan 10, 2020 at 4:04 PM Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> > >
> > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > On Thu, Dec 12, 2019 at 04:38:28PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > >
> > > >
> > > > What is readdirplus and what do we need a command line option to
> > > > control it ? What's the user benefit of changing the setting ?
> > >
> > > cc'ing Miklos who understands this better than me.
> > >
> > > My understanding is that readdirplus is a heuristic inherited from NFS
> > > where when you iterate over the directory you also pick up stat() data
> > > for each entry in the directory.  You then cache that stat data
> > > somewhere.
> > > The Plus-ness is that a lot of directory operations involve you stating
> > > each entry (e.g. to figure out if you can access it etc) so rolling it
> > > into one op avoids the separate stat.  The unplus-ness is that it's an
> > > overhead and I think changes some of the caching behaviour.
> >
> > Yeah, so either may give better performance and it's hard to pick a
> > clear winner.  NFS also has an option to control this.
>
> IIUC from the man page, the NFS option for controlling this is a client
> side mount option. This makes sense as only the client really has knowledge
> of whether its workload will benefit.
>
> With this in mind, should the readdirplus control for virtio-fs also be a
> guest mount option instead of a host virtiofsd CLI option ? The guest admin
> seems best placed to know whether their workload will benefit or not.

Definitely.   In fact other options, e.g. ones that control caching,
should probably also be client side (cache=XXX, writeback,
timeout=XXX, etc).

This needs an extension of the INIT message, so options can be passed
to the server.   Added this to our TODO list.

Thanks,
Miklos



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus
  2020-01-10 15:30           ` Miklos Szeredi
@ 2020-01-10 15:40             ` Vivek Goyal
  2020-01-10 16:00               ` Miklos Szeredi
  0 siblings, 1 reply; 307+ messages in thread
From: Vivek Goyal @ 2020-01-10 15:40 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel P. Berrangé,
	Dr. David Alan Gilbert, Stefan Hajnoczi, qemu-devel

On Fri, Jan 10, 2020 at 04:30:01PM +0100, Miklos Szeredi wrote:
> On Fri, Jan 10, 2020 at 4:18 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Fri, Jan 10, 2020 at 04:13:08PM +0100, Miklos Szeredi wrote:
> > > On Fri, Jan 10, 2020 at 4:04 PM Dr. David Alan Gilbert
> > > <dgilbert@redhat.com> wrote:
> > > >
> > > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > > On Thu, Dec 12, 2019 at 04:38:28PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > > >
> > > > >
> > > > > What is readdirplus and what do we need a command line option to
> > > > > control it ? What's the user benefit of changing the setting ?
> > > >
> > > > cc'ing Miklos who understands this better than me.
> > > >
> > > > My understanding is that readdirplus is a heuristic inherited from NFS
> > > > where when you iterate over the directory you also pick up stat() data
> > > > for each entry in the directory.  You then cache that stat data
> > > > somewhere.
> > > > The Plus-ness is that a lot of directory operations involve you stating
> > > > each entry (e.g. to figure out if you can access it etc) so rolling it
> > > > into one op avoids the separate stat.  The unplus-ness is that it's an
> > > > overhead and I think changes some of the caching behaviour.
> > >
> > > Yeah, so either may give better performance and it's hard to pick a
> > > clear winner.  NFS also has an option to control this.
> >
> > IIUC from the man page, the NFS option for controlling this is a client
> > side mount option. This makes sense as only the client really has knowledge
> > of whether its workload will benefit.
> >
> > With this in mind, should the readdirplus control for virtio-fs also be a
> > guest mount option instead of a host virtiofsd CLI option ? The guest admin
> > seems best placed to know whether their workload will benefit or not.
> 
> Definitely.   In fact other options, e.g. ones that control caching,
> should probably also be client side (cache=XXX, writeback,
> timeout=XXX, etc).

I am not sure about cache options. So if we want to share a directory
between multiple guests with stronger coherency (cache=none), then admin
should decide that cache=always/auto is not supported on this export.

Also, how will one client know whether there are other clients same
directory with strong coherency requirements and it should use cache=none
instead of cache=always/auto.

Having said that, it also makes sense that client knows its workoad
and can decide if cache=auto works best for it and use that instead.

May be we need both client and server side options. Client will request
certain cache=xxx options and server can deny these if admin decides
not to enable that option for that particular mount.

For example, if admin decides that we can only support cache=none on
this particular dir due to other guest sharing it, then daemon should
be able to deny cache=auto/always requests from client.

Thanks
Vivek



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 062/104] virtiofsd: Handle hard reboot
  2020-01-07 11:14   ` Daniel P. Berrangé
@ 2020-01-10 15:43     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-10 15:43 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:38:22PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Handle a
> >   mount
> >   hard reboot (without unmount)
> >   mount
> > 
> > we get another 'init' which FUSE doesn't normally expect.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c | 16 +++++++++++++++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index 2d1d1a2e59..45125ef66a 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -2436,7 +2436,21 @@ void fuse_session_process_buf_int(struct fuse_session *se,
> >              goto reply_err;
> >          }
> >      } else if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT) {
> > -        goto reply_err;
> > +        if (fuse_lowlevel_is_virtio(se)) {
> > +            /*
> > +             * TODO: This is after a hard reboot typically, we need to do
> > +             * a destroy, but we can't reply to this request yet so
> > +             * we can't use do_destroy
> > +             */
> > +            fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
> > +            se->got_destroy = 1;
> > +            se->got_init = 0;
> > +            if (se->op.destroy) {
> > +                se->op.destroy(se->userdata);
> > +            }
> > +        } else {
> > +            goto reply_err;
> > +        }
> 
> In doing this, is there any danger we're exposed to from a malicious
> guest which does
> 
>    mount
>    mount
> 
> without a reboot in between ?

I don't think so - or at least not from the daemon side of things; if it
were to do that (and get two FUSE_INIT's) then the state of it's first
mount would be rather messed up; but the only thing to suffer would be
the kernel doing that odd re-init, so I don't think the maliciousness
should break anyone else.


> I'm thinking not so if its ok, then
> 
>  Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks.

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus
  2020-01-10 15:40             ` Vivek Goyal
@ 2020-01-10 16:00               ` Miklos Szeredi
  0 siblings, 0 replies; 307+ messages in thread
From: Miklos Szeredi @ 2020-01-10 16:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Daniel P. Berrangé,
	Dr. David Alan Gilbert, Stefan Hajnoczi, qemu-devel

On Fri, Jan 10, 2020 at 4:40 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Fri, Jan 10, 2020 at 04:30:01PM +0100, Miklos Szeredi wrote:
> > On Fri, Jan 10, 2020 at 4:18 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > On Fri, Jan 10, 2020 at 04:13:08PM +0100, Miklos Szeredi wrote:
> > > > On Fri, Jan 10, 2020 at 4:04 PM Dr. David Alan Gilbert
> > > > <dgilbert@redhat.com> wrote:
> > > > >
> > > > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > > > On Thu, Dec 12, 2019 at 04:38:28PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > > > >
> > > > > >
> > > > > > What is readdirplus and what do we need a command line option to
> > > > > > control it ? What's the user benefit of changing the setting ?
> > > > >
> > > > > cc'ing Miklos who understands this better than me.
> > > > >
> > > > > My understanding is that readdirplus is a heuristic inherited from NFS
> > > > > where when you iterate over the directory you also pick up stat() data
> > > > > for each entry in the directory.  You then cache that stat data
> > > > > somewhere.
> > > > > The Plus-ness is that a lot of directory operations involve you stating
> > > > > each entry (e.g. to figure out if you can access it etc) so rolling it
> > > > > into one op avoids the separate stat.  The unplus-ness is that it's an
> > > > > overhead and I think changes some of the caching behaviour.
> > > >
> > > > Yeah, so either may give better performance and it's hard to pick a
> > > > clear winner.  NFS also has an option to control this.
> > >
> > > IIUC from the man page, the NFS option for controlling this is a client
> > > side mount option. This makes sense as only the client really has knowledge
> > > of whether its workload will benefit.
> > >
> > > With this in mind, should the readdirplus control for virtio-fs also be a
> > > guest mount option instead of a host virtiofsd CLI option ? The guest admin
> > > seems best placed to know whether their workload will benefit or not.
> >
> > Definitely.   In fact other options, e.g. ones that control caching,
> > should probably also be client side (cache=XXX, writeback,
> > timeout=XXX, etc).
>
> I am not sure about cache options. So if we want to share a directory
> between multiple guests with stronger coherency (cache=none), then admin
> should decide that cache=always/auto is not supported on this export.
>
> Also, how will one client know whether there are other clients same
> directory with strong coherency requirements and it should use cache=none
> instead of cache=always/auto.
>
> Having said that, it also makes sense that client knows its workoad
> and can decide if cache=auto works best for it and use that instead.
>
> May be we need both client and server side options. Client will request
> certain cache=xxx options and server can deny these if admin decides
> not to enable that option for that particular mount.
>
> For example, if admin decides that we can only support cache=none on
> this particular dir due to other guest sharing it, then daemon should
> be able to deny cache=auto/always requests from client.

Makes sense.  The server dictates policy, the client just passes the
options onto the server.

Thanks,
Miklos



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 014/104] virtiofsd: Add options for virtio
  2020-01-03 15:18   ` Daniel P. Berrangé
@ 2020-01-10 16:01     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-10 16:01 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Dec 12, 2019 at 04:37:34PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Add options to specify parameters for virtio-fs paths, i.e.
> > 
> >    ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_i.h        |  1 +
> >  tools/virtiofsd/fuse_lowlevel.c | 17 ++++++++++++-----
> >  tools/virtiofsd/helper.c        | 22 +++++++++++-----------
> >  3 files changed, 24 insertions(+), 16 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> > index 0b5acc8765..f58be71e4b 100644
> > --- a/tools/virtiofsd/fuse_i.h
> > +++ b/tools/virtiofsd/fuse_i.h
> > @@ -63,6 +63,7 @@ struct fuse_session {
> >      struct fuse_notify_req notify_list;
> >      size_t bufsize;
> >      int error;
> > +    char *vu_socket_path;
> >  };
> >  
> >  struct fuse_chan {
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index 167701b453..da708161e1 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -2118,8 +2118,12 @@ reply_err:
> >      }
> >  
> >  static const struct fuse_opt fuse_ll_opts[] = {
> > -    LL_OPTION("debug", debug, 1), LL_OPTION("-d", debug, 1),
> > -    LL_OPTION("--debug", debug, 1), LL_OPTION("allow_root", deny_others, 1),
> > +    LL_OPTION("debug", debug, 1),
> > +    LL_OPTION("-d", debug, 1),
> > +    LL_OPTION("--debug", debug, 1),
> 
> Pre-existing, but I'm not convinced we really need 3 different
> ways to enable debugging - I would think -d / --debug is sufficient,
> without needing "-o debug".

Given it's existing, I'll leave that.

> > +    LL_OPTION("allow_root", deny_others, 1),
> > +    LL_OPTION("--socket-path=%s", vu_socket_path, 0),
> > +    LL_OPTION("vhost_user_socket=%s", vu_socket_path, 0),
> 
> Similarly here I'm not convinced we need to add both
> "--socket-path PATH" and "-o vhost_user_socket=PATH"
> 
> 
> IIRC, we need --socket-path for compliance with QEMU's
> standard execution model for vhost helpers.

OK, deleted -o vhost_user_socket   - it was there because
our existing kata glue and test setups were using it.


Dave

> >      FUSE_OPT_END
> >  };
> >  
> > @@ -2135,9 +2139,12 @@ void fuse_lowlevel_help(void)
> >       * These are not all options, but the ones that are
> >       * potentially of interest to an end-user
> >       */
> > -    printf("    -o allow_other         allow access by all users\n"
> > -           "    -o allow_root          allow access by root\n"
> > -           "    -o auto_unmount        auto unmount on process termination\n");
> > +    printf(
> > +        "    -o allow_other             allow access by all users\n"
> > +        "    -o allow_root              allow access by root\n"
> > +        "    --socket-path=PATH         path for the vhost-user socket\n"
> > +        "    -o vhost_user_socket=PATH  path for the vhost-user socket\n"
> > +        "    -o auto_unmount            auto unmount on process termination\n");
> >  }
> >  
> >  void fuse_session_destroy(struct fuse_session *se)
> > diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> > index 8afccfc15e..48e38a7963 100644
> > --- a/tools/virtiofsd/helper.c
> > +++ b/tools/virtiofsd/helper.c
> > @@ -128,17 +128,17 @@ static const struct fuse_opt conn_info_opt_spec[] = {
> >  
> >  void fuse_cmdline_help(void)
> >  {
> > -    printf(
> > -        "    -h   --help            print help\n"
> > -        "    -V   --version         print version\n"
> > -        "    -d   -o debug          enable debug output (implies -f)\n"
> > -        "    -f                     foreground operation\n"
> > -        "    -s                     disable multi-threaded operation\n"
> > -        "    -o clone_fd            use separate fuse device fd for each "
> > -        "thread\n"
> > -        "                           (may improve performance)\n"
> > -        "    -o max_idle_threads    the maximum number of idle worker threads\n"
> > -        "                           allowed (default: 10)\n");
> > +    printf("    -h   --help                print help\n"
> > +           "    -V   --version             print version\n"
> > +           "    -d   -o debug              enable debug output (implies -f)\n"
> > +           "    -f                         foreground operation\n"
> > +           "    -s                         disable multi-threaded operation\n"
> > +           "    -o clone_fd                use separate fuse device fd for "
> > +           "each thread\n"
> > +           "                               (may improve performance)\n"
> > +           "    -o max_idle_threads        the maximum number of idle worker "
> > +           "threads\n"
> > +           "                               allowed (default: 10)\n");
> >  }
> >  
> >  static int fuse_helper_opt_proc(void *data, const char *arg, int key,
> > -- 
> > 2.23.0
> > 
> > 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support
  2020-01-10  9:52     ` Dr. David Alan Gilbert
@ 2020-01-13 20:06       ` Dr. David Alan Gilbert
  2020-01-14  8:29         ` Daniel P. Berrangé
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-13 20:06 UTC (permalink / raw)
  To: Daniel P. Berrangé, mszeredi; +Cc: qemu-devel, stefanha, vgoyal

* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Dec 12, 2019 at 04:38:26PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > 
> > > No glibc support yet, so use syscall().
> > 
> > It exists in glibc in my Fedora 31 install.
> > 
> > Presumably this is related to an older version
> > 
> > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 91d3120033..bed2270141 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -1083,7 +1083,17 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >      }
> > >  
> > >      if (flags) {
> > > +#ifndef SYS_renameat2
> > >          fuse_reply_err(req, EINVAL);
> > > +#else
> > > +        res = syscall(SYS_renameat2, lo_fd(req, parent), name,
> > > +                      lo_fd(req, newparent), newname, flags);
> > > +        if (res == -1 && errno == ENOSYS) {
> > > +            fuse_reply_err(req, EINVAL);
> > > +        } else {
> > > +            fuse_reply_err(req, res == -1 ? errno : 0);
> > > +        }
> > > +#endif
> > 
> > We should use the formal API if available as first choice
> 
> OK, done - I've kept the 'ifndef SYS_renameat2' that drops back to an
> error for truly ancient cases; although I doubt everything else will
> build on something that old.

Hmm, and this breaks on middle age distros;  older distros don't have it
at all, new ones have both the syscall and the wrapper; but for the
middle age ones they have the syscall but not the wrapper.

Dan: What's your preference here; should I add a config fragment to
detect the wrapper - it seems overkill rather than just reverting it
until it becomes common.

Dave

> Dave
> 
> > 
> > Regards,
> > Daniel
> > -- 
> > |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> > |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support
  2020-01-13 20:06       ` Dr. David Alan Gilbert
@ 2020-01-14  8:29         ` Daniel P. Berrangé
  2020-01-14 10:07           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-14  8:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: mszeredi, qemu-devel, stefanha, vgoyal

On Mon, Jan 13, 2020 at 08:06:24PM +0000, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Dec 12, 2019 at 04:38:26PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > 
> > > > No glibc support yet, so use syscall().
> > > 
> > > It exists in glibc in my Fedora 31 install.
> > > 
> > > Presumably this is related to an older version
> > > 
> > > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/passthrough_ll.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > index 91d3120033..bed2270141 100644
> > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > @@ -1083,7 +1083,17 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> > > >      }
> > > >  
> > > >      if (flags) {
> > > > +#ifndef SYS_renameat2
> > > >          fuse_reply_err(req, EINVAL);
> > > > +#else
> > > > +        res = syscall(SYS_renameat2, lo_fd(req, parent), name,
> > > > +                      lo_fd(req, newparent), newname, flags);
> > > > +        if (res == -1 && errno == ENOSYS) {
> > > > +            fuse_reply_err(req, EINVAL);
> > > > +        } else {
> > > > +            fuse_reply_err(req, res == -1 ? errno : 0);
> > > > +        }
> > > > +#endif
> > > 
> > > We should use the formal API if available as first choice
> > 
> > OK, done - I've kept the 'ifndef SYS_renameat2' that drops back to an
> > error for truly ancient cases; although I doubt everything else will
> > build on something that old.
> 
> Hmm, and this breaks on middle age distros;  older distros don't have it
> at all, new ones have both the syscall and the wrapper; but for the
> middle age ones they have the syscall but not the wrapper.
>
> Dan: What's your preference here; should I add a config fragment to
> detect the wrapper - it seems overkill rather than just reverting it
> until it becomes common.

What specific middle age distro in particular is affected ? My general
thought would be to /not/ support such distros. Focus on modern distros
since this is a brand new feature in QEMU, where we should try to
minimize support for legacy stuff at the start. But depending on the
distro impacted, the might be a reason to stay with SYS_..

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support
  2020-01-14  8:29         ` Daniel P. Berrangé
@ 2020-01-14 10:07           ` Dr. David Alan Gilbert
  2020-01-14 10:12             ` Daniel P. Berrangé
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-14 10:07 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: mszeredi, qemu-devel, stefanha, vgoyal

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Jan 13, 2020 at 08:06:24PM +0000, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > On Thu, Dec 12, 2019 at 04:38:26PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > > 
> > > > > No glibc support yet, so use syscall().
> > > > 
> > > > It exists in glibc in my Fedora 31 install.
> > > > 
> > > > Presumably this is related to an older version
> > > > 
> > > > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > > > > ---
> > > > >  tools/virtiofsd/passthrough_ll.c | 10 ++++++++++
> > > > >  1 file changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > > index 91d3120033..bed2270141 100644
> > > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > > @@ -1083,7 +1083,17 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> > > > >      }
> > > > >  
> > > > >      if (flags) {
> > > > > +#ifndef SYS_renameat2
> > > > >          fuse_reply_err(req, EINVAL);
> > > > > +#else
> > > > > +        res = syscall(SYS_renameat2, lo_fd(req, parent), name,
> > > > > +                      lo_fd(req, newparent), newname, flags);
> > > > > +        if (res == -1 && errno == ENOSYS) {
> > > > > +            fuse_reply_err(req, EINVAL);
> > > > > +        } else {
> > > > > +            fuse_reply_err(req, res == -1 ? errno : 0);
> > > > > +        }
> > > > > +#endif
> > > > 
> > > > We should use the formal API if available as first choice
> > > 
> > > OK, done - I've kept the 'ifndef SYS_renameat2' that drops back to an
> > > error for truly ancient cases; although I doubt everything else will
> > > build on something that old.
> > 
> > Hmm, and this breaks on middle age distros;  older distros don't have it
> > at all, new ones have both the syscall and the wrapper; but for the
> > middle age ones they have the syscall but not the wrapper.
> >
> > Dan: What's your preference here; should I add a config fragment to
> > detect the wrapper - it seems overkill rather than just reverting it
> > until it becomes common.
> 
> What specific middle age distro in particular is affected ? My general
> thought would be to /not/ support such distros. Focus on modern distros
> since this is a brand new feature in QEMU, where we should try to
> minimize support for legacy stuff at the start. But depending on the
> distro impacted, the might be a reason to stay with SYS_..

The report came from Ubuntu 18.04 (which Intel uses on CI); that's not
that old, so I think it sohuld be supported.   I don't really see the
justification for insisting on using the wrapper.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support
  2020-01-14 10:07           ` Dr. David Alan Gilbert
@ 2020-01-14 10:12             ` Daniel P. Berrangé
  0 siblings, 0 replies; 307+ messages in thread
From: Daniel P. Berrangé @ 2020-01-14 10:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: mszeredi, qemu-devel, stefanha, vgoyal

On Tue, Jan 14, 2020 at 10:07:03AM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Mon, Jan 13, 2020 at 08:06:24PM +0000, Dr. David Alan Gilbert wrote:
> > > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > > On Thu, Dec 12, 2019 at 04:38:26PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > > > 
> > > > > > No glibc support yet, so use syscall().
> > > > > 
> > > > > It exists in glibc in my Fedora 31 install.
> > > > > 
> > > > > Presumably this is related to an older version
> > > > > 
> > > > > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > > > > > ---
> > > > > >  tools/virtiofsd/passthrough_ll.c | 10 ++++++++++
> > > > > >  1 file changed, 10 insertions(+)
> > > > > > 
> > > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > > > index 91d3120033..bed2270141 100644
> > > > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > > > @@ -1083,7 +1083,17 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> > > > > >      }
> > > > > >  
> > > > > >      if (flags) {
> > > > > > +#ifndef SYS_renameat2
> > > > > >          fuse_reply_err(req, EINVAL);
> > > > > > +#else
> > > > > > +        res = syscall(SYS_renameat2, lo_fd(req, parent), name,
> > > > > > +                      lo_fd(req, newparent), newname, flags);
> > > > > > +        if (res == -1 && errno == ENOSYS) {
> > > > > > +            fuse_reply_err(req, EINVAL);
> > > > > > +        } else {
> > > > > > +            fuse_reply_err(req, res == -1 ? errno : 0);
> > > > > > +        }
> > > > > > +#endif
> > > > > 
> > > > > We should use the formal API if available as first choice
> > > > 
> > > > OK, done - I've kept the 'ifndef SYS_renameat2' that drops back to an
> > > > error for truly ancient cases; although I doubt everything else will
> > > > build on something that old.
> > > 
> > > Hmm, and this breaks on middle age distros;  older distros don't have it
> > > at all, new ones have both the syscall and the wrapper; but for the
> > > middle age ones they have the syscall but not the wrapper.
> > >
> > > Dan: What's your preference here; should I add a config fragment to
> > > detect the wrapper - it seems overkill rather than just reverting it
> > > until it becomes common.
> > 
> > What specific middle age distro in particular is affected ? My general
> > thought would be to /not/ support such distros. Focus on modern distros
> > since this is a brand new feature in QEMU, where we should try to
> > minimize support for legacy stuff at the start. But depending on the
> > distro impacted, the might be a reason to stay with SYS_..
> 
> The report came from Ubuntu 18.04 (which Intel uses on CI); that's not
> that old, so I think it sohuld be supported.   I don't really see the
> justification for insisting on using the wrapper.

Yeah, ok that's pretty new & will be around for a while yet.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo
  2019-12-12 16:38 ` [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo Dr. David Alan Gilbert (git)
@ 2020-01-15 11:20   ` Misono Tomohiro
  2020-01-15 16:57     ` Dr. David Alan Gilbert
  2020-01-20 10:24   ` Sergio Lopez
  1 sibling, 1 reply; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-15 11:20 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: Liu Bo <bo.liu@linux.alibaba.com>
> 
> For fuse's queueinfo, both queueinfo array and queueinfos are allocated in
> fv_queue_set_started() but not cleaned up when the daemon process quits.
> 
> This fixes the leak in proper places.
> 
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index 7b22ae8d4f..a364f23d5d 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -671,6 +671,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>          }
>          close(ourqi->kill_fd);
>          ourqi->kick_fd = -1;
> +        free(vud->qi[qidx]);
> +        vud->qi[qidx] = NULL;
>      }
>  }
>  
> @@ -878,6 +880,13 @@ int virtio_session_mount(struct fuse_session *se)
>  void virtio_session_close(struct fuse_session *se)
>  {
>      close(se->vu_socketfd);

I beleve above close() should be removed as it is called 6 line below.

> +
> +    if (!se->virtio_dev) {
> +        return;
> +    }
> +
> +    close(se->vu_socketfd);
> +    free(se->virtio_dev->qi);
>      free(se->virtio_dev);
>      se->virtio_dev = NULL;
>  }
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 051/104] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV
  2019-12-12 16:38 ` [PATCH 051/104] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV Dr. David Alan Gilbert (git)
@ 2020-01-15 12:06   ` Misono Tomohiro
  2020-01-15 14:34     ` Dr. David Alan Gilbert
  2020-01-16 14:37   ` Sergio Lopez
  1 sibling, 1 reply; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-15 12:06 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: Vivek Goyal <vgoyal@redhat.com>
> 
> Caller can set FUSE_WRITE_KILL_PRIV in write_flags. Parse it and pass it
> to the filesystem.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_common.h   | 6 +++++-
>  tools/virtiofsd/fuse_lowlevel.c | 4 +++-
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> index 147c043bd9..1e8191b7a6 100644
> --- a/tools/virtiofsd/fuse_common.h
> +++ b/tools/virtiofsd/fuse_common.h
> @@ -93,8 +93,12 @@ struct fuse_file_info {
>       */
>      unsigned int cache_readdir:1;
>  
> +    /* Indicates that suid/sgid bits should be removed upon write */
> +    unsigned int kill_priv:1;
> +
> +
>      /** Padding.  Reserved for future use*/
> -    unsigned int padding:25;
> +    unsigned int padding:24;
>      unsigned int padding2:32;
>  
>      /*
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index bd5ca2f157..c8a3b1597a 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -1144,6 +1144,7 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid,
>      memset(&fi, 0, sizeof(fi));
>      fi.fh = arg->fh;
>      fi.writepage = (arg->write_flags & FUSE_WRITE_CACHE) != 0;
> +    fi.kill_priv = !!(arg->write_flags & FUSE_WRITE_KILL_PRIV);
>  
>      fi.lock_owner = arg->lock_owner;
>      fi.flags = arg->flags;
> @@ -1179,7 +1180,8 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
>      fi.lock_owner = arg->lock_owner;
>      fi.flags = arg->flags;
>      fi.fh = arg->fh;
> -    fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
> +    fi.writepage = !!(arg->write_flags & FUSE_WRITE_CACHE);
> +    fi.kill_priv = !!(arg->write_flags & FUSE_WRITE_KILL_PRIV);
>  
>      if (ibufv->count == 1) {
>          assert(!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD));
> -- 
> 2.23.0

Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

side-note: virtiofs uses write_buf() and therefore do_write() is never called.
How about cleanup the function?


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 051/104] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV
  2020-01-15 12:06   ` Misono Tomohiro
@ 2020-01-15 14:34     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-15 14:34 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: qemu-devel, stefanha, vgoyal

* Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > From: Vivek Goyal <vgoyal@redhat.com>
> > 
> > Caller can set FUSE_WRITE_KILL_PRIV in write_flags. Parse it and pass it
> > to the filesystem.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_common.h   | 6 +++++-
> >  tools/virtiofsd/fuse_lowlevel.c | 4 +++-
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > index 147c043bd9..1e8191b7a6 100644
> > --- a/tools/virtiofsd/fuse_common.h
> > +++ b/tools/virtiofsd/fuse_common.h
> > @@ -93,8 +93,12 @@ struct fuse_file_info {
> >       */
> >      unsigned int cache_readdir:1;
> >  
> > +    /* Indicates that suid/sgid bits should be removed upon write */
> > +    unsigned int kill_priv:1;
> > +
> > +
> >      /** Padding.  Reserved for future use*/
> > -    unsigned int padding:25;
> > +    unsigned int padding:24;
> >      unsigned int padding2:32;
> >  
> >      /*
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index bd5ca2f157..c8a3b1597a 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -1144,6 +1144,7 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid,
> >      memset(&fi, 0, sizeof(fi));
> >      fi.fh = arg->fh;
> >      fi.writepage = (arg->write_flags & FUSE_WRITE_CACHE) != 0;
> > +    fi.kill_priv = !!(arg->write_flags & FUSE_WRITE_KILL_PRIV);
> >  
> >      fi.lock_owner = arg->lock_owner;
> >      fi.flags = arg->flags;
> > @@ -1179,7 +1180,8 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
> >      fi.lock_owner = arg->lock_owner;
> >      fi.flags = arg->flags;
> >      fi.fh = arg->fh;
> > -    fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
> > +    fi.writepage = !!(arg->write_flags & FUSE_WRITE_CACHE);
> > +    fi.kill_priv = !!(arg->write_flags & FUSE_WRITE_KILL_PRIV);
> >  
> >      if (ibufv->count == 1) {
> >          assert(!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD));
> > -- 
> > 2.23.0
> 
> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

Thank you.

> side-note: virtiofs uses write_buf() and therefore do_write() is never called.
> How about cleanup the function?

Yes I think you're right; I need to go through and check there's no
corner case which can get into the plain do_write.

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo
  2020-01-15 11:20   ` Misono Tomohiro
@ 2020-01-15 16:57     ` Dr. David Alan Gilbert
  2020-01-16  0:54       ` misono.tomohiro
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-15 16:57 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: qemu-devel, stefanha, vgoyal

* Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > From: Liu Bo <bo.liu@linux.alibaba.com>
> > 
> > For fuse's queueinfo, both queueinfo array and queueinfos are allocated in
> > fv_queue_set_started() but not cleaned up when the daemon process quits.
> > 
> > This fixes the leak in proper places.
> > 
> > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/fuse_virtio.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index 7b22ae8d4f..a364f23d5d 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -671,6 +671,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >          }
> >          close(ourqi->kill_fd);
> >          ourqi->kick_fd = -1;
> > +        free(vud->qi[qidx]);
> > +        vud->qi[qidx] = NULL;
> >      }
> >  }
> >  
> > @@ -878,6 +880,13 @@ int virtio_session_mount(struct fuse_session *se)
> >  void virtio_session_close(struct fuse_session *se)
> >  {
> >      close(se->vu_socketfd);
> 
> I beleve above close() should be removed as it is called 6 line below.

You're right, I think that's my fault from when I merged this patch
with 'Virtiofsd: fix segfault when quit before dev init'.

Fixed.
Thanks.

Dave

> > +
> > +    if (!se->virtio_dev) {
> > +        return;
> > +    }
> > +
> > +    close(se->vu_socketfd);
> > +    free(se->virtio_dev->qi);
> >      free(se->virtio_dev);
> >      se->virtio_dev = NULL;
> >  }
> > -- 
> > 2.23.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 055/104] virtiofsd: fix libfuse information leaks
  2019-12-12 16:38 ` [PATCH 055/104] virtiofsd: fix libfuse information leaks Dr. David Alan Gilbert (git)
  2020-01-06 15:01   ` Daniel P. Berrangé
@ 2020-01-15 17:07   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-15 17:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:38 PM, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Some FUSE message replies contain padding fields that are not
> initialized by libfuse.  This is fine in traditional FUSE applications
> because the kernel is trusted.  virtiofsd does not trust the guest and
> must not expose uninitialized memory.
> 
> Use C struct initializers to automatically zero out memory.  Not all of
> these code changes are strictly necessary but they will prevent future
> information leaks if the structs are extended.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tools/virtiofsd/fuse_lowlevel.c | 150 ++++++++++++++++----------------
>   1 file changed, 76 insertions(+), 74 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 054/104] virtiofsd: set maximum RLIMIT_NOFILE limit
  2019-12-12 16:38 ` [PATCH 054/104] virtiofsd: set maximum RLIMIT_NOFILE limit Dr. David Alan Gilbert (git)
  2020-01-06 15:00   ` Daniel P. Berrangé
@ 2020-01-15 17:09   ` Philippe Mathieu-Daudé
  2020-01-15 17:38     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-15 17:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:38 PM, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> virtiofsd can exceed the default open file descriptor limit easily on
> most systems.  Take advantage of the fact that it runs as root to raise
> the limit.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tools/virtiofsd/passthrough_ll.c | 32 ++++++++++++++++++++++++++++++++
>   1 file changed, 32 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index ab318a6f36..139bf08f4c 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -52,6 +52,7 @@
>   #include <sys/file.h>
>   #include <sys/mount.h>
>   #include <sys/prctl.h>
> +#include <sys/resource.h>
>   #include <sys/syscall.h>
>   #include <sys/types.h>
>   #include <sys/wait.h>
> @@ -2250,6 +2251,35 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se)
>       setup_seccomp();
>   }
>   
> +/* Raise the maximum number of open file descriptors */
> +static void setup_nofile_rlimit(void)
> +{
> +    const rlim_t max_fds = 1000000;

'static const'?

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> +    struct rlimit rlim;
> +
> +    if (getrlimit(RLIMIT_NOFILE, &rlim) < 0) {
> +        fuse_log(FUSE_LOG_ERR, "getrlimit(RLIMIT_NOFILE): %m\n");
> +        exit(1);
> +    }
> +
> +    if (rlim.rlim_cur >= max_fds) {
> +        return; /* nothing to do */
> +    }
> +
> +    rlim.rlim_cur = max_fds;
> +    rlim.rlim_max = max_fds;
> +
> +    if (setrlimit(RLIMIT_NOFILE, &rlim) < 0) {
> +        /* Ignore SELinux denials */
> +        if (errno == EPERM) {
> +            return;
> +        }
> +
> +        fuse_log(FUSE_LOG_ERR, "setrlimit(RLIMIT_NOFILE): %m\n");
> +        exit(1);
> +    }
> +}
> +
>   int main(int argc, char *argv[])
>   {
>       struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
> @@ -2371,6 +2401,8 @@ int main(int argc, char *argv[])
>   
>       fuse_daemonize(opts.foreground);
>   
> +    setup_nofile_rlimit();
> +
>       /* Must be before sandbox since it wants /proc */
>       setup_capng();
>   
> 



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 031/104] virtiofs: Add maintainers entry
  2019-12-12 16:37 ` [PATCH 031/104] virtiofs: Add maintainers entry Dr. David Alan Gilbert (git)
  2020-01-06 14:21   ` Daniel P. Berrangé
@ 2020-01-15 17:19   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-15 17:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:37 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   MAINTAINERS | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5e5e3e52d6..d1b3e262d2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1575,6 +1575,14 @@ T: git https://github.com/cohuck/qemu.git s390-next
>   T: git https://github.com/borntraeger/qemu.git s390-next
>   L: qemu-s390x@nongnu.org
>   
> +virtiofs
> +M: Dr. David Alan Gilbert <dgilbert@redhat.com>
> +M: Stefan Hajnoczi <stefanha@redhat.com>
> +S: Supported
> +F: tools/virtiofsd/*

^ The files added by this series

> +F: hw/virtio/vhost-user-fs*
> +F: include/hw/virtio/vhost-user-fs.h

^ Files already present in the repository:

$ ./scripts/get_maintainer.pl -f hw/virtio/vhost-user-fs.c
"Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost)

$ ./scripts/get_maintainer.pl -f hw/virtio/vhost-user-fs-pci.c
"Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost)

$ ./scripts/get_maintainer.pl -f include/hw/virtio/vhost-user-fs.h
"Michael S. Tsirkin" <mst@redhat.com> (supporter:virtio)

Now these get more maintainers, good.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> +
>   virtio-input
>   M: Gerd Hoffmann <kraxel@redhat.com>
>   S: Maintained
> 



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 054/104] virtiofsd: set maximum RLIMIT_NOFILE limit
  2020-01-15 17:09   ` Philippe Mathieu-Daudé
@ 2020-01-15 17:38     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-15 17:38 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: qemu-devel, stefanha, vgoyal

* Philippe Mathieu-Daudé (philmd@redhat.com) wrote:
> On 12/12/19 5:38 PM, Dr. David Alan Gilbert (git) wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > virtiofsd can exceed the default open file descriptor limit easily on
> > most systems.  Take advantage of the fact that it runs as root to raise
> > the limit.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   tools/virtiofsd/passthrough_ll.c | 32 ++++++++++++++++++++++++++++++++
> >   1 file changed, 32 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index ab318a6f36..139bf08f4c 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -52,6 +52,7 @@
> >   #include <sys/file.h>
> >   #include <sys/mount.h>
> >   #include <sys/prctl.h>
> > +#include <sys/resource.h>
> >   #include <sys/syscall.h>
> >   #include <sys/types.h>
> >   #include <sys/wait.h>
> > @@ -2250,6 +2251,35 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se)
> >       setup_seccomp();
> >   }
> > +/* Raise the maximum number of open file descriptors */
> > +static void setup_nofile_rlimit(void)
> > +{
> > +    const rlim_t max_fds = 1000000;
> 
> 'static const'?

Why?

> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> 
Thanks!

> > +    struct rlimit rlim;
> > +
> > +    if (getrlimit(RLIMIT_NOFILE, &rlim) < 0) {
> > +        fuse_log(FUSE_LOG_ERR, "getrlimit(RLIMIT_NOFILE): %m\n");
> > +        exit(1);
> > +    }
> > +
> > +    if (rlim.rlim_cur >= max_fds) {
> > +        return; /* nothing to do */
> > +    }
> > +
> > +    rlim.rlim_cur = max_fds;
> > +    rlim.rlim_max = max_fds;
> > +
> > +    if (setrlimit(RLIMIT_NOFILE, &rlim) < 0) {
> > +        /* Ignore SELinux denials */
> > +        if (errno == EPERM) {
> > +            return;
> > +        }
> > +
> > +        fuse_log(FUSE_LOG_ERR, "setrlimit(RLIMIT_NOFILE): %m\n");
> > +        exit(1);
> > +    }
> > +}
> > +
> >   int main(int argc, char *argv[])
> >   {
> >       struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
> > @@ -2371,6 +2401,8 @@ int main(int argc, char *argv[])
> >       fuse_daemonize(opts.foreground);
> > +    setup_nofile_rlimit();
> > +
> >       /* Must be before sandbox since it wants /proc */
> >       setup_capng();
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 001/104] virtiofsd: Pull in upstream headers
  2019-12-12 16:37 ` [PATCH 001/104] virtiofsd: Pull in upstream headers Dr. David Alan Gilbert (git)
  2020-01-03 11:54   ` Daniel P. Berrangé
@ 2020-01-15 17:38   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-15 17:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:37 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Pull in headers fromlibfuse's upstream fuse-3.8.0

So diffing vs https://github.com/libfuse/libfuse/tree/fuse-3.8.0.

> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   tools/virtiofsd/fuse.h                | 1275 +++++++++++++++

include/fuse.h, OK

>   tools/virtiofsd/fuse_common.h         |  823 ++++++++++

include/fuse_common.h, OK

>   tools/virtiofsd/fuse_i.h              |  139 ++

lib/fuse_i.h, OK

>   tools/virtiofsd/fuse_log.h            |   82 +

include/fuse_log.h, OK

>   tools/virtiofsd/fuse_lowlevel.h       | 2089 +++++++++++++++++++++++++

include/fuse_lowlevel.h, OK

>   tools/virtiofsd/fuse_misc.h           |   59 +

lib/fuse_misc.h, OK

>   tools/virtiofsd/fuse_opt.h            |  271 ++++

include/fuse_opt.h, OK

>   tools/virtiofsd/passthrough_helpers.h |   76 +

example/passthrough_helpers.h, OK

>   8 files changed, 4814 insertions(+)
>   create mode 100644 tools/virtiofsd/fuse.h
>   create mode 100644 tools/virtiofsd/fuse_common.h
>   create mode 100644 tools/virtiofsd/fuse_i.h
>   create mode 100644 tools/virtiofsd/fuse_log.h
>   create mode 100644 tools/virtiofsd/fuse_lowlevel.h
>   create mode 100644 tools/virtiofsd/fuse_misc.h
>   create mode 100644 tools/virtiofsd/fuse_opt.h
>   create mode 100644 tools/virtiofsd/passthrough_helpers.h

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 046/104] virtiofsd: use /proc/self/fd/ O_PATH file descriptor
  2019-12-12 16:38 ` [PATCH 046/104] virtiofsd: use /proc/self/fd/ O_PATH file descriptor Dr. David Alan Gilbert (git)
@ 2020-01-15 18:09   ` Philippe Mathieu-Daudé
  2020-01-17  9:42     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-15 18:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:38 PM, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Sandboxing will remove /proc from the mount namespace so we can no
> longer build string paths into "/proc/self/fd/...".
> 
> Keep an O_PATH file descriptor so we can still re-open fds via
> /proc/self/fd.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tools/virtiofsd/passthrough_ll.c | 129 ++++++++++++++++++++++++-------
>   1 file changed, 102 insertions(+), 27 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 123f095990..006908f25a 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -110,6 +110,9 @@ struct lo_data {
>       struct lo_map ino_map; /* protected by lo->mutex */
>       struct lo_map dirp_map; /* protected by lo->mutex */
>       struct lo_map fd_map; /* protected by lo->mutex */
> +
> +    /* An O_PATH file descriptor to /proc/self/fd/ */
> +    int proc_self_fd;
>   };
>   
>   static const struct fuse_opt lo_opts[] = {
> @@ -379,9 +382,9 @@ static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
>       int res;
>   
>   retry:
> -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> +    sprintf(procname, "%i", inode->fd);
>   
> -    res = readlink(procname, path, PATH_MAX);
> +    res = readlinkat(lo->proc_self_fd, procname, path, PATH_MAX);
>       if (res < 0) {
>           fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink failed: %m\n");
>           goto fail_noretry;
> @@ -477,9 +480,9 @@ static int utimensat_empty(struct lo_data *lo, struct lo_inode *inode,
>           }
>           return res;
>       }
> -    sprintf(path, "/proc/self/fd/%i", inode->fd);
> +    sprintf(path, "%i", inode->fd);
>   
> -    return utimensat(AT_FDCWD, path, tv, 0);
> +    return utimensat(lo->proc_self_fd, path, tv, 0);
>   
>   fallback:
>       res = lo_parent_and_name(lo, inode, path, &parent);
> @@ -535,8 +538,8 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>           if (fi) {
>               res = fchmod(fd, attr->st_mode);
>           } else {
> -            sprintf(procname, "/proc/self/fd/%i", ifd);
> -            res = chmod(procname, attr->st_mode);
> +            sprintf(procname, "%i", ifd);
> +            res = fchmodat(lo->proc_self_fd, procname, attr->st_mode, 0);
>           }
>           if (res == -1) {
>               goto out_err;
> @@ -552,11 +555,23 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>           }
>       }
>       if (valid & FUSE_SET_ATTR_SIZE) {
> +        int truncfd;
> +
>           if (fi) {
> -            res = ftruncate(fd, attr->st_size);
> +            truncfd = fd;
>           } else {
> -            sprintf(procname, "/proc/self/fd/%i", ifd);
> -            res = truncate(procname, attr->st_size);
> +            sprintf(procname, "%i", ifd);
> +            truncfd = openat(lo->proc_self_fd, procname, O_RDWR);
> +            if (truncfd < 0) {
> +                goto out_err;
> +            }
> +        }
> +
> +        res = ftruncate(truncfd, attr->st_size);
> +        if (!fi) {
> +            saverr = errno;
> +            close(truncfd);
> +            errno = saverr;
>           }
>           if (res == -1) {
>               goto out_err;
> @@ -857,9 +872,9 @@ static int linkat_empty_nofollow(struct lo_data *lo, struct lo_inode *inode,
>           return res;
>       }
>   
> -    sprintf(path, "/proc/self/fd/%i", inode->fd);
> +    sprintf(path, "%i", inode->fd);
>   
> -    return linkat(AT_FDCWD, path, dfd, name, AT_SYMLINK_FOLLOW);
> +    return linkat(lo->proc_self_fd, path, dfd, name, AT_SYMLINK_FOLLOW);
>   
>   fallback:
>       res = lo_parent_and_name(lo, inode, path, &parent);
> @@ -1387,8 +1402,8 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>           fi->flags &= ~O_APPEND;
>       }
>   
> -    sprintf(buf, "/proc/self/fd/%i", lo_fd(req, ino));
> -    fd = open(buf, fi->flags & ~O_NOFOLLOW);
> +    sprintf(buf, "%i", lo_fd(req, ino));
> +    fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
>       if (fd == -1) {
>           return (void)fuse_reply_err(req, errno);
>       }
> @@ -1440,8 +1455,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>   static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
>                        struct fuse_file_info *fi)
>   {
> +    struct lo_data *lo = lo_data(req);

We can initialize this one ...

>       int res;
> -    (void)ino;
>       int fd;
>       char *buf;
>   
> @@ -1449,12 +1464,12 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
>                (void *)fi);
>   
>       if (!fi) {

... here:

            lo = lo_data(req);

Similarly in other functions, but I see this is the style used by this file.

> -        res = asprintf(&buf, "/proc/self/fd/%i", lo_fd(req, ino));
> +        res = asprintf(&buf, "%i", lo_fd(req, ino));
>           if (res == -1) {
>               return (void)fuse_reply_err(req, errno);
>           }
>   
> -        fd = open(buf, O_RDWR);
> +        fd = openat(lo->proc_self_fd, buf, O_RDWR);
>           free(buf);
>           if (fd == -1) {
>               return (void)fuse_reply_err(req, errno);
> @@ -1570,11 +1585,13 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>   static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>                           size_t size)
>   {
> +    struct lo_data *lo = lo_data(req);
>       char *value = NULL;
>       char procname[64];
>       struct lo_inode *inode;
>       ssize_t ret;
>       int saverr;
> +    int fd = -1;
>   
>       inode = lo_inode(req, ino);
>       if (!inode) {
> @@ -1599,7 +1616,11 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>           goto out;
>       }
>   
> -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> +    sprintf(procname, "%i", inode->fd);
> +    fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +    if (fd < 0) {
> +        goto out_err;
> +    }
>   
>       if (size) {
>           value = malloc(size);
> @@ -1607,7 +1628,7 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>               goto out_err;
>           }
>   
> -        ret = getxattr(procname, name, value, size);
> +        ret = fgetxattr(fd, name, value, size);
>           if (ret == -1) {
>               goto out_err;
>           }
> @@ -1618,7 +1639,7 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>   
>           fuse_reply_buf(req, value, ret);
>       } else {
> -        ret = getxattr(procname, name, NULL, 0);
> +        ret = fgetxattr(fd, name, NULL, 0);
>           if (ret == -1) {
>               goto out_err;
>           }
> @@ -1627,6 +1648,10 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>       }
>   out_free:
>       free(value);
> +
> +    if (fd >= 0) {
> +        close(fd);
> +    }
>       return;
>   
>   out_err:
> @@ -1638,11 +1663,13 @@ out:
>   
>   static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>   {
> +    struct lo_data *lo = lo_data(req);
>       char *value = NULL;
>       char procname[64];
>       struct lo_inode *inode;
>       ssize_t ret;
>       int saverr;
> +    int fd = -1;
>   
>       inode = lo_inode(req, ino);
>       if (!inode) {
> @@ -1666,7 +1693,11 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>           goto out;
>       }
>   
> -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> +    sprintf(procname, "%i", inode->fd);
> +    fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +    if (fd < 0) {
> +        goto out_err;
> +    }
>   
>       if (size) {
>           value = malloc(size);
> @@ -1674,7 +1705,7 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>               goto out_err;
>           }
>   
> -        ret = listxattr(procname, value, size);
> +        ret = flistxattr(fd, value, size);
>           if (ret == -1) {
>               goto out_err;
>           }
> @@ -1685,7 +1716,7 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>   
>           fuse_reply_buf(req, value, ret);
>       } else {
> -        ret = listxattr(procname, NULL, 0);
> +        ret = flistxattr(fd, NULL, 0);
>           if (ret == -1) {
>               goto out_err;
>           }
> @@ -1694,6 +1725,10 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>       }
>   out_free:
>       free(value);
> +
> +    if (fd >= 0) {
> +        close(fd);
> +    }
>       return;
>   
>   out_err:
> @@ -1707,9 +1742,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>                           const char *value, size_t size, int flags)
>   {
>       char procname[64];
> +    struct lo_data *lo = lo_data(req);
>       struct lo_inode *inode;
>       ssize_t ret;
>       int saverr;
> +    int fd = -1;
>   
>       inode = lo_inode(req, ino);
>       if (!inode) {
> @@ -1734,21 +1771,31 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>           goto out;
>       }
>   
> -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> +    sprintf(procname, "%i", inode->fd);
> +    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> +    if (fd < 0) {
> +        saverr = errno;
> +        goto out;
> +    }
>   
> -    ret = setxattr(procname, name, value, size, flags);
> +    ret = fsetxattr(fd, name, value, size, flags);
>       saverr = ret == -1 ? errno : 0;
>   
>   out:
> +    if (fd >= 0) {
> +        close(fd);
> +    }
>       fuse_reply_err(req, saverr);
>   }
>   
>   static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
>   {
>       char procname[64];
> +    struct lo_data *lo = lo_data(req);
>       struct lo_inode *inode;
>       ssize_t ret;
>       int saverr;
> +    int fd = -1;
>   
>       inode = lo_inode(req, ino);
>       if (!inode) {
> @@ -1772,12 +1819,20 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
>           goto out;
>       }
>   
> -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> +    sprintf(procname, "%i", inode->fd);
> +    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> +    if (fd < 0) {
> +        saverr = errno;
> +        goto out;
> +    }
>   
> -    ret = removexattr(procname, name);
> +    ret = fremovexattr(fd, name);
>       saverr = ret == -1 ? errno : 0;
>   
>   out:
> +    if (fd >= 0) {
> +        close(fd);
> +    }
>       fuse_reply_err(req, saverr);
>   }
>   
> @@ -1870,12 +1925,25 @@ static void print_capabilities(void)
>       printf("}\n");
>   }
>   
> +static void setup_proc_self_fd(struct lo_data *lo)
> +{
> +    lo->proc_self_fd = open("/proc/self/fd", O_PATH);
> +    if (lo->proc_self_fd == -1) {
> +        fuse_log(FUSE_LOG_ERR, "open(/proc/self/fd, O_PATH): %m\n");
> +        exit(1);
> +    }
> +}
> +
>   int main(int argc, char *argv[])
>   {
>       struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
>       struct fuse_session *se;
>       struct fuse_cmdline_opts opts;
> -    struct lo_data lo = { .debug = 0, .writeback = 0 };
> +    struct lo_data lo = {
> +        .debug = 0,
> +        .writeback = 0,
> +        .proc_self_fd = -1,
> +    };
>       struct lo_map_elem *root_elem;
>       int ret = -1;
>   
> @@ -1986,6 +2054,9 @@ int main(int argc, char *argv[])
>   
>       fuse_daemonize(opts.foreground);
>   
> +    /* Must be after daemonize to get the right /proc/self/fd */
> +    setup_proc_self_fd(&lo);
> +
>       /* Block until ctrl+c or fusermount -u */
>       ret = virtio_loop(se);
>   
> @@ -2001,6 +2072,10 @@ err_out1:
>       lo_map_destroy(&lo.dirp_map);
>       lo_map_destroy(&lo.ino_map);
>   
> +    if (lo.proc_self_fd >= 0) {
> +        close(lo.proc_self_fd);
> +    }
> +
>       if (lo.root.fd >= 0) {
>           close(lo.root.fd);
>       }
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
  2019-12-12 16:39 ` [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races Dr. David Alan Gilbert (git)
@ 2020-01-15 23:05   ` Masayoshi Mizuma
  2020-01-16 12:24     ` Dr. David Alan Gilbert
  2020-01-17 13:40   ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 307+ messages in thread
From: Masayoshi Mizuma @ 2020-01-15 23:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:39:01PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> When running with multiple threads it can be tricky to handle
> FUSE_INIT/FUSE_DESTROY in parallel with other request types or in
> parallel with themselves.  Serialize FUSE_INIT and FUSE_DESTROY so that
> malicious clients cannot trigger race conditions.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_i.h        |  1 +
>  tools/virtiofsd/fuse_lowlevel.c | 18 ++++++++++++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> index d0679508cd..8a4a05b319 100644
> --- a/tools/virtiofsd/fuse_i.h
> +++ b/tools/virtiofsd/fuse_i.h
> @@ -61,6 +61,7 @@ struct fuse_session {
>      struct fuse_req list;
>      struct fuse_req interrupts;
>      pthread_mutex_t lock;
> +    pthread_rwlock_t init_rwlock;
>      int got_destroy;
>      int broken_splice_nonblock;
>      uint64_t notify_ctr;
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 10f478b00c..9f01c05e3e 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -2431,6 +2431,19 @@ void fuse_session_process_buf_int(struct fuse_session *se,
>      req->ctx.pid = in->pid;
>      req->ch = ch ? fuse_chan_get(ch) : NULL;
>  
> +    /*
> +     * INIT and DESTROY requests are serialized, all other request types
> +     * run in parallel.  This prevents races between FUSE_INIT and ordinary
> +     * requests, FUSE_INIT and FUSE_INIT, FUSE_INIT and FUSE_DESTROY, and
> +     * FUSE_DESTROY and FUSE_DESTROY.
> +     */
> +    if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT ||
> +        in->opcode == FUSE_DESTROY) {
> +        pthread_rwlock_wrlock(&se->init_rwlock);
> +    } else {
> +        pthread_rwlock_rdlock(&se->init_rwlock);
> +    }
> +
>      err = EIO;
>      if (!se->got_init) {
>          enum fuse_opcode expected;
> @@ -2488,10 +2501,13 @@ void fuse_session_process_buf_int(struct fuse_session *se,
>      } else {
>          fuse_ll_ops[in->opcode].func(req, in->nodeid, &iter);
>      }
> +
> +    pthread_rwlock_unlock(&se->init_rwlock);
>      return;
>  
>  reply_err:
>      fuse_reply_err(req, err);
> +    pthread_rwlock_unlock(&se->init_rwlock);
>  }
>  
>  #define LL_OPTION(n, o, v)                     \
> @@ -2538,6 +2554,7 @@ void fuse_session_destroy(struct fuse_session *se)
>              se->op.destroy(se->userdata);
>          }
>      }
> +    pthread_rwlock_destroy(&se->init_rwlock);
>      pthread_mutex_destroy(&se->lock);
>      free(se->cuse_data);
>      if (se->fd != -1) {
> @@ -2631,6 +2648,7 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
>      list_init_req(&se->list);
>      list_init_req(&se->interrupts);
>      fuse_mutex_init(&se->lock);
> +    pthread_rwlock_init(&se->init_rwlock, NULL);
>  
>      memcpy(&se->op, op, op_size);
>      se->owner = getuid();

Looks good to me.

Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

> -- 
> 2.23.0
> 
> 


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 085/104] virtiofsd: Support remote posix locks
  2019-12-12 16:38 ` [PATCH 085/104] virtiofsd: Support remote posix locks Dr. David Alan Gilbert (git)
@ 2020-01-15 23:38   ` Masayoshi Mizuma
  2020-01-16 13:26     ` Vivek Goyal
  0 siblings, 1 reply; 307+ messages in thread
From: Masayoshi Mizuma @ 2020-01-15 23:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:38:45PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> 
> Doing posix locks with-in guest kernel are not sufficient if a file/dir
> is being shared by multiple guests. So we need the notion of daemon doing
> the locks which are visible to rest of the guests.
> 
> Given posix locks are per process, one can not call posix lock API on host,
> otherwise bunch of basic posix locks properties are broken. For example,
> If two processes (A and B) in guest open the file and take locks on different
> sections of file, if one of the processes closes the fd, it will close
> fd on virtiofsd and all posix locks on file will go away. This means if
> process A closes the fd, then locks of process B will go away too.
> 
> Similar other problems exist too.
> 
> This patch set tries to emulate posix locks while using open file
> description locks provided on Linux.
> 
> Daemon provides two options (-o posix_lock, -o no_posix_lock) to enable
> or disable posix locking in daemon. By default it is enabled.
> 
> There are few issues though.
> 
> - GETLK() returns pid of process holding lock. As we are emulating locks
>   using OFD, and these locks are not per process and don't return pid
>   of process, so GETLK() in guest does not reuturn process pid.
> 
> - As of now only F_SETLK is supported and not F_SETLKW. We can't block
>   the thread in virtiofsd for arbitrary long duration as there is only
>   one thread serving the queue. That means unlock request will not make
>   it to daemon and F_SETLKW will block infinitely and bring virtio-fs
>   to a halt. This is a solvable problem though and will require significant
>   changes in virtiofsd and kernel. Left as a TODO item for now.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 190 +++++++++++++++++++++++++++++++
>  1 file changed, 190 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index fbcc222860..fc79d5ac43 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -68,6 +68,13 @@
>  #include "seccomp.h"
>  
>  #define HAVE_POSIX_FALLOCATE 1
> +
> +/* Keep track of inode posix locks for each owner. */
> +struct lo_inode_plock {
> +    uint64_t lock_owner;
> +    int fd; /* fd for OFD locks */
> +};
> +
>  struct lo_map_elem {
>      union {
>          struct lo_inode *inode;
> @@ -96,6 +103,8 @@ struct lo_inode {
>      struct lo_key key;
>      uint64_t refcount; /* protected by lo->mutex */
>      fuse_ino_t fuse_ino;
> +    pthread_mutex_t plock_mutex;
> +    GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
>  };
>  
>  struct lo_cred {
> @@ -115,6 +124,7 @@ struct lo_data {
>      int norace;
>      int writeback;
>      int flock;
> +    int posix_lock;
>      int xattr;
>      const char *source;
>      double timeout;
> @@ -138,6 +148,8 @@ static const struct fuse_opt lo_opts[] = {
>      { "source=%s", offsetof(struct lo_data, source), 0 },
>      { "flock", offsetof(struct lo_data, flock), 1 },
>      { "no_flock", offsetof(struct lo_data, flock), 0 },
> +    { "posix_lock", offsetof(struct lo_data, posix_lock), 1 },
> +    { "no_posix_lock", offsetof(struct lo_data, posix_lock), 0 },
>      { "xattr", offsetof(struct lo_data, xattr), 1 },
>      { "no_xattr", offsetof(struct lo_data, xattr), 0 },
>      { "timeout=%lf", offsetof(struct lo_data, timeout), 0 },
> @@ -486,6 +498,17 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>          fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
>          conn->want |= FUSE_CAP_FLOCK_LOCKS;
>      }
> +
> +    if (conn->capable & FUSE_CAP_POSIX_LOCKS) {
> +        if (lo->posix_lock) {
> +            fuse_log(FUSE_LOG_DEBUG, "lo_init: activating posix locks\n");
> +            conn->want |= FUSE_CAP_POSIX_LOCKS;
> +        } else {
> +            fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling posix locks\n");
> +            conn->want &= ~FUSE_CAP_POSIX_LOCKS;
> +        }
> +    }
> +
>      if ((lo->cache == CACHE_NONE && !lo->readdirplus_set) ||
>          lo->readdirplus_clear) {
>          fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling readdirplus\n");
> @@ -773,6 +796,19 @@ static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st)
>      return p;
>  }
>  
> +/* value_destroy_func for posix_locks GHashTable */
> +static void posix_locks_value_destroy(gpointer data)
> +{
> +    struct lo_inode_plock *plock = data;
> +
> +    /*
> +     * We had used open() for locks and had only one fd. So
> +     * closing this fd should release all OFD locks.
> +     */
> +    close(plock->fd);
> +    free(plock);
> +}
> +
>  static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>                          struct fuse_entry_param *e)
>  {
> @@ -826,6 +862,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          newfd = -1;
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
> +        pthread_mutex_init(&inode->plock_mutex, NULL);
> +        inode->posix_locks = g_hash_table_new_full(
> +            g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
>  
>          pthread_mutex_lock(&lo->mutex);
>          inode->fuse_ino = lo_add_inode_mapping(req, inode);
> @@ -1192,6 +1231,11 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
>      if (!inode->refcount) {
>          lo_map_remove(&lo->ino_map, inode->fuse_ino);
>          g_hash_table_remove(lo->inodes, &inode->key);
> +        if (g_hash_table_size(inode->posix_locks)) {
> +            fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> +        }
> +        g_hash_table_destroy(inode->posix_locks);
> +        pthread_mutex_destroy(&inode->plock_mutex);
>          pthread_mutex_unlock(&lo->mutex);
>          close(inode->fd);
>          free(inode);
> @@ -1548,6 +1592,136 @@ out:
>      }
>  }
>  
> +/* Should be called with inode->plock_mutex held */
> +static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
> +                                                      struct lo_inode *inode,
> +                                                      uint64_t lock_owner,
> +                                                      pid_t pid, int *err)
> +{
> +    struct lo_inode_plock *plock;
> +    char procname[64];
> +    int fd;
> +
> +    plock =
> +        g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
> +
> +    if (plock) {
> +        return plock;
> +    }
> +
> +    plock = malloc(sizeof(struct lo_inode_plock));
> +    if (!plock) {
> +        *err = ENOMEM;
> +        return NULL;
> +    }
> +
> +    /* Open another instance of file which can be used for ofd locks. */
> +    sprintf(procname, "%i", inode->fd);
> +
> +    /* TODO: What if file is not writable? */
> +    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> +    if (fd == -1) {

> +        *err = -errno;

I think the errno is positive value, so the minus isn't needed?

           *err = errno;

Otherwise looks good to me.

Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

Thanks,
Masa

> +        free(plock);
> +        return NULL;
> +    }
> +
> +    plock->lock_owner = lock_owner;
> +    plock->fd = fd;
> +    g_hash_table_insert(inode->posix_locks, GUINT_TO_POINTER(plock->lock_owner),
> +                        plock);
> +    return plock;
> +}
> +
> +static void lo_getlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> +                     struct flock *lock)
> +{
> +    struct lo_data *lo = lo_data(req);
> +    struct lo_inode *inode;
> +    struct lo_inode_plock *plock;
> +    int ret, saverr = 0;
> +
> +    fuse_log(FUSE_LOG_DEBUG,
> +             "lo_getlk(ino=%" PRIu64 ", flags=%d)"
> +             " owner=0x%lx, l_type=%d l_start=0x%lx"
> +             " l_len=0x%lx\n",
> +             ino, fi->flags, fi->lock_owner, lock->l_type, lock->l_start,
> +             lock->l_len);
> +
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
> +    pthread_mutex_lock(&inode->plock_mutex);
> +    plock =
> +        lookup_create_plock_ctx(lo, inode, fi->lock_owner, lock->l_pid, &ret);
> +    if (!plock) {
> +        pthread_mutex_unlock(&inode->plock_mutex);
> +        fuse_reply_err(req, ret);
> +        return;
> +    }
> +
> +    ret = fcntl(plock->fd, F_OFD_GETLK, lock);
> +    if (ret == -1) {
> +        saverr = errno;
> +    }
> +    pthread_mutex_unlock(&inode->plock_mutex);
> +
> +    if (saverr) {
> +        fuse_reply_err(req, saverr);
> +    } else {
> +        fuse_reply_lock(req, lock);
> +    }
> +}
> +
> +static void lo_setlk(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> +                     struct flock *lock, int sleep)
> +{
> +    struct lo_data *lo = lo_data(req);
> +    struct lo_inode *inode;
> +    struct lo_inode_plock *plock;
> +    int ret, saverr = 0;
> +
> +    fuse_log(FUSE_LOG_DEBUG,
> +             "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> +             " cmd=%d pid=%d owner=0x%lx sleep=%d l_whence=%d"
> +             " l_start=0x%lx l_len=0x%lx\n",
> +             ino, fi->flags, lock->l_type, lock->l_pid, fi->lock_owner, sleep,
> +             lock->l_whence, lock->l_start, lock->l_len);
> +
> +    if (sleep) {
> +        fuse_reply_err(req, EOPNOTSUPP);
> +        return;
> +    }
> +
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
> +    pthread_mutex_lock(&inode->plock_mutex);
> +    plock =
> +        lookup_create_plock_ctx(lo, inode, fi->lock_owner, lock->l_pid, &ret);
> +
> +    if (!plock) {
> +        pthread_mutex_unlock(&inode->plock_mutex);
> +        fuse_reply_err(req, ret);
> +        return;
> +    }
> +
> +    /* TODO: Is it alright to modify flock? */
> +    lock->l_pid = 0;
> +    ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> +    if (ret == -1) {
> +        saverr = errno;
> +    }
> +    pthread_mutex_unlock(&inode->plock_mutex);
> +    fuse_reply_err(req, saverr);
> +}
> +
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
>                          struct fuse_file_info *fi)
>  {
> @@ -1649,6 +1823,19 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  {
>      int res;
>      (void)ino;
> +    struct lo_inode *inode;
> +
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
> +    /* An fd is going away. Cleanup associated posix locks */
> +    pthread_mutex_lock(&inode->plock_mutex);
> +    g_hash_table_remove(inode->posix_locks, GUINT_TO_POINTER(fi->lock_owner));
> +    pthread_mutex_unlock(&inode->plock_mutex);
> +
>      res = close(dup(lo_fi_fd(req, fi)));
>      fuse_reply_err(req, res == -1 ? errno : 0);
>  }
> @@ -2111,6 +2298,8 @@ static struct fuse_lowlevel_ops lo_oper = {
>      .releasedir = lo_releasedir,
>      .fsyncdir = lo_fsyncdir,
>      .create = lo_create,
> +    .getlk = lo_getlk,
> +    .setlk = lo_setlk,
>      .open = lo_open,
>      .release = lo_release,
>      .flush = lo_flush,
> @@ -2466,6 +2655,7 @@ int main(int argc, char *argv[])
>      struct lo_data lo = {
>          .debug = 0,
>          .writeback = 0,
> +        .posix_lock = 1,
>          .proc_self_fd = -1,
>      };
>      struct lo_map_elem *root_elem;
> -- 
> 2.23.0
> 
> 


^ permalink raw reply	[flat|nested] 307+ messages in thread

* RE: [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo
  2020-01-15 16:57     ` Dr. David Alan Gilbert
@ 2020-01-16  0:54       ` misono.tomohiro
  2020-01-16 12:19         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: misono.tomohiro @ 2020-01-16  0:54 UTC (permalink / raw)
  To: 'Dr. David Alan Gilbert'; +Cc: qemu-devel, stefanha, vgoyal

> * Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > > From: Liu Bo <bo.liu@linux.alibaba.com>
> > >
> > > For fuse's queueinfo, both queueinfo array and queueinfos are
> > > allocated in
> > > fv_queue_set_started() but not cleaned up when the daemon process quits.
> > >
> > > This fixes the leak in proper places.
> > >
> > > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > > Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
> > > ---
> > >  tools/virtiofsd/fuse_virtio.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/tools/virtiofsd/fuse_virtio.c
> > > b/tools/virtiofsd/fuse_virtio.c index 7b22ae8d4f..a364f23d5d 100644
> > > --- a/tools/virtiofsd/fuse_virtio.c
> > > +++ b/tools/virtiofsd/fuse_virtio.c
> > > @@ -671,6 +671,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> > >          }
> > >          close(ourqi->kill_fd);
> > >          ourqi->kick_fd = -1;
> > > +        free(vud->qi[qidx]);
> > > +        vud->qi[qidx] = NULL;
> > >      }
> > >  }
> > >
> > > @@ -878,6 +880,13 @@ int virtio_session_mount(struct fuse_session
> > > *se)  void virtio_session_close(struct fuse_session *se)  {
> > >      close(se->vu_socketfd);
> >
> > I beleve above close() should be removed as it is called 6 line below.
> 
> You're right, I think that's my fault from when I merged this patch with 'Virtiofsd: fix segfault when quit before dev init'.
> 
> Fixed.

Given that:
 Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

Thanks.

> Thanks.
> 
> Dave
> 
> > > +
> > > +    if (!se->virtio_dev) {
> > > +        return;
> > > +    }
> > > +
> > > +    close(se->vu_socketfd);
> > > +    free(se->virtio_dev->qi);
> > >      free(se->virtio_dev);
> > >      se->virtio_dev = NULL;
> > >  }
> > > --
> > > 2.23.0
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 053/104] virtiofsd: Drop CAP_FSETID if client asked for it
  2019-12-12 16:38 ` [PATCH 053/104] virtiofsd: Drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
@ 2020-01-16  4:41   ` Misono Tomohiro
  2020-01-16 15:21   ` Sergio Lopez
  1 sibling, 0 replies; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-16  4:41 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: Vivek Goyal <vgoyal@redhat.com>
> 
> If client requested killing setuid/setgid bits on file being written, drop
> CAP_FSETID capability so that setuid/setgid bits are cleared upon write
> automatically.
> 
> pjdfstest chown/12.t needs this.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>   dgilbert: reworked for libcap-ng

Looks good to me.
 Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

> ---
>  tools/virtiofsd/passthrough_ll.c | 105 +++++++++++++++++++++++++++++++
>  1 file changed, 105 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 6a09b28608..ab318a6f36 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -201,6 +201,91 @@ static int load_capng(void)
>      return 0;
>  }
>  
> +/*
> + * Helpers for dropping and regaining effective capabilities. Returns 0
> + * on success, error otherwise
> + */
> +static int drop_effective_cap(const char *cap_name, bool *cap_dropped)
> +{
> +    int cap, ret;
> +
> +    cap = capng_name_to_capability(cap_name);
> +    if (cap < 0) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "capng_name_to_capability(%s) failed:%s\n",
> +                 cap_name, strerror(errno));
> +        goto out;
> +    }
> +
> +    if (load_capng()) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "load_capng() failed\n");
> +        goto out;
> +    }
> +
> +    /* We dont have this capability in effective set already. */
> +    if (!capng_have_capability(CAPNG_EFFECTIVE, cap)) {
> +        ret = 0;
> +        goto out;
> +    }
> +
> +    if (capng_update(CAPNG_DROP, CAPNG_EFFECTIVE, cap)) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "capng_update(DROP,) failed\n");
> +        goto out;
> +    }
> +
> +    if (capng_apply(CAPNG_SELECT_CAPS)) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "drop:capng_apply() failed\n");
> +        goto out;
> +    }
> +
> +    ret = 0;
> +    if (cap_dropped) {
> +        *cap_dropped = true;
> +    }
> +
> +out:
> +    return ret;
> +}
> +
> +static int gain_effective_cap(const char *cap_name)
> +{
> +    int cap;
> +    int ret = 0;
> +
> +    cap = capng_name_to_capability(cap_name);
> +    if (cap < 0) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "capng_name_to_capability(%s) failed:%s\n",
> +                 cap_name, strerror(errno));
> +        goto out;
> +    }
> +
> +    if (load_capng()) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "load_capng() failed\n");
> +        goto out;
> +    }
> +
> +    if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE, cap)) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "capng_update(ADD,) failed\n");
> +        goto out;
> +    }
> +
> +    if (capng_apply(CAPNG_SELECT_CAPS)) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "gain:capng_apply() failed\n");
> +        goto out;
> +    }
> +    ret = 0;
> +
> +out:
> +    return ret;
> +}
> +
>  static void lo_map_init(struct lo_map *map)
>  {
>      map->elems = NULL;
> @@ -1559,6 +1644,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
>      (void)ino;
>      ssize_t res;
>      struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
> +    bool cap_fsetid_dropped = false;
>  
>      out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
>      out_buf.buf[0].fd = lo_fi_fd(req, fi);
> @@ -1570,12 +1656,31 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
>                   out_buf.buf[0].size, (unsigned long)off);
>      }
>  
> +    /*
> +     * If kill_priv is set, drop CAP_FSETID which should lead to kernel
> +     * clearing setuid/setgid on file.
> +     */
> +    if (fi->kill_priv) {
> +        res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> +        if (res != 0) {
> +            fuse_reply_err(req, res);
> +            return;
> +        }
> +    }
> +
>      res = fuse_buf_copy(&out_buf, in_buf, 0);
>      if (res < 0) {
>          fuse_reply_err(req, -res);
>      } else {
>          fuse_reply_write(req, (size_t)res);
>      }
> +
> +    if (cap_fsetid_dropped) {
> +        res = gain_effective_cap("FSETID");
> +        if (res) {
> +            fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
> +        }
> +    }
>  }
>  
>  static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 070/104] virtiofsd: fail when parent inode isn't known in lo_do_lookup()
  2019-12-12 16:38 ` [PATCH 070/104] virtiofsd: fail when parent inode isn't known in lo_do_lookup() Dr. David Alan Gilbert (git)
@ 2020-01-16  7:17   ` Misono Tomohiro
  2020-01-20 10:08   ` Sergio Lopez
  1 sibling, 0 replies; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-16  7:17 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

Looks good to me.
 Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

> ---
>  tools/virtiofsd/passthrough_ll.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1618db5a92..ef8b88e3d1 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -778,6 +778,15 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode, *dir = lo_inode(req, parent);
>  
> +    /*
> +     * name_to_handle_at() and open_by_handle_at() can reach here with fuse
> +     * mount point in guest, but we don't have its inode info in the
> +     * ino_map.
> +     */
> +    if (!dir) {
> +        return ENOENT;
> +    }
> +
>      memset(e, 0, sizeof(*e));
>      e->attr_timeout = lo->timeout;
>      e->entry_timeout = lo->timeout;
> @@ -787,7 +796,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          name = ".";
>      }
>  
> -    newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
> +    newfd = openat(dir->fd, name, O_PATH | O_NOFOLLOW);
>      if (newfd == -1) {
>          goto out_err;
>      }
> @@ -797,7 +806,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out_err;
>      }
>  
> -    inode = lo_find(lo_data(req), &e->attr);
> +    inode = lo_find(lo, &e->attr);
>      if (inode) {
>          close(newfd);
>          newfd = -1;
> @@ -813,6 +822,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          inode->is_symlink = S_ISLNK(e->attr.st_mode);
>          inode->refcount = 1;
>          inode->fd = newfd;
> +        newfd = -1;
>          inode->ino = e->attr.st_ino;
>          inode->dev = e->attr.st_dev;


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 071/104] virtiofsd: extract root inode init into setup_root()
  2019-12-12 16:38 ` [PATCH 071/104] virtiofsd: extract root inode init into setup_root() Dr. David Alan Gilbert (git)
@ 2020-01-16  7:20   ` Misono Tomohiro
  2020-01-16 15:51     ` Dr. David Alan Gilbert
  2020-01-20 10:09   ` Sergio Lopez
  1 sibling, 1 reply; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-16  7:20 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: Miklos Szeredi <mszeredi@redhat.com>
> 
> Inititialize the root inode in a single place.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index ef8b88e3d1..0f33c3c5e9 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2336,6 +2336,29 @@ static void log_func(enum fuse_log_level level, const char *_fmt, va_list ap)
>      }
>  }
>  
> +static void setup_root(struct lo_data *lo, struct lo_inode *root)
> +{
> +    int fd, res;
> +    struct stat stat;
> +
> +    fd = open("/", O_PATH);
> +    if (fd == -1) {
> +        fuse_log(FUSE_LOG_ERR, "open(%s, O_PATH): %m\n", lo->source);
> +        exit(1);
> +    }
> +
> +    res = fstatat(fd, "", &stat, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
> +    if (res == -1) {
> +        fuse_log(FUSE_LOG_ERR, "fstatat(%s): %m\n", lo->source);
> +        exit(1);
> +    }
> +
> +    root->fd = fd;
> +    root->ino = stat.st_ino;
> +    root->dev = stat.st_dev;
> +    root->refcount = 2;
> +}
> +
>  int main(int argc, char *argv[])
>  {
>      struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
> @@ -2411,8 +2434,6 @@ int main(int argc, char *argv[])
>      if (lo.debug) {
>          current_log_level = FUSE_LOG_DEBUG;
>      }
> -    lo.root.refcount = 2;
> -
>      if (lo.source) {
>          struct stat stat;
>          int res;
> @@ -2480,6 +2501,7 @@ int main(int argc, char *argv[])
>  
>      setup_sandbox(&lo, se, opts.syslog);
>  
> +    setup_root(&lo, &lo.root);
>      /* Block until ctrl+c or fusermount -u */
>      ret = virtio_loop(se);

Following block still remains in main():
2933    lo.root.is_symlink = false;
...
2952    lo.root.fd = open(lo.source, O_PATH);
2953
2954    if (lo.root.fd == -1) {
2955        fuse_log(FUSE_LOG_ERR, "open(\"%s\", O_PATH): %m\n", lo.source);
2956        exit(1);
2957    }

L.2933 should be included in lo_setup_root() and can we just remove L.2952-2957?

Thanks,
Misono


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename
  2019-12-12 16:38 ` [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename Dr. David Alan Gilbert (git)
@ 2020-01-16 11:56   ` Misono Tomohiro
  2020-01-16 16:45     ` Dr. David Alan Gilbert
  2020-01-20 10:17   ` Sergio Lopez
  1 sibling, 1 reply; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-16 11:56 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: Miklos Szeredi <mszeredi@redhat.com>
> 
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

I'm not familiar with qemu convention but shouldn't we put
at least one line of description like linux kernel?

For code itself:
 Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

> ---
>  tools/virtiofsd/passthrough_ll.c | 50 +++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 0f33c3c5e9..1b84d4f313 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1077,17 +1077,42 @@ out_err:
>      fuse_reply_err(req, saverr);
>  }
>  
> +static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
> +                                    const char *name)
> +{
> +    int res;
> +    struct stat attr;
> +
> +    res = fstatat(lo_fd(req, parent), name, &attr,
> +                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
> +    if (res == -1) {
> +        return NULL;
> +    }
> +
> +    return lo_find(lo_data(req), &attr);
> +}
> +
>  static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
>  {
>      int res;
> +    struct lo_inode *inode;
> +    struct lo_data *lo = lo_data(req);
> +
>      if (!is_safe_path_component(name)) {
>          fuse_reply_err(req, EINVAL);
>          return;
>      }
>  
> +    inode = lookup_name(req, parent, name);
> +    if (!inode) {
> +        fuse_reply_err(req, EIO);
> +        return;
> +    }
> +
>      res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
>  
>      fuse_reply_err(req, res == -1 ? errno : 0);
> +    unref_inode_lolocked(lo, inode, 1);
>  }
>  
>  static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> @@ -1095,12 +1120,23 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
>                        unsigned int flags)
>  {
>      int res;
> +    struct lo_inode *oldinode;
> +    struct lo_inode *newinode;
> +    struct lo_data *lo = lo_data(req);
>  
>      if (!is_safe_path_component(name) || !is_safe_path_component(newname)) {
>          fuse_reply_err(req, EINVAL);
>          return;
>      }
>  
> +    oldinode = lookup_name(req, parent, name);
> +    newinode = lookup_name(req, newparent, newname);
> +
> +    if (!oldinode) {
> +        fuse_reply_err(req, EIO);
> +        goto out;
> +    }
> +
>      if (flags) {
>  #ifndef SYS_renameat2
>          fuse_reply_err(req, EINVAL);
> @@ -1113,26 +1149,38 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
>              fuse_reply_err(req, res == -1 ? errno : 0);
>          }
>  #endif
> -        return;
> +        goto out;
>      }
>  
>      res = renameat(lo_fd(req, parent), name, lo_fd(req, newparent), newname);
>  
>      fuse_reply_err(req, res == -1 ? errno : 0);
> +out:
> +    unref_inode_lolocked(lo, oldinode, 1);
> +    unref_inode_lolocked(lo, newinode, 1);
>  }
>  
>  static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
>  {
>      int res;
> +    struct lo_inode *inode;
> +    struct lo_data *lo = lo_data(req);
>  
>      if (!is_safe_path_component(name)) {
>          fuse_reply_err(req, EINVAL);
>          return;
>      }
>  
> +    inode = lookup_name(req, parent, name);
> +    if (!inode) {
> +        fuse_reply_err(req, EIO);
> +        return;
> +    }
> +
>      res = unlinkat(lo_fd(req, parent), name, 0);
>  
>      fuse_reply_err(req, res == -1 ? errno : 0);
> +    unref_inode_lolocked(lo, inode, 1);
>  }
>  
>  static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo
  2020-01-16  0:54       ` misono.tomohiro
@ 2020-01-16 12:19         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-16 12:19 UTC (permalink / raw)
  To: misono.tomohiro; +Cc: qemu-devel, stefanha, vgoyal

* misono.tomohiro@fujitsu.com (misono.tomohiro@fujitsu.com) wrote:
> > * Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > > > From: Liu Bo <bo.liu@linux.alibaba.com>
> > > >
> > > > For fuse's queueinfo, both queueinfo array and queueinfos are
> > > > allocated in
> > > > fv_queue_set_started() but not cleaned up when the daemon process quits.
> > > >
> > > > This fixes the leak in proper places.
> > > >
> > > > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > > > Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
> > > > ---
> > > >  tools/virtiofsd/fuse_virtio.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/tools/virtiofsd/fuse_virtio.c
> > > > b/tools/virtiofsd/fuse_virtio.c index 7b22ae8d4f..a364f23d5d 100644
> > > > --- a/tools/virtiofsd/fuse_virtio.c
> > > > +++ b/tools/virtiofsd/fuse_virtio.c
> > > > @@ -671,6 +671,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> > > >          }
> > > >          close(ourqi->kill_fd);
> > > >          ourqi->kick_fd = -1;
> > > > +        free(vud->qi[qidx]);
> > > > +        vud->qi[qidx] = NULL;
> > > >      }
> > > >  }
> > > >
> > > > @@ -878,6 +880,13 @@ int virtio_session_mount(struct fuse_session
> > > > *se)  void virtio_session_close(struct fuse_session *se)  {
> > > >      close(se->vu_socketfd);
> > >
> > > I beleve above close() should be removed as it is called 6 line below.
> > 
> > You're right, I think that's my fault from when I merged this patch with 'Virtiofsd: fix segfault when quit before dev init'.
> > 
> > Fixed.
> 
> Given that:
>  Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

Thank you!

Dave

> Thanks.
> 
> > Thanks.
> > 
> > Dave
> > 
> > > > +
> > > > +    if (!se->virtio_dev) {
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    close(se->vu_socketfd);
> > > > +    free(se->virtio_dev->qi);
> > > >      free(se->virtio_dev);
> > > >      se->virtio_dev = NULL;
> > > >  }
> > > > --
> > > > 2.23.0
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
  2020-01-15 23:05   ` Masayoshi Mizuma
@ 2020-01-16 12:24     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-16 12:24 UTC (permalink / raw)
  To: Masayoshi Mizuma; +Cc: qemu-devel, stefanha, vgoyal

* Masayoshi Mizuma (msys.mizuma@gmail.com) wrote:
> On Thu, Dec 12, 2019 at 04:39:01PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > When running with multiple threads it can be tricky to handle
> > FUSE_INIT/FUSE_DESTROY in parallel with other request types or in
> > parallel with themselves.  Serialize FUSE_INIT and FUSE_DESTROY so that
> > malicious clients cannot trigger race conditions.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_i.h        |  1 +
> >  tools/virtiofsd/fuse_lowlevel.c | 18 ++++++++++++++++++
> >  2 files changed, 19 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> > index d0679508cd..8a4a05b319 100644
> > --- a/tools/virtiofsd/fuse_i.h
> > +++ b/tools/virtiofsd/fuse_i.h
> > @@ -61,6 +61,7 @@ struct fuse_session {
> >      struct fuse_req list;
> >      struct fuse_req interrupts;
> >      pthread_mutex_t lock;
> > +    pthread_rwlock_t init_rwlock;
> >      int got_destroy;
> >      int broken_splice_nonblock;
> >      uint64_t notify_ctr;
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index 10f478b00c..9f01c05e3e 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -2431,6 +2431,19 @@ void fuse_session_process_buf_int(struct fuse_session *se,
> >      req->ctx.pid = in->pid;
> >      req->ch = ch ? fuse_chan_get(ch) : NULL;
> >  
> > +    /*
> > +     * INIT and DESTROY requests are serialized, all other request types
> > +     * run in parallel.  This prevents races between FUSE_INIT and ordinary
> > +     * requests, FUSE_INIT and FUSE_INIT, FUSE_INIT and FUSE_DESTROY, and
> > +     * FUSE_DESTROY and FUSE_DESTROY.
> > +     */
> > +    if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT ||
> > +        in->opcode == FUSE_DESTROY) {
> > +        pthread_rwlock_wrlock(&se->init_rwlock);
> > +    } else {
> > +        pthread_rwlock_rdlock(&se->init_rwlock);
> > +    }
> > +
> >      err = EIO;
> >      if (!se->got_init) {
> >          enum fuse_opcode expected;
> > @@ -2488,10 +2501,13 @@ void fuse_session_process_buf_int(struct fuse_session *se,
> >      } else {
> >          fuse_ll_ops[in->opcode].func(req, in->nodeid, &iter);
> >      }
> > +
> > +    pthread_rwlock_unlock(&se->init_rwlock);
> >      return;
> >  
> >  reply_err:
> >      fuse_reply_err(req, err);
> > +    pthread_rwlock_unlock(&se->init_rwlock);
> >  }
> >  
> >  #define LL_OPTION(n, o, v)                     \
> > @@ -2538,6 +2554,7 @@ void fuse_session_destroy(struct fuse_session *se)
> >              se->op.destroy(se->userdata);
> >          }
> >      }
> > +    pthread_rwlock_destroy(&se->init_rwlock);
> >      pthread_mutex_destroy(&se->lock);
> >      free(se->cuse_data);
> >      if (se->fd != -1) {
> > @@ -2631,6 +2648,7 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
> >      list_init_req(&se->list);
> >      list_init_req(&se->interrupts);
> >      fuse_mutex_init(&se->lock);
> > +    pthread_rwlock_init(&se->init_rwlock, NULL);
> >  
> >      memcpy(&se->op, op, op_size);
> >      se->owner = getuid();
> 
> Looks good to me.
> 
> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

Thanks

> > -- 
> > 2.23.0
> > 
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free
  2019-12-12 16:38 ` [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free Dr. David Alan Gilbert (git)
@ 2020-01-16 12:25   ` Misono Tomohiro
  2020-01-16 17:21     ` Stefan Hajnoczi
  2020-01-20 10:28   ` Sergio Lopez
  1 sibling, 1 reply; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-16 12:25 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> If thread A is using an inode it must not be deleted by thread B when
> processing a FUSE_FORGET request.
> 
> The FUSE protocol itself already has a counter called nlookup that is
> used in FUSE_FORGET messages.  We cannot trust this counter since the
> untrusted client can manipulate it via FUSE_FORGET messages.
> 
> Introduce a new refcount to keep inodes alive for the required lifespan.
> lo_inode_put() must be called to release a reference.  FUSE's nlookup
> counter holds exactly one reference so that the inode stays alive as
> long as the client still wants to remember it.
> 
> Note that the lo_inode->is_symlink field is moved to avoid creating a
> hole in the struct due to struct field alignment.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 168 ++++++++++++++++++++++++++-----
>  1 file changed, 145 insertions(+), 23 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index b19c9ee328..8f4ab8351c 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -99,7 +99,13 @@ struct lo_key {
>  
>  struct lo_inode {
>      int fd;
> -    bool is_symlink;
> +
> +    /*
> +     * Atomic reference count for this object.  The nlookup field holds a
> +     * reference and release it when nlookup reaches 0.
> +     */
> +    gint refcount;
> +
>      struct lo_key key;
>  
>      /*
> @@ -118,6 +124,8 @@ struct lo_inode {
>      fuse_ino_t fuse_ino;
>      pthread_mutex_t plock_mutex;
>      GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
> +
> +    bool is_symlink;
>  };
>  
>  struct lo_cred {
> @@ -473,6 +481,23 @@ static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
>      return elem - lo_data(req)->ino_map.elems;
>  }
>  
> +static void lo_inode_put(struct lo_data *lo, struct lo_inode **inodep)
> +{
> +    struct lo_inode *inode = *inodep;
> +
> +    if (!inode) {
> +        return;
> +    }
> +
> +    *inodep = NULL;
> +
> +    if (g_atomic_int_dec_and_test(&inode->refcount)) {
> +        close(inode->fd);
> +        free(inode);
> +    }
> +}
> +
> +/* Caller must release refcount using lo_inode_put() */
>  static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
>  {
>      struct lo_data *lo = lo_data(req);
> @@ -480,6 +505,9 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
>  
>      pthread_mutex_lock(&lo->mutex);
>      elem = lo_map_get(&lo->ino_map, ino);
> +    if (elem) {
> +        g_atomic_int_inc(&elem->inode->refcount);
> +    }
>      pthread_mutex_unlock(&lo->mutex);
>  
>      if (!elem) {
> @@ -489,10 +517,23 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
>      return elem->inode;
>  }
>  
> +/*
> + * TODO Remove this helper and force callers to hold an inode refcount until
> + * they are done with the fd.  This will be done in a later patch to make
> + * review easier.
> + */
>  static int lo_fd(fuse_req_t req, fuse_ino_t ino)
>  {
>      struct lo_inode *inode = lo_inode(req, ino);
> -    return inode ? inode->fd : -1;
> +    int fd;
> +
> +    if (!inode) {
> +        return -1;
> +    }
> +
> +    fd = inode->fd;
> +    lo_inode_put(lo_data(req), &inode);
> +    return fd;
>  }
>  
>  static void lo_init(void *userdata, struct fuse_conn_info *conn)
> @@ -547,6 +588,10 @@ static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
>      fuse_reply_attr(req, &buf, lo->timeout);
>  }
>  
> +/*
> + * Increments parent->nlookup and caller must release refcount using
> + * lo_inode_put(&parent).
> + */
>  static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
>                                char path[PATH_MAX], struct lo_inode **parent)
>  {
> @@ -584,6 +629,7 @@ retry:
>          p = &lo->root;
>          pthread_mutex_lock(&lo->mutex);
>          p->nlookup++;
> +        g_atomic_int_inc(&p->refcount);
>          pthread_mutex_unlock(&lo->mutex);
>      } else {
>          *last = '\0';

We need lo_ionde_put() in error path, right?:
https://gitlab.com/virtio-fs/qemu/blob/virtio-fs-as-posted-2019-12-12/tools/virtiofsd/passthrough_ll.c#L680

nit: if yes, unref_inode_lolocked() is always paired with lo_inode_put().
So how about combine them in one function? As p->nloockup and p->refcount
are both incremented in one place (lo_find/lo_parent_and_name) in these case,
it seems natural for me to decrement them in one function as well.

Thanks,
Misono


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 095/104] virtiofsd: convert more fprintf and perror to use fuse log infra
  2019-12-12 16:38 ` [PATCH 095/104] virtiofsd: convert more fprintf and perror to use fuse log infra Dr. David Alan Gilbert (git)
  2020-01-07 12:16   ` Daniel P. Berrangé
@ 2020-01-16 12:29   ` Misono Tomohiro
  2020-01-16 16:32     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-16 12:29 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: Eryu Guan <eguan@linux.alibaba.com>
> 
> Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_signals.c | 6 +++++-
>  tools/virtiofsd/helper.c       | 9 ++++++---
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_signals.c b/tools/virtiofsd/fuse_signals.c
> index 10a6f88088..edabf24e0d 100644
> --- a/tools/virtiofsd/fuse_signals.c
> +++ b/tools/virtiofsd/fuse_signals.c
> @@ -11,6 +11,7 @@
>  #include "fuse_i.h"
>  #include "fuse_lowlevel.h"
>  
> +#include <errno.h>
>  #include <signal.h>
>  #include <stdio.h>
>  #include <stdlib.h>
> @@ -46,12 +47,15 @@ static int set_one_signal_handler(int sig, void (*handler)(int), int remove)
>      sa.sa_flags = 0;
>  
>      if (sigaction(sig, NULL, &old_sa) == -1) {
> -        perror("fuse: cannot get old signal handler");
> +        fuse_log(FUSE_LOG_ERR, "fuse: cannot get old signal handler: %s\n",
> +                 strerror(errno));
>          return -1;
>      }
>  
>      if (old_sa.sa_handler == (remove ? handler : SIG_DFL) &&
>          sigaction(sig, &sa, NULL) == -1) {
> +        fuse_log(FUSE_LOG_ERR, "fuse: cannot set signal handler: %s\n",
> +                 strerror(errno));

I notice one perror is remaining:
>          perror("fuse: cannot set signal handler");

other than that,
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

>          return -1;
>      }
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 7b28507a38..bcb8c05063 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -200,7 +200,8 @@ int fuse_daemonize(int foreground)
>          char completed;
>  
>          if (pipe(waiter)) {
> -            perror("fuse_daemonize: pipe");
> +            fuse_log(FUSE_LOG_ERR, "fuse_daemonize: pipe: %s\n",
> +                     strerror(errno));
>              return -1;
>          }
>  
> @@ -210,7 +211,8 @@ int fuse_daemonize(int foreground)
>           */
>          switch (fork()) {
>          case -1:
> -            perror("fuse_daemonize: fork");
> +            fuse_log(FUSE_LOG_ERR, "fuse_daemonize: fork: %s\n",
> +                     strerror(errno));
>              return -1;
>          case 0:
>              break;
> @@ -220,7 +222,8 @@ int fuse_daemonize(int foreground)
>          }
>  
>          if (setsid() == -1) {
> -            perror("fuse_daemonize: setsid");
> +            fuse_log(FUSE_LOG_ERR, "fuse_daemonize: setsid: %s\n",
> +                     strerror(errno));
>              return -1;
>          }
>  
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 085/104] virtiofsd: Support remote posix locks
  2020-01-15 23:38   ` Masayoshi Mizuma
@ 2020-01-16 13:26     ` Vivek Goyal
  2020-01-17  9:27       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Vivek Goyal @ 2020-01-16 13:26 UTC (permalink / raw)
  To: Masayoshi Mizuma; +Cc: Dr. David Alan Gilbert (git), stefanha, qemu-devel

On Wed, Jan 15, 2020 at 06:38:31PM -0500, Masayoshi Mizuma wrote:

[..]
> > +/* Should be called with inode->plock_mutex held */
> > +static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
> > +                                                      struct lo_inode *inode,
> > +                                                      uint64_t lock_owner,
> > +                                                      pid_t pid, int *err)
> > +{
> > +    struct lo_inode_plock *plock;
> > +    char procname[64];
> > +    int fd;
> > +
> > +    plock =
> > +        g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
> > +
> > +    if (plock) {
> > +        return plock;
> > +    }
> > +
> > +    plock = malloc(sizeof(struct lo_inode_plock));
> > +    if (!plock) {
> > +        *err = ENOMEM;
> > +        return NULL;
> > +    }
> > +
> > +    /* Open another instance of file which can be used for ofd locks. */
> > +    sprintf(procname, "%i", inode->fd);
> > +
> > +    /* TODO: What if file is not writable? */
> > +    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> > +    if (fd == -1) {
> 
> > +        *err = -errno;
> 
> I think the errno is positive value, so the minus isn't needed?
> 
>            *err = errno;

That's sounds right. Thanks. 

David, will you be able to do this tweak in your tree or you want me to
send a separate fix patch.

Thanks
Vivek

> 
> Otherwise looks good to me.
> 
> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> 
> Thanks,
> Masa
> 



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 041/104] virtiofsd: add fuse_mbuf_iter API
  2019-12-12 16:38 ` [PATCH 041/104] virtiofsd: add fuse_mbuf_iter API Dr. David Alan Gilbert (git)
@ 2020-01-16 14:17   ` Sergio Lopez
  0 siblings, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-16 14:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: vgoyal, stefanha

[-- Attachment #1: Type: text/plain, Size: 559 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Stefan Hajnoczi <stefanha@redhat.com>
>
> Introduce an API for consuming bytes from a buffer with size checks.
> All FUSE operations will be converted to use this safe API instead of
> void *inarg.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/buffer.c      | 28 ++++++++++++++++++++
>  tools/virtiofsd/fuse_common.h | 49 ++++++++++++++++++++++++++++++++++-
>  2 files changed, 76 insertions(+), 1 deletion(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 042/104] virtiofsd: validate input buffer sizes in do_write_buf()
  2019-12-12 16:38 ` [PATCH 042/104] virtiofsd: validate input buffer sizes in do_write_buf() Dr. David Alan Gilbert (git)
@ 2020-01-16 14:19   ` Sergio Lopez
  0 siblings, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-16 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert (git), stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 565 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Stefan Hajnoczi <stefanha@redhat.com>
>
> There is a small change in behavior: if fuse_write_in->size doesn't
> match the input buffer size then the request is failed.  Previously
> write requests with 1 fuse_buf element would truncate to
> fuse_write_in->size.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 49 ++++++++++++++++++++-------------
>  1 file changed, 30 insertions(+), 19 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 043/104] virtiofsd: check input buffer size in fuse_lowlevel.c ops
  2019-12-12 16:38 ` [PATCH 043/104] virtiofsd: check input buffer size in fuse_lowlevel.c ops Dr. David Alan Gilbert (git)
@ 2020-01-16 14:25   ` Sergio Lopez
  0 siblings, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-16 14:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert (git), stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Stefan Hajnoczi <stefanha@redhat.com>
>
> Each FUSE operation involves parsing the input buffer.  Currently the
> code assumes the input buffer is large enough for the expected
> arguments.  This patch uses fuse_mbuf_iter to check the size.
>
> Most operations are simple to convert.  Some are more complicated due to
> variable-length inputs or different sizes depending on the protocol
> version.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 581 +++++++++++++++++++++++++-------
>  1 file changed, 456 insertions(+), 125 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 044/104] virtiofsd: prevent ".." escape in lo_do_lookup()
  2019-12-12 16:38 ` [PATCH 044/104] virtiofsd: prevent ".." escape in lo_do_lookup() Dr. David Alan Gilbert (git)
@ 2020-01-16 14:33   ` Sergio Lopez
  0 siblings, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-16 14:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert (git), stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Stefan Hajnoczi <stefanha@redhat.com>
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 045/104] virtiofsd: prevent ".." escape in lo_do_readdir()
  2019-12-12 16:38 ` [PATCH 045/104] virtiofsd: prevent ".." escape in lo_do_readdir() Dr. David Alan Gilbert (git)
@ 2020-01-16 14:35   ` Sergio Lopez
  0 siblings, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-16 14:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert (git), stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 463 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Stefan Hajnoczi <stefanha@redhat.com>
>
> Construct a fake dirent for the root directory's ".." entry.  This hides
> the parent directory from the FUSE client.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 36 +++++++++++++++++++-------------
>  1 file changed, 22 insertions(+), 14 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 051/104] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV
  2019-12-12 16:38 ` [PATCH 051/104] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV Dr. David Alan Gilbert (git)
  2020-01-15 12:06   ` Misono Tomohiro
@ 2020-01-16 14:37   ` Sergio Lopez
  1 sibling, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-16 14:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert (git), stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 442 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Vivek Goyal <vgoyal@redhat.com>
>
> Caller can set FUSE_WRITE_KILL_PRIV in write_flags. Parse it and pass it
> to the filesystem.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_common.h   | 6 +++++-
>  tools/virtiofsd/fuse_lowlevel.c | 4 +++-
>  2 files changed, 8 insertions(+), 2 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 053/104] virtiofsd: Drop CAP_FSETID if client asked for it
  2019-12-12 16:38 ` [PATCH 053/104] virtiofsd: Drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
  2020-01-16  4:41   ` Misono Tomohiro
@ 2020-01-16 15:21   ` Sergio Lopez
  1 sibling, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-16 15:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert (git), stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 4582 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Vivek Goyal <vgoyal@redhat.com>
>
> If client requested killing setuid/setgid bits on file being written, drop
> CAP_FSETID capability so that setuid/setgid bits are cleared upon write
> automatically.
>
> pjdfstest chown/12.t needs this.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>   dgilbert: reworked for libcap-ng
> ---
>  tools/virtiofsd/passthrough_ll.c | 105 +++++++++++++++++++++++++++++++
>  1 file changed, 105 insertions(+)
>
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 6a09b28608..ab318a6f36 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -201,6 +201,91 @@ static int load_capng(void)
>      return 0;
>  }
>
> +/*
> + * Helpers for dropping and regaining effective capabilities. Returns 0
> + * on success, error otherwise
> + */
> +static int drop_effective_cap(const char *cap_name, bool *cap_dropped)
> +{
> +    int cap, ret;
> +
> +    cap = capng_name_to_capability(cap_name);
> +    if (cap < 0) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "capng_name_to_capability(%s) failed:%s\n",
> +                 cap_name, strerror(errno));
> +        goto out;
> +    }
> +
> +    if (load_capng()) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "load_capng() failed\n");
> +        goto out;
> +    }
> +
> +    /* We dont have this capability in effective set already. */
> +    if (!capng_have_capability(CAPNG_EFFECTIVE, cap)) {
> +        ret = 0;
> +        goto out;
> +    }
> +
> +    if (capng_update(CAPNG_DROP, CAPNG_EFFECTIVE, cap)) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "capng_update(DROP,) failed\n");
> +        goto out;
> +    }
> +
> +    if (capng_apply(CAPNG_SELECT_CAPS)) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "drop:capng_apply() failed\n");
> +        goto out;
> +    }
> +
> +    ret = 0;
> +    if (cap_dropped) {
> +        *cap_dropped = true;
> +    }
> +
> +out:
> +    return ret;
> +}
> +
> +static int gain_effective_cap(const char *cap_name)
> +{
> +    int cap;
> +    int ret = 0;
> +
> +    cap = capng_name_to_capability(cap_name);
> +    if (cap < 0) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "capng_name_to_capability(%s) failed:%s\n",
> +                 cap_name, strerror(errno));
> +        goto out;
> +    }
> +
> +    if (load_capng()) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "load_capng() failed\n");
> +        goto out;
> +    }
> +
> +    if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE, cap)) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "capng_update(ADD,) failed\n");
> +        goto out;
> +    }
> +
> +    if (capng_apply(CAPNG_SELECT_CAPS)) {
> +        ret = errno;
> +        fuse_log(FUSE_LOG_ERR, "gain:capng_apply() failed\n");
> +        goto out;
> +    }
> +    ret = 0;
> +
> +out:
> +    return ret;
> +}
> +
>  static void lo_map_init(struct lo_map *map)
>  {
>      map->elems = NULL;
> @@ -1559,6 +1644,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
>      (void)ino;
>      ssize_t res;
>      struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
> +    bool cap_fsetid_dropped = false;
>
>      out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
>      out_buf.buf[0].fd = lo_fi_fd(req, fi);
> @@ -1570,12 +1656,31 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
>                   out_buf.buf[0].size, (unsigned long)off);
>      }
>
> +    /*
> +     * If kill_priv is set, drop CAP_FSETID which should lead to kernel
> +     * clearing setuid/setgid on file.
> +     */
> +    if (fi->kill_priv) {
> +        res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> +        if (res != 0) {
> +            fuse_reply_err(req, res);
> +            return;
> +        }
> +    }
> +
>      res = fuse_buf_copy(&out_buf, in_buf, 0);
>      if (res < 0) {
>          fuse_reply_err(req, -res);
>      } else {
>          fuse_reply_write(req, (size_t)res);
>      }
> +
> +    if (cap_fsetid_dropped) {
> +        res = gain_effective_cap("FSETID");
> +        if (res) {
> +            fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
> +        }
> +    }
>  }
>
>  static void lo_statfs(fuse_req_t req, fuse_ino_t ino)

Fiddling with the capabilities for clearing setuid/setgid on a file
seems a bit weird to me, but I see this was already discussed in
https://lkml.org/lkml/2019/5/20/1229, so...

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 071/104] virtiofsd: extract root inode init into setup_root()
  2020-01-16  7:20   ` Misono Tomohiro
@ 2020-01-16 15:51     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-16 15:51 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: qemu-devel, stefanha, vgoyal

* Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > From: Miklos Szeredi <mszeredi@redhat.com>
> > 
> > Inititialize the root inode in a single place.
> > 
> > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++--
> >  1 file changed, 24 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index ef8b88e3d1..0f33c3c5e9 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2336,6 +2336,29 @@ static void log_func(enum fuse_log_level level, const char *_fmt, va_list ap)
> >      }
> >  }
> >  
> > +static void setup_root(struct lo_data *lo, struct lo_inode *root)
> > +{
> > +    int fd, res;
> > +    struct stat stat;
> > +
> > +    fd = open("/", O_PATH);
> > +    if (fd == -1) {
> > +        fuse_log(FUSE_LOG_ERR, "open(%s, O_PATH): %m\n", lo->source);
> > +        exit(1);
> > +    }
> > +
> > +    res = fstatat(fd, "", &stat, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
> > +    if (res == -1) {
> > +        fuse_log(FUSE_LOG_ERR, "fstatat(%s): %m\n", lo->source);
> > +        exit(1);
> > +    }
> > +
> > +    root->fd = fd;
> > +    root->ino = stat.st_ino;
> > +    root->dev = stat.st_dev;
> > +    root->refcount = 2;
> > +}
> > +
> >  int main(int argc, char *argv[])
> >  {
> >      struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
> > @@ -2411,8 +2434,6 @@ int main(int argc, char *argv[])
> >      if (lo.debug) {
> >          current_log_level = FUSE_LOG_DEBUG;
> >      }
> > -    lo.root.refcount = 2;
> > -
> >      if (lo.source) {
> >          struct stat stat;
> >          int res;
> > @@ -2480,6 +2501,7 @@ int main(int argc, char *argv[])
> >  
> >      setup_sandbox(&lo, se, opts.syslog);
> >  
> > +    setup_root(&lo, &lo.root);
> >      /* Block until ctrl+c or fusermount -u */
> >      ret = virtio_loop(se);
> 
> Following block still remains in main():
> 2933    lo.root.is_symlink = false;
> ...
> 2952    lo.root.fd = open(lo.source, O_PATH);
> 2953
> 2954    if (lo.root.fd == -1) {
> 2955        fuse_log(FUSE_LOG_ERR, "open(\"%s\", O_PATH): %m\n", lo.source);
> 2956        exit(1);
> 2957    }
> 
> L.2933 should be included in lo_setup_root() and can we just remove L.2952-2957?

Yes agreed; thanks I've fixed that up.

Dave

> Thanks,
> Misono
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 095/104] virtiofsd: convert more fprintf and perror to use fuse log infra
  2020-01-16 12:29   ` Misono Tomohiro
@ 2020-01-16 16:32     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-16 16:32 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: qemu-devel, stefanha, vgoyal

* Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > From: Eryu Guan <eguan@linux.alibaba.com>
> > 
> > Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/fuse_signals.c | 6 +++++-
> >  tools/virtiofsd/helper.c       | 9 ++++++---
> >  2 files changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/fuse_signals.c b/tools/virtiofsd/fuse_signals.c
> > index 10a6f88088..edabf24e0d 100644
> > --- a/tools/virtiofsd/fuse_signals.c
> > +++ b/tools/virtiofsd/fuse_signals.c
> > @@ -11,6 +11,7 @@
> >  #include "fuse_i.h"
> >  #include "fuse_lowlevel.h"
> >  
> > +#include <errno.h>
> >  #include <signal.h>
> >  #include <stdio.h>
> >  #include <stdlib.h>
> > @@ -46,12 +47,15 @@ static int set_one_signal_handler(int sig, void (*handler)(int), int remove)
> >      sa.sa_flags = 0;
> >  
> >      if (sigaction(sig, NULL, &old_sa) == -1) {
> > -        perror("fuse: cannot get old signal handler");
> > +        fuse_log(FUSE_LOG_ERR, "fuse: cannot get old signal handler: %s\n",
> > +                 strerror(errno));
> >          return -1;
> >      }
> >  
> >      if (old_sa.sa_handler == (remove ? handler : SIG_DFL) &&
> >          sigaction(sig, &sa, NULL) == -1) {
> > +        fuse_log(FUSE_LOG_ERR, "fuse: cannot set signal handler: %s\n",
> > +                 strerror(errno));
> 
> I notice one perror is remaining:
> >          perror("fuse: cannot set signal handler");

Oops,  Removed.

> other than that,
> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

Thanks.

> 
> >          return -1;
> >      }
> > diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> > index 7b28507a38..bcb8c05063 100644
> > --- a/tools/virtiofsd/helper.c
> > +++ b/tools/virtiofsd/helper.c
> > @@ -200,7 +200,8 @@ int fuse_daemonize(int foreground)
> >          char completed;
> >  
> >          if (pipe(waiter)) {
> > -            perror("fuse_daemonize: pipe");
> > +            fuse_log(FUSE_LOG_ERR, "fuse_daemonize: pipe: %s\n",
> > +                     strerror(errno));
> >              return -1;
> >          }
> >  
> > @@ -210,7 +211,8 @@ int fuse_daemonize(int foreground)
> >           */
> >          switch (fork()) {
> >          case -1:
> > -            perror("fuse_daemonize: fork");
> > +            fuse_log(FUSE_LOG_ERR, "fuse_daemonize: fork: %s\n",
> > +                     strerror(errno));
> >              return -1;
> >          case 0:
> >              break;
> > @@ -220,7 +222,8 @@ int fuse_daemonize(int foreground)
> >          }
> >  
> >          if (setsid() == -1) {
> > -            perror("fuse_daemonize: setsid");
> > +            fuse_log(FUSE_LOG_ERR, "fuse_daemonize: setsid: %s\n",
> > +                     strerror(errno));
> >              return -1;
> >          }
> >  
> > -- 
> > 2.23.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename
  2020-01-16 11:56   ` Misono Tomohiro
@ 2020-01-16 16:45     ` Dr. David Alan Gilbert
  2020-01-17 10:19       ` Miklos Szeredi
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-16 16:45 UTC (permalink / raw)
  To: Misono Tomohiro, mszeredi; +Cc: qemu-devel, stefanha, vgoyal

* Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > From: Miklos Szeredi <mszeredi@redhat.com>
> > 
> > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> 
> I'm not familiar with qemu convention but shouldn't we put
> at least one line of description like linux kernel?

Miklos: would you like to suggest a better commit message?

> For code itself:
>  Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

Thanks!

> 
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 50 +++++++++++++++++++++++++++++++-
> >  1 file changed, 49 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 0f33c3c5e9..1b84d4f313 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -1077,17 +1077,42 @@ out_err:
> >      fuse_reply_err(req, saverr);
> >  }
> >  
> > +static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
> > +                                    const char *name)
> > +{
> > +    int res;
> > +    struct stat attr;
> > +
> > +    res = fstatat(lo_fd(req, parent), name, &attr,
> > +                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
> > +    if (res == -1) {
> > +        return NULL;
> > +    }
> > +
> > +    return lo_find(lo_data(req), &attr);
> > +}
> > +
> >  static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
> >  {
> >      int res;
> > +    struct lo_inode *inode;
> > +    struct lo_data *lo = lo_data(req);
> > +
> >      if (!is_safe_path_component(name)) {
> >          fuse_reply_err(req, EINVAL);
> >          return;
> >      }
> >  
> > +    inode = lookup_name(req, parent, name);
> > +    if (!inode) {
> > +        fuse_reply_err(req, EIO);
> > +        return;
> > +    }
> > +
> >      res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
> >  
> >      fuse_reply_err(req, res == -1 ? errno : 0);
> > +    unref_inode_lolocked(lo, inode, 1);
> >  }
> >  
> >  static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> > @@ -1095,12 +1120,23 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> >                        unsigned int flags)
> >  {
> >      int res;
> > +    struct lo_inode *oldinode;
> > +    struct lo_inode *newinode;
> > +    struct lo_data *lo = lo_data(req);
> >  
> >      if (!is_safe_path_component(name) || !is_safe_path_component(newname)) {
> >          fuse_reply_err(req, EINVAL);
> >          return;
> >      }
> >  
> > +    oldinode = lookup_name(req, parent, name);
> > +    newinode = lookup_name(req, newparent, newname);
> > +
> > +    if (!oldinode) {
> > +        fuse_reply_err(req, EIO);
> > +        goto out;
> > +    }
> > +
> >      if (flags) {
> >  #ifndef SYS_renameat2
> >          fuse_reply_err(req, EINVAL);
> > @@ -1113,26 +1149,38 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> >              fuse_reply_err(req, res == -1 ? errno : 0);
> >          }
> >  #endif
> > -        return;
> > +        goto out;
> >      }
> >  
> >      res = renameat(lo_fd(req, parent), name, lo_fd(req, newparent), newname);
> >  
> >      fuse_reply_err(req, res == -1 ? errno : 0);
> > +out:
> > +    unref_inode_lolocked(lo, oldinode, 1);
> > +    unref_inode_lolocked(lo, newinode, 1);
> >  }
> >  
> >  static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
> >  {
> >      int res;
> > +    struct lo_inode *inode;
> > +    struct lo_data *lo = lo_data(req);
> >  
> >      if (!is_safe_path_component(name)) {
> >          fuse_reply_err(req, EINVAL);
> >          return;
> >      }
> >  
> > +    inode = lookup_name(req, parent, name);
> > +    if (!inode) {
> > +        fuse_reply_err(req, EIO);
> > +        return;
> > +    }
> > +
> >      res = unlinkat(lo_fd(req, parent), name, 0);
> >  
> >      fuse_reply_err(req, res == -1 ? errno : 0);
> > +    unref_inode_lolocked(lo, inode, 1);
> >  }
> >  
> >  static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
> > -- 
> > 2.23.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free
  2020-01-16 12:25   ` Misono Tomohiro
@ 2020-01-16 17:21     ` Stefan Hajnoczi
  2020-01-16 17:42       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Stefan Hajnoczi @ 2020-01-16 17:21 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: dgilbert, vgoyal, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5294 bytes --]

On Thu, Jan 16, 2020 at 09:25:42PM +0900, Misono Tomohiro wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > If thread A is using an inode it must not be deleted by thread B when
> > processing a FUSE_FORGET request.
> > 
> > The FUSE protocol itself already has a counter called nlookup that is
> > used in FUSE_FORGET messages.  We cannot trust this counter since the
> > untrusted client can manipulate it via FUSE_FORGET messages.
> > 
> > Introduce a new refcount to keep inodes alive for the required lifespan.
> > lo_inode_put() must be called to release a reference.  FUSE's nlookup
> > counter holds exactly one reference so that the inode stays alive as
> > long as the client still wants to remember it.
> > 
> > Note that the lo_inode->is_symlink field is moved to avoid creating a
> > hole in the struct due to struct field alignment.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 168 ++++++++++++++++++++++++++-----
> >  1 file changed, 145 insertions(+), 23 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index b19c9ee328..8f4ab8351c 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -99,7 +99,13 @@ struct lo_key {
> >  
> >  struct lo_inode {
> >      int fd;
> > -    bool is_symlink;
> > +
> > +    /*
> > +     * Atomic reference count for this object.  The nlookup field holds a
> > +     * reference and release it when nlookup reaches 0.
> > +     */
> > +    gint refcount;
> > +
> >      struct lo_key key;
> >  
> >      /*
> > @@ -118,6 +124,8 @@ struct lo_inode {
> >      fuse_ino_t fuse_ino;
> >      pthread_mutex_t plock_mutex;
> >      GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
> > +
> > +    bool is_symlink;
> >  };
> >  
> >  struct lo_cred {
> > @@ -473,6 +481,23 @@ static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
> >      return elem - lo_data(req)->ino_map.elems;
> >  }
> >  
> > +static void lo_inode_put(struct lo_data *lo, struct lo_inode **inodep)
> > +{
> > +    struct lo_inode *inode = *inodep;
> > +
> > +    if (!inode) {
> > +        return;
> > +    }
> > +
> > +    *inodep = NULL;
> > +
> > +    if (g_atomic_int_dec_and_test(&inode->refcount)) {
> > +        close(inode->fd);
> > +        free(inode);
> > +    }
> > +}
> > +
> > +/* Caller must release refcount using lo_inode_put() */
> >  static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
> >  {
> >      struct lo_data *lo = lo_data(req);
> > @@ -480,6 +505,9 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
> >  
> >      pthread_mutex_lock(&lo->mutex);
> >      elem = lo_map_get(&lo->ino_map, ino);
> > +    if (elem) {
> > +        g_atomic_int_inc(&elem->inode->refcount);
> > +    }
> >      pthread_mutex_unlock(&lo->mutex);
> >  
> >      if (!elem) {
> > @@ -489,10 +517,23 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
> >      return elem->inode;
> >  }
> >  
> > +/*
> > + * TODO Remove this helper and force callers to hold an inode refcount until
> > + * they are done with the fd.  This will be done in a later patch to make
> > + * review easier.
> > + */
> >  static int lo_fd(fuse_req_t req, fuse_ino_t ino)
> >  {
> >      struct lo_inode *inode = lo_inode(req, ino);
> > -    return inode ? inode->fd : -1;
> > +    int fd;
> > +
> > +    if (!inode) {
> > +        return -1;
> > +    }
> > +
> > +    fd = inode->fd;
> > +    lo_inode_put(lo_data(req), &inode);
> > +    return fd;
> >  }
> >  
> >  static void lo_init(void *userdata, struct fuse_conn_info *conn)
> > @@ -547,6 +588,10 @@ static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> >      fuse_reply_attr(req, &buf, lo->timeout);
> >  }
> >  
> > +/*
> > + * Increments parent->nlookup and caller must release refcount using
> > + * lo_inode_put(&parent).
> > + */
> >  static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
> >                                char path[PATH_MAX], struct lo_inode **parent)
> >  {
> > @@ -584,6 +629,7 @@ retry:
> >          p = &lo->root;
> >          pthread_mutex_lock(&lo->mutex);
> >          p->nlookup++;
> > +        g_atomic_int_inc(&p->refcount);
> >          pthread_mutex_unlock(&lo->mutex);
> >      } else {
> >          *last = '\0';
> 
> We need lo_ionde_put() in error path, right?:
> https://gitlab.com/virtio-fs/qemu/blob/virtio-fs-as-posted-2019-12-12/tools/virtiofsd/passthrough_ll.c#L680

Yes, thanks for spotting this bug!  The lo_parent_and_name() code should
look like this:

  fail_unref:
      unref_inode_lolocked(lo, p, 1);
      lo_inode_put(lo, &p);
  ...

> nit: if yes, unref_inode_lolocked() is always paired with lo_inode_put().
> So how about combine them in one function? As p->nloockup and p->refcount
> are both incremented in one place (lo_find/lo_parent_and_name) in these case,
> it seems natural for me to decrement them in one function as well.

Nice idea.  I would also drop the nlookup argument - this function will
only be used with nlookup=1.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free
  2020-01-16 17:21     ` Stefan Hajnoczi
@ 2020-01-16 17:42       ` Dr. David Alan Gilbert
  2020-01-17  0:47         ` misono.tomohiro
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-16 17:42 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Misono Tomohiro, vgoyal, qemu-devel

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Thu, Jan 16, 2020 at 09:25:42PM +0900, Misono Tomohiro wrote:
> > > From: Stefan Hajnoczi <stefanha@redhat.com>
> > > 
> > > If thread A is using an inode it must not be deleted by thread B when
> > > processing a FUSE_FORGET request.
> > > 
> > > The FUSE protocol itself already has a counter called nlookup that is
> > > used in FUSE_FORGET messages.  We cannot trust this counter since the
> > > untrusted client can manipulate it via FUSE_FORGET messages.
> > > 
> > > Introduce a new refcount to keep inodes alive for the required lifespan.
> > > lo_inode_put() must be called to release a reference.  FUSE's nlookup
> > > counter holds exactly one reference so that the inode stays alive as
> > > long as the client still wants to remember it.
> > > 
> > > Note that the lo_inode->is_symlink field is moved to avoid creating a
> > > hole in the struct due to struct field alignment.
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 168 ++++++++++++++++++++++++++-----
> > >  1 file changed, 145 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index b19c9ee328..8f4ab8351c 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -99,7 +99,13 @@ struct lo_key {
> > >  
> > >  struct lo_inode {
> > >      int fd;
> > > -    bool is_symlink;
> > > +
> > > +    /*
> > > +     * Atomic reference count for this object.  The nlookup field holds a
> > > +     * reference and release it when nlookup reaches 0.
> > > +     */
> > > +    gint refcount;
> > > +
> > >      struct lo_key key;
> > >  
> > >      /*
> > > @@ -118,6 +124,8 @@ struct lo_inode {
> > >      fuse_ino_t fuse_ino;
> > >      pthread_mutex_t plock_mutex;
> > >      GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
> > > +
> > > +    bool is_symlink;
> > >  };
> > >  
> > >  struct lo_cred {
> > > @@ -473,6 +481,23 @@ static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
> > >      return elem - lo_data(req)->ino_map.elems;
> > >  }
> > >  
> > > +static void lo_inode_put(struct lo_data *lo, struct lo_inode **inodep)
> > > +{
> > > +    struct lo_inode *inode = *inodep;
> > > +
> > > +    if (!inode) {
> > > +        return;
> > > +    }
> > > +
> > > +    *inodep = NULL;
> > > +
> > > +    if (g_atomic_int_dec_and_test(&inode->refcount)) {
> > > +        close(inode->fd);
> > > +        free(inode);
> > > +    }
> > > +}
> > > +
> > > +/* Caller must release refcount using lo_inode_put() */
> > >  static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
> > >  {
> > >      struct lo_data *lo = lo_data(req);
> > > @@ -480,6 +505,9 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
> > >  
> > >      pthread_mutex_lock(&lo->mutex);
> > >      elem = lo_map_get(&lo->ino_map, ino);
> > > +    if (elem) {
> > > +        g_atomic_int_inc(&elem->inode->refcount);
> > > +    }
> > >      pthread_mutex_unlock(&lo->mutex);
> > >  
> > >      if (!elem) {
> > > @@ -489,10 +517,23 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
> > >      return elem->inode;
> > >  }
> > >  
> > > +/*
> > > + * TODO Remove this helper and force callers to hold an inode refcount until
> > > + * they are done with the fd.  This will be done in a later patch to make
> > > + * review easier.
> > > + */
> > >  static int lo_fd(fuse_req_t req, fuse_ino_t ino)
> > >  {
> > >      struct lo_inode *inode = lo_inode(req, ino);
> > > -    return inode ? inode->fd : -1;
> > > +    int fd;
> > > +
> > > +    if (!inode) {
> > > +        return -1;
> > > +    }
> > > +
> > > +    fd = inode->fd;
> > > +    lo_inode_put(lo_data(req), &inode);
> > > +    return fd;
> > >  }
> > >  
> > >  static void lo_init(void *userdata, struct fuse_conn_info *conn)
> > > @@ -547,6 +588,10 @@ static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> > >      fuse_reply_attr(req, &buf, lo->timeout);
> > >  }
> > >  
> > > +/*
> > > + * Increments parent->nlookup and caller must release refcount using
> > > + * lo_inode_put(&parent).
> > > + */
> > >  static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
> > >                                char path[PATH_MAX], struct lo_inode **parent)
> > >  {
> > > @@ -584,6 +629,7 @@ retry:
> > >          p = &lo->root;
> > >          pthread_mutex_lock(&lo->mutex);
> > >          p->nlookup++;
> > > +        g_atomic_int_inc(&p->refcount);
> > >          pthread_mutex_unlock(&lo->mutex);
> > >      } else {
> > >          *last = '\0';
> > 
> > We need lo_ionde_put() in error path, right?:
> > https://gitlab.com/virtio-fs/qemu/blob/virtio-fs-as-posted-2019-12-12/tools/virtiofsd/passthrough_ll.c#L680
> 
> Yes, thanks for spotting this bug!  The lo_parent_and_name() code should
> look like this:
> 
>   fail_unref:
>       unref_inode_lolocked(lo, p, 1);
>       lo_inode_put(lo, &p);
>   ...

I've merged that one in.

> > nit: if yes, unref_inode_lolocked() is always paired with lo_inode_put().
> > So how about combine them in one function? As p->nloockup and p->refcount
> > are both incremented in one place (lo_find/lo_parent_and_name) in these case,
> > it seems natural for me to decrement them in one function as well.
> 
> Nice idea.  I would also drop the nlookup argument - this function will
> only be used with nlookup=1.

I'll leave that to you if you want to send a patch on top.

Dave

> Stefan


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* RE: [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free
  2020-01-16 17:42       ` Dr. David Alan Gilbert
@ 2020-01-17  0:47         ` misono.tomohiro
  0 siblings, 0 replies; 307+ messages in thread
From: misono.tomohiro @ 2020-01-17  0:47 UTC (permalink / raw)
  To: 'Dr. David Alan Gilbert', Stefan Hajnoczi; +Cc: qemu-devel, vgoyal

> > On Thu, Jan 16, 2020 at 09:25:42PM +0900, Misono Tomohiro wrote:
> > > > From: Stefan Hajnoczi <stefanha@redhat.com>
> > > >
> > > > If thread A is using an inode it must not be deleted by thread B
> > > > when processing a FUSE_FORGET request.
> > > >
> > > > The FUSE protocol itself already has a counter called nlookup that
> > > > is used in FUSE_FORGET messages.  We cannot trust this counter
> > > > since the untrusted client can manipulate it via FUSE_FORGET messages.
> > > >
> > > > Introduce a new refcount to keep inodes alive for the required lifespan.
> > > > lo_inode_put() must be called to release a reference.  FUSE's
> > > > nlookup counter holds exactly one reference so that the inode
> > > > stays alive as long as the client still wants to remember it.
> > > >
> > > > Note that the lo_inode->is_symlink field is moved to avoid
> > > > creating a hole in the struct due to struct field alignment.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/passthrough_ll.c | 168
> > > > ++++++++++++++++++++++++++-----
> > > >  1 file changed, 145 insertions(+), 23 deletions(-)
> > > >
> > > > diff --git a/tools/virtiofsd/passthrough_ll.c
> > > > b/tools/virtiofsd/passthrough_ll.c
> > > > index b19c9ee328..8f4ab8351c 100644
> > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > @@ -99,7 +99,13 @@ struct lo_key {
> > > >
> > > >  struct lo_inode {
> > > >      int fd;
> > > > -    bool is_symlink;
> > > > +
> > > > +    /*
> > > > +     * Atomic reference count for this object.  The nlookup field holds a
> > > > +     * reference and release it when nlookup reaches 0.
> > > > +     */
> > > > +    gint refcount;
> > > > +
> > > >      struct lo_key key;
> > > >
> > > >      /*
> > > > @@ -118,6 +124,8 @@ struct lo_inode {
> > > >      fuse_ino_t fuse_ino;
> > > >      pthread_mutex_t plock_mutex;
> > > >      GHashTable *posix_locks; /* protected by
> > > > lo_inode->plock_mutex */
> > > > +
> > > > +    bool is_symlink;
> > > >  };
> > > >
> > > >  struct lo_cred {
> > > > @@ -473,6 +481,23 @@ static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
> > > >      return elem - lo_data(req)->ino_map.elems;  }
> > > >
> > > > +static void lo_inode_put(struct lo_data *lo, struct lo_inode
> > > > +**inodep) {
> > > > +    struct lo_inode *inode = *inodep;
> > > > +
> > > > +    if (!inode) {
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    *inodep = NULL;
> > > > +
> > > > +    if (g_atomic_int_dec_and_test(&inode->refcount)) {
> > > > +        close(inode->fd);
> > > > +        free(inode);
> > > > +    }
> > > > +}
> > > > +
> > > > +/* Caller must release refcount using lo_inode_put() */
> > > >  static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
> > > > {
> > > >      struct lo_data *lo = lo_data(req); @@ -480,6 +505,9 @@ static
> > > > struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
> > > >
> > > >      pthread_mutex_lock(&lo->mutex);
> > > >      elem = lo_map_get(&lo->ino_map, ino);
> > > > +    if (elem) {
> > > > +        g_atomic_int_inc(&elem->inode->refcount);
> > > > +    }
> > > >      pthread_mutex_unlock(&lo->mutex);
> > > >
> > > >      if (!elem) {
> > > > @@ -489,10 +517,23 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
> > > >      return elem->inode;
> > > >  }
> > > >
> > > > +/*
> > > > + * TODO Remove this helper and force callers to hold an inode
> > > > +refcount until
> > > > + * they are done with the fd.  This will be done in a later patch
> > > > +to make
> > > > + * review easier.
> > > > + */
> > > >  static int lo_fd(fuse_req_t req, fuse_ino_t ino)  {
> > > >      struct lo_inode *inode = lo_inode(req, ino);
> > > > -    return inode ? inode->fd : -1;
> > > > +    int fd;
> > > > +
> > > > +    if (!inode) {
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    fd = inode->fd;
> > > > +    lo_inode_put(lo_data(req), &inode);
> > > > +    return fd;
> > > >  }
> > > >
> > > >  static void lo_init(void *userdata, struct fuse_conn_info *conn)
> > > > @@ -547,6 +588,10 @@ static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> > > >      fuse_reply_attr(req, &buf, lo->timeout);  }
> > > >
> > > > +/*
> > > > + * Increments parent->nlookup and caller must release refcount
> > > > +using
> > > > + * lo_inode_put(&parent).
> > > > + */
> > > >  static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
> > > >                                char path[PATH_MAX], struct
> > > > lo_inode **parent)  { @@ -584,6 +629,7 @@ retry:
> > > >          p = &lo->root;
> > > >          pthread_mutex_lock(&lo->mutex);
> > > >          p->nlookup++;
> > > > +        g_atomic_int_inc(&p->refcount);
> > > >          pthread_mutex_unlock(&lo->mutex);
> > > >      } else {
> > > >          *last = '\0';
> > >
> > > We need lo_ionde_put() in error path, right?:
> > > https://gitlab.com/virtio-fs/qemu/blob/virtio-fs-as-posted-2019-12-1
> > > 2/tools/virtiofsd/passthrough_ll.c#L680
> >
> > Yes, thanks for spotting this bug!  The lo_parent_and_name() code
> > should look like this:
> >
> >   fail_unref:
> >       unref_inode_lolocked(lo, p, 1);
> >       lo_inode_put(lo, &p);
> >   ...
> 
> I've merged that one in.

Thanks, so with that:
 Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

> 
> > > nit: if yes, unref_inode_lolocked() is always paired with lo_inode_put().
> > > So how about combine them in one function? As p->nloockup and
> > > p->refcount are both incremented in one place
> > > (lo_find/lo_parent_and_name) in these case, it seems natural for me to decrement them in one function as well.
> >
> > Nice idea.  I would also drop the nlookup argument - this function
> > will only be used with nlookup=1.
> 
> I'll leave that to you if you want to send a patch on top.
> 
> Dave
> 
> > Stefan
> 
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 094/104] virtiofsd: do not always set FUSE_FLOCK_LOCKS
  2019-12-12 16:38 ` [PATCH 094/104] virtiofsd: do not always set FUSE_FLOCK_LOCKS Dr. David Alan Gilbert (git)
@ 2020-01-17  8:50   ` Misono Tomohiro
  2020-01-20 10:31   ` Sergio Lopez
  1 sibling, 0 replies; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-17  8:50 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: Peng Tao <tao.peng@linux.alibaba.com>
> 
> Right now we always enable it regardless of given commandlines.
> Fix it by setting the flag relying on the lo->flock bit.
> 
> Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>

I think we should remove LL_SET_DEFAULT for flock/posix_lock in do_init()
but that can be done in another patch.

Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

> ---
>  tools/virtiofsd/passthrough_ll.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 8f4ab8351c..cf6b548eee 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -548,9 +548,14 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>          fuse_log(FUSE_LOG_DEBUG, "lo_init: activating writeback\n");
>          conn->want |= FUSE_CAP_WRITEBACK_CACHE;
>      }
> -    if (lo->flock && conn->capable & FUSE_CAP_FLOCK_LOCKS) {
> -        fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
> -        conn->want |= FUSE_CAP_FLOCK_LOCKS;
> +    if (conn->capable & FUSE_CAP_FLOCK_LOCKS) {
> +        if (lo->flock) {
> +            fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
> +            conn->want |= FUSE_CAP_FLOCK_LOCKS;
> +        } else {
> +            fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling flock locks\n");
> +            conn->want &= ~FUSE_CAP_FLOCK_LOCKS;
> +        }
>      }
>  
>      if (conn->capable & FUSE_CAP_POSIX_LOCKS) {
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 085/104] virtiofsd: Support remote posix locks
  2020-01-16 13:26     ` Vivek Goyal
@ 2020-01-17  9:27       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17  9:27 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Masayoshi Mizuma, qemu-devel, stefanha

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Wed, Jan 15, 2020 at 06:38:31PM -0500, Masayoshi Mizuma wrote:
> 
> [..]
> > > +/* Should be called with inode->plock_mutex held */
> > > +static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
> > > +                                                      struct lo_inode *inode,
> > > +                                                      uint64_t lock_owner,
> > > +                                                      pid_t pid, int *err)
> > > +{
> > > +    struct lo_inode_plock *plock;
> > > +    char procname[64];
> > > +    int fd;
> > > +
> > > +    plock =
> > > +        g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
> > > +
> > > +    if (plock) {
> > > +        return plock;
> > > +    }
> > > +
> > > +    plock = malloc(sizeof(struct lo_inode_plock));
> > > +    if (!plock) {
> > > +        *err = ENOMEM;
> > > +        return NULL;
> > > +    }
> > > +
> > > +    /* Open another instance of file which can be used for ofd locks. */
> > > +    sprintf(procname, "%i", inode->fd);
> > > +
> > > +    /* TODO: What if file is not writable? */
> > > +    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> > > +    if (fd == -1) {
> > 
> > > +        *err = -errno;
> > 
> > I think the errno is positive value, so the minus isn't needed?
> > 
> >            *err = errno;
> 
> That's sounds right. Thanks. 
> 
> David, will you be able to do this tweak in your tree or you want me to
> send a separate fix patch.

Fixed in my tree.

> Thanks
> Vivek
> 
> > 
> > Otherwise looks good to me.
> > 
> > Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

Thanks!

Dave

> > Thanks,
> > Masa
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 046/104] virtiofsd: use /proc/self/fd/ O_PATH file descriptor
  2020-01-15 18:09   ` Philippe Mathieu-Daudé
@ 2020-01-17  9:42     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17  9:42 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: qemu-devel, stefanha, vgoyal

* Philippe Mathieu-Daudé (philmd@redhat.com) wrote:
> On 12/12/19 5:38 PM, Dr. David Alan Gilbert (git) wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > Sandboxing will remove /proc from the mount namespace so we can no
> > longer build string paths into "/proc/self/fd/...".
> > 
> > Keep an O_PATH file descriptor so we can still re-open fds via
> > /proc/self/fd.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   tools/virtiofsd/passthrough_ll.c | 129 ++++++++++++++++++++++++-------
> >   1 file changed, 102 insertions(+), 27 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 123f095990..006908f25a 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -110,6 +110,9 @@ struct lo_data {
> >       struct lo_map ino_map; /* protected by lo->mutex */
> >       struct lo_map dirp_map; /* protected by lo->mutex */
> >       struct lo_map fd_map; /* protected by lo->mutex */
> > +
> > +    /* An O_PATH file descriptor to /proc/self/fd/ */
> > +    int proc_self_fd;
> >   };
> >   static const struct fuse_opt lo_opts[] = {
> > @@ -379,9 +382,9 @@ static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
> >       int res;
> >   retry:
> > -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> > +    sprintf(procname, "%i", inode->fd);
> > -    res = readlink(procname, path, PATH_MAX);
> > +    res = readlinkat(lo->proc_self_fd, procname, path, PATH_MAX);
> >       if (res < 0) {
> >           fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink failed: %m\n");
> >           goto fail_noretry;
> > @@ -477,9 +480,9 @@ static int utimensat_empty(struct lo_data *lo, struct lo_inode *inode,
> >           }
> >           return res;
> >       }
> > -    sprintf(path, "/proc/self/fd/%i", inode->fd);
> > +    sprintf(path, "%i", inode->fd);
> > -    return utimensat(AT_FDCWD, path, tv, 0);
> > +    return utimensat(lo->proc_self_fd, path, tv, 0);
> >   fallback:
> >       res = lo_parent_and_name(lo, inode, path, &parent);
> > @@ -535,8 +538,8 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
> >           if (fi) {
> >               res = fchmod(fd, attr->st_mode);
> >           } else {
> > -            sprintf(procname, "/proc/self/fd/%i", ifd);
> > -            res = chmod(procname, attr->st_mode);
> > +            sprintf(procname, "%i", ifd);
> > +            res = fchmodat(lo->proc_self_fd, procname, attr->st_mode, 0);
> >           }
> >           if (res == -1) {
> >               goto out_err;
> > @@ -552,11 +555,23 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
> >           }
> >       }
> >       if (valid & FUSE_SET_ATTR_SIZE) {
> > +        int truncfd;
> > +
> >           if (fi) {
> > -            res = ftruncate(fd, attr->st_size);
> > +            truncfd = fd;
> >           } else {
> > -            sprintf(procname, "/proc/self/fd/%i", ifd);
> > -            res = truncate(procname, attr->st_size);
> > +            sprintf(procname, "%i", ifd);
> > +            truncfd = openat(lo->proc_self_fd, procname, O_RDWR);
> > +            if (truncfd < 0) {
> > +                goto out_err;
> > +            }
> > +        }
> > +
> > +        res = ftruncate(truncfd, attr->st_size);
> > +        if (!fi) {
> > +            saverr = errno;
> > +            close(truncfd);
> > +            errno = saverr;
> >           }
> >           if (res == -1) {
> >               goto out_err;
> > @@ -857,9 +872,9 @@ static int linkat_empty_nofollow(struct lo_data *lo, struct lo_inode *inode,
> >           return res;
> >       }
> > -    sprintf(path, "/proc/self/fd/%i", inode->fd);
> > +    sprintf(path, "%i", inode->fd);
> > -    return linkat(AT_FDCWD, path, dfd, name, AT_SYMLINK_FOLLOW);
> > +    return linkat(lo->proc_self_fd, path, dfd, name, AT_SYMLINK_FOLLOW);
> >   fallback:
> >       res = lo_parent_and_name(lo, inode, path, &parent);
> > @@ -1387,8 +1402,8 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
> >           fi->flags &= ~O_APPEND;
> >       }
> > -    sprintf(buf, "/proc/self/fd/%i", lo_fd(req, ino));
> > -    fd = open(buf, fi->flags & ~O_NOFOLLOW);
> > +    sprintf(buf, "%i", lo_fd(req, ino));
> > +    fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
> >       if (fd == -1) {
> >           return (void)fuse_reply_err(req, errno);
> >       }
> > @@ -1440,8 +1455,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
> >   static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
> >                        struct fuse_file_info *fi)
> >   {
> > +    struct lo_data *lo = lo_data(req);
> 
> We can initialize this one ...
> 
> >       int res;
> > -    (void)ino;
> >       int fd;
> >       char *buf;
> > @@ -1449,12 +1464,12 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
> >                (void *)fi);
> >       if (!fi) {
> 
> ... here:
> 
>            lo = lo_data(req);
> 
> Similarly in other functions, but I see this is the style used by this file.

I moved that one, I'll keep my out for others; it's nice to keep the
scope small.

> > -        res = asprintf(&buf, "/proc/self/fd/%i", lo_fd(req, ino));
> > +        res = asprintf(&buf, "%i", lo_fd(req, ino));
> >           if (res == -1) {
> >               return (void)fuse_reply_err(req, errno);
> >           }
> > -        fd = open(buf, O_RDWR);
> > +        fd = openat(lo->proc_self_fd, buf, O_RDWR);
> >           free(buf);
> >           if (fd == -1) {
> >               return (void)fuse_reply_err(req, errno);
> > @@ -1570,11 +1585,13 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
> >   static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
> >                           size_t size)
> >   {
> > +    struct lo_data *lo = lo_data(req);
> >       char *value = NULL;
> >       char procname[64];
> >       struct lo_inode *inode;
> >       ssize_t ret;
> >       int saverr;
> > +    int fd = -1;
> >       inode = lo_inode(req, ino);
> >       if (!inode) {
> > @@ -1599,7 +1616,11 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
> >           goto out;
> >       }
> > -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> > +    sprintf(procname, "%i", inode->fd);
> > +    fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> > +    if (fd < 0) {
> > +        goto out_err;
> > +    }
> >       if (size) {
> >           value = malloc(size);
> > @@ -1607,7 +1628,7 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
> >               goto out_err;
> >           }
> > -        ret = getxattr(procname, name, value, size);
> > +        ret = fgetxattr(fd, name, value, size);
> >           if (ret == -1) {
> >               goto out_err;
> >           }
> > @@ -1618,7 +1639,7 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
> >           fuse_reply_buf(req, value, ret);
> >       } else {
> > -        ret = getxattr(procname, name, NULL, 0);
> > +        ret = fgetxattr(fd, name, NULL, 0);
> >           if (ret == -1) {
> >               goto out_err;
> >           }
> > @@ -1627,6 +1648,10 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
> >       }
> >   out_free:
> >       free(value);
> > +
> > +    if (fd >= 0) {
> > +        close(fd);
> > +    }
> >       return;
> >   out_err:
> > @@ -1638,11 +1663,13 @@ out:
> >   static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
> >   {
> > +    struct lo_data *lo = lo_data(req);
> >       char *value = NULL;
> >       char procname[64];
> >       struct lo_inode *inode;
> >       ssize_t ret;
> >       int saverr;
> > +    int fd = -1;
> >       inode = lo_inode(req, ino);
> >       if (!inode) {
> > @@ -1666,7 +1693,11 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
> >           goto out;
> >       }
> > -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> > +    sprintf(procname, "%i", inode->fd);
> > +    fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> > +    if (fd < 0) {
> > +        goto out_err;
> > +    }
> >       if (size) {
> >           value = malloc(size);
> > @@ -1674,7 +1705,7 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
> >               goto out_err;
> >           }
> > -        ret = listxattr(procname, value, size);
> > +        ret = flistxattr(fd, value, size);
> >           if (ret == -1) {
> >               goto out_err;
> >           }
> > @@ -1685,7 +1716,7 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
> >           fuse_reply_buf(req, value, ret);
> >       } else {
> > -        ret = listxattr(procname, NULL, 0);
> > +        ret = flistxattr(fd, NULL, 0);
> >           if (ret == -1) {
> >               goto out_err;
> >           }
> > @@ -1694,6 +1725,10 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
> >       }
> >   out_free:
> >       free(value);
> > +
> > +    if (fd >= 0) {
> > +        close(fd);
> > +    }
> >       return;
> >   out_err:
> > @@ -1707,9 +1742,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
> >                           const char *value, size_t size, int flags)
> >   {
> >       char procname[64];
> > +    struct lo_data *lo = lo_data(req);
> >       struct lo_inode *inode;
> >       ssize_t ret;
> >       int saverr;
> > +    int fd = -1;
> >       inode = lo_inode(req, ino);
> >       if (!inode) {
> > @@ -1734,21 +1771,31 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
> >           goto out;
> >       }
> > -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> > +    sprintf(procname, "%i", inode->fd);
> > +    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> > +    if (fd < 0) {
> > +        saverr = errno;
> > +        goto out;
> > +    }
> > -    ret = setxattr(procname, name, value, size, flags);
> > +    ret = fsetxattr(fd, name, value, size, flags);
> >       saverr = ret == -1 ? errno : 0;
> >   out:
> > +    if (fd >= 0) {
> > +        close(fd);
> > +    }
> >       fuse_reply_err(req, saverr);
> >   }
> >   static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
> >   {
> >       char procname[64];
> > +    struct lo_data *lo = lo_data(req);
> >       struct lo_inode *inode;
> >       ssize_t ret;
> >       int saverr;
> > +    int fd = -1;
> >       inode = lo_inode(req, ino);
> >       if (!inode) {
> > @@ -1772,12 +1819,20 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
> >           goto out;
> >       }
> > -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> > +    sprintf(procname, "%i", inode->fd);
> > +    fd = openat(lo->proc_self_fd, procname, O_RDWR);
> > +    if (fd < 0) {
> > +        saverr = errno;
> > +        goto out;
> > +    }
> > -    ret = removexattr(procname, name);
> > +    ret = fremovexattr(fd, name);
> >       saverr = ret == -1 ? errno : 0;
> >   out:
> > +    if (fd >= 0) {
> > +        close(fd);
> > +    }
> >       fuse_reply_err(req, saverr);
> >   }
> > @@ -1870,12 +1925,25 @@ static void print_capabilities(void)
> >       printf("}\n");
> >   }
> > +static void setup_proc_self_fd(struct lo_data *lo)
> > +{
> > +    lo->proc_self_fd = open("/proc/self/fd", O_PATH);
> > +    if (lo->proc_self_fd == -1) {
> > +        fuse_log(FUSE_LOG_ERR, "open(/proc/self/fd, O_PATH): %m\n");
> > +        exit(1);
> > +    }
> > +}
> > +
> >   int main(int argc, char *argv[])
> >   {
> >       struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
> >       struct fuse_session *se;
> >       struct fuse_cmdline_opts opts;
> > -    struct lo_data lo = { .debug = 0, .writeback = 0 };
> > +    struct lo_data lo = {
> > +        .debug = 0,
> > +        .writeback = 0,
> > +        .proc_self_fd = -1,
> > +    };
> >       struct lo_map_elem *root_elem;
> >       int ret = -1;
> > @@ -1986,6 +2054,9 @@ int main(int argc, char *argv[])
> >       fuse_daemonize(opts.foreground);
> > +    /* Must be after daemonize to get the right /proc/self/fd */
> > +    setup_proc_self_fd(&lo);
> > +
> >       /* Block until ctrl+c or fusermount -u */
> >       ret = virtio_loop(se);
> > @@ -2001,6 +2072,10 @@ err_out1:
> >       lo_map_destroy(&lo.dirp_map);
> >       lo_map_destroy(&lo.ino_map);
> > +    if (lo.proc_self_fd >= 0) {
> > +        close(lo.proc_self_fd);
> > +    }
> > +
> >       if (lo.root.fd >= 0) {
> >           close(lo.root.fd);
> >       }
> > 
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Thanks!

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename
  2020-01-16 16:45     ` Dr. David Alan Gilbert
@ 2020-01-17 10:19       ` Miklos Szeredi
  2020-01-17 11:37         ` Dr. David Alan Gilbert
  2020-01-17 18:43         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 307+ messages in thread
From: Miklos Szeredi @ 2020-01-17 10:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Vivek Goyal, Misono Tomohiro, Stefan Hajnoczi, qemu-devel

On Thu, Jan 16, 2020 at 5:45 PM Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > > From: Miklos Szeredi <mszeredi@redhat.com>
> > >
> > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> >
> > I'm not familiar with qemu convention but shouldn't we put
> > at least one line of description like linux kernel?
>
> Miklos: would you like to suggest a better commit message?

Hmm, the patch doesn't really make sense, since the looked up inode is not used.

Not sure what happened here, this seems to be for supporting shared
versions, and these changes are part of commit 06f78a397f00
("virtiofsd: add initial support for shared versions") in our gitlab
qemu tree.  Was this intentionally split out?

Thanks,
Miklos



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 000/104] virtiofs daemon [all]
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (104 preceding siblings ...)
  2019-12-12 18:21 ` [PATCH 000/104] virtiofs daemon [all] no-reply
@ 2020-01-17 11:32 ` Dr. David Alan Gilbert
  2020-01-17 13:32 ` [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename() Philippe Mathieu-Daudé
  106 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17 11:32 UTC (permalink / raw)
  To: qemu-devel, stefanha, vgoyal, berrange, m.mizuma,
	misono.tomohiro, philmd, slp


Hi,
  Here's a summary of the review status of this set,

Total: 109 Reviewed: 81 New: 5  changed/rr: 8

The first column is:
  'blank' - no change
  'D' - some diff from the original patch
        - from a simple diff script
  'N' - a new patch

The second column is:
  'R' - has a Reviewed-by
  'c' - Someone reviewed it and asked for changes
  ' ' - no one reviewed it

That leaves ~15 patches from this post that haven't
had reviews (and aren't new).

As soon as most of those are reviewed I'll rebase and repost.

Dave

  R : virtiofsd-Pull-in-upstream-headers
  R : virtiofsd-Pull-in-kernel-s-fuse.h
  R : virtiofsd-Add-auxiliary-.c-s
  R : virtiofsd-Add-fuse_lowlevel.c
D R : virtiofsd-Add-passthrough_ll
D R : virtiofsd-Trim-down-imported-files
D R : virtiofsd-Format-imported-files-to-qemu-style
  R : virtiofsd-remove-mountpoint-dummy-argument
  R : virtiofsd-remove-unused-notify-reply-support
  R : virtiofsd-Fix-fuse_daemonize-ignored-return-value
D c : virtiofsd-Fix-common-header-and-define-for-QEMU-b
  R : virtiofsd-Trim-out-compatibility-code
N   : vitriofsd-passthrough_ll-fix-fallocate-ifdefs
  R : virtiofsd-Make-fsync-work-even-if-only-inode-is-p
D c : virtiofsd-Add-options-for-virtio
  R : virtiofsd-add-o-source-PATH-to-help-output
  R : virtiofsd-Open-vhost-connection-instead-of-mounti
  R : virtiofsd-Start-wiring-up-vhost-user
  R : virtiofsd-Add-main-virtio-loop
  R : virtiofsd-get-set-features-callbacks
  R : virtiofsd-Start-queue-threads
  R : virtiofsd-Poll-kick_fd-for-queue
  R : virtiofsd-Start-reading-commands-from-queue
  R : virtiofsd-Send-replies-to-messages
  R : virtiofsd-Keep-track-of-replies
  R : virtiofsd-Add-Makefile-wiring-for-virtiofsd-contr
    : virtiofsd-Fast-path-for-virtio-read
  R : virtiofsd-add-fd-FDNUM-fd-passing-option
  R : virtiofsd-make-f-foreground-the-default
  R : virtiofsd-add-vhost-user.json-file
  R : virtiofsd-add-print-capabilities-option
  R : virtiofs-Add-maintainers-entry
D c : virtiofsd-passthrough_ll-create-new-files-in-call
    : virtiofsd-passthrough_ll-add-lo_map-for-ino-fh-in
    : virtiofsd-passthrough_ll-add-ino_map-to-hide-lo_i
    : virtiofsd-passthrough_ll-add-dirp_map-to-hide-lo_
D   : virtiofsd-passthrough_ll-add-fd_map-to-hide-file-
D   : virtiofsd-passthrough_ll-add-fallback-for-racy-op
  R : virtiofsd-validate-path-components
    : virtiofsd-Plumb-fuse_bufvec-through-to-do_write_b
    : virtiofsd-Pass-write-iov-s-all-the-way-through
    : virtiofsd-add-fuse_mbuf_iter-API
  R : virtiofsd-validate-input-buffer-sizes-in-do_write
  R : virtiofsd-check-input-buffer-size-in-fuse_lowleve
  R : virtiofsd-prevent-.-escape-in-lo_do_lookup
  R : virtiofsd-prevent-.-escape-in-lo_do_readdir
D R : virtiofsd-use-proc-self-fd-O_PATH-file-descriptor
  R : virtiofsd-sandbox-mount-namespace
  R : virtiofsd-move-to-an-empty-network-namespace
  R : virtiofsd-move-to-a-new-pid-namespace
D c : virtiofsd-add-seccomp-whitelist
  R : virtiofsd-Parse-flag-FUSE_WRITE_KILL_PRIV
  R : virtiofsd-cap-ng-helpers
  R : virtiofsd-Drop-CAP_FSETID-if-client-asked-for-it
  R : virtiofsd-set-maximum-RLIMIT_NOFILE-limit
  R : virtiofsd-fix-libfuse-information-leaks
N   : docs-Add-docs-tools
D R : virtiofsd-add-security-guide-document
  R : virtiofsd-add-syslog-command-line-option
D R : virtiofsd-print-log-only-when-priority-is-high-en
D c : virtiofsd-Add-ID-to-the-log-with-FUSE_LOG_DEBUG-l
D   : virtiofsd-Add-timestamp-to-the-log-with-FUSE_LOG_
  R : virtiofsd-Handle-reinit
  R : virtiofsd-Handle-hard-reboot
D R : virtiofsd-Kill-threads-when-queues-are-stopped
  R : vhost-user-Print-unexpected-slave-message-types
  R : contrib-libvhost-user-Protect-slave-fd-with-mutex
D c : virtiofsd-passthrough_ll-add-renameat2-support
  R : virtiofsd-passthrough_ll-disable-readdirplus-on-c
D   : virtiofsd-passthrough_ll-control-readdirplus
  R : virtiofsd-rename-unref_inode-to-unref_inode_loloc
  R : virtiofsd-fail-when-parent-inode-isn-t-known-in-l
D c : virtiofsd-extract-root-inode-init-into-setup_root
  R : virtiofsd-passthrough_ll-fix-refcounting-on-remov
D R : virtiofsd-passthrough_ll-clean-up-cache-related-o
  R : virtiofsd-passthrough_ll-use-hashtable
  R : virtiofsd-Clean-up-inodes-on-destroy
  R : virtiofsd-support-nanosecond-resolution-for-file-
  R : virtiofsd-fix-error-handling-in-main
  R : virtiofsd-cleanup-allocated-resource-in-se
D c : virtiofsd-fix-memory-leak-on-lo.source
D R : virtiofsd-add-helper-for-lo_data-cleanup
  R : virtiofsd-Prevent-multiply-running-with-same-vhos
  R : virtiofsd-enable-PARALLEL_DIROPS-during-INIT
  R : virtiofsd-fix-incorrect-error-handling-in-lo_do_l
D R : Virtiofsd-fix-memory-leak-on-fuse-queueinfo
D R : virtiofsd-Support-remote-posix-locks
  R : virtiofsd-use-fuse_lowlevel_is_virtio-in-fuse_ses
  R : virtiofsd-prevent-fv_queue_thread-vs-virtio_loop-
  R : virtiofsd-make-lo_release-atomic
    : virtiofsd-prevent-races-with-lo_dirp_put
    : virtiofsd-rename-inode-refcount-to-inode-nlookup
    : libvhost-user-Fix-some-memtable-remap-cases
D R : virtiofsd-add-man-page
D R : virtiofsd-introduce-inode-refcount-to-prevent-use
  R : virtiofsd-do-not-always-set-FUSE_FLOCK_LOCKS
D R : virtiofsd-convert-more-fprintf-and-perror-to-use-
  R : virtiofsd-Reset-O_DIRECT-flag-during-file-open
  R : virtiofsd-Fix-data-corruption-with-O_APPEND-wirte
D R : virtiofsd-add-definition-of-fuse_buf_writev
D R : virtiofsd-use-fuse_buf_writev-to-replace-fuse_buf
D   : virtiofsd-process-requests-in-a-thread-pool
  R : virtiofsd-prevent-FUSE_INIT-FUSE_DESTROY-races
    : virtiofsd-fix-lo_destroy-resource-leaks
  R : virtiofsd-add-thread-pool-size-NUM-option
    : virtiofsd-Convert-lo_destroy-to-take-the-lo-mutex
N R : virtiofsd-passthrough_ll-Pass-errno-to-fuse_reply
N R : virtiofsd-stop-all-queue-threads-on-exit-in-virti
N   : virtiofsd-add-some-options-to-the-help-message
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename
  2020-01-17 10:19       ` Miklos Szeredi
@ 2020-01-17 11:37         ` Dr. David Alan Gilbert
  2020-01-17 18:43         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17 11:37 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Vivek Goyal, Misono Tomohiro, Stefan Hajnoczi, qemu-devel

* Miklos Szeredi (mszeredi@redhat.com) wrote:
> On Thu, Jan 16, 2020 at 5:45 PM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > >
> > > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > >
> > > I'm not familiar with qemu convention but shouldn't we put
> > > at least one line of description like linux kernel?
> >
> > Miklos: would you like to suggest a better commit message?
> 
> Hmm, the patch doesn't really make sense, since the looked up inode is not used.
> 
> Not sure what happened here, this seems to be for supporting shared
> versions, and these changes are part of commit 06f78a397f00
> ("virtiofsd: add initial support for shared versions") in our gitlab
> qemu tree.  Was this intentionally split out?

I remember I did split the shared version support out when trying to
remove it into a separate patch; let me see if I can remove this
without breaking the merge around it.

Dave

> Thanks,
> Miklos
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename()
  2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
                   ` (105 preceding siblings ...)
  2020-01-17 11:32 ` Dr. David Alan Gilbert
@ 2020-01-17 13:32 ` Philippe Mathieu-Daudé
  2020-01-19  8:35   ` Xiao Yang
  2020-01-20 18:52   ` Dr. David Alan Gilbert
  106 siblings, 2 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-17 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Philippe Mathieu-Daudé,
	Dr . David Alan Gilbert, stefanha, Vivek Goyal

We can unref both old/new inodes with the same mutex lock.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
Based-on: <20191212163904.159893-1-dgilbert@redhat.com>
"virtiofs daemon"
https://www.mail-archive.com/qemu-devel@nongnu.org/msg664652.html

 tools/virtiofsd/passthrough_ll.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 57f58aef26..5c717cb5a1 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1461,8 +1461,10 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
     }
 
 out:
-    unref_inode_lolocked(lo, oldinode, 1);
-    unref_inode_lolocked(lo, newinode, 1);
+    pthread_mutex_lock(&lo->mutex);
+    unref_inode(lo, oldinode, 1);
+    unref_inode(lo, newinode, 1);
+    pthread_mutex_unlock(&lo->mutex);
     lo_inode_put(lo, &oldinode);
     lo_inode_put(lo, &newinode);
     lo_inode_put(lo, &parent_inode);
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 307+ messages in thread

* Re: [PATCH 104/104] virtiofsd: Convert lo_destroy to take the lo->mutex lock itself
  2019-12-12 16:39 ` [PATCH 104/104] virtiofsd: Convert lo_destroy to take the lo->mutex lock itself Dr. David Alan Gilbert (git)
@ 2020-01-17 13:33   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-17 13:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:39 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> lo_destroy was relying on some implicit knowledge of the locking;
> we can avoid this if we create an unref_inode that doesn't take
> the lock and then grab it for the whole of the lo_destroy.
> 
> Suggested-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   tools/virtiofsd/passthrough_ll.c | 31 +++++++++++++++++--------------
>   1 file changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 38f4948e61..c37f57157e 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1328,14 +1328,13 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
>       lo_inode_put(lo, &inode);
>   }
>   
> -static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
> -                                 uint64_t n)
> +/* To be called with lo->mutex held */
> +static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>   {
>       if (!inode) {
>           return;
>       }
>   
> -    pthread_mutex_lock(&lo->mutex);
>       assert(inode->nlookup >= n);
>       inode->nlookup -= n;
>       if (!inode->nlookup) {
> @@ -1346,15 +1345,24 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
>           }
>           g_hash_table_destroy(inode->posix_locks);
>           pthread_mutex_destroy(&inode->plock_mutex);
> -        pthread_mutex_unlock(&lo->mutex);
>   
>           /* Drop our refcount from lo_do_lookup() */
>           lo_inode_put(lo, &inode);
> -    } else {
> -        pthread_mutex_unlock(&lo->mutex);
>       }
>   }
>   
> +static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
> +                                 uint64_t n)
> +{
> +    if (!inode) {
> +        return;
> +    }
> +
> +    pthread_mutex_lock(&lo->mutex);
> +    unref_inode(lo, inode, n);
> +    pthread_mutex_unlock(&lo->mutex);
> +}
> +
>   static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
>   {
>       struct lo_data *lo = lo_data(req);
> @@ -2441,13 +2449,7 @@ static void lo_destroy(void *userdata)
>   {
>       struct lo_data *lo = (struct lo_data *)userdata;
>   
> -    /*
> -     * Normally lo->mutex must be taken when traversing lo->inodes but
> -     * lo_destroy() is a serialized request so no races are possible here.
> -     *
> -     * In addition, we cannot acquire lo->mutex since unref_inode() takes it
> -     * too and this would result in a recursive lock.
> -     */
> +    pthread_mutex_lock(&lo->mutex);
>       while (true) {
>           GHashTableIter iter;
>           gpointer key, value;
> @@ -2458,8 +2460,9 @@ static void lo_destroy(void *userdata)
>           }
>   
>           struct lo_inode *inode = value;
> -        unref_inode_lolocked(lo, inode, inode->nlookup);
> +        unref_inode(lo, inode, inode->nlookup);
>       }
> +    pthread_mutex_unlock(&lo->mutex);
>   }
>   
>   static struct fuse_lowlevel_ops lo_oper = {
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 103/104] virtiofsd: add --thread-pool-size=NUM option
  2019-12-12 16:39 ` [PATCH 103/104] virtiofsd: add --thread-pool-size=NUM option Dr. David Alan Gilbert (git)
  2020-01-07 12:25   ` Daniel P. Berrangé
@ 2020-01-17 13:35   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-17 13:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:39 PM, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Add an option to control the size of the thread pool.  Requests are now
> processed in parallel by default.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tools/virtiofsd/fuse_i.h        | 1 +
>   tools/virtiofsd/fuse_lowlevel.c | 7 ++++++-
>   tools/virtiofsd/fuse_virtio.c   | 5 +++--
>   3 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> index 8a4a05b319..4da6a242ba 100644
> --- a/tools/virtiofsd/fuse_i.h
> +++ b/tools/virtiofsd/fuse_i.h
> @@ -72,6 +72,7 @@ struct fuse_session {
>       int   vu_listen_fd;
>       int   vu_socketfd;
>       struct fv_VuDev *virtio_dev;
> +    int thread_pool_size;
>   };
>   
>   struct fuse_chan {
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 9f01c05e3e..09a7b23726 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -27,6 +27,7 @@
>   #include <sys/file.h>
>   #include <unistd.h>
>   
> +#define THREAD_POOL_SIZE 64
>   
>   #define OFFSET_MAX 0x7fffffffffffffffLL
>   
> @@ -2523,6 +2524,7 @@ static const struct fuse_opt fuse_ll_opts[] = {
>       LL_OPTION("--socket-path=%s", vu_socket_path, 0),
>       LL_OPTION("vhost_user_socket=%s", vu_socket_path, 0),
>       LL_OPTION("--fd=%d", vu_listen_fd, 0),
> +    LL_OPTION("--thread-pool-size=%d", thread_pool_size, 0),
>       FUSE_OPT_END
>   };
>   
> @@ -2544,7 +2546,9 @@ void fuse_lowlevel_help(void)
>           "    --socket-path=PATH         path for the vhost-user socket\n"
>           "    -o vhost_user_socket=PATH  path for the vhost-user socket\n"
>           "    --fd=FDNUM                 fd number of vhost-user socket\n"
> -        "    -o auto_unmount            auto unmount on process termination\n");
> +        "    -o auto_unmount            auto unmount on process termination\n"
> +        "    --thread-pool-size=NUM     thread pool size limit (default %d)\n",
> +        THREAD_POOL_SIZE);
>   }
>   
>   void fuse_session_destroy(struct fuse_session *se)
> @@ -2598,6 +2602,7 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
>       }
>       se->fd = -1;
>       se->vu_listen_fd = -1;
> +    se->thread_pool_size = THREAD_POOL_SIZE;
>       se->conn.max_write = UINT_MAX;
>       se->conn.max_readahead = UINT_MAX;
>   
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index b696ac3135..7bc6ff2f19 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -570,10 +570,11 @@ static void *fv_queue_thread(void *opaque)
>       struct fv_QueueInfo *qi = opaque;
>       struct VuDev *dev = &qi->virtio_dev->dev;
>       struct VuVirtq *q = vu_get_queue(dev, qi->qidx);
> +    struct fuse_session *se = qi->virtio_dev->se;
>       GThreadPool *pool;
>   
> -    pool = g_thread_pool_new(fv_queue_worker, qi, 1 /* TODO max_threads */,
> -                             TRUE, NULL);
> +    pool = g_thread_pool_new(fv_queue_worker, qi, se->thread_pool_size, TRUE,
> +                             NULL);
>       if (!pool) {
>           fuse_log(FUSE_LOG_ERR, "%s: g_thread_pool_new failed\n", __func__);
>           return NULL;
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
  2019-12-12 16:39 ` [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races Dr. David Alan Gilbert (git)
  2020-01-15 23:05   ` Masayoshi Mizuma
@ 2020-01-17 13:40   ` Philippe Mathieu-Daudé
  2020-01-17 15:28     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-17 13:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:39 PM, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> When running with multiple threads it can be tricky to handle
> FUSE_INIT/FUSE_DESTROY in parallel with other request types or in
> parallel with themselves.  Serialize FUSE_INIT and FUSE_DESTROY so that
> malicious clients cannot trigger race conditions.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tools/virtiofsd/fuse_i.h        |  1 +
>   tools/virtiofsd/fuse_lowlevel.c | 18 ++++++++++++++++++
>   2 files changed, 19 insertions(+)
> 
> diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> index d0679508cd..8a4a05b319 100644
> --- a/tools/virtiofsd/fuse_i.h
> +++ b/tools/virtiofsd/fuse_i.h
> @@ -61,6 +61,7 @@ struct fuse_session {
>       struct fuse_req list;
>       struct fuse_req interrupts;
>       pthread_mutex_t lock;
> +    pthread_rwlock_t init_rwlock;
>       int got_destroy;
>       int broken_splice_nonblock;
>       uint64_t notify_ctr;
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 10f478b00c..9f01c05e3e 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -2431,6 +2431,19 @@ void fuse_session_process_buf_int(struct fuse_session *se,
>       req->ctx.pid = in->pid;
>       req->ch = ch ? fuse_chan_get(ch) : NULL;
>   
> +    /*
> +     * INIT and DESTROY requests are serialized, all other request types
> +     * run in parallel.  This prevents races between FUSE_INIT and ordinary
> +     * requests, FUSE_INIT and FUSE_INIT, FUSE_INIT and FUSE_DESTROY, and

typo "FUSE_INIT and FUSE_INIT" -> "FUSE_INIT and CUSE_INIT"?

> +     * FUSE_DESTROY and FUSE_DESTROY.
> +     */
> +    if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT ||
> +        in->opcode == FUSE_DESTROY) {
> +        pthread_rwlock_wrlock(&se->init_rwlock);
> +    } else {
> +        pthread_rwlock_rdlock(&se->init_rwlock);
> +    }
> +
>       err = EIO;
>       if (!se->got_init) {
>           enum fuse_opcode expected;
> @@ -2488,10 +2501,13 @@ void fuse_session_process_buf_int(struct fuse_session *se,
>       } else {
>           fuse_ll_ops[in->opcode].func(req, in->nodeid, &iter);
>       }
> +
> +    pthread_rwlock_unlock(&se->init_rwlock);
>       return;
>   
>   reply_err:
>       fuse_reply_err(req, err);
> +    pthread_rwlock_unlock(&se->init_rwlock);
>   }
>   
>   #define LL_OPTION(n, o, v)                     \
> @@ -2538,6 +2554,7 @@ void fuse_session_destroy(struct fuse_session *se)
>               se->op.destroy(se->userdata);
>           }
>       }
> +    pthread_rwlock_destroy(&se->init_rwlock);
>       pthread_mutex_destroy(&se->lock);
>       free(se->cuse_data);
>       if (se->fd != -1) {
> @@ -2631,6 +2648,7 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
>       list_init_req(&se->list);
>       list_init_req(&se->interrupts);
>       fuse_mutex_init(&se->lock);
> +    pthread_rwlock_init(&se->init_rwlock, NULL);
>   
>       memcpy(&se->op, op, op_size);
>       se->owner = getuid();
> 



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 102/104] virtiofsd: fix lo_destroy() resource leaks
  2019-12-12 16:39 ` [PATCH 102/104] virtiofsd: fix lo_destroy() resource leaks Dr. David Alan Gilbert (git)
@ 2020-01-17 13:43   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-17 13:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:39 PM, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Now that lo_destroy() is serialized we can call unref_inode() so that
> all inode resources are freed.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   tools/virtiofsd/passthrough_ll.c | 41 ++++++++++++++++----------------
>   1 file changed, 20 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1bf251a91d..38f4948e61 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1355,26 +1355,6 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
>       }
>   }
>   
> -static int unref_all_inodes_cb(gpointer key, gpointer value, gpointer user_data)
> -{
> -    struct lo_inode *inode = value;
> -    struct lo_data *lo = user_data;
> -
> -    inode->nlookup = 0;
> -    lo_map_remove(&lo->ino_map, inode->fuse_ino);
> -    close(inode->fd);
> -    lo_inode_put(lo, &inode); /* Drop our refcount from lo_do_lookup() */
> -
> -    return TRUE;
> -}
> -
> -static void unref_all_inodes(struct lo_data *lo)
> -{
> -    pthread_mutex_lock(&lo->mutex);
> -    g_hash_table_foreach_remove(lo->inodes, unref_all_inodes_cb, lo);
> -    pthread_mutex_unlock(&lo->mutex);
> -}
> -
>   static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
>   {
>       struct lo_data *lo = lo_data(req);
> @@ -2460,7 +2440,26 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
>   static void lo_destroy(void *userdata)
>   {
>       struct lo_data *lo = (struct lo_data *)userdata;
> -    unref_all_inodes(lo);
> +
> +    /*
> +     * Normally lo->mutex must be taken when traversing lo->inodes but
> +     * lo_destroy() is a serialized request so no races are possible here.
> +     *
> +     * In addition, we cannot acquire lo->mutex since unref_inode() takes it
> +     * too and this would result in a recursive lock.
> +     */
> +    while (true) {
> +        GHashTableIter iter;
> +        gpointer key, value;
> +
> +        g_hash_table_iter_init(&iter, lo->inodes);
> +        if (!g_hash_table_iter_next(&iter, &key, &value)) {
> +            break;
> +        }
> +
> +        struct lo_inode *inode = value;
> +        unref_inode_lolocked(lo, inode, inode->nlookup);
> +    }
>   }
>   
>   static struct fuse_lowlevel_ops lo_oper = {
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 089/104] virtiofsd: prevent races with lo_dirp_put()
  2019-12-12 16:38 ` [PATCH 089/104] virtiofsd: prevent races with lo_dirp_put() Dr. David Alan Gilbert (git)
@ 2020-01-17 13:52   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-17 13:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:38 PM, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Introduce lo_dirp_put() so that FUSE_RELEASEDIR does not cause
> use-after-free races with other threads that are accessing lo_dirp.
> 
> Also make lo_releasedir() atomic to prevent FUSE_RELEASEDIR racing with
> itself.  This prevents double-frees.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tools/virtiofsd/passthrough_ll.c | 41 +++++++++++++++++++++++++++-----
>   1 file changed, 35 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index eadd568435..7663e574d8 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1317,11 +1317,28 @@ static void lo_readlink(fuse_req_t req, fuse_ino_t ino)
>   }
>   
>   struct lo_dirp {
> +    gint refcount;
>       DIR *dp;
>       struct dirent *entry;
>       off_t offset;
>   };
>   
> +static void lo_dirp_put(struct lo_dirp **dp)
> +{
> +    struct lo_dirp *d = *dp;
> +
> +    if (!d) {
> +        return;
> +    }
> +    *dp = NULL;
> +
> +    if (g_atomic_int_dec_and_test(&d->refcount)) {
> +        closedir(d->dp);
> +        free(d);
> +    }
> +}
> +
> +/* Call lo_dirp_put() on the return value when no longer needed */
>   static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
>   {
>       struct lo_data *lo = lo_data(req);
> @@ -1329,6 +1346,9 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
>   
>       pthread_mutex_lock(&lo->mutex);
>       elem = lo_map_get(&lo->dirp_map, fi->fh);
> +    if (elem) {
> +        g_atomic_int_inc(&elem->dirp->refcount);
> +    }
>       pthread_mutex_unlock(&lo->mutex);
>       if (!elem) {
>           return NULL;
> @@ -1364,6 +1384,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>       d->offset = 0;
>       d->entry = NULL;
>   
> +    g_atomic_int_set(&d->refcount, 1); /* paired with lo_releasedir() */
>       pthread_mutex_lock(&lo->mutex);
>       fh = lo_add_dirp_mapping(req, d);
>       pthread_mutex_unlock(&lo->mutex);
> @@ -1397,7 +1418,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
>                             off_t offset, struct fuse_file_info *fi, int plus)
>   {
>       struct lo_data *lo = lo_data(req);
> -    struct lo_dirp *d;
> +    struct lo_dirp *d = NULL;
>       struct lo_inode *dinode;
>       char *buf = NULL;
>       char *p;
> @@ -1487,6 +1508,8 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
>   
>       err = 0;
>   error:
> +    lo_dirp_put(&d);
> +
>       /*
>        * If there's an error, we can only signal it if we haven't stored
>        * any entries yet - otherwise we'd end up with wrong lookup
> @@ -1517,22 +1540,25 @@ static void lo_releasedir(fuse_req_t req, fuse_ino_t ino,
>                             struct fuse_file_info *fi)
>   {
>       struct lo_data *lo = lo_data(req);
> +    struct lo_map_elem *elem;
>       struct lo_dirp *d;
>   
>       (void)ino;
>   
> -    d = lo_dirp(req, fi);
> -    if (!d) {
> +    pthread_mutex_lock(&lo->mutex);
> +    elem = lo_map_get(&lo->dirp_map, fi->fh);
> +    if (!elem) {
> +        pthread_mutex_unlock(&lo->mutex);
>           fuse_reply_err(req, EBADF);
>           return;
>       }
>   
> -    pthread_mutex_lock(&lo->mutex);
> +    d = elem->dirp;
>       lo_map_remove(&lo->dirp_map, fi->fh);
>       pthread_mutex_unlock(&lo->mutex);
>   
> -    closedir(d->dp);
> -    free(d);
> +    lo_dirp_put(&d); /* paired with lo_opendir() */
> +
>       fuse_reply_err(req, 0);
>   }
>   
> @@ -1743,6 +1769,9 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
>       } else {
>           res = fsync(fd);
>       }
> +
> +    lo_dirp_put(&d);
> +
>       fuse_reply_err(req, res == -1 ? errno : 0);
>   }
>   
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 090/104] virtiofsd: rename inode->refcount to inode->nlookup
  2019-12-12 16:38 ` [PATCH 090/104] virtiofsd: rename inode->refcount to inode->nlookup Dr. David Alan Gilbert (git)
@ 2020-01-17 13:54   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-17 13:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:38 PM, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> This reference counter plays a specific role in the FUSE protocol.  It's
> not a generic object reference counter and the FUSE kernel code calls it
> "nlookup".
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tools/virtiofsd/passthrough_ll.c | 37 +++++++++++++++++++++-----------
>   1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 7663e574d8..b19c9ee328 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -101,7 +101,20 @@ struct lo_inode {
>       int fd;
>       bool is_symlink;
>       struct lo_key key;
> -    uint64_t refcount; /* protected by lo->mutex */
> +
> +    /*
> +     * This counter keeps the inode alive during the FUSE session.
> +     * Incremented when the FUSE inode number is sent in a reply
> +     * (FUSE_LOOKUP, FUSE_READDIRPLUS, etc).  Decremented when an inode is
> +     * released by requests like FUSE_FORGET, FUSE_RMDIR, FUSE_RENAME, etc.
> +     *
> +     * Note that this value is untrusted because the client can manipulate
> +     * it arbitrarily using FUSE_FORGET requests.
> +     *
> +     * Protected by lo->mutex.
> +     */
> +    uint64_t nlookup;
> +
>       fuse_ino_t fuse_ino;
>       pthread_mutex_t plock_mutex;
>       GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */
> @@ -570,7 +583,7 @@ retry:
>       if (last == path) {
>           p = &lo->root;
>           pthread_mutex_lock(&lo->mutex);
> -        p->refcount++;
> +        p->nlookup++;
>           pthread_mutex_unlock(&lo->mutex);
>       } else {
>           *last = '\0';
> @@ -788,8 +801,8 @@ static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st)
>       pthread_mutex_lock(&lo->mutex);
>       p = g_hash_table_lookup(lo->inodes, &key);
>       if (p) {
> -        assert(p->refcount > 0);
> -        p->refcount++;
> +        assert(p->nlookup > 0);
> +        p->nlookup++;
>       }
>       pthread_mutex_unlock(&lo->mutex);
>   
> @@ -857,7 +870,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>           }
>   
>           inode->is_symlink = S_ISLNK(e->attr.st_mode);
> -        inode->refcount = 1;
> +        inode->nlookup = 1;
>           inode->fd = newfd;
>           newfd = -1;
>           inode->key.ino = e->attr.st_ino;
> @@ -1097,7 +1110,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>       }
>   
>       pthread_mutex_lock(&lo->mutex);
> -    inode->refcount++;
> +    inode->nlookup++;
>       pthread_mutex_unlock(&lo->mutex);
>       e.ino = inode->fuse_ino;
>   
> @@ -1226,9 +1239,9 @@ static void unref_inode_lolocked(struct lo_data *lo, struct lo_inode *inode,
>       }
>   
>       pthread_mutex_lock(&lo->mutex);
> -    assert(inode->refcount >= n);
> -    inode->refcount -= n;
> -    if (!inode->refcount) {
> +    assert(inode->nlookup >= n);
> +    inode->nlookup -= n;
> +    if (!inode->nlookup) {
>           lo_map_remove(&lo->ino_map, inode->fuse_ino);
>           g_hash_table_remove(lo->inodes, &inode->key);
>           if (g_hash_table_size(inode->posix_locks)) {
> @@ -1249,7 +1262,7 @@ static int unref_all_inodes_cb(gpointer key, gpointer value, gpointer user_data)
>       struct lo_inode *inode = value;
>       struct lo_data *lo = user_data;
>   
> -    inode->refcount = 0;
> +    inode->nlookup = 0;
>       lo_map_remove(&lo->ino_map, inode->fuse_ino);
>       close(inode->fd);
>   
> @@ -1274,7 +1287,7 @@ static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
>       }
>   
>       fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
> -             (unsigned long long)ino, (unsigned long long)inode->refcount,
> +             (unsigned long long)ino, (unsigned long long)inode->nlookup,
>                (unsigned long long)nlookup);
>   
>       unref_inode_lolocked(lo, inode, nlookup);
> @@ -2642,7 +2655,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>       root->fd = fd;
>       root->key.ino = stat.st_ino;
>       root->key.dev = stat.st_dev;
> -    root->refcount = 2;
> +    root->nlookup = 2;
>   }
>   
>   static guint lo_key_hash(gconstpointer key)
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 091/104] libvhost-user: Fix some memtable remap cases
  2019-12-12 16:38 ` [PATCH 091/104] libvhost-user: Fix some memtable remap cases Dr. David Alan Gilbert (git)
@ 2020-01-17 13:58   ` Marc-André Lureau
  2020-01-20 15:50     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Marc-André Lureau @ 2020-01-17 13:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: QEMU, Stefan Hajnoczi, vgoyal

Hi

On Thu, Dec 12, 2019 at 10:05 PM Dr. David Alan Gilbert (git)
<dgilbert@redhat.com> wrote:
>
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> If a new setmemtable command comes in once the vhost threads are
> running, it will remap the guests address space and the threads
> will now be looking in the wrong place.
>
> Fortunately we're running this command under lock, so we can
> update the queue mappings so that threads will look in the new-right
> place.
>
> Note: This doesn't fix things that the threads might be doing
> without a lock (e.g. a readv/writev!)  That's for another time.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  contrib/libvhost-user/libvhost-user.c | 33 ++++++++++++++++++++-------
>  contrib/libvhost-user/libvhost-user.h |  3 +++
>  2 files changed, 28 insertions(+), 8 deletions(-)
>
> diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> index 63e41062a4..b89bf18501 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -564,6 +564,21 @@ vu_reset_device_exec(VuDev *dev, VhostUserMsg *vmsg)
>      return false;
>  }
>
> +static bool
> +map_ring(VuDev *dev, VuVirtq *vq)
> +{
> +    vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
> +    vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
> +    vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
> +
> +    DPRINT("Setting virtq addresses:\n");
> +    DPRINT("    vring_desc  at %p\n", vq->vring.desc);
> +    DPRINT("    vring_used  at %p\n", vq->vring.used);
> +    DPRINT("    vring_avail at %p\n", vq->vring.avail);
> +
> +    return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
> +}
> +
>  static bool
>  vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
>  {
> @@ -767,6 +782,14 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>          close(vmsg->fds[i]);
>      }
>
> +    for (i = 0; i < dev->max_queues; i++) {
> +        if (dev->vq[i].vring.desc) {

The code usually checks for all (vq->vring.desc && vq->vring.used &&
vq->vring.avail).

Perhaps we should introduce a VQ_RING_IS_SET() macro?

> +            if (map_ring(dev, &dev->vq[i])) {
> +                vu_panic(dev, "remaping queue %d during setmemtable", i);
> +            }
> +        }
> +    }
> +
>      return false;
>  }
>
> @@ -853,18 +876,12 @@ vu_set_vring_addr_exec(VuDev *dev, VhostUserMsg *vmsg)
>      DPRINT("    avail_user_addr:  0x%016" PRIx64 "\n", vra->avail_user_addr);
>      DPRINT("    log_guest_addr:   0x%016" PRIx64 "\n", vra->log_guest_addr);
>
> +    vq->vra = *vra;
>      vq->vring.flags = vra->flags;
> -    vq->vring.desc = qva_to_va(dev, vra->desc_user_addr);
> -    vq->vring.used = qva_to_va(dev, vra->used_user_addr);
> -    vq->vring.avail = qva_to_va(dev, vra->avail_user_addr);
>      vq->vring.log_guest_addr = vra->log_guest_addr;
>
> -    DPRINT("Setting virtq addresses:\n");
> -    DPRINT("    vring_desc  at %p\n", vq->vring.desc);
> -    DPRINT("    vring_used  at %p\n", vq->vring.used);
> -    DPRINT("    vring_avail at %p\n", vq->vring.avail);
>
> -    if (!(vq->vring.desc && vq->vring.used && vq->vring.avail)) {
> +    if (map_ring(dev, vq)) {
>          vu_panic(dev, "Invalid vring_addr message");
>          return false;
>      }
> diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
> index 1844b6f8d4..5cb7708559 100644
> --- a/contrib/libvhost-user/libvhost-user.h
> +++ b/contrib/libvhost-user/libvhost-user.h
> @@ -327,6 +327,9 @@ typedef struct VuVirtq {
>      int err_fd;
>      unsigned int enable;
>      bool started;
> +
> +    /* Guest addresses of our ring */
> +    struct vhost_vring_addr vra;
>  } VuVirtq;
>
>  enum VuWatchCondtion {
> --
> 2.23.0
>
>

Looks reasonable otherwise (assuming all running threads were flushed
somehow by other means)

-- 
Marc-André Lureau


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 035/104] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers
  2019-12-12 16:37 ` [PATCH 035/104] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers Dr. David Alan Gilbert (git)
@ 2020-01-17 13:58   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-17 13:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, stefanha, vgoyal

On 12/12/19 5:37 PM, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Do not expose lo_dirp pointers to clients.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tools/virtiofsd/passthrough_ll.c | 103 +++++++++++++++++++++++--------
>   1 file changed, 76 insertions(+), 27 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index fd1d88bddf..face8910b0 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -56,27 +56,10 @@
>   #include "passthrough_helpers.h"
>   
>   #define HAVE_POSIX_FALLOCATE 1
> -/*
> - * We are re-using pointers to our `struct lo_inode`
> - * elements as inodes. This means that we must be able to
> - * store uintptr_t values in a fuse_ino_t variable. The following
> - * incantation checks this condition at compile time.
> - */
> -#if defined(__GNUC__) &&                                      \
> -    (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && \
> -    !defined __cplusplus
> -_Static_assert(sizeof(fuse_ino_t) >= sizeof(uintptr_t),
> -               "fuse_ino_t too small to hold uintptr_t values!");
> -#else
> -struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct {
> -    unsigned _uintptr_to_must_hold_fuse_ino_t
> -        : ((sizeof(fuse_ino_t) >= sizeof(uintptr_t)) ? 1 : -1);
> -};
> -#endif
> -
>   struct lo_map_elem {
>       union {
>           struct lo_inode *inode;
> +        struct lo_dirp *dirp;
>           ssize_t freelist;
>       };
>       bool in_use;
> @@ -123,6 +106,7 @@ struct lo_data {
>       int timeout_set;
>       struct lo_inode root; /* protected by lo->mutex */
>       struct lo_map ino_map; /* protected by lo->mutex */
> +    struct lo_map dirp_map; /* protected by lo->mutex */
>   };
>   
>   static const struct fuse_opt lo_opts[] = {
> @@ -252,6 +236,20 @@ static void lo_map_remove(struct lo_map *map, size_t key)
>       map->freelist = key;
>   }
>   
> +/* Assumes lo->mutex is held */
> +static ssize_t lo_add_dirp_mapping(fuse_req_t req, struct lo_dirp *dirp)
> +{
> +    struct lo_map_elem *elem;
> +
> +    elem = lo_map_alloc_elem(&lo_data(req)->dirp_map);
> +    if (!elem) {
> +        return -1;
> +    }
> +
> +    elem->dirp = dirp;
> +    return elem - lo_data(req)->dirp_map.elems;
> +}
> +
>   /* Assumes lo->mutex is held */
>   static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
>   {
> @@ -844,9 +842,19 @@ struct lo_dirp {
>       off_t offset;
>   };
>   
> -static struct lo_dirp *lo_dirp(struct fuse_file_info *fi)
> +static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
>   {
> -    return (struct lo_dirp *)(uintptr_t)fi->fh;
> +    struct lo_data *lo = lo_data(req);
> +    struct lo_map_elem *elem;
> +
> +    pthread_mutex_lock(&lo->mutex);
> +    elem = lo_map_get(&lo->dirp_map, fi->fh);
> +    pthread_mutex_unlock(&lo->mutex);
> +    if (!elem) {
> +        return NULL;
> +    }
> +
> +    return elem->dirp;
>   }
>   
>   static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
> @@ -856,6 +864,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>       struct lo_data *lo = lo_data(req);
>       struct lo_dirp *d;
>       int fd;
> +    ssize_t fh;
>   
>       d = calloc(1, sizeof(struct lo_dirp));
>       if (d == NULL) {
> @@ -875,7 +884,14 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>       d->offset = 0;
>       d->entry = NULL;
>   
> -    fi->fh = (uintptr_t)d;
> +    pthread_mutex_lock(&lo->mutex);
> +    fh = lo_add_dirp_mapping(req, d);
> +    pthread_mutex_unlock(&lo->mutex);
> +    if (fh == -1) {
> +        goto out_err;
> +    }
> +
> +    fi->fh = fh;
>       if (lo->cache == CACHE_ALWAYS) {
>           fi->keep_cache = 1;
>       }
> @@ -886,6 +902,9 @@ out_errno:
>       error = errno;
>   out_err:
>       if (d) {
> +        if (d->dp) {
> +            closedir(d->dp);
> +        }
>           if (fd != -1) {
>               close(fd);
>           }
> @@ -903,17 +922,21 @@ static int is_dot_or_dotdot(const char *name)
>   static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
>                             off_t offset, struct fuse_file_info *fi, int plus)
>   {
> -    struct lo_dirp *d = lo_dirp(fi);
> -    char *buf;
> +    struct lo_dirp *d;
> +    char *buf = NULL;
>       char *p;
>       size_t rem = size;
> -    int err;
> +    int err = ENOMEM;
>   
>       (void)ino;
>   
> +    d = lo_dirp(req, fi);
> +    if (!d) {
> +        goto error;
> +    }
> +
>       buf = calloc(1, size);
>       if (!buf) {
> -        err = ENOMEM;
>           goto error;
>       }
>       p = buf;
> @@ -1011,8 +1034,21 @@ static void lo_readdirplus(fuse_req_t req, fuse_ino_t ino, size_t size,
>   static void lo_releasedir(fuse_req_t req, fuse_ino_t ino,
>                             struct fuse_file_info *fi)
>   {
> -    struct lo_dirp *d = lo_dirp(fi);
> +    struct lo_data *lo = lo_data(req);
> +    struct lo_dirp *d;
> +
>       (void)ino;
> +
> +    d = lo_dirp(req, fi);
> +    if (!d) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
> +    pthread_mutex_lock(&lo->mutex);
> +    lo_map_remove(&lo->dirp_map, fi->fh);
> +    pthread_mutex_unlock(&lo->mutex);
> +
>       closedir(d->dp);
>       free(d);
>       fuse_reply_err(req, 0);
> @@ -1064,8 +1100,18 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
>                           struct fuse_file_info *fi)
>   {
>       int res;
> -    int fd = dirfd(lo_dirp(fi)->dp);
> +    struct lo_dirp *d;
> +    int fd;
> +
>       (void)ino;
> +
> +    d = lo_dirp(req, fi);
> +    if (!d) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
> +    fd = dirfd(d->dp);
>       if (datasync) {
>           res = fdatasync(fd);
>       } else {
> @@ -1597,6 +1643,8 @@ int main(int argc, char *argv[])
>       root_elem = lo_map_reserve(&lo.ino_map, lo.root.fuse_ino);
>       root_elem->inode = &lo.root;
>   
> +    lo_map_init(&lo.dirp_map);
> +
>       if (fuse_parse_cmdline(&args, &opts) != 0) {
>           return 1;
>       }
> @@ -1693,6 +1741,7 @@ err_out2:
>   err_out1:
>       fuse_opt_free_args(&args);
>   
> +    lo_map_destroy(&lo.dirp_map);
>       lo_map_destroy(&lo.ino_map);
>   
>       if (lo.root.fd >= 0) {
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
  2020-01-17 13:40   ` Philippe Mathieu-Daudé
@ 2020-01-17 15:28     ` Dr. David Alan Gilbert
  2020-01-17 15:30       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17 15:28 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: qemu-devel, stefanha, vgoyal

* Philippe Mathieu-Daudé (philmd@redhat.com) wrote:
> On 12/12/19 5:39 PM, Dr. David Alan Gilbert (git) wrote:
> > From: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > When running with multiple threads it can be tricky to handle
> > FUSE_INIT/FUSE_DESTROY in parallel with other request types or in
> > parallel with themselves.  Serialize FUSE_INIT and FUSE_DESTROY so that
> > malicious clients cannot trigger race conditions.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   tools/virtiofsd/fuse_i.h        |  1 +
> >   tools/virtiofsd/fuse_lowlevel.c | 18 ++++++++++++++++++
> >   2 files changed, 19 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> > index d0679508cd..8a4a05b319 100644
> > --- a/tools/virtiofsd/fuse_i.h
> > +++ b/tools/virtiofsd/fuse_i.h
> > @@ -61,6 +61,7 @@ struct fuse_session {
> >       struct fuse_req list;
> >       struct fuse_req interrupts;
> >       pthread_mutex_t lock;
> > +    pthread_rwlock_t init_rwlock;
> >       int got_destroy;
> >       int broken_splice_nonblock;
> >       uint64_t notify_ctr;
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index 10f478b00c..9f01c05e3e 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -2431,6 +2431,19 @@ void fuse_session_process_buf_int(struct fuse_session *se,
> >       req->ctx.pid = in->pid;
> >       req->ch = ch ? fuse_chan_get(ch) : NULL;
> > +    /*
> > +     * INIT and DESTROY requests are serialized, all other request types
> > +     * run in parallel.  This prevents races between FUSE_INIT and ordinary
> > +     * requests, FUSE_INIT and FUSE_INIT, FUSE_INIT and FUSE_DESTROY, and
> 
> typo "FUSE_INIT and FUSE_INIT" -> "FUSE_INIT and CUSE_INIT"?

No, don't think so; I think it's suggesting a race between two
FUSE_INIT's.

Dave

> > +     * FUSE_DESTROY and FUSE_DESTROY.
> > +     */
> > +    if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT ||
> > +        in->opcode == FUSE_DESTROY) {
> > +        pthread_rwlock_wrlock(&se->init_rwlock);
> > +    } else {
> > +        pthread_rwlock_rdlock(&se->init_rwlock);
> > +    }
> > +
> >       err = EIO;
> >       if (!se->got_init) {
> >           enum fuse_opcode expected;
> > @@ -2488,10 +2501,13 @@ void fuse_session_process_buf_int(struct fuse_session *se,
> >       } else {
> >           fuse_ll_ops[in->opcode].func(req, in->nodeid, &iter);
> >       }
> > +
> > +    pthread_rwlock_unlock(&se->init_rwlock);
> >       return;
> >   reply_err:
> >       fuse_reply_err(req, err);
> > +    pthread_rwlock_unlock(&se->init_rwlock);
> >   }
> >   #define LL_OPTION(n, o, v)                     \
> > @@ -2538,6 +2554,7 @@ void fuse_session_destroy(struct fuse_session *se)
> >               se->op.destroy(se->userdata);
> >           }
> >       }
> > +    pthread_rwlock_destroy(&se->init_rwlock);
> >       pthread_mutex_destroy(&se->lock);
> >       free(se->cuse_data);
> >       if (se->fd != -1) {
> > @@ -2631,6 +2648,7 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
> >       list_init_req(&se->list);
> >       list_init_req(&se->interrupts);
> >       fuse_mutex_init(&se->lock);
> > +    pthread_rwlock_init(&se->init_rwlock, NULL);
> >       memcpy(&se->op, op, op_size);
> >       se->owner = getuid();
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
  2020-01-17 15:28     ` Dr. David Alan Gilbert
@ 2020-01-17 15:30       ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-17 15:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, stefanha, vgoyal

On 1/17/20 4:28 PM, Dr. David Alan Gilbert wrote:
> * Philippe Mathieu-Daudé (philmd@redhat.com) wrote:
>> On 12/12/19 5:39 PM, Dr. David Alan Gilbert (git) wrote:
>>> From: Stefan Hajnoczi <stefanha@redhat.com>
>>>
>>> When running with multiple threads it can be tricky to handle
>>> FUSE_INIT/FUSE_DESTROY in parallel with other request types or in
>>> parallel with themselves.  Serialize FUSE_INIT and FUSE_DESTROY so that
>>> malicious clients cannot trigger race conditions.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>    tools/virtiofsd/fuse_i.h        |  1 +
>>>    tools/virtiofsd/fuse_lowlevel.c | 18 ++++++++++++++++++
>>>    2 files changed, 19 insertions(+)
>>>
>>> diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
>>> index d0679508cd..8a4a05b319 100644
>>> --- a/tools/virtiofsd/fuse_i.h
>>> +++ b/tools/virtiofsd/fuse_i.h
>>> @@ -61,6 +61,7 @@ struct fuse_session {
>>>        struct fuse_req list;
>>>        struct fuse_req interrupts;
>>>        pthread_mutex_t lock;
>>> +    pthread_rwlock_t init_rwlock;
>>>        int got_destroy;
>>>        int broken_splice_nonblock;
>>>        uint64_t notify_ctr;
>>> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
>>> index 10f478b00c..9f01c05e3e 100644
>>> --- a/tools/virtiofsd/fuse_lowlevel.c
>>> +++ b/tools/virtiofsd/fuse_lowlevel.c
>>> @@ -2431,6 +2431,19 @@ void fuse_session_process_buf_int(struct fuse_session *se,
>>>        req->ctx.pid = in->pid;
>>>        req->ch = ch ? fuse_chan_get(ch) : NULL;
>>> +    /*
>>> +     * INIT and DESTROY requests are serialized, all other request types
>>> +     * run in parallel.  This prevents races between FUSE_INIT and ordinary
>>> +     * requests, FUSE_INIT and FUSE_INIT, FUSE_INIT and FUSE_DESTROY, and
>>
>> typo "FUSE_INIT and FUSE_INIT" -> "FUSE_INIT and CUSE_INIT"?
> 
> No, don't think so; I think it's suggesting a race between two
> FUSE_INIT's.

And CUSE_INIT is a subtype of FUSE_INIT, OK.

> 
> Dave
> 
>>> +     * FUSE_DESTROY and FUSE_DESTROY.
>>> +     */
>>> +    if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT ||
>>> +        in->opcode == FUSE_DESTROY) {
>>> +        pthread_rwlock_wrlock(&se->init_rwlock);
>>> +    } else {
>>> +        pthread_rwlock_rdlock(&se->init_rwlock);
>>> +    }
>>> +
>>>        err = EIO;
>>>        if (!se->got_init) {
>>>            enum fuse_opcode expected;
>>> @@ -2488,10 +2501,13 @@ void fuse_session_process_buf_int(struct fuse_session *se,
>>>        } else {
>>>            fuse_ll_ops[in->opcode].func(req, in->nodeid, &iter);
>>>        }
>>> +
>>> +    pthread_rwlock_unlock(&se->init_rwlock);
>>>        return;
>>>    reply_err:
>>>        fuse_reply_err(req, err);
>>> +    pthread_rwlock_unlock(&se->init_rwlock);
>>>    }
>>>    #define LL_OPTION(n, o, v)                     \
>>> @@ -2538,6 +2554,7 @@ void fuse_session_destroy(struct fuse_session *se)
>>>                se->op.destroy(se->userdata);
>>>            }
>>>        }
>>> +    pthread_rwlock_destroy(&se->init_rwlock);
>>>        pthread_mutex_destroy(&se->lock);
>>>        free(se->cuse_data);
>>>        if (se->fd != -1) {
>>> @@ -2631,6 +2648,7 @@ struct fuse_session *fuse_session_new(struct fuse_args *args,
>>>        list_init_req(&se->list);
>>>        list_init_req(&se->interrupts);
>>>        fuse_mutex_init(&se->lock);
>>> +    pthread_rwlock_init(&se->init_rwlock, NULL);
>>>        memcpy(&se->op, op, op_size);
>>>        se->owner = getuid();
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename
  2020-01-17 10:19       ` Miklos Szeredi
  2020-01-17 11:37         ` Dr. David Alan Gilbert
@ 2020-01-17 18:43         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-17 18:43 UTC (permalink / raw)
  To: Miklos Szeredi, Stefan Hajnoczi; +Cc: Misono Tomohiro, Vivek Goyal, qemu-devel

* Miklos Szeredi (mszeredi@redhat.com) wrote:
> On Thu, Jan 16, 2020 at 5:45 PM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > >
> > > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > >
> > > I'm not familiar with qemu convention but shouldn't we put
> > > at least one line of description like linux kernel?
> >
> > Miklos: would you like to suggest a better commit message?
> 
> Hmm, the patch doesn't really make sense, since the looked up inode is not used.
> 
> Not sure what happened here, this seems to be for supporting shared
> versions, and these changes are part of commit 06f78a397f00
> ("virtiofsd: add initial support for shared versions") in our gitlab
> qemu tree.  Was this intentionally split out?

I think the reason I kept this here is because Stefan's
  'introduce inode refcount to prevent use-after-free'

does use the the inodes in lo_rename; is it also dependent on having
done the in lo_rmdir and lo_unlink?

Dave


> Thanks,
> Miklos
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 026/104] virtiofsd: Fast path for virtio read
  2019-12-12 16:37 ` [PATCH 026/104] virtiofsd: Fast path for virtio read Dr. David Alan Gilbert (git)
@ 2020-01-17 18:54   ` Masayoshi Mizuma
  2020-01-20 12:32     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Masayoshi Mizuma @ 2020-01-17 18:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:46PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Readv the data straight into the guests buffer.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> With fix by:
> Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c |   5 +
>  tools/virtiofsd/fuse_virtio.c   | 159 ++++++++++++++++++++++++++++++++
>  tools/virtiofsd/fuse_virtio.h   |   4 +
>  3 files changed, 168 insertions(+)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index c2b114cf5b..5f80625652 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -475,6 +475,11 @@ static int fuse_send_data_iov_fallback(struct fuse_session *se,
>          return fuse_send_msg(se, ch, iov, iov_count);
>      }
>  
> +    if (fuse_lowlevel_is_virtio(se) && buf->count == 1 &&
> +        buf->buf[0].flags == (FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK)) {
> +        return virtio_send_data_iov(se, ch, iov, iov_count, buf, len);
> +    }
> +
>      abort(); /* Will have taken vhost path */
>      return 0;
>  }
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index c33e0f7e8c..146cd3f702 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -230,6 +230,165 @@ err:
>      return ret;
>  }
>  
> +/*
> + * Callback from fuse_send_data_iov_* when it's virtio and the buffer
> + * is a single FD with FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK
> + * We need send the iov and then the buffer.
> + * Return 0 on success
> + */
> +int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
> +                         struct iovec *iov, int count, struct fuse_bufvec *buf,
> +                         size_t len)
> +{
> +    int ret = 0;
> +    VuVirtqElement *elem;
> +    VuVirtq *q;
> +
> +    assert(count >= 1);
> +    assert(iov[0].iov_len >= sizeof(struct fuse_out_header));
> +
> +    struct fuse_out_header *out = iov[0].iov_base;
> +    /* TODO: Endianness! */
> +
> +    size_t iov_len = iov_size(iov, count);
> +    size_t tosend_len = iov_len + len;
> +
> +    out->len = tosend_len;
> +
> +    fuse_log(FUSE_LOG_DEBUG, "%s: count=%d len=%zd iov_len=%zd\n", __func__,
> +             count, len, iov_len);
> +
> +    /* unique == 0 is notification which we don't support */
> +    assert(out->unique);
> +
> +    /* For virtio we always have ch */
> +    assert(ch);
> +    assert(!ch->qi->reply_sent);
> +    elem = ch->qi->qe;
> +    q = &ch->qi->virtio_dev->dev.vq[ch->qi->qidx];
> +
> +    /* The 'in' part of the elem is to qemu */
> +    unsigned int in_num = elem->in_num;
> +    struct iovec *in_sg = elem->in_sg;
> +    size_t in_len = iov_size(in_sg, in_num);
> +    fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
> +             __func__, elem->index, in_num, in_len);
> +
> +    /*
> +     * The elem should have room for a 'fuse_out_header' (out from fuse)
> +     * plus the data based on the len in the header.
> +     */
> +    if (in_len < sizeof(struct fuse_out_header)) {
> +        fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
> +                 __func__, elem->index);

> +        ret = -E2BIG;

The ret should be positive value, right?

           ret = E2BIG;

> +        goto err;
> +    }
> +    if (in_len < tosend_len) {
> +        fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len %zd\n",
> +                 __func__, elem->index, tosend_len);

> +        ret = -E2BIG;

           ret = E2BIG;

> +        goto err;
> +    }
> +
> +    /* TODO: Limit to 'len' */
> +
> +    /* First copy the header data from iov->in_sg */
> +    copy_iov(iov, count, in_sg, in_num, iov_len);
> +
> +    /*
> +     * Build a copy of the the in_sg iov so we can skip bits in it,
> +     * including changing the offsets
> +     */

> +    struct iovec *in_sg_cpy = calloc(sizeof(struct iovec), in_num);

       assert(in_sg_cpy) should be here? in case calloc() fails...

> +    memcpy(in_sg_cpy, in_sg, sizeof(struct iovec) * in_num);
> +    /* These get updated as we skip */
> +    struct iovec *in_sg_ptr = in_sg_cpy;
> +    int in_sg_cpy_count = in_num;
> +
> +    /* skip over parts of in_sg that contained the header iov */
> +    size_t skip_size = iov_len;
> +
> +    size_t in_sg_left = 0;
> +    do {
> +        while (skip_size != 0 && in_sg_cpy_count) {
> +            if (skip_size >= in_sg_ptr[0].iov_len) {
> +                skip_size -= in_sg_ptr[0].iov_len;
> +                in_sg_ptr++;
> +                in_sg_cpy_count--;
> +            } else {
> +                in_sg_ptr[0].iov_len -= skip_size;
> +                in_sg_ptr[0].iov_base += skip_size;
> +                break;
> +            }
> +        }
> +
> +        int i;
> +        for (i = 0, in_sg_left = 0; i < in_sg_cpy_count; i++) {
> +            in_sg_left += in_sg_ptr[i].iov_len;
> +        }
> +        fuse_log(FUSE_LOG_DEBUG,
> +                 "%s: after skip skip_size=%zd in_sg_cpy_count=%d "
> +                 "in_sg_left=%zd\n",
> +                 __func__, skip_size, in_sg_cpy_count, in_sg_left);
> +        ret = preadv(buf->buf[0].fd, in_sg_ptr, in_sg_cpy_count,
> +                     buf->buf[0].pos);
> +

> +        fuse_log(FUSE_LOG_DEBUG, "%s: preadv_res=%d(%m) len=%zd\n",
> +                 __func__, ret, len);

"%m" should be removed? because it may show the previous errno even if preadv()
is succsess. Like as:

[ID: 00000079] virtio_send_data_iov: after skip skip_size=0 in_sg_cpy_count=1 in_sg_left=65536
[ID: 00000079] virtio_send_data_iov: preadv_res=16000(No such file or directory) len=65536

Otherwise, looks good to me:

Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

Thanks,
Masa

> +        if (ret == -1) {
> +            ret = errno;
> +            free(in_sg_cpy);
> +            goto err;
> +        }
> +        if (ret < len && ret) {
> +            fuse_log(FUSE_LOG_DEBUG, "%s: ret < len\n", __func__);
> +            /* Skip over this much next time around */
> +            skip_size = ret;
> +            buf->buf[0].pos += ret;
> +            len -= ret;
> +
> +            /* Lets do another read */
> +            continue;
> +        }
> +        if (!ret) {
> +            /* EOF case? */
> +            fuse_log(FUSE_LOG_DEBUG, "%s: !ret in_sg_left=%zd\n", __func__,
> +                     in_sg_left);
> +            break;
> +        }
> +        if (ret != len) {
> +            fuse_log(FUSE_LOG_DEBUG, "%s: ret!=len\n", __func__);
> +            ret = EIO;
> +            free(in_sg_cpy);
> +            goto err;
> +        }
> +        in_sg_left -= ret;
> +        len -= ret;
> +    } while (in_sg_left);
> +    free(in_sg_cpy);
> +
> +    /* Need to fix out->len on EOF */
> +    if (len) {
> +        struct fuse_out_header *out_sg = in_sg[0].iov_base;
> +
> +        tosend_len -= len;
> +        out_sg->len = tosend_len;
> +    }
> +
> +    ret = 0;
> +
> +    vu_queue_push(&se->virtio_dev->dev, q, elem, tosend_len);
> +    vu_queue_notify(&se->virtio_dev->dev, q);
> +
> +err:
> +    if (ret == 0) {
> +        ch->qi->reply_sent = true;
> +    }
> +
> +    return ret;
> +}
> +
>  /* Thread function for individual queues, created when a queue is 'started' */
>  static void *fv_queue_thread(void *opaque)
>  {
> diff --git a/tools/virtiofsd/fuse_virtio.h b/tools/virtiofsd/fuse_virtio.h
> index 135a14875a..cc676b9193 100644
> --- a/tools/virtiofsd/fuse_virtio.h
> +++ b/tools/virtiofsd/fuse_virtio.h
> @@ -26,4 +26,8 @@ int virtio_loop(struct fuse_session *se);
>  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
>                      struct iovec *iov, int count);
>  
> +int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
> +                         struct iovec *iov, int count,
> +                         struct fuse_bufvec *buf, size_t len);
> +
>  #endif
> -- 
> 2.23.0
> 
> 


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 039/104] virtiofsd: Plumb fuse_bufvec through to do_write_buf
  2019-12-12 16:37 ` [PATCH 039/104] virtiofsd: Plumb fuse_bufvec through to do_write_buf Dr. David Alan Gilbert (git)
@ 2020-01-17 21:01   ` Masayoshi Mizuma
  0 siblings, 0 replies; 307+ messages in thread
From: Masayoshi Mizuma @ 2020-01-17 21:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:59PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Let fuse_session_process_buf_int take a fuse_bufvec * instead of a
> fuse_buf;  and then through to do_write_buf - where in the best
> case it can pass that straight through to op.write_buf without copying
> (other than skipping a header).
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/fuse_i.h        |  2 +-
>  tools/virtiofsd/fuse_lowlevel.c | 61 ++++++++++++++++++++++-----------
>  tools/virtiofsd/fuse_virtio.c   |  3 +-
>  3 files changed, 44 insertions(+), 22 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
> index cb1ca70ffa..d0679508cd 100644
> --- a/tools/virtiofsd/fuse_i.h
> +++ b/tools/virtiofsd/fuse_i.h
> @@ -119,7 +119,7 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
>  void fuse_free_req(fuse_req_t req);
>  
>  void fuse_session_process_buf_int(struct fuse_session *se,
> -                                  const struct fuse_buf *buf,
> +                                  struct fuse_bufvec *bufv,
>                                    struct fuse_chan *ch);
>  
>  
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index bea092b454..7d33bbf539 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -1006,11 +1006,12 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
>  }
>  
>  static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
> -                         const struct fuse_buf *ibuf)
> +                         struct fuse_bufvec *ibufv)
>  {
>      struct fuse_session *se = req->se;
> -    struct fuse_bufvec bufv = {
> -        .buf[0] = *ibuf,
> +    struct fuse_bufvec *pbufv = ibufv;
> +    struct fuse_bufvec tmpbufv = {
> +        .buf[0] = ibufv->buf[0],
>          .count = 1,
>      };
>      struct fuse_write_in *arg = (struct fuse_write_in *)inarg;
> @@ -1020,22 +1021,31 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
>      fi.fh = arg->fh;
>      fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
>  
> -    fi.lock_owner = arg->lock_owner;
> -    fi.flags = arg->flags;
> -    if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD)) {
> -        bufv.buf[0].mem = PARAM(arg);
> -    }
> -
> -    bufv.buf[0].size -=
> -        sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in);
> -    if (bufv.buf[0].size < arg->size) {
> -        fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
> -        fuse_reply_err(req, EIO);
> -        return;
> +    if (ibufv->count == 1) {
> +        fi.lock_owner = arg->lock_owner;
> +        fi.flags = arg->flags;
> +        if (!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD)) {
> +            tmpbufv.buf[0].mem = PARAM(arg);
> +        }
> +        tmpbufv.buf[0].size -=
> +            sizeof(struct fuse_in_header) + sizeof(struct fuse_write_in);
> +        if (tmpbufv.buf[0].size < arg->size) {
> +            fuse_log(FUSE_LOG_ERR,
> +                     "fuse: do_write_buf: buffer size too small\n");
> +            fuse_reply_err(req, EIO);
> +            return;
> +        }
> +        tmpbufv.buf[0].size = arg->size;
> +        pbufv = &tmpbufv;
> +    } else {
> +        /*
> +         *  Input bufv contains the headers in the first element
> +         * and the data in the rest, we need to skip that first element
> +         */
> +        ibufv->buf[0].size = 0;
>      }
> -    bufv.buf[0].size = arg->size;
>  
> -    se->op.write_buf(req, nodeid, &bufv, arg->offset, &fi);
> +    se->op.write_buf(req, nodeid, pbufv, arg->offset, &fi);
>  }
>  
>  static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
> @@ -2027,13 +2037,24 @@ static const char *opname(enum fuse_opcode opcode)
>  void fuse_session_process_buf(struct fuse_session *se,
>                                const struct fuse_buf *buf)
>  {
> -    fuse_session_process_buf_int(se, buf, NULL);
> +    struct fuse_bufvec bufv = { .buf[0] = *buf, .count = 1 };
> +    fuse_session_process_buf_int(se, &bufv, NULL);
>  }
>  
> +/*
> + * Restriction:
> + *   bufv is normally a single entry buffer, except for a write
> + *   where (if it's in memory) then the bufv may be multiple entries,
> + *   where the first entry contains all headers and subsequent entries
> + *   contain data
> + *   bufv shall not use any offsets etc to make the data anything
> + *   other than contiguous starting from 0.
> + */
>  void fuse_session_process_buf_int(struct fuse_session *se,
> -                                  const struct fuse_buf *buf,
> +                                  struct fuse_bufvec *bufv,
>                                    struct fuse_chan *ch)
>  {
> +    const struct fuse_buf *buf = bufv->buf;
>      struct fuse_in_header *in;
>      const void *inarg;
>      struct fuse_req *req;
> @@ -2111,7 +2132,7 @@ void fuse_session_process_buf_int(struct fuse_session *se,
>  
>      inarg = (void *)&in[1];
>      if (in->opcode == FUSE_WRITE && se->op.write_buf) {
> -        do_write_buf(req, in->nodeid, inarg, buf);
> +        do_write_buf(req, in->nodeid, inarg, bufv);
>      } else {
>          fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
>      }
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index fa6e53e7d0..99c877ea2e 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -499,7 +499,8 @@ static void *fv_queue_thread(void *opaque)
>              /* TODO! Endianness of header */
>  
>              /* TODO: Add checks for fuse_session_exited */
> -            fuse_session_process_buf_int(se, &fbuf, &ch);
> +            struct fuse_bufvec bufv = { .buf[0] = fbuf, .count = 1 };
> +            fuse_session_process_buf_int(se, &bufv, &ch);
>  
>              if (!qi->reply_sent) {
>                  fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n",

Looks good to me.

Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 033/104] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection
  2019-12-12 16:37 ` [PATCH 033/104] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection Dr. David Alan Gilbert (git)
@ 2020-01-17 21:44   ` Masayoshi Mizuma
  0 siblings, 0 replies; 307+ messages in thread
From: Masayoshi Mizuma @ 2020-01-17 21:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> A layer of indirection is needed because passthrough_ll cannot expose
> pointers or file descriptor numbers to untrusted clients.  Malicious
> clients could send invalid pointers or file descriptors in order to
> crash or exploit the file system daemon.
> 
> lo_map provides an integer key->value mapping.  This will be used for
> ino and fh fields in the patches that follow.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 124 +++++++++++++++++++++++++++++++
>  1 file changed, 124 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 0188cd9ad6..0a94c3e1f2 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -74,6 +74,21 @@ struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct {
>  };
>  #endif
>  
> +struct lo_map_elem {
> +    union {
> +        /* Element values will go here... */
> +        ssize_t freelist;
> +    };
> +    bool in_use;
> +};
> +
> +/* Maps FUSE fh or ino values to internal objects */
> +struct lo_map {
> +    struct lo_map_elem *elems;
> +    size_t nelems;
> +    ssize_t freelist;
> +};
> +
>  struct lo_inode {
>      struct lo_inode *next; /* protected by lo->mutex */
>      struct lo_inode *prev; /* protected by lo->mutex */
> @@ -130,6 +145,115 @@ static struct lo_data *lo_data(fuse_req_t req)
>      return (struct lo_data *)fuse_req_userdata(req);
>  }
>  
> +__attribute__((unused)) static void lo_map_init(struct lo_map *map)
> +{
> +    map->elems = NULL;
> +    map->nelems = 0;
> +    map->freelist = -1;
> +}
> +
> +__attribute__((unused)) static void lo_map_destroy(struct lo_map *map)
> +{
> +    free(map->elems);
> +}
> +
> +static int lo_map_grow(struct lo_map *map, size_t new_nelems)
> +{
> +    struct lo_map_elem *new_elems;
> +    size_t i;
> +
> +    if (new_nelems <= map->nelems) {
> +        return 1;
> +    }
> +
> +    new_elems = realloc(map->elems, sizeof(map->elems[0]) * new_nelems);
> +    if (!new_elems) {
> +        return 0;
> +    }
> +
> +    for (i = map->nelems; i < new_nelems; i++) {
> +        new_elems[i].freelist = i + 1;
> +        new_elems[i].in_use = false;
> +    }
> +    new_elems[new_nelems - 1].freelist = -1;
> +
> +    map->elems = new_elems;
> +    map->freelist = map->nelems;
> +    map->nelems = new_nelems;
> +    return 1;
> +}
> +
> +__attribute__((unused)) static struct lo_map_elem *
> +lo_map_alloc_elem(struct lo_map *map)
> +{
> +    struct lo_map_elem *elem;
> +
> +    if (map->freelist == -1 && !lo_map_grow(map, map->nelems + 256)) {
> +        return NULL;
> +    }
> +
> +    elem = &map->elems[map->freelist];
> +    map->freelist = elem->freelist;
> +
> +    elem->in_use = true;
> +
> +    return elem;
> +}
> +
> +__attribute__((unused)) static struct lo_map_elem *
> +lo_map_reserve(struct lo_map *map, size_t key)
> +{
> +    ssize_t *prev;
> +
> +    if (!lo_map_grow(map, key + 1)) {
> +        return NULL;
> +    }
> +
> +    for (prev = &map->freelist; *prev != -1;
> +         prev = &map->elems[*prev].freelist) {
> +        if (*prev == key) {
> +            struct lo_map_elem *elem = &map->elems[key];
> +
> +            *prev = elem->freelist;
> +            elem->in_use = true;
> +            return elem;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +__attribute__((unused)) static struct lo_map_elem *
> +lo_map_get(struct lo_map *map, size_t key)
> +{
> +    if (key >= map->nelems) {
> +        return NULL;
> +    }
> +    if (!map->elems[key].in_use) {
> +        return NULL;
> +    }
> +    return &map->elems[key];
> +}
> +
> +__attribute__((unused)) static void lo_map_remove(struct lo_map *map,
> +                                                  size_t key)
> +{
> +    struct lo_map_elem *elem;
> +
> +    if (key >= map->nelems) {
> +        return;
> +    }
> +
> +    elem = &map->elems[key];
> +    if (!elem->in_use) {
> +        return;
> +    }
> +
> +    elem->in_use = false;
> +
> +    elem->freelist = map->freelist;
> +    map->freelist = key;
> +}
> +
>  static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
>  {
>      if (ino == FUSE_ROOT_ID) {

Looks good to me.

Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 034/104] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers
  2019-12-12 16:37 ` [PATCH 034/104] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers Dr. David Alan Gilbert (git)
@ 2020-01-17 21:45   ` Masayoshi Mizuma
  0 siblings, 0 replies; 307+ messages in thread
From: Masayoshi Mizuma @ 2020-01-17 21:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:54PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Do not expose lo_inode pointers to clients.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 144 ++++++++++++++++++++++++-------
>  1 file changed, 114 insertions(+), 30 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 0a94c3e1f2..fd1d88bddf 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -57,8 +57,8 @@
>  
>  #define HAVE_POSIX_FALLOCATE 1
>  /*
> - * We are re-using pointers to our `struct lo_inode` and `struct
> - * lo_dirp` elements as inodes. This means that we must be able to
> + * We are re-using pointers to our `struct lo_inode`
> + * elements as inodes. This means that we must be able to
>   * store uintptr_t values in a fuse_ino_t variable. The following
>   * incantation checks this condition at compile time.
>   */
> @@ -76,7 +76,7 @@ struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct {
>  
>  struct lo_map_elem {
>      union {
> -        /* Element values will go here... */
> +        struct lo_inode *inode;
>          ssize_t freelist;
>      };
>      bool in_use;
> @@ -97,6 +97,7 @@ struct lo_inode {
>      ino_t ino;
>      dev_t dev;
>      uint64_t refcount; /* protected by lo->mutex */
> +    fuse_ino_t fuse_ino;
>  };
>  
>  struct lo_cred {
> @@ -121,6 +122,7 @@ struct lo_data {
>      int cache;
>      int timeout_set;
>      struct lo_inode root; /* protected by lo->mutex */
> +    struct lo_map ino_map; /* protected by lo->mutex */
>  };
>  
>  static const struct fuse_opt lo_opts[] = {
> @@ -145,14 +147,14 @@ static struct lo_data *lo_data(fuse_req_t req)
>      return (struct lo_data *)fuse_req_userdata(req);
>  }
>  
> -__attribute__((unused)) static void lo_map_init(struct lo_map *map)
> +static void lo_map_init(struct lo_map *map)
>  {
>      map->elems = NULL;
>      map->nelems = 0;
>      map->freelist = -1;
>  }
>  
> -__attribute__((unused)) static void lo_map_destroy(struct lo_map *map)
> +static void lo_map_destroy(struct lo_map *map)
>  {
>      free(map->elems);
>  }
> @@ -183,8 +185,7 @@ static int lo_map_grow(struct lo_map *map, size_t new_nelems)
>      return 1;
>  }
>  
> -__attribute__((unused)) static struct lo_map_elem *
> -lo_map_alloc_elem(struct lo_map *map)
> +static struct lo_map_elem *lo_map_alloc_elem(struct lo_map *map)
>  {
>      struct lo_map_elem *elem;
>  
> @@ -200,8 +201,7 @@ lo_map_alloc_elem(struct lo_map *map)
>      return elem;
>  }
>  
> -__attribute__((unused)) static struct lo_map_elem *
> -lo_map_reserve(struct lo_map *map, size_t key)
> +static struct lo_map_elem *lo_map_reserve(struct lo_map *map, size_t key)
>  {
>      ssize_t *prev;
>  
> @@ -222,8 +222,7 @@ lo_map_reserve(struct lo_map *map, size_t key)
>      return NULL;
>  }
>  
> -__attribute__((unused)) static struct lo_map_elem *
> -lo_map_get(struct lo_map *map, size_t key)
> +static struct lo_map_elem *lo_map_get(struct lo_map *map, size_t key)
>  {
>      if (key >= map->nelems) {
>          return NULL;
> @@ -234,8 +233,7 @@ lo_map_get(struct lo_map *map, size_t key)
>      return &map->elems[key];
>  }
>  
> -__attribute__((unused)) static void lo_map_remove(struct lo_map *map,
> -                                                  size_t key)
> +static void lo_map_remove(struct lo_map *map, size_t key)
>  {
>      struct lo_map_elem *elem;
>  
> @@ -254,18 +252,40 @@ __attribute__((unused)) static void lo_map_remove(struct lo_map *map,
>      map->freelist = key;
>  }
>  
> +/* Assumes lo->mutex is held */
> +static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
> +{
> +    struct lo_map_elem *elem;
> +
> +    elem = lo_map_alloc_elem(&lo_data(req)->ino_map);
> +    if (!elem) {
> +        return -1;
> +    }
> +
> +    elem->inode = inode;
> +    return elem - lo_data(req)->ino_map.elems;
> +}
> +
>  static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
>  {
> -    if (ino == FUSE_ROOT_ID) {
> -        return &lo_data(req)->root;
> -    } else {
> -        return (struct lo_inode *)(uintptr_t)ino;
> +    struct lo_data *lo = lo_data(req);
> +    struct lo_map_elem *elem;
> +
> +    pthread_mutex_lock(&lo->mutex);
> +    elem = lo_map_get(&lo->ino_map, ino);
> +    pthread_mutex_unlock(&lo->mutex);
> +
> +    if (!elem) {
> +        return NULL;
>      }
> +
> +    return elem->inode;
>  }
>  
>  static int lo_fd(fuse_req_t req, fuse_ino_t ino)
>  {
> -    return lo_inode(req, ino)->fd;
> +    struct lo_inode *inode = lo_inode(req, ino);
> +    return inode ? inode->fd : -1;
>  }
>  
>  static bool lo_debug(fuse_req_t req)
> @@ -337,10 +357,18 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>  {
>      int saverr;
>      char procname[64];
> -    struct lo_inode *inode = lo_inode(req, ino);
> -    int ifd = inode->fd;
> +    struct lo_inode *inode;
> +    int ifd;
>      int res;
>  
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
> +    ifd = inode->fd;
> +
>      if (valid & FUSE_SET_ATTR_MODE) {
>          if (fi) {
>              res = fchmod(fi->fh, attr->st_mode);
> @@ -470,6 +498,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          inode->dev = e->attr.st_dev;
>  
>          pthread_mutex_lock(&lo->mutex);
> +        inode->fuse_ino = lo_add_inode_mapping(req, inode);
>          prev = &lo->root;
>          next = prev->next;
>          next->prev = inode;
> @@ -478,7 +507,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          prev->next = inode;
>          pthread_mutex_unlock(&lo->mutex);
>      }
> -    e->ino = (uintptr_t)inode;
> +    e->ino = inode->fuse_ino;
>  
>      if (lo_debug(req)) {
>          fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
> @@ -565,10 +594,16 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>  {
>      int res;
>      int saverr;
> -    struct lo_inode *dir = lo_inode(req, parent);
> +    struct lo_inode *dir;
>      struct fuse_entry_param e;
>      struct lo_cred old = {};
>  
> +    dir = lo_inode(req, parent);
> +    if (!dir) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      saverr = ENOMEM;
>  
>      saverr = lo_change_cred(req, &old);
> @@ -646,10 +681,16 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>  {
>      int res;
>      struct lo_data *lo = lo_data(req);
> -    struct lo_inode *inode = lo_inode(req, ino);
> +    struct lo_inode *inode;
>      struct fuse_entry_param e;
>      int saverr;
>  
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      memset(&e, 0, sizeof(struct fuse_entry_param));
>      e.attr_timeout = lo->timeout;
>      e.entry_timeout = lo->timeout;
> @@ -667,7 +708,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>      pthread_mutex_lock(&lo->mutex);
>      inode->refcount++;
>      pthread_mutex_unlock(&lo->mutex);
> -    e.ino = (uintptr_t)inode;
> +    e.ino = inode->fuse_ino;
>  
>      if (lo_debug(req)) {
>          fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
> @@ -733,10 +774,10 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>          next->prev = prev;
>          prev->next = next;
>  
> +        lo_map_remove(&lo->ino_map, inode->fuse_ino);
>          pthread_mutex_unlock(&lo->mutex);
>          close(inode->fd);
>          free(inode);
> -
>      } else {
>          pthread_mutex_unlock(&lo->mutex);
>      }
> @@ -745,7 +786,12 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>  static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
>  {
>      struct lo_data *lo = lo_data(req);
> -    struct lo_inode *inode = lo_inode(req, ino);
> +    struct lo_inode *inode;
> +
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        return;
> +    }
>  
>      if (lo_debug(req)) {
>          fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
> @@ -1227,10 +1273,16 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>  {
>      char *value = NULL;
>      char procname[64];
> -    struct lo_inode *inode = lo_inode(req, ino);
> +    struct lo_inode *inode;
>      ssize_t ret;
>      int saverr;
>  
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      saverr = ENOSYS;
>      if (!lo_data(req)->xattr) {
>          goto out;
> @@ -1289,10 +1341,16 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>  {
>      char *value = NULL;
>      char procname[64];
> -    struct lo_inode *inode = lo_inode(req, ino);
> +    struct lo_inode *inode;
>      ssize_t ret;
>      int saverr;
>  
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      saverr = ENOSYS;
>      if (!lo_data(req)->xattr) {
>          goto out;
> @@ -1350,10 +1408,16 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>                          const char *value, size_t size, int flags)
>  {
>      char procname[64];
> -    struct lo_inode *inode = lo_inode(req, ino);
> +    struct lo_inode *inode;
>      ssize_t ret;
>      int saverr;
>  
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      saverr = ENOSYS;
>      if (!lo_data(req)->xattr) {
>          goto out;
> @@ -1383,10 +1447,16 @@ out:
>  static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
>  {
>      char procname[64];
> -    struct lo_inode *inode = lo_inode(req, ino);
> +    struct lo_inode *inode;
>      ssize_t ret;
>      int saverr;
>  
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        fuse_reply_err(req, EBADF);
> +        return;
> +    }
> +
>      saverr = ENOSYS;
>      if (!lo_data(req)->xattr) {
>          goto out;
> @@ -1505,6 +1575,7 @@ int main(int argc, char *argv[])
>      struct fuse_session *se;
>      struct fuse_cmdline_opts opts;
>      struct lo_data lo = { .debug = 0, .writeback = 0 };
> +    struct lo_map_elem *root_elem;
>      int ret = -1;
>  
>      /* Don't mask creation mode, kernel already did that */
> @@ -1513,8 +1584,19 @@ int main(int argc, char *argv[])
>      pthread_mutex_init(&lo.mutex, NULL);
>      lo.root.next = lo.root.prev = &lo.root;
>      lo.root.fd = -1;
> +    lo.root.fuse_ino = FUSE_ROOT_ID;
>      lo.cache = CACHE_NORMAL;
>  
> +    /*
> +     * Set up the ino map like this:
> +     * [0] Reserved (will not be used)
> +     * [1] Root inode
> +     */
> +    lo_map_init(&lo.ino_map);
> +    lo_map_reserve(&lo.ino_map, 0)->in_use = false;
> +    root_elem = lo_map_reserve(&lo.ino_map, lo.root.fuse_ino);
> +    root_elem->inode = &lo.root;
> +
>      if (fuse_parse_cmdline(&args, &opts) != 0) {
>          return 1;
>      }
> @@ -1611,6 +1693,8 @@ err_out2:
>  err_out1:
>      fuse_opt_free_args(&args);
>  
> +    lo_map_destroy(&lo.ino_map);
> +
>      if (lo.root.fd >= 0) {
>          close(lo.root.fd);
>      }

Looks good to me.

Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 036/104] virtiofsd: passthrough_ll: add fd_map to hide file descriptors
  2019-12-12 16:37 ` [PATCH 036/104] virtiofsd: passthrough_ll: add fd_map to hide file descriptors Dr. David Alan Gilbert (git)
@ 2020-01-17 22:32   ` Masayoshi Mizuma
  0 siblings, 0 replies; 307+ messages in thread
From: Masayoshi Mizuma @ 2020-01-17 22:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:56PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Do not expose file descriptor numbers to clients.  This prevents the
> abuse of internal file descriptors (like stdin/stdout).
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> dgilbert:
>   Added lseek
> ---
>  tools/virtiofsd/passthrough_ll.c | 114 +++++++++++++++++++++++++------
>  1 file changed, 93 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index face8910b0..93e74cce21 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -60,6 +60,7 @@ struct lo_map_elem {
>      union {
>          struct lo_inode *inode;
>          struct lo_dirp *dirp;
> +        int fd;
>          ssize_t freelist;
>      };
>      bool in_use;
> @@ -107,6 +108,7 @@ struct lo_data {
>      struct lo_inode root; /* protected by lo->mutex */
>      struct lo_map ino_map; /* protected by lo->mutex */
>      struct lo_map dirp_map; /* protected by lo->mutex */
> +    struct lo_map fd_map; /* protected by lo->mutex */
>  };
>  
>  static const struct fuse_opt lo_opts[] = {
> @@ -236,6 +238,20 @@ static void lo_map_remove(struct lo_map *map, size_t key)
>      map->freelist = key;
>  }
>  
> +/* Assumes lo->mutex is held */
> +static ssize_t lo_add_fd_mapping(fuse_req_t req, int fd)
> +{
> +    struct lo_map_elem *elem;
> +
> +    elem = lo_map_alloc_elem(&lo_data(req)->fd_map);
> +    if (!elem) {
> +        return -1;
> +    }
> +
> +    elem->fd = fd;
> +    return elem - lo_data(req)->fd_map.elems;
> +}
> +
>  /* Assumes lo->mutex is held */
>  static ssize_t lo_add_dirp_mapping(fuse_req_t req, struct lo_dirp *dirp)
>  {
> @@ -350,6 +366,22 @@ static int utimensat_empty_nofollow(struct lo_inode *inode,
>      return utimensat(AT_FDCWD, procname, tv, 0);
>  }
>  
> +static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
> +{
> +    struct lo_data *lo = lo_data(req);
> +    struct lo_map_elem *elem;
> +
> +    pthread_mutex_lock(&lo->mutex);
> +    elem = lo_map_get(&lo->fd_map, fi->fh);
> +    pthread_mutex_unlock(&lo->mutex);
> +
> +    if (!elem) {
> +        return -1;
> +    }
> +
> +    return elem->fd;
> +}
> +
>  static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>                         int valid, struct fuse_file_info *fi)
>  {
> @@ -358,6 +390,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>      struct lo_inode *inode;
>      int ifd;
>      int res;
> +    int fd;
>  
>      inode = lo_inode(req, ino);
>      if (!inode) {
> @@ -367,9 +400,14 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>  
>      ifd = inode->fd;
>  
> +    /* If fi->fh is invalid we'll report EBADF later */
> +    if (fi) {
> +        fd = lo_fi_fd(req, fi);
> +    }
> +
>      if (valid & FUSE_SET_ATTR_MODE) {
>          if (fi) {
> -            res = fchmod(fi->fh, attr->st_mode);
> +            res = fchmod(fd, attr->st_mode);
>          } else {
>              sprintf(procname, "/proc/self/fd/%i", ifd);
>              res = chmod(procname, attr->st_mode);
> @@ -389,7 +427,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>      }
>      if (valid & FUSE_SET_ATTR_SIZE) {
>          if (fi) {
> -            res = ftruncate(fi->fh, attr->st_size);
> +            res = ftruncate(fd, attr->st_size);
>          } else {
>              sprintf(procname, "/proc/self/fd/%i", ifd);
>              res = truncate(procname, attr->st_size);
> @@ -419,7 +457,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          }
>  
>          if (fi) {
> -            res = futimens(fi->fh, tv);
> +            res = futimens(fd, tv);
>          } else {
>              res = utimensat_empty_nofollow(inode, tv);
>          }
> @@ -1079,7 +1117,18 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
>      lo_restore_cred(&old);
>  
>      if (!err) {
> -        fi->fh = fd;
> +        ssize_t fh;
> +
> +        pthread_mutex_lock(&lo->mutex);
> +        fh = lo_add_fd_mapping(req, fd);
> +        pthread_mutex_unlock(&lo->mutex);
> +        if (fh == -1) {
> +            close(fd);
> +            fuse_reply_err(req, ENOMEM);
> +            return;
> +        }
> +
> +        fi->fh = fh;
>          err = lo_do_lookup(req, parent, name, &e);
>      }
>      if (lo->cache == CACHE_NEVER) {
> @@ -1123,6 +1172,7 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
>  static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  {
>      int fd;
> +    ssize_t fh;
>      char buf[64];
>      struct lo_data *lo = lo_data(req);
>  
> @@ -1158,7 +1208,16 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>          return (void)fuse_reply_err(req, errno);
>      }
>  
> -    fi->fh = fd;
> +    pthread_mutex_lock(&lo->mutex);
> +    fh = lo_add_fd_mapping(req, fd);
> +    pthread_mutex_unlock(&lo->mutex);
> +    if (fh == -1) {
> +        close(fd);
> +        fuse_reply_err(req, ENOMEM);
> +        return;
> +    }
> +
> +    fi->fh = fh;
>      if (lo->cache == CACHE_NEVER) {
>          fi->direct_io = 1;
>      } else if (lo->cache == CACHE_ALWAYS) {
> @@ -1170,9 +1229,18 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  static void lo_release(fuse_req_t req, fuse_ino_t ino,
>                         struct fuse_file_info *fi)
>  {
> +    struct lo_data *lo = lo_data(req);
> +    int fd;
> +
>      (void)ino;
>  
> -    close(fi->fh);
> +    fd = lo_fi_fd(req, fi);
> +
> +    pthread_mutex_lock(&lo->mutex);
> +    lo_map_remove(&lo->fd_map, fi->fh);
> +    pthread_mutex_unlock(&lo->mutex);
> +
> +    close(fd);
>      fuse_reply_err(req, 0);
>  }
>  
> @@ -1180,7 +1248,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  {
>      int res;
>      (void)ino;
> -    res = close(dup(fi->fh));
> +    res = close(dup(lo_fi_fd(req, fi)));
>      fuse_reply_err(req, res == -1 ? errno : 0);
>  }
>  
> @@ -1207,7 +1275,7 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
>              return (void)fuse_reply_err(req, errno);
>          }
>      } else {
> -        fd = fi->fh;
> +        fd = lo_fi_fd(req, fi);
>      }
>  
>      if (datasync) {
> @@ -1234,7 +1302,7 @@ static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,
>      }
>  
>      buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
> -    buf.buf[0].fd = fi->fh;
> +    buf.buf[0].fd = lo_fi_fd(req, fi);
>      buf.buf[0].pos = offset;
>  
>      fuse_reply_data(req, &buf, FUSE_BUF_SPLICE_MOVE);
> @@ -1249,7 +1317,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
>      struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
>  
>      out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
> -    out_buf.buf[0].fd = fi->fh;
> +    out_buf.buf[0].fd = lo_fi_fd(req, fi);
>      out_buf.buf[0].pos = off;
>  
>      if (lo_debug(req)) {
> @@ -1297,7 +1365,7 @@ static void lo_fallocate(fuse_req_t req, fuse_ino_t ino, int mode, off_t offset,
>          return;
>      }
>  
> -    err = posix_fallocate(fi->fh, offset, length);
> +    err = posix_fallocate(lo_fi_fd(req, fi), offset, length);
>  #endif
>  
>      fuse_reply_err(req, err);
> @@ -1309,7 +1377,7 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
>      int res;
>      (void)ino;
>  
> -    res = flock(fi->fh, op);
> +    res = flock(lo_fi_fd(req, fi), op);
>  
>      fuse_reply_err(req, res == -1 ? errno : 0);
>  }
> @@ -1534,17 +1602,19 @@ static void lo_copy_file_range(fuse_req_t req, fuse_ino_t ino_in, off_t off_in,
>                                 off_t off_out, struct fuse_file_info *fi_out,
>                                 size_t len, int flags)
>  {
> +    int in_fd, out_fd;
>      ssize_t res;
>  
> -    if (lo_debug(req))
> -        fuse_log(FUSE_LOG_DEBUG,
> -                 "lo_copy_file_range(ino=%" PRIu64 "/fd=%lu, "
> -                 "off=%lu, ino=%" PRIu64 "/fd=%lu, "
> -                 "off=%lu, size=%zd, flags=0x%x)\n",
> -                 ino_in, fi_in->fh, off_in, ino_out, fi_out->fh, off_out, len,
> -                 flags);
> +    in_fd = lo_fi_fd(req, fi_in);
> +    out_fd = lo_fi_fd(req, fi_out);
> +
> +    fuse_log(FUSE_LOG_DEBUG,
> +             "lo_copy_file_range(ino=%" PRIu64 "/fd=%d, "
> +             "off=%lu, ino=%" PRIu64 "/fd=%d, "
> +             "off=%lu, size=%zd, flags=0x%x)\n",
> +             ino_in, in_fd, off_in, ino_out, out_fd, off_out, len, flags);
>  
> -    res = copy_file_range(fi_in->fh, &off_in, fi_out->fh, &off_out, len, flags);
> +    res = copy_file_range(in_fd, &off_in, out_fd, &off_out, len, flags);
>      if (res < 0) {
>          fuse_reply_err(req, -errno);
>      } else {
> @@ -1559,7 +1629,7 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
>      off_t res;
>  
>      (void)ino;
> -    res = lseek(fi->fh, off, whence);
> +    res = lseek(lo_fi_fd(req, fi), off, whence);
>      if (res != -1) {
>          fuse_reply_lseek(req, res);
>      } else {
> @@ -1644,6 +1714,7 @@ int main(int argc, char *argv[])
>      root_elem->inode = &lo.root;
>  
>      lo_map_init(&lo.dirp_map);
> +    lo_map_init(&lo.fd_map);
>  
>      if (fuse_parse_cmdline(&args, &opts) != 0) {
>          return 1;
> @@ -1741,6 +1812,7 @@ err_out2:
>  err_out1:
>      fuse_opt_free_args(&args);
>  
> +    lo_map_destroy(&lo.fd_map);
>      lo_map_destroy(&lo.dirp_map);
>      lo_map_destroy(&lo.ino_map);
>  
> -- 

Looks good to me.

Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 037/104] virtiofsd: passthrough_ll: add fallback for racy ops
  2019-12-12 16:37 ` [PATCH 037/104] virtiofsd: passthrough_ll: add fallback for racy ops Dr. David Alan Gilbert (git)
@ 2020-01-18 16:22   ` Masayoshi Mizuma
  2020-01-20 13:26     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Masayoshi Mizuma @ 2020-01-18 16:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On Thu, Dec 12, 2019 at 04:37:57PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: Miklos Szeredi <mszeredi@redhat.com>
> 
> We have two operations that cannot be done race-free on a symlink in
> certain cases: utimes and link.
> 
> Add racy fallback for these if the race-free method doesn't work.  We do
> our best to avoid races even in this case:
> 
>   - get absolute path by reading /proc/self/fd/NN symlink
> 
>   - lookup parent directory: after this we are safe against renames in
>     ancestors
> 
>   - lookup name in parent directory, and verify that we got to the original
>     inode,  if not retry the whole thing
> 
> Both utimes(2) and link(2) hold i_lock on the inode across the operation,
> so a racing rename/delete by this fuse instance is not possible, only from
> other entities changing the filesystem.
> 
> If the "norace" option is given, then disable the racy fallbacks.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 159 +++++++++++++++++++++++++++----
>  1 file changed, 142 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 93e74cce21..1faae2753f 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -98,6 +98,7 @@ enum {
>  struct lo_data {
>      pthread_mutex_t mutex;
>      int debug;
> +    int norace;
>      int writeback;
>      int flock;
>      int xattr;
> @@ -124,10 +125,15 @@ static const struct fuse_opt lo_opts[] = {
>      { "cache=never", offsetof(struct lo_data, cache), CACHE_NEVER },
>      { "cache=auto", offsetof(struct lo_data, cache), CACHE_NORMAL },
>      { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },
> -
> +    { "norace", offsetof(struct lo_data, norace), 1 },
>      FUSE_OPT_END
>  };
>  
> +static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
> +
> +static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st);
> +
> +
>  static struct lo_data *lo_data(fuse_req_t req)
>  {
>      return (struct lo_data *)fuse_req_userdata(req);
> @@ -347,23 +353,127 @@ static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
>      fuse_reply_attr(req, &buf, lo->timeout);
>  }
>  > -static int utimensat_empty_nofollow(struct lo_inode *inode,
> -                                    const struct timespec *tv)
> +static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
> +                              char path[PATH_MAX], struct lo_inode **parent)
>  {
> -    int res;
>      char procname[64];
> +    char *last;
> +    struct stat stat;
> +    struct lo_inode *p;
> +    int retries = 2;
> +    int res;
> +
> +retry:
> +    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> +
> +    res = readlink(procname, path, PATH_MAX);
> +    if (res < 0) {

> +        fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink failed: %m\n");

I think it's better to use __func__ macro in case the function name is
changed in the future.

           fuse_log(FUSE_LOG_WARNING, "%s: readlink failed: %m\n", __func__);

> +        goto fail_noretry;
> +    }
> +
> +    if (res >= PATH_MAX) {

> +        fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink overflowed\n");

           fuse_log(FUSE_LOG_WARNING, "%s: readlink overflowed\n", __func__);

> +        goto fail_noretry;
> +    }
> +    path[res] = '\0';
> +
> +    last = strrchr(path, '/');
> +    if (last == NULL) {
> +        /* Shouldn't happen */

> +        fuse_log(
> +            FUSE_LOG_WARNING,
> +            "lo_parent_and_name: INTERNAL ERROR: bad path read from proc\n");

           fuse_log(
               FUSE_LOG_WARNING,
            "%s: INTERNAL ERROR: bad path read from proc\n", __func__);

> +        goto fail_noretry;
> +    }
> +    if (last == path) {
> +        p = &lo->root;
> +        pthread_mutex_lock(&lo->mutex);
> +        p->refcount++;
> +        pthread_mutex_unlock(&lo->mutex);
> +    } else {
> +        *last = '\0';
> +        res = fstatat(AT_FDCWD, last == path ? "/" : path, &stat, 0);
> +        if (res == -1) {
> +            if (!retries) {

> +                fuse_log(FUSE_LOG_WARNING,
> +                         "lo_parent_and_name: failed to stat parent: %m\n");

                   fuse_log(FUSE_LOG_WARNING,
                            "%s: failed to stat parent: %m\n", __func__);

> +            }
> +            goto fail;
> +        }
> +        p = lo_find(lo, &stat);
> +        if (p == NULL) {
> +            if (!retries) {

> +                fuse_log(FUSE_LOG_WARNING,
> +                         "lo_parent_and_name: failed to find parent\n");

                   fuse_log(FUSE_LOG_WARNING,
                         "%s: failed to find parent\n", __func__);

> +            }
> +            goto fail;
> +        }
> +    }
> +    last++;
> +    res = fstatat(p->fd, last, &stat, AT_SYMLINK_NOFOLLOW);
> +    if (res == -1) {
> +        if (!retries) {

> +            fuse_log(FUSE_LOG_WARNING,
> +                     "lo_parent_and_name: failed to stat last\n");

               fuse_log(FUSE_LOG_WARNING,
                     "%s: failed to stat last\n", __func__);

> +        }
> +        goto fail_unref;
> +    }
> +    if (stat.st_dev != inode->dev || stat.st_ino != inode->ino) {
> +        if (!retries) {

> +            fuse_log(FUSE_LOG_WARNING,
> +                     "lo_parent_and_name: failed to match last\n");

               fuse_log(FUSE_LOG_WARNING,
                        "%s: failed to match last\n", __func__);

> +        }
> +        goto fail_unref;
> +    }
> +    *parent = p;
> +    memmove(path, last, strlen(last) + 1);
> +
> +    return 0;
> +
> +fail_unref:
> +    unref_inode(lo, p, 1);
> +fail:
> +    if (retries) {
> +        retries--;
> +        goto retry;
> +    }
> +fail_noretry:
> +    errno = EIO;
> +    return -1;
> +}
> +
> +static int utimensat_empty(struct lo_data *lo, struct lo_inode *inode,
> +                           const struct timespec *tv)
> +{
> +    int res;
> +    struct lo_inode *parent;
> +    char path[PATH_MAX];
>  
>      if (inode->is_symlink) {
> -        res = utimensat(inode->fd, "", tv, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
> +        res = utimensat(inode->fd, "", tv, AT_EMPTY_PATH);
>          if (res == -1 && errno == EINVAL) {
>              /* Sorry, no race free way to set times on symlink. */
> -            errno = EPERM;
> +            if (lo->norace) {
> +                errno = EPERM;
> +            } else {
> +                goto fallback;
> +            }
>          }
>          return res;
>      }
> -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> +    sprintf(path, "/proc/self/fd/%i", inode->fd);
> +
> +    return utimensat(AT_FDCWD, path, tv, 0);
>  
> -    return utimensat(AT_FDCWD, procname, tv, 0);
> +fallback:
> +    res = lo_parent_and_name(lo, inode, path, &parent);
> +    if (res != -1) {
> +        res = utimensat(parent->fd, path, tv, AT_SYMLINK_NOFOLLOW);
> +        unref_inode(lo, parent, 1);
> +    }
> +
> +    return res;
>  }
>  
>  static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
> @@ -387,6 +497,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>  {
>      int saverr;
>      char procname[64];
> +    struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode;
>      int ifd;
>      int res;
> @@ -459,7 +570,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          if (fi) {
>              res = futimens(fd, tv);
>          } else {
> -            res = utimensat_empty_nofollow(inode, tv);
> +            res = utimensat_empty(lo, inode, tv);
>          }
>          if (res == -1) {
>              goto out_err;
> @@ -692,24 +803,38 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
>      lo_mknod_symlink(req, parent, name, S_IFLNK, 0, link);
>  }
>  
> -static int linkat_empty_nofollow(struct lo_inode *inode, int dfd,
> -                                 const char *name)
> +static int linkat_empty_nofollow(struct lo_data *lo, struct lo_inode *inode,
> +                                 int dfd, const char *name)
>  {
>      int res;
> -    char procname[64];
> +    struct lo_inode *parent;
> +    char path[PATH_MAX];
>  
>      if (inode->is_symlink) {
>          res = linkat(inode->fd, "", dfd, name, AT_EMPTY_PATH);
>          if (res == -1 && (errno == ENOENT || errno == EINVAL)) {
>              /* Sorry, no race free way to hard-link a symlink. */
> -            errno = EPERM;
> +            if (lo->norace) {
> +                errno = EPERM;
> +            } else {
> +                goto fallback;
> +            }
>          }
>          return res;
>      }
>  
> -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> +    sprintf(path, "/proc/self/fd/%i", inode->fd);
> +
> +    return linkat(AT_FDCWD, path, dfd, name, AT_SYMLINK_FOLLOW);
> +
> +fallback:
> +    res = lo_parent_and_name(lo, inode, path, &parent);
> +    if (res != -1) {
> +        res = linkat(parent->fd, path, dfd, name, 0);
> +        unref_inode(lo, parent, 1);
> +    }
>  
> -    return linkat(AT_FDCWD, procname, dfd, name, AT_SYMLINK_FOLLOW);
> +    return res;
>  }
>  
>  static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
> @@ -731,7 +856,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>      e.attr_timeout = lo->timeout;
>      e.entry_timeout = lo->timeout;
>  
> -    res = linkat_empty_nofollow(inode, lo_fd(req, parent), name);
> +    res = linkat_empty_nofollow(lo, inode, lo_fd(req, parent), name);
>      if (res == -1) {
>          goto out_err;
>      }
> @@ -1544,7 +1669,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
>      }
>  
>      if (inode->is_symlink) {
> -        /* Sorry, no race free way to setxattr on symlink. */
> +        /* Sorry, no race free way to removexattr on symlink. */
>          saverr = EPERM;
>          goto out;
>      }

Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 040/104] virtiofsd: Pass write iov's all the way through
  2019-12-12 16:38 ` [PATCH 040/104] virtiofsd: Pass write iov's all the way through Dr. David Alan Gilbert (git)
@ 2020-01-19  8:08   ` Xiao Yang
  2020-01-20  8:24     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 307+ messages in thread
From: Xiao Yang @ 2020-01-19  8:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On 2019/12/13 0:38, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
>
> Pass the write iov pointing to guest RAM all the way through rather
> than copying the data.
>
> Signed-off-by: Dr. David Alan Gilbert<dgilbert@redhat.com>
> ---
>   tools/virtiofsd/fuse_virtio.c | 79 ++++++++++++++++++++++++++++++++---
>   1 file changed, 73 insertions(+), 6 deletions(-)
>
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index 99c877ea2e..3c778b6296 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -452,6 +452,10 @@ static void *fv_queue_thread(void *opaque)
>                    __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
>
>           while (1) {
> +            bool allocated_bufv = false;
> +            struct fuse_bufvec bufv;
> +            struct fuse_bufvec *pbufv;
> +
>               /*
>                * An element contains one request and the space to send our
>                * response They're spread over multiple descriptors in a
> @@ -493,14 +497,76 @@ static void *fv_queue_thread(void *opaque)
>                            __func__, elem->index);
>                   assert(0); /* TODO */
>               }
> -            copy_from_iov(&fbuf, out_num, out_sg);
> -            fbuf.size = out_len;
> +            /* Copy just the first element and look at it */
> +            copy_from_iov(&fbuf, 1, out_sg);
> +
> +            if (out_num>  2&&
> +                out_sg[0].iov_len == sizeof(struct fuse_in_header)&&
> +                ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE&&
> +                out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
> +                /*
> +                 * For a write we don't actually need to copy the
> +                 * data, we can just do it straight out of guest memory
> +                 * but we must still copy the headers in case the guest
> +                 * was nasty and changed them while we were using them.
> +                 */
> +                fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__);
> +
> +                /* copy the fuse_write_in header after the fuse_in_header */
> +                fbuf.mem += out_sg->iov_len;
> +                copy_from_iov(&fbuf, 1, out_sg + 1);
> +                fbuf.mem -= out_sg->iov_len;
> +                fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
> +
> +                /* Allocate the bufv, with space for the rest of the iov */
> +                allocated_bufv = true;
> +                pbufv = malloc(sizeof(struct fuse_bufvec) +
> +                               sizeof(struct fuse_buf) * (out_num - 2));
> +                if (!pbufv) {
> +                    vu_queue_unpop(dev, q, elem, 0);
> +                    free(elem);
> +                    fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n",
> +                             __func__);
> +                    goto out;
> +                }
> +
> +                pbufv->count = 1;
> +                pbufv->buf[0] = fbuf;
> +
> +                size_t iovindex, pbufvindex;
> +                iovindex = 2; /* 2 headers, separate iovs */
> +                pbufvindex = 1; /* 2 headers, 1 fusebuf */
> +
> +                for (; iovindex<  out_num; iovindex++, pbufvindex++) {
> +                    pbufv->count++;
> +                    pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
> +                    pbufv->buf[pbufvindex].flags = 0;
> +                    pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
> +                    pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
> +                }
> +            } else {
> +                /* Normal (non fast write) path */
> +
> +                /* Copy the rest of the buffer */
> +                fbuf.mem += out_sg->iov_len;
> +                copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
> +                fbuf.mem -= out_sg->iov_len;
> +                fbuf.size = out_len;
>
> -            /* TODO! Endianness of header */
> +                /* TODO! Endianness of header */
>
> -            /* TODO: Add checks for fuse_session_exited */
> -            struct fuse_bufvec bufv = { .buf[0] = fbuf, .count = 1 };
> -            fuse_session_process_buf_int(se,&bufv,&ch);
> +                /* TODO: Add checks for fuse_session_exited */
> +                bufv.buf[0] = fbuf;
> +                bufv.count = 1;
> +                pbufv =&bufv;
> +            }
> +            pbufv->idx = 0;
> +            pbufv->off = 0;
> +            fuse_session_process_buf_int(se, pbufv,&ch);
> +
> +            if (allocated_bufv) {
> +                free(pbufv);
> +            }
>
>               if (!qi->reply_sent) {
>                   fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n",
> @@ -514,6 +580,7 @@ static void *fv_queue_thread(void *opaque)
>               elem = NULL;
>           }
>       }
> +out:
>       pthread_mutex_destroy(&ch.lock);
>       free(fbuf.mem);
>
Hi,

Tested the patch and got the correct data written by guest, so it looks 
fine to me.
Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com>

Best Regards,
Xiao Yang






^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename()
  2020-01-17 13:32 ` [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename() Philippe Mathieu-Daudé
@ 2020-01-19  8:35   ` Xiao Yang
  2020-01-20  8:27     ` Philippe Mathieu-Daudé
  2020-01-20 18:52   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Xiao Yang @ 2020-01-19  8:35 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Vivek Goyal, qemu-devel, stefanha, Dr . David Alan Gilbert

On 2020/1/17 21:32, Philippe Mathieu-Daudé wrote:
> We can unref both old/new inodes with the same mutex lock.
>
> Signed-off-by: Philippe Mathieu-Daudé<philmd@redhat.com>
> ---
> Based-on:<20191212163904.159893-1-dgilbert@redhat.com>
> "virtiofs daemon"
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg664652.html
>
>   tools/virtiofsd/passthrough_ll.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 57f58aef26..5c717cb5a1 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1461,8 +1461,10 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
>       }
>
>   out:
> -    unref_inode_lolocked(lo, oldinode, 1);
> -    unref_inode_lolocked(lo, newinode, 1);
> +    pthread_mutex_lock(&lo->mutex);
> +    unref_inode(lo, oldinode, 1);
> +    unref_inode(lo, newinode, 1);
> +    pthread_mutex_unlock(&lo->mutex);
Hi,

It seems to avoid calling pthread_mutex_lock and pthread_mutex_unlock twice.
Does the change fix some issues or improve the performance?

Best Regards,
Xiao Yang
>       lo_inode_put(lo,&oldinode);
>       lo_inode_put(lo,&newinode);
>       lo_inode_put(lo,&parent_inode);





^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 062/104] virtiofsd: Handle hard reboot
  2019-12-12 16:38 ` [PATCH 062/104] virtiofsd: Handle hard reboot Dr. David Alan Gilbert (git)
  2020-01-07 11:14   ` Daniel P. Berrangé
@ 2020-01-20  6:46   ` Misono Tomohiro
  2020-01-22 18:28     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-20  6:46 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Handle a
>   mount
>   hard reboot (without unmount)
>   mount
> 
> we get another 'init' which FUSE doesn't normally expect.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 2d1d1a2e59..45125ef66a 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -2436,7 +2436,21 @@ void fuse_session_process_buf_int(struct fuse_session *se,
>              goto reply_err;
>          }
>      } else if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT) {
> -        goto reply_err;
> +        if (fuse_lowlevel_is_virtio(se)) {

> +            /*
> +             * TODO: This is after a hard reboot typically, we need to do
> +             * a destroy, but we can't reply to this request yet so
> +             * we can't use do_destroy
> +             */

Hi,

I wonder what is the TODO actually. Is this just to provide a common
function for both here and do_destroy() or more than that?

Thanks
Misono

> +            fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
> +            se->got_destroy = 1;
> +            se->got_init = 0;
> +            if (se->op.destroy) {
> +                se->op.destroy(se->userdata);
> +            }
> +        } else {
> +            goto reply_err;
> +        }
>      }
>  
>      err = EACCES;
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 040/104] virtiofsd: Pass write iov's all the way through
  2020-01-19  8:08   ` Xiao Yang
@ 2020-01-20  8:24     ` Philippe Mathieu-Daudé
  2020-01-20 13:28       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-20  8:24 UTC (permalink / raw)
  To: Xiao Yang, Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On 1/19/20 9:08 AM, Xiao Yang wrote:
> On 2019/12/13 0:38, Dr. David Alan Gilbert (git) wrote:
>> From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
>>
>> Pass the write iov pointing to guest RAM all the way through rather
>> than copying the data.
>>
>> Signed-off-by: Dr. David Alan Gilbert<dgilbert@redhat.com>
>> ---
>>   tools/virtiofsd/fuse_virtio.c | 79 ++++++++++++++++++++++++++++++++---
>>   1 file changed, 73 insertions(+), 6 deletions(-)
>>
[...]
> Hi,
> 
> Tested the patch and got the correct data written by guest, so it looks 
> fine to me.
> Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com>

So also:
Tested-by: Xiao Yang <yangx.jy@cn.fujitsu.com>

> 
> Best Regards,
> Xiao Yang



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename()
  2020-01-19  8:35   ` Xiao Yang
@ 2020-01-20  8:27     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-20  8:27 UTC (permalink / raw)
  To: Xiao Yang; +Cc: Vivek Goyal, qemu-devel, stefanha, Dr . David Alan Gilbert

On 1/19/20 9:35 AM, Xiao Yang wrote:
> On 2020/1/17 21:32, Philippe Mathieu-Daudé wrote:
>> We can unref both old/new inodes with the same mutex lock.
>>
>> Signed-off-by: Philippe Mathieu-Daudé<philmd@redhat.com>
>> ---
>> Based-on:<20191212163904.159893-1-dgilbert@redhat.com>
>> "virtiofs daemon"
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg664652.html
>>
>>   tools/virtiofsd/passthrough_ll.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c 
>> b/tools/virtiofsd/passthrough_ll.c
>> index 57f58aef26..5c717cb5a1 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -1461,8 +1461,10 @@ static void lo_rename(fuse_req_t req, 
>> fuse_ino_t parent, const char *name,
>>       }
>>
>>   out:
>> -    unref_inode_lolocked(lo, oldinode, 1);
>> -    unref_inode_lolocked(lo, newinode, 1);
>> +    pthread_mutex_lock(&lo->mutex);
>> +    unref_inode(lo, oldinode, 1);
>> +    unref_inode(lo, newinode, 1);
>> +    pthread_mutex_unlock(&lo->mutex);
> Hi,
> 
> It seems to avoid calling pthread_mutex_lock and pthread_mutex_unlock 
> twice.
> Does the change fix some issues or improve the performance?

No issue, simply intend to improve the performance.

> Best Regards,
> Xiao Yang
>>       lo_inode_put(lo,&oldinode);
>>       lo_inode_put(lo,&newinode);
>>       lo_inode_put(lo,&parent_inode);
> 
> 
> 



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 070/104] virtiofsd: fail when parent inode isn't known in lo_do_lookup()
  2019-12-12 16:38 ` [PATCH 070/104] virtiofsd: fail when parent inode isn't known in lo_do_lookup() Dr. David Alan Gilbert (git)
  2020-01-16  7:17   ` Misono Tomohiro
@ 2020-01-20 10:08   ` Sergio Lopez
  1 sibling, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-20 10:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: vgoyal, stefanha

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Miklos Szeredi <mszeredi@redhat.com>
>
> The Linux file handle APIs (struct export_operations) can access inodes
> that are not attached to parents because path name traversal is not
> performed.  Refuse if there is no parent in lo_do_lookup().
>
> Also clean up lo_do_lookup() while we're here.
>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 071/104] virtiofsd: extract root inode init into setup_root()
  2019-12-12 16:38 ` [PATCH 071/104] virtiofsd: extract root inode init into setup_root() Dr. David Alan Gilbert (git)
  2020-01-16  7:20   ` Misono Tomohiro
@ 2020-01-20 10:09   ` Sergio Lopez
  1 sibling, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-20 10:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: vgoyal, stefanha

[-- Attachment #1: Type: text/plain, Size: 438 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Miklos Szeredi <mszeredi@redhat.com>
>
> Inititialize the root inode in a single place.
>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename
  2019-12-12 16:38 ` [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename Dr. David Alan Gilbert (git)
  2020-01-16 11:56   ` Misono Tomohiro
@ 2020-01-20 10:17   ` Sergio Lopez
  2020-01-20 10:56     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Sergio Lopez @ 2020-01-20 10:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 414 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Miklos Szeredi <mszeredi@redhat.com>
>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 50 +++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)

This one is missing a commit message, and I think the patch isn't
trivial enough to give it a pass without it.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo
  2019-12-12 16:38 ` [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo Dr. David Alan Gilbert (git)
  2020-01-15 11:20   ` Misono Tomohiro
@ 2020-01-20 10:24   ` Sergio Lopez
  2020-01-20 10:54     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Sergio Lopez @ 2020-01-20 10:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Liu Bo <bo.liu@linux.alibaba.com>
>
> For fuse's queueinfo, both queueinfo array and queueinfos are allocated in
> fv_queue_set_started() but not cleaned up when the daemon process quits.
>
> This fixes the leak in proper places.
>
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_virtio.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> index 7b22ae8d4f..a364f23d5d 100644
> --- a/tools/virtiofsd/fuse_virtio.c
> +++ b/tools/virtiofsd/fuse_virtio.c
> @@ -671,6 +671,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
>          }
>          close(ourqi->kill_fd);
>          ourqi->kick_fd = -1;
> +        free(vud->qi[qidx]);
> +        vud->qi[qidx] = NULL;
>      }
>  }
>  
> @@ -878,6 +880,13 @@ int virtio_session_mount(struct fuse_session *se)
>  void virtio_session_close(struct fuse_session *se)
>  {
>      close(se->vu_socketfd);
> +
> +    if (!se->virtio_dev) {
> +        return;
> +    }
> +
> +    close(se->vu_socketfd);
> +    free(se->virtio_dev->qi);
>      free(se->virtio_dev);
>      se->virtio_dev = NULL;
>  }

There's a duplicated "close(se->vu_socketfd);" statement here.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free
  2019-12-12 16:38 ` [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free Dr. David Alan Gilbert (git)
  2020-01-16 12:25   ` Misono Tomohiro
@ 2020-01-20 10:28   ` Sergio Lopez
  1 sibling, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-20 10:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Stefan Hajnoczi <stefanha@redhat.com>
>
> If thread A is using an inode it must not be deleted by thread B when
> processing a FUSE_FORGET request.
>
> The FUSE protocol itself already has a counter called nlookup that is
> used in FUSE_FORGET messages.  We cannot trust this counter since the
> untrusted client can manipulate it via FUSE_FORGET messages.
>
> Introduce a new refcount to keep inodes alive for the required lifespan.
> lo_inode_put() must be called to release a reference.  FUSE's nlookup
> counter holds exactly one reference so that the inode stays alive as
> long as the client still wants to remember it.
>
> Note that the lo_inode->is_symlink field is moved to avoid creating a
> hole in the struct due to struct field alignment.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 168 ++++++++++++++++++++++++++-----
>  1 file changed, 145 insertions(+), 23 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 094/104] virtiofsd: do not always set FUSE_FLOCK_LOCKS
  2019-12-12 16:38 ` [PATCH 094/104] virtiofsd: do not always set FUSE_FLOCK_LOCKS Dr. David Alan Gilbert (git)
  2020-01-17  8:50   ` Misono Tomohiro
@ 2020-01-20 10:31   ` Sergio Lopez
  1 sibling, 0 replies; 307+ messages in thread
From: Sergio Lopez @ 2020-01-20 10:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, stefanha, vgoyal

[-- Attachment #1: Type: text/plain, Size: 445 bytes --]


Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: Peng Tao <tao.peng@linux.alibaba.com>
>
> Right now we always enable it regardless of given commandlines.
> Fix it by setting the flag relying on the lo->flock bit.
>
> Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo
  2020-01-20 10:24   ` Sergio Lopez
@ 2020-01-20 10:54     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-20 10:54 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: qemu-devel, stefanha, vgoyal

* Sergio Lopez (slp@redhat.com) wrote:
> 
> Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:
> 
> > From: Liu Bo <bo.liu@linux.alibaba.com>
> >
> > For fuse's queueinfo, both queueinfo array and queueinfos are allocated in
> > fv_queue_set_started() but not cleaned up when the daemon process quits.
> >
> > This fixes the leak in proper places.
> >
> > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/fuse_virtio.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index 7b22ae8d4f..a364f23d5d 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -671,6 +671,8 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool started)
> >          }
> >          close(ourqi->kill_fd);
> >          ourqi->kick_fd = -1;
> > +        free(vud->qi[qidx]);
> > +        vud->qi[qidx] = NULL;
> >      }
> >  }
> >  
> > @@ -878,6 +880,13 @@ int virtio_session_mount(struct fuse_session *se)
> >  void virtio_session_close(struct fuse_session *se)
> >  {
> >      close(se->vu_socketfd);
> > +
> > +    if (!se->virtio_dev) {
> > +        return;
> > +    }
> > +
> > +    close(se->vu_socketfd);
> > +    free(se->virtio_dev->qi);
> >      free(se->virtio_dev);
> >      se->virtio_dev = NULL;
> >  }
> 
> There's a duplicated "close(se->vu_socketfd);" statement here.

Yep, already spotted/fixed:
  https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg02901.html

Dave

> Sergio.


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename
  2020-01-20 10:17   ` Sergio Lopez
@ 2020-01-20 10:56     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-20 10:56 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: qemu-devel, stefanha, vgoyal

* Sergio Lopez (slp@redhat.com) wrote:
> 
> Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:
> 
> > From: Miklos Szeredi <mszeredi@redhat.com>
> >
> > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 50 +++++++++++++++++++++++++++++++-
> >  1 file changed, 49 insertions(+), 1 deletion(-)
> 
> This one is missing a commit message, and I think the patch isn't
> trivial enough to give it a pass without it.

Yep, see discussion:
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg03296.html

> Sergio.


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 026/104] virtiofsd: Fast path for virtio read
  2020-01-17 18:54   ` Masayoshi Mizuma
@ 2020-01-20 12:32     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-20 12:32 UTC (permalink / raw)
  To: Masayoshi Mizuma; +Cc: qemu-devel, stefanha, vgoyal

* Masayoshi Mizuma (msys.mizuma@gmail.com) wrote:
> On Thu, Dec 12, 2019 at 04:37:46PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Readv the data straight into the guests buffer.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > With fix by:
> > Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c |   5 +
> >  tools/virtiofsd/fuse_virtio.c   | 159 ++++++++++++++++++++++++++++++++
> >  tools/virtiofsd/fuse_virtio.h   |   4 +
> >  3 files changed, 168 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index c2b114cf5b..5f80625652 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -475,6 +475,11 @@ static int fuse_send_data_iov_fallback(struct fuse_session *se,
> >          return fuse_send_msg(se, ch, iov, iov_count);
> >      }
> >  
> > +    if (fuse_lowlevel_is_virtio(se) && buf->count == 1 &&
> > +        buf->buf[0].flags == (FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK)) {
> > +        return virtio_send_data_iov(se, ch, iov, iov_count, buf, len);
> > +    }
> > +
> >      abort(); /* Will have taken vhost path */
> >      return 0;
> >  }
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index c33e0f7e8c..146cd3f702 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -230,6 +230,165 @@ err:
> >      return ret;
> >  }
> >  
> > +/*
> > + * Callback from fuse_send_data_iov_* when it's virtio and the buffer
> > + * is a single FD with FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK
> > + * We need send the iov and then the buffer.
> > + * Return 0 on success
> > + */
> > +int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
> > +                         struct iovec *iov, int count, struct fuse_bufvec *buf,
> > +                         size_t len)
> > +{
> > +    int ret = 0;
> > +    VuVirtqElement *elem;
> > +    VuVirtq *q;
> > +
> > +    assert(count >= 1);
> > +    assert(iov[0].iov_len >= sizeof(struct fuse_out_header));
> > +
> > +    struct fuse_out_header *out = iov[0].iov_base;
> > +    /* TODO: Endianness! */
> > +
> > +    size_t iov_len = iov_size(iov, count);
> > +    size_t tosend_len = iov_len + len;
> > +
> > +    out->len = tosend_len;
> > +
> > +    fuse_log(FUSE_LOG_DEBUG, "%s: count=%d len=%zd iov_len=%zd\n", __func__,
> > +             count, len, iov_len);
> > +
> > +    /* unique == 0 is notification which we don't support */
> > +    assert(out->unique);
> > +
> > +    /* For virtio we always have ch */
> > +    assert(ch);
> > +    assert(!ch->qi->reply_sent);
> > +    elem = ch->qi->qe;
> > +    q = &ch->qi->virtio_dev->dev.vq[ch->qi->qidx];
> > +
> > +    /* The 'in' part of the elem is to qemu */
> > +    unsigned int in_num = elem->in_num;
> > +    struct iovec *in_sg = elem->in_sg;
> > +    size_t in_len = iov_size(in_sg, in_num);
> > +    fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n",
> > +             __func__, elem->index, in_num, in_len);
> > +
> > +    /*
> > +     * The elem should have room for a 'fuse_out_header' (out from fuse)
> > +     * plus the data based on the len in the header.
> > +     */
> > +    if (in_len < sizeof(struct fuse_out_header)) {
> > +        fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
> > +                 __func__, elem->index);
> 
> > +        ret = -E2BIG;
> 
> The ret should be positive value, right?
> 
>            ret = E2BIG;

Yes, I think so.

> > +        goto err;
> > +    }
> > +    if (in_len < tosend_len) {
> > +        fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len %zd\n",
> > +                 __func__, elem->index, tosend_len);
> 
> > +        ret = -E2BIG;
> 
>            ret = E2BIG;
> 
> > +        goto err;
> > +    }
> > +
> > +    /* TODO: Limit to 'len' */
> > +
> > +    /* First copy the header data from iov->in_sg */
> > +    copy_iov(iov, count, in_sg, in_num, iov_len);
> > +
> > +    /*
> > +     * Build a copy of the the in_sg iov so we can skip bits in it,
> > +     * including changing the offsets
> > +     */
> 
> > +    struct iovec *in_sg_cpy = calloc(sizeof(struct iovec), in_num);
> 
>        assert(in_sg_cpy) should be here? in case calloc() fails...

Thanks, added.

> > +    memcpy(in_sg_cpy, in_sg, sizeof(struct iovec) * in_num);
> > +    /* These get updated as we skip */
> > +    struct iovec *in_sg_ptr = in_sg_cpy;
> > +    int in_sg_cpy_count = in_num;
> > +
> > +    /* skip over parts of in_sg that contained the header iov */
> > +    size_t skip_size = iov_len;
> > +
> > +    size_t in_sg_left = 0;
> > +    do {
> > +        while (skip_size != 0 && in_sg_cpy_count) {
> > +            if (skip_size >= in_sg_ptr[0].iov_len) {
> > +                skip_size -= in_sg_ptr[0].iov_len;
> > +                in_sg_ptr++;
> > +                in_sg_cpy_count--;
> > +            } else {
> > +                in_sg_ptr[0].iov_len -= skip_size;
> > +                in_sg_ptr[0].iov_base += skip_size;
> > +                break;
> > +            }
> > +        }
> > +
> > +        int i;
> > +        for (i = 0, in_sg_left = 0; i < in_sg_cpy_count; i++) {
> > +            in_sg_left += in_sg_ptr[i].iov_len;
> > +        }
> > +        fuse_log(FUSE_LOG_DEBUG,
> > +                 "%s: after skip skip_size=%zd in_sg_cpy_count=%d "
> > +                 "in_sg_left=%zd\n",
> > +                 __func__, skip_size, in_sg_cpy_count, in_sg_left);
> > +        ret = preadv(buf->buf[0].fd, in_sg_ptr, in_sg_cpy_count,
> > +                     buf->buf[0].pos);
> > +
> 
> > +        fuse_log(FUSE_LOG_DEBUG, "%s: preadv_res=%d(%m) len=%zd\n",
> > +                 __func__, ret, len);
> 
> "%m" should be removed? because it may show the previous errno even if preadv()
> is succsess. Like as:
> 
> [ID: 00000079] virtio_send_data_iov: after skip skip_size=0 in_sg_cpy_count=1 in_sg_left=65536
> [ID: 00000079] virtio_send_data_iov: preadv_res=16000(No such file or directory) len=65536

I think there's another problem; that fuse_log might corrupt errno, so
we return a bad errno below it.
So I'll split it into two separate fuse_log's - one inside the (ret ==
-1_ block with the %m and one after without it.

> Otherwise, looks good to me:
> 
> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

Thanks

> 
> Thanks,
> Masa
> 
> > +        if (ret == -1) {
> > +            ret = errno;
> > +            free(in_sg_cpy);
> > +            goto err;
> > +        }
> > +        if (ret < len && ret) {
> > +            fuse_log(FUSE_LOG_DEBUG, "%s: ret < len\n", __func__);
> > +            /* Skip over this much next time around */
> > +            skip_size = ret;
> > +            buf->buf[0].pos += ret;
> > +            len -= ret;
> > +
> > +            /* Lets do another read */
> > +            continue;
> > +        }
> > +        if (!ret) {
> > +            /* EOF case? */
> > +            fuse_log(FUSE_LOG_DEBUG, "%s: !ret in_sg_left=%zd\n", __func__,
> > +                     in_sg_left);
> > +            break;
> > +        }
> > +        if (ret != len) {
> > +            fuse_log(FUSE_LOG_DEBUG, "%s: ret!=len\n", __func__);
> > +            ret = EIO;
> > +            free(in_sg_cpy);
> > +            goto err;
> > +        }
> > +        in_sg_left -= ret;
> > +        len -= ret;
> > +    } while (in_sg_left);
> > +    free(in_sg_cpy);
> > +
> > +    /* Need to fix out->len on EOF */
> > +    if (len) {
> > +        struct fuse_out_header *out_sg = in_sg[0].iov_base;
> > +
> > +        tosend_len -= len;
> > +        out_sg->len = tosend_len;
> > +    }
> > +
> > +    ret = 0;
> > +
> > +    vu_queue_push(&se->virtio_dev->dev, q, elem, tosend_len);
> > +    vu_queue_notify(&se->virtio_dev->dev, q);
> > +
> > +err:
> > +    if (ret == 0) {
> > +        ch->qi->reply_sent = true;
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> >  /* Thread function for individual queues, created when a queue is 'started' */
> >  static void *fv_queue_thread(void *opaque)
> >  {
> > diff --git a/tools/virtiofsd/fuse_virtio.h b/tools/virtiofsd/fuse_virtio.h
> > index 135a14875a..cc676b9193 100644
> > --- a/tools/virtiofsd/fuse_virtio.h
> > +++ b/tools/virtiofsd/fuse_virtio.h
> > @@ -26,4 +26,8 @@ int virtio_loop(struct fuse_session *se);
> >  int virtio_send_msg(struct fuse_session *se, struct fuse_chan *ch,
> >                      struct iovec *iov, int count);
> >  
> > +int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
> > +                         struct iovec *iov, int count,
> > +                         struct fuse_bufvec *buf, size_t len);
> > +
> >  #endif
> > -- 
> > 2.23.0
> > 
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 100/104] virtiofsd: process requests in a thread pool
  2019-12-12 16:39 ` [PATCH 100/104] virtiofsd: process requests in a thread pool Dr. David Alan Gilbert (git)
@ 2020-01-20 12:54   ` Misono Tomohiro
  0 siblings, 0 replies; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-20 12:54 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Introduce a thread pool so that fv_queue_thread() just pops
> VuVirtqElements and hands them to the thread pool.  For the time being
> only one worker thread is allowed since passthrough_ll.c is not
> thread-safe yet.  Future patches will lift this restriction so that
> multiple FUSE requests can be processed in parallel.
> 
> The main new concept is struct FVRequest, which contains both
> VuVirtqElement and struct fuse_chan.  We now have fv_VuDev for a device,
> fv_QueueInfo for a virtqueue, and FVRequest for a request.  Some of
> fv_QueueInfo's fields are moved into FVRequest because they are
> per-request.  The name FVRequest conforms to QEMU coding style and I
> expect the struct fv_* types will be renamed in a future refactoring.
> 
> This patch series is not optimal.  fbuf reuse is dropped so each request
> does malloc(se->bufsize), but there is no clean and cheap way to keep
> this with a thread pool.  The vq_lock mutex is held for longer than
> necessary, especially during the eventfd_write() syscall.  Performance
> can be improved in the future.
> 
> prctl(2) had to be added to the seccomp whitelist because glib invokes
> it.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Looks good to me.
 Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 037/104] virtiofsd: passthrough_ll: add fallback for racy ops
  2020-01-18 16:22   ` Masayoshi Mizuma
@ 2020-01-20 13:26     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-20 13:26 UTC (permalink / raw)
  To: Masayoshi Mizuma; +Cc: qemu-devel, stefanha, vgoyal

* Masayoshi Mizuma (msys.mizuma@gmail.com) wrote:
> On Thu, Dec 12, 2019 at 04:37:57PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: Miklos Szeredi <mszeredi@redhat.com>
> > 
> > We have two operations that cannot be done race-free on a symlink in
> > certain cases: utimes and link.
> > 
> > Add racy fallback for these if the race-free method doesn't work.  We do
> > our best to avoid races even in this case:
> > 
> >   - get absolute path by reading /proc/self/fd/NN symlink
> > 
> >   - lookup parent directory: after this we are safe against renames in
> >     ancestors
> > 
> >   - lookup name in parent directory, and verify that we got to the original
> >     inode,  if not retry the whole thing
> > 
> > Both utimes(2) and link(2) hold i_lock on the inode across the operation,
> > so a racing rename/delete by this fuse instance is not possible, only from
> > other entities changing the filesystem.
> > 
> > If the "norace" option is given, then disable the racy fallbacks.
> > 
> > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 159 +++++++++++++++++++++++++++----
> >  1 file changed, 142 insertions(+), 17 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 93e74cce21..1faae2753f 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -98,6 +98,7 @@ enum {
> >  struct lo_data {
> >      pthread_mutex_t mutex;
> >      int debug;
> > +    int norace;
> >      int writeback;
> >      int flock;
> >      int xattr;
> > @@ -124,10 +125,15 @@ static const struct fuse_opt lo_opts[] = {
> >      { "cache=never", offsetof(struct lo_data, cache), CACHE_NEVER },
> >      { "cache=auto", offsetof(struct lo_data, cache), CACHE_NORMAL },
> >      { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },
> > -
> > +    { "norace", offsetof(struct lo_data, norace), 1 },
> >      FUSE_OPT_END
> >  };
> >  
> > +static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
> > +
> > +static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st);
> > +
> > +
> >  static struct lo_data *lo_data(fuse_req_t req)
> >  {
> >      return (struct lo_data *)fuse_req_userdata(req);
> > @@ -347,23 +353,127 @@ static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> >      fuse_reply_attr(req, &buf, lo->timeout);
> >  }
> >  > -static int utimensat_empty_nofollow(struct lo_inode *inode,
> > -                                    const struct timespec *tv)
> > +static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
> > +                              char path[PATH_MAX], struct lo_inode **parent)
> >  {
> > -    int res;
> >      char procname[64];
> > +    char *last;
> > +    struct stat stat;
> > +    struct lo_inode *p;
> > +    int retries = 2;
> > +    int res;
> > +
> > +retry:
> > +    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> > +
> > +    res = readlink(procname, path, PATH_MAX);
> > +    if (res < 0) {
> 
> > +        fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink failed: %m\n");
> 
> I think it's better to use __func__ macro in case the function name is
> changed in the future.

Yes, agreed - I've changed them all

>            fuse_log(FUSE_LOG_WARNING, "%s: readlink failed: %m\n", __func__);
> 
> > +        goto fail_noretry;
> > +    }
> > +
> > +    if (res >= PATH_MAX) {
> 
> > +        fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink overflowed\n");
> 
>            fuse_log(FUSE_LOG_WARNING, "%s: readlink overflowed\n", __func__);
> 
> > +        goto fail_noretry;
> > +    }
> > +    path[res] = '\0';
> > +
> > +    last = strrchr(path, '/');
> > +    if (last == NULL) {
> > +        /* Shouldn't happen */
> 
> > +        fuse_log(
> > +            FUSE_LOG_WARNING,
> > +            "lo_parent_and_name: INTERNAL ERROR: bad path read from proc\n");
> 
>            fuse_log(
>                FUSE_LOG_WARNING,
>             "%s: INTERNAL ERROR: bad path read from proc\n", __func__);
> 
> > +        goto fail_noretry;
> > +    }
> > +    if (last == path) {
> > +        p = &lo->root;
> > +        pthread_mutex_lock(&lo->mutex);
> > +        p->refcount++;
> > +        pthread_mutex_unlock(&lo->mutex);
> > +    } else {
> > +        *last = '\0';
> > +        res = fstatat(AT_FDCWD, last == path ? "/" : path, &stat, 0);
> > +        if (res == -1) {
> > +            if (!retries) {
> 
> > +                fuse_log(FUSE_LOG_WARNING,
> > +                         "lo_parent_and_name: failed to stat parent: %m\n");
> 
>                    fuse_log(FUSE_LOG_WARNING,
>                             "%s: failed to stat parent: %m\n", __func__);
> 
> > +            }
> > +            goto fail;
> > +        }
> > +        p = lo_find(lo, &stat);
> > +        if (p == NULL) {
> > +            if (!retries) {
> 
> > +                fuse_log(FUSE_LOG_WARNING,
> > +                         "lo_parent_and_name: failed to find parent\n");
> 
>                    fuse_log(FUSE_LOG_WARNING,
>                          "%s: failed to find parent\n", __func__);
> 
> > +            }
> > +            goto fail;
> > +        }
> > +    }
> > +    last++;
> > +    res = fstatat(p->fd, last, &stat, AT_SYMLINK_NOFOLLOW);
> > +    if (res == -1) {
> > +        if (!retries) {
> 
> > +            fuse_log(FUSE_LOG_WARNING,
> > +                     "lo_parent_and_name: failed to stat last\n");
> 
>                fuse_log(FUSE_LOG_WARNING,
>                      "%s: failed to stat last\n", __func__);
> 
> > +        }
> > +        goto fail_unref;
> > +    }
> > +    if (stat.st_dev != inode->dev || stat.st_ino != inode->ino) {
> > +        if (!retries) {
> 
> > +            fuse_log(FUSE_LOG_WARNING,
> > +                     "lo_parent_and_name: failed to match last\n");
> 
>                fuse_log(FUSE_LOG_WARNING,
>                         "%s: failed to match last\n", __func__);
> 
> > +        }
> > +        goto fail_unref;
> > +    }
> > +    *parent = p;
> > +    memmove(path, last, strlen(last) + 1);
> > +
> > +    return 0;
> > +
> > +fail_unref:
> > +    unref_inode(lo, p, 1);
> > +fail:
> > +    if (retries) {
> > +        retries--;
> > +        goto retry;
> > +    }
> > +fail_noretry:
> > +    errno = EIO;
> > +    return -1;
> > +}
> > +
> > +static int utimensat_empty(struct lo_data *lo, struct lo_inode *inode,
> > +                           const struct timespec *tv)
> > +{
> > +    int res;
> > +    struct lo_inode *parent;
> > +    char path[PATH_MAX];
> >  
> >      if (inode->is_symlink) {
> > -        res = utimensat(inode->fd, "", tv, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
> > +        res = utimensat(inode->fd, "", tv, AT_EMPTY_PATH);
> >          if (res == -1 && errno == EINVAL) {
> >              /* Sorry, no race free way to set times on symlink. */
> > -            errno = EPERM;
> > +            if (lo->norace) {
> > +                errno = EPERM;
> > +            } else {
> > +                goto fallback;
> > +            }
> >          }
> >          return res;
> >      }
> > -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> > +    sprintf(path, "/proc/self/fd/%i", inode->fd);
> > +
> > +    return utimensat(AT_FDCWD, path, tv, 0);
> >  
> > -    return utimensat(AT_FDCWD, procname, tv, 0);
> > +fallback:
> > +    res = lo_parent_and_name(lo, inode, path, &parent);
> > +    if (res != -1) {
> > +        res = utimensat(parent->fd, path, tv, AT_SYMLINK_NOFOLLOW);
> > +        unref_inode(lo, parent, 1);
> > +    }
> > +
> > +    return res;
> >  }
> >  
> >  static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
> > @@ -387,6 +497,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
> >  {
> >      int saverr;
> >      char procname[64];
> > +    struct lo_data *lo = lo_data(req);
> >      struct lo_inode *inode;
> >      int ifd;
> >      int res;
> > @@ -459,7 +570,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
> >          if (fi) {
> >              res = futimens(fd, tv);
> >          } else {
> > -            res = utimensat_empty_nofollow(inode, tv);
> > +            res = utimensat_empty(lo, inode, tv);
> >          }
> >          if (res == -1) {
> >              goto out_err;
> > @@ -692,24 +803,38 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
> >      lo_mknod_symlink(req, parent, name, S_IFLNK, 0, link);
> >  }
> >  
> > -static int linkat_empty_nofollow(struct lo_inode *inode, int dfd,
> > -                                 const char *name)
> > +static int linkat_empty_nofollow(struct lo_data *lo, struct lo_inode *inode,
> > +                                 int dfd, const char *name)
> >  {
> >      int res;
> > -    char procname[64];
> > +    struct lo_inode *parent;
> > +    char path[PATH_MAX];
> >  
> >      if (inode->is_symlink) {
> >          res = linkat(inode->fd, "", dfd, name, AT_EMPTY_PATH);
> >          if (res == -1 && (errno == ENOENT || errno == EINVAL)) {
> >              /* Sorry, no race free way to hard-link a symlink. */
> > -            errno = EPERM;
> > +            if (lo->norace) {
> > +                errno = EPERM;
> > +            } else {
> > +                goto fallback;
> > +            }
> >          }
> >          return res;
> >      }
> >  
> > -    sprintf(procname, "/proc/self/fd/%i", inode->fd);
> > +    sprintf(path, "/proc/self/fd/%i", inode->fd);
> > +
> > +    return linkat(AT_FDCWD, path, dfd, name, AT_SYMLINK_FOLLOW);
> > +
> > +fallback:
> > +    res = lo_parent_and_name(lo, inode, path, &parent);
> > +    if (res != -1) {
> > +        res = linkat(parent->fd, path, dfd, name, 0);
> > +        unref_inode(lo, parent, 1);
> > +    }
> >  
> > -    return linkat(AT_FDCWD, procname, dfd, name, AT_SYMLINK_FOLLOW);
> > +    return res;
> >  }
> >  
> >  static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
> > @@ -731,7 +856,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
> >      e.attr_timeout = lo->timeout;
> >      e.entry_timeout = lo->timeout;
> >  
> > -    res = linkat_empty_nofollow(inode, lo_fd(req, parent), name);
> > +    res = linkat_empty_nofollow(lo, inode, lo_fd(req, parent), name);
> >      if (res == -1) {
> >          goto out_err;
> >      }
> > @@ -1544,7 +1669,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
> >      }
> >  
> >      if (inode->is_symlink) {
> > -        /* Sorry, no race free way to setxattr on symlink. */
> > +        /* Sorry, no race free way to removexattr on symlink. */

I've undone that change; since it looks bogus.

> >          saverr = EPERM;
> >          goto out;
> >      }
> 
> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

Thanks.

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 040/104] virtiofsd: Pass write iov's all the way through
  2020-01-20  8:24     ` Philippe Mathieu-Daudé
@ 2020-01-20 13:28       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-20 13:28 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: vgoyal, Xiao Yang, stefanha, qemu-devel

* Philippe Mathieu-Daudé (philmd@redhat.com) wrote:
> On 1/19/20 9:08 AM, Xiao Yang wrote:
> > On 2019/12/13 0:38, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
> > > 
> > > Pass the write iov pointing to guest RAM all the way through rather
> > > than copying the data.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert<dgilbert@redhat.com>
> > > ---
> > >   tools/virtiofsd/fuse_virtio.c | 79 ++++++++++++++++++++++++++++++++---
> > >   1 file changed, 73 insertions(+), 6 deletions(-)
> > > 
> [...]
> > Hi,
> > 
> > Tested the patch and got the correct data written by guest, so it looks
> > fine to me.
> > Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
> 
> So also:
> Tested-by: Xiao Yang <yangx.jy@cn.fujitsu.com>

I'd take that but only if that's directly on a Xiao Yang's mail.

> > 
> > Best Regards,
> > Xiao Yang
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 091/104] libvhost-user: Fix some memtable remap cases
  2020-01-17 13:58   ` Marc-André Lureau
@ 2020-01-20 15:50     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-20 15:50 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: QEMU, Stefan Hajnoczi, vgoyal

* Marc-André Lureau (marcandre.lureau@gmail.com) wrote:
> Hi
> 
> On Thu, Dec 12, 2019 at 10:05 PM Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
> >
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > If a new setmemtable command comes in once the vhost threads are
> > running, it will remap the guests address space and the threads
> > will now be looking in the wrong place.
> >
> > Fortunately we're running this command under lock, so we can
> > update the queue mappings so that threads will look in the new-right
> > place.
> >
> > Note: This doesn't fix things that the threads might be doing
> > without a lock (e.g. a readv/writev!)  That's for another time.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  contrib/libvhost-user/libvhost-user.c | 33 ++++++++++++++++++++-------
> >  contrib/libvhost-user/libvhost-user.h |  3 +++
> >  2 files changed, 28 insertions(+), 8 deletions(-)
> >
> > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > index 63e41062a4..b89bf18501 100644
> > --- a/contrib/libvhost-user/libvhost-user.c
> > +++ b/contrib/libvhost-user/libvhost-user.c
> > @@ -564,6 +564,21 @@ vu_reset_device_exec(VuDev *dev, VhostUserMsg *vmsg)
> >      return false;
> >  }
> >
> > +static bool
> > +map_ring(VuDev *dev, VuVirtq *vq)
> > +{
> > +    vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
> > +    vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
> > +    vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
> > +
> > +    DPRINT("Setting virtq addresses:\n");
> > +    DPRINT("    vring_desc  at %p\n", vq->vring.desc);
> > +    DPRINT("    vring_used  at %p\n", vq->vring.used);
> > +    DPRINT("    vring_avail at %p\n", vq->vring.avail);
> > +
> > +    return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
> > +}
> > +
> >  static bool
> >  vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
> >  {
> > @@ -767,6 +782,14 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> >          close(vmsg->fds[i]);
> >      }
> >
> > +    for (i = 0; i < dev->max_queues; i++) {
> > +        if (dev->vq[i].vring.desc) {
> 
> The code usually checks for all (vq->vring.desc && vq->vring.used &&
> vq->vring.avail).
> 
> Perhaps we should introduce a VQ_RING_IS_SET() macro?

I'd like to understand why? and what to do in the case only one was set?
In this case I'm intending to make sure that there's no old mapping left
in any of the 3.

> > +            if (map_ring(dev, &dev->vq[i])) {
> > +                vu_panic(dev, "remaping queue %d during setmemtable", i);
> > +            }
> > +        }
> > +    }
> > +
> >      return false;
> >  }
> >
> > @@ -853,18 +876,12 @@ vu_set_vring_addr_exec(VuDev *dev, VhostUserMsg *vmsg)
> >      DPRINT("    avail_user_addr:  0x%016" PRIx64 "\n", vra->avail_user_addr);
> >      DPRINT("    log_guest_addr:   0x%016" PRIx64 "\n", vra->log_guest_addr);
> >
> > +    vq->vra = *vra;
> >      vq->vring.flags = vra->flags;
> > -    vq->vring.desc = qva_to_va(dev, vra->desc_user_addr);
> > -    vq->vring.used = qva_to_va(dev, vra->used_user_addr);
> > -    vq->vring.avail = qva_to_va(dev, vra->avail_user_addr);
> >      vq->vring.log_guest_addr = vra->log_guest_addr;
> >
> > -    DPRINT("Setting virtq addresses:\n");
> > -    DPRINT("    vring_desc  at %p\n", vq->vring.desc);
> > -    DPRINT("    vring_used  at %p\n", vq->vring.used);
> > -    DPRINT("    vring_avail at %p\n", vq->vring.avail);
> >
> > -    if (!(vq->vring.desc && vq->vring.used && vq->vring.avail)) {
> > +    if (map_ring(dev, vq)) {
> >          vu_panic(dev, "Invalid vring_addr message");
> >          return false;
> >      }
> > diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
> > index 1844b6f8d4..5cb7708559 100644
> > --- a/contrib/libvhost-user/libvhost-user.h
> > +++ b/contrib/libvhost-user/libvhost-user.h
> > @@ -327,6 +327,9 @@ typedef struct VuVirtq {
> >      int err_fd;
> >      unsigned int enable;
> >      bool started;
> > +
> > +    /* Guest addresses of our ring */
> > +    struct vhost_vring_addr vra;
> >  } VuVirtq;
> >
> >  enum VuWatchCondtion {
> > --
> > 2.23.0
> >
> >
> 
> Looks reasonable otherwise (assuming all running threads were flushed
> somehow by other means)

Yeh, well, that's a separate question - which I think there's room for
more caution over; but that is why there's a 'some' in the subject line.

Dave

> -- 
> Marc-André Lureau
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename()
  2020-01-17 13:32 ` [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename() Philippe Mathieu-Daudé
  2020-01-19  8:35   ` Xiao Yang
@ 2020-01-20 18:52   ` Dr. David Alan Gilbert
  2020-01-20 18:55     ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-20 18:52 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: qemu-devel, stefanha, Vivek Goyal

* Philippe Mathieu-Daudé (philmd@redhat.com) wrote:
> We can unref both old/new inodes with the same mutex lock.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> Based-on: <20191212163904.159893-1-dgilbert@redhat.com>
> "virtiofs daemon"
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg664652.html
> 
>  tools/virtiofsd/passthrough_ll.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 57f58aef26..5c717cb5a1 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1461,8 +1461,10 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
>      }
>  
>  out:
> -    unref_inode_lolocked(lo, oldinode, 1);
> -    unref_inode_lolocked(lo, newinode, 1);
> +    pthread_mutex_lock(&lo->mutex);
> +    unref_inode(lo, oldinode, 1);
> +    unref_inode(lo, newinode, 1);
> +    pthread_mutex_unlock(&lo->mutex);

While that would work; I'd rather keep that code simpler and the
same as every other normal operation - we only use the unref_inode
in one other place and that's because we're iterating the hash table
while deleting stuff.

Dave

>      lo_inode_put(lo, &oldinode);
>      lo_inode_put(lo, &newinode);
>      lo_inode_put(lo, &parent_inode);
> -- 
> 2.21.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename()
  2020-01-20 18:52   ` Dr. David Alan Gilbert
@ 2020-01-20 18:55     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 307+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-01-20 18:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: QEMU Developers, Stefan Hajnoczi, Vivek Goyal

On Mon, Jan 20, 2020 at 7:52 PM Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Philippe Mathieu-Daudé (philmd@redhat.com) wrote:
> > We can unref both old/new inodes with the same mutex lock.
> >
> > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> > ---
> > Based-on: <20191212163904.159893-1-dgilbert@redhat.com>
> > "virtiofs daemon"
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg664652.html
> >
> >  tools/virtiofsd/passthrough_ll.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 57f58aef26..5c717cb5a1 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -1461,8 +1461,10 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
> >      }
> >
> >  out:
> > -    unref_inode_lolocked(lo, oldinode, 1);
> > -    unref_inode_lolocked(lo, newinode, 1);
> > +    pthread_mutex_lock(&lo->mutex);
> > +    unref_inode(lo, oldinode, 1);
> > +    unref_inode(lo, newinode, 1);
> > +    pthread_mutex_unlock(&lo->mutex);
>
> While that would work; I'd rather keep that code simpler and the
> same as every other normal operation - we only use the unref_inode
> in one other place and that's because we're iterating the hash table
> while deleting stuff.

OK I understand.

> Dave
>
> >      lo_inode_put(lo, &oldinode);
> >      lo_inode_put(lo, &newinode);
> >      lo_inode_put(lo, &parent_inode);
> > --
> > 2.21.1
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting
  2019-12-12 16:37 ` [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting Dr. David Alan Gilbert (git)
  2020-01-03 15:21   ` Daniel P. Berrangé
@ 2020-01-21  6:57   ` Misono Tomohiro
  2020-01-21 11:38     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Misono Tomohiro @ 2020-01-21  6:57 UTC (permalink / raw)
  To: dgilbert; +Cc: misono.tomohiro, qemu-devel, stefanha, vgoyal

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When run with vhost-user options we conect to the QEMU instead
> via a socket.  Start this off by creating the socket.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---

<snip>
> +    /*
> +     * Poison the fuse FD so we spot if we accidentally use it;
> +     * DO NOT check for this value, check for fuse_lowlevel_is_virtio()
> +     */
> +    se->fd = 0xdaff0d11;
</snip>

As a result of this, se->fd now holds dummy fd.
So we should remove close(se->fd) in fuse_session_destroy():
https://gitlab.com/virtio-fs/qemu/blob/virtio-fs-dev/tools/virtiofsd/fuse_lowlevel.c#L2663

Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 006/104] virtiofsd: Trim down imported files
  2019-12-12 16:37 ` [PATCH 006/104] virtiofsd: Trim down imported files Dr. David Alan Gilbert (git)
  2020-01-03 12:02   ` Daniel P. Berrangé
@ 2020-01-21  9:58   ` Xiao Yang
  2020-01-21 10:51     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 307+ messages in thread
From: Xiao Yang @ 2020-01-21  9:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, stefanha, vgoyal

On 2019/12/13 0:37, Dr. David Alan Gilbert (git) wrote:
> -	res = fuse_buf_copy(&pipe_buf, buf,
> -			    FUSE_BUF_FORCE_SPLICE | FUSE_BUF_SPLICE_NONBLOCK);
> -	if (res<  0) {
> -		if (res == -EAGAIN || res == -EINVAL) {
> -			/*
> -			 * Should only get EAGAIN on kernels with
> -			 * broken SPLICE_F_NONBLOCK support (<=
> -			 * 2.6.35) where this error or a short read is
> -			 * returned even if the pipe itself is not
> -			 * full
> -			 *
> -			 * EINVAL might mean that splice can't handle
> -			 * this combination of input and output.
> -			 */
> -			if (res == -EAGAIN)
> -				se->broken_splice_nonblock = 1;
> -
> -			pthread_setspecific(se->pipe_key, NULL);
> -			fuse_ll_pipe_free(llp);
> -			goto fallback;
> -		}
> -		res = -res;
> -		goto clear_pipe;
> -	}
> -
> -	if (res != 0&&  res<  len) {
> -		struct fuse_bufvec mem_buf = FUSE_BUFVEC_INIT(len);
> -		void *mbuf;
> -		size_t now_len = res;
> -		/*
> -		 * For regular files a short count is either
> -		 *  1) due to EOF, or
> -		 *  2) because of broken SPLICE_F_NONBLOCK (see above)
> -		 *
> -		 * For other inputs it's possible that we overflowed
> -		 * the pipe because of small buffer fragments.
> -		 */
> -
> -		res = posix_memalign(&mbuf, pagesize, len);
> -		if (res != 0)
> -			goto clear_pipe;
> -
> -		mem_buf.buf[0].mem = mbuf;
> -		mem_buf.off = now_len;
> -		res = fuse_buf_copy(&mem_buf, buf, 0);
> -		if (res>  0) {
> -			char *tmpbuf;
> -			size_t extra_len = res;
> -			/*
> -			 * Trickiest case: got more data.  Need to get
> -			 * back the data from the pipe and then fall
> -			 * back to regular write.
> -			 */
> -			tmpbuf = malloc(headerlen);
> -			if (tmpbuf == NULL) {
> -				free(mbuf);
> -				res = ENOMEM;
> -				goto clear_pipe;
> -			}
> -			res = read_back(llp->pipe[0], tmpbuf, headerlen);
> -			free(tmpbuf);
> -			if (res != 0) {
> -				free(mbuf);
> -				goto clear_pipe;
> -			}
> -			res = read_back(llp->pipe[0], mbuf, now_len);
> -			if (res != 0) {
> -				free(mbuf);
> -				goto clear_pipe;
> -			}
> -			len = now_len + extra_len;
> -			iov[iov_count].iov_base = mbuf;
> -			iov[iov_count].iov_len = len;
> -			iov_count++;
> -			res = fuse_send_msg(se, ch, iov, iov_count);
> -			free(mbuf);
> -			return res;
> -		}
> -		free(mbuf);
> -		res = now_len;
> -	}
> -	len = res;
> -	out->len = headerlen + len;
> -
> -	if (se->debug) {
> -		fuse_log(FUSE_LOG_DEBUG,
> -			"   unique: %llu, success, outsize: %i (splice)\n",
> -			(unsigned long long) out->unique, out->len);
> -	}
> -
> -	splice_flags = 0;
> -	if ((flags&  FUSE_BUF_SPLICE_MOVE)&&
> -	    (se->conn.want&  FUSE_CAP_SPLICE_MOVE))
> -		splice_flags |= SPLICE_F_MOVE;
> -
> -	res = splice(llp->pipe[0], NULL, ch ? ch->fd : se->fd,
> -		     NULL, out->len, splice_flags);
Hi,

1) In buffer.c, fuse_buf_splice() uses splice(2) to copy/move data in 
some cases if the syscall is supported.
2) One pipe needs to be passed to splice(2) so splice(2) without one 
pipe fails and then go back to use other ways(e.g. use fuse_buf_fd_to_fd()).
3) fuse_buf_copy() calls fuse_buf_splice() indirectly and this patch has 
removed all pipes used by fuse_buf_copy().

Is it necessary to leave the code related to splice(2)?  Is it going to 
be used in future?
We have to use splice(2) by the correct CONFIG_SPLICE macro If necessary.

Best Regards,
Xiao Yang




^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 006/104] virtiofsd: Trim down imported files
  2020-01-21  9:58   ` Xiao Yang
@ 2020-01-21 10:51     ` Dr. David Alan Gilbert
  2020-01-22  0:57       ` Xiao Yang
  0 siblings, 1 reply; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-21 10:51 UTC (permalink / raw)
  To: Xiao Yang; +Cc: qemu-devel, stefanha, vgoyal

* Xiao Yang (yangx.jy@cn.fujitsu.com) wrote:
> On 2019/12/13 0:37, Dr. David Alan Gilbert (git) wrote:
> > -	res = fuse_buf_copy(&pipe_buf, buf,
> > -			    FUSE_BUF_FORCE_SPLICE | FUSE_BUF_SPLICE_NONBLOCK);
> > -	if (res<  0) {
> > -		if (res == -EAGAIN || res == -EINVAL) {
> > -			/*
> > -			 * Should only get EAGAIN on kernels with
> > -			 * broken SPLICE_F_NONBLOCK support (<=
> > -			 * 2.6.35) where this error or a short read is
> > -			 * returned even if the pipe itself is not
> > -			 * full
> > -			 *
> > -			 * EINVAL might mean that splice can't handle
> > -			 * this combination of input and output.
> > -			 */
> > -			if (res == -EAGAIN)
> > -				se->broken_splice_nonblock = 1;
> > -
> > -			pthread_setspecific(se->pipe_key, NULL);
> > -			fuse_ll_pipe_free(llp);
> > -			goto fallback;
> > -		}
> > -		res = -res;
> > -		goto clear_pipe;
> > -	}
> > -
> > -	if (res != 0&&  res<  len) {
> > -		struct fuse_bufvec mem_buf = FUSE_BUFVEC_INIT(len);
> > -		void *mbuf;
> > -		size_t now_len = res;
> > -		/*
> > -		 * For regular files a short count is either
> > -		 *  1) due to EOF, or
> > -		 *  2) because of broken SPLICE_F_NONBLOCK (see above)
> > -		 *
> > -		 * For other inputs it's possible that we overflowed
> > -		 * the pipe because of small buffer fragments.
> > -		 */
> > -
> > -		res = posix_memalign(&mbuf, pagesize, len);
> > -		if (res != 0)
> > -			goto clear_pipe;
> > -
> > -		mem_buf.buf[0].mem = mbuf;
> > -		mem_buf.off = now_len;
> > -		res = fuse_buf_copy(&mem_buf, buf, 0);
> > -		if (res>  0) {
> > -			char *tmpbuf;
> > -			size_t extra_len = res;
> > -			/*
> > -			 * Trickiest case: got more data.  Need to get
> > -			 * back the data from the pipe and then fall
> > -			 * back to regular write.
> > -			 */
> > -			tmpbuf = malloc(headerlen);
> > -			if (tmpbuf == NULL) {
> > -				free(mbuf);
> > -				res = ENOMEM;
> > -				goto clear_pipe;
> > -			}
> > -			res = read_back(llp->pipe[0], tmpbuf, headerlen);
> > -			free(tmpbuf);
> > -			if (res != 0) {
> > -				free(mbuf);
> > -				goto clear_pipe;
> > -			}
> > -			res = read_back(llp->pipe[0], mbuf, now_len);
> > -			if (res != 0) {
> > -				free(mbuf);
> > -				goto clear_pipe;
> > -			}
> > -			len = now_len + extra_len;
> > -			iov[iov_count].iov_base = mbuf;
> > -			iov[iov_count].iov_len = len;
> > -			iov_count++;
> > -			res = fuse_send_msg(se, ch, iov, iov_count);
> > -			free(mbuf);
> > -			return res;
> > -		}
> > -		free(mbuf);
> > -		res = now_len;
> > -	}
> > -	len = res;
> > -	out->len = headerlen + len;
> > -
> > -	if (se->debug) {
> > -		fuse_log(FUSE_LOG_DEBUG,
> > -			"   unique: %llu, success, outsize: %i (splice)\n",
> > -			(unsigned long long) out->unique, out->len);
> > -	}
> > -
> > -	splice_flags = 0;
> > -	if ((flags&  FUSE_BUF_SPLICE_MOVE)&&
> > -	    (se->conn.want&  FUSE_CAP_SPLICE_MOVE))
> > -		splice_flags |= SPLICE_F_MOVE;
> > -
> > -	res = splice(llp->pipe[0], NULL, ch ? ch->fd : se->fd,
> > -		     NULL, out->len, splice_flags);
> Hi,
> 
> 1) In buffer.c, fuse_buf_splice() uses splice(2) to copy/move data in some
> cases if the syscall is supported.
> 2) One pipe needs to be passed to splice(2) so splice(2) without one pipe
> fails and then go back to use other ways(e.g. use fuse_buf_fd_to_fd()).
> 3) fuse_buf_copy() calls fuse_buf_splice() indirectly and this patch has
> removed all pipes used by fuse_buf_copy().
> 
> Is it necessary to leave the code related to splice(2)?  Is it going to be
> used in future?
> We have to use splice(2) by the correct CONFIG_SPLICE macro If necessary.

Yes, I think we never set HAVE_SPLICE; so that code can go.
I'll change this patch to remove that as well.

Dave

> Best Regards,
> Xiao Yang
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting
  2020-01-21  6:57   ` Misono Tomohiro
@ 2020-01-21 11:38     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-21 11:38 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: qemu-devel, stefanha, vgoyal

* Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > When run with vhost-user options we conect to the QEMU instead
> > via a socket.  Start this off by creating the socket.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> 
> <snip>
> > +    /*
> > +     * Poison the fuse FD so we spot if we accidentally use it;
> > +     * DO NOT check for this value, check for fuse_lowlevel_is_virtio()
> > +     */
> > +    se->fd = 0xdaff0d11;
> </snip>
> 
> As a result of this, se->fd now holds dummy fd.
> So we should remove close(se->fd) in fuse_session_destroy():
> https://gitlab.com/virtio-fs/qemu/blob/virtio-fs-dev/tools/virtiofsd/fuse_lowlevel.c#L2663

Thanks; the easier fix here is to remove the dummy
0xdaff0d11 value and just use -1  - it was quite a useful trick when
I first did this to find places where we were accidentally using
the fd when we shouldn't, but it's not really needed now we got it
going.

> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>

Thanks.

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 006/104] virtiofsd: Trim down imported files
  2020-01-21 10:51     ` Dr. David Alan Gilbert
@ 2020-01-22  0:57       ` Xiao Yang
  0 siblings, 0 replies; 307+ messages in thread
From: Xiao Yang @ 2020-01-22  0:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, stefanha, vgoyal

On 2020/1/21 18:51, Dr. David Alan Gilbert wrote:
> * Xiao Yang (yangx.jy@cn.fujitsu.com) wrote:
>> On 2019/12/13 0:37, Dr. David Alan Gilbert (git) wrote:
>>> -	res = fuse_buf_copy(&pipe_buf, buf,
>>> -			    FUSE_BUF_FORCE_SPLICE | FUSE_BUF_SPLICE_NONBLOCK);
>>> -	if (res<   0) {
>>> -		if (res == -EAGAIN || res == -EINVAL) {
>>> -			/*
>>> -			 * Should only get EAGAIN on kernels with
>>> -			 * broken SPLICE_F_NONBLOCK support (<=
>>> -			 * 2.6.35) where this error or a short read is
>>> -			 * returned even if the pipe itself is not
>>> -			 * full
>>> -			 *
>>> -			 * EINVAL might mean that splice can't handle
>>> -			 * this combination of input and output.
>>> -			 */
>>> -			if (res == -EAGAIN)
>>> -				se->broken_splice_nonblock = 1;
>>> -
>>> -			pthread_setspecific(se->pipe_key, NULL);
>>> -			fuse_ll_pipe_free(llp);
>>> -			goto fallback;
>>> -		}
>>> -		res = -res;
>>> -		goto clear_pipe;
>>> -	}
>>> -
>>> -	if (res != 0&&   res<   len) {
>>> -		struct fuse_bufvec mem_buf = FUSE_BUFVEC_INIT(len);
>>> -		void *mbuf;
>>> -		size_t now_len = res;
>>> -		/*
>>> -		 * For regular files a short count is either
>>> -		 *  1) due to EOF, or
>>> -		 *  2) because of broken SPLICE_F_NONBLOCK (see above)
>>> -		 *
>>> -		 * For other inputs it's possible that we overflowed
>>> -		 * the pipe because of small buffer fragments.
>>> -		 */
>>> -
>>> -		res = posix_memalign(&mbuf, pagesize, len);
>>> -		if (res != 0)
>>> -			goto clear_pipe;
>>> -
>>> -		mem_buf.buf[0].mem = mbuf;
>>> -		mem_buf.off = now_len;
>>> -		res = fuse_buf_copy(&mem_buf, buf, 0);
>>> -		if (res>   0) {
>>> -			char *tmpbuf;
>>> -			size_t extra_len = res;
>>> -			/*
>>> -			 * Trickiest case: got more data.  Need to get
>>> -			 * back the data from the pipe and then fall
>>> -			 * back to regular write.
>>> -			 */
>>> -			tmpbuf = malloc(headerlen);
>>> -			if (tmpbuf == NULL) {
>>> -				free(mbuf);
>>> -				res = ENOMEM;
>>> -				goto clear_pipe;
>>> -			}
>>> -			res = read_back(llp->pipe[0], tmpbuf, headerlen);
>>> -			free(tmpbuf);
>>> -			if (res != 0) {
>>> -				free(mbuf);
>>> -				goto clear_pipe;
>>> -			}
>>> -			res = read_back(llp->pipe[0], mbuf, now_len);
>>> -			if (res != 0) {
>>> -				free(mbuf);
>>> -				goto clear_pipe;
>>> -			}
>>> -			len = now_len + extra_len;
>>> -			iov[iov_count].iov_base = mbuf;
>>> -			iov[iov_count].iov_len = len;
>>> -			iov_count++;
>>> -			res = fuse_send_msg(se, ch, iov, iov_count);
>>> -			free(mbuf);
>>> -			return res;
>>> -		}
>>> -		free(mbuf);
>>> -		res = now_len;
>>> -	}
>>> -	len = res;
>>> -	out->len = headerlen + len;
>>> -
>>> -	if (se->debug) {
>>> -		fuse_log(FUSE_LOG_DEBUG,
>>> -			"   unique: %llu, success, outsize: %i (splice)\n",
>>> -			(unsigned long long) out->unique, out->len);
>>> -	}
>>> -
>>> -	splice_flags = 0;
>>> -	if ((flags&   FUSE_BUF_SPLICE_MOVE)&&
>>> -	    (se->conn.want&   FUSE_CAP_SPLICE_MOVE))
>>> -		splice_flags |= SPLICE_F_MOVE;
>>> -
>>> -	res = splice(llp->pipe[0], NULL, ch ? ch->fd : se->fd,
>>> -		     NULL, out->len, splice_flags);
>> Hi,
>>
>> 1) In buffer.c, fuse_buf_splice() uses splice(2) to copy/move data in some
>> cases if the syscall is supported.
>> 2) One pipe needs to be passed to splice(2) so splice(2) without one pipe
>> fails and then go back to use other ways(e.g. use fuse_buf_fd_to_fd()).
>> 3) fuse_buf_copy() calls fuse_buf_splice() indirectly and this patch has
>> removed all pipes used by fuse_buf_copy().
>>
>> Is it necessary to leave the code related to splice(2)?  Is it going to be
>> used in future?
>> We have to use splice(2) by the correct CONFIG_SPLICE macro If necessary.
> Yes, I think we never set HAVE_SPLICE; so that code can go.
> I'll change this patch to remove that as well.
>
Hi Dave,

Agreed.
Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com>

Best Regards,
Xiao Yang
> Dave
>
>> Best Regards,
>> Xiao Yang
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
>
> .
>





^ permalink raw reply	[flat|nested] 307+ messages in thread

* Re: [PATCH 062/104] virtiofsd: Handle hard reboot
  2020-01-20  6:46   ` Misono Tomohiro
@ 2020-01-22 18:28     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 307+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-22 18:28 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: qemu-devel, stefanha, vgoyal

* Misono Tomohiro (misono.tomohiro@jp.fujitsu.com) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Handle a
> >   mount
> >   hard reboot (without unmount)
> >   mount
> > 
> > we get another 'init' which FUSE doesn't normally expect.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c | 16 +++++++++++++++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index 2d1d1a2e59..45125ef66a 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -2436,7 +2436,21 @@ void fuse_session_process_buf_int(struct fuse_session *se,
> >              goto reply_err;
> >          }
> >      } else if (in->opcode == FUSE_INIT || in->opcode == CUSE_INIT) {
> > -        goto reply_err;
> > +        if (fuse_lowlevel_is_virtio(se)) {
> 
> > +            /*
> > +             * TODO: This is after a hard reboot typically, we need to do
> > +             * a destroy, but we can't reply to this request yet so
> > +             * we can't use do_destroy
> > +             */
> 
> Hi,
> 
> I wonder what is the TODO actually. Is this just to provide a common
> function for both here and do_destroy() or more than that?

Yes, we really need to combine it somehow; but do_destroy is based
n responding to a request, but we don't have a normal request at this
point.

Dave

> Thanks
> Misono
> 
> > +            fuse_log(FUSE_LOG_DEBUG, "%s: reinit\n", __func__);
> > +            se->got_destroy = 1;
> > +            se->got_init = 0;
> > +            if (se->op.destroy) {
> > +                se->op.destroy(se->userdata);
> > +            }
> > +        } else {
> > +            goto reply_err;
> > +        }
> >      }
> >  
> >      err = EACCES;
> > -- 
> > 2.23.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 307+ messages in thread

end of thread, other threads:[~2020-01-22 18:29 UTC | newest]

Thread overview: 307+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-12 16:37 [PATCH 000/104] virtiofs daemon [all] Dr. David Alan Gilbert (git)
2019-12-12 16:37 ` [PATCH 001/104] virtiofsd: Pull in upstream headers Dr. David Alan Gilbert (git)
2020-01-03 11:54   ` Daniel P. Berrangé
2020-01-15 17:38   ` Philippe Mathieu-Daudé
2019-12-12 16:37 ` [PATCH 002/104] virtiofsd: Pull in kernel's fuse.h Dr. David Alan Gilbert (git)
2020-01-03 11:56   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 003/104] virtiofsd: Add auxiliary .c's Dr. David Alan Gilbert (git)
2020-01-03 11:57   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 004/104] virtiofsd: Add fuse_lowlevel.c Dr. David Alan Gilbert (git)
2020-01-03 11:58   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 005/104] virtiofsd: Add passthrough_ll Dr. David Alan Gilbert (git)
2020-01-03 12:01   ` Daniel P. Berrangé
2020-01-03 12:15     ` Dr. David Alan Gilbert
2020-01-03 12:33       ` Daniel P. Berrangé
2020-01-03 14:37         ` Dr. David Alan Gilbert
2019-12-12 16:37 ` [PATCH 006/104] virtiofsd: Trim down imported files Dr. David Alan Gilbert (git)
2020-01-03 12:02   ` Daniel P. Berrangé
2020-01-21  9:58   ` Xiao Yang
2020-01-21 10:51     ` Dr. David Alan Gilbert
2020-01-22  0:57       ` Xiao Yang
2019-12-12 16:37 ` [PATCH 007/104] virtiofsd: Format imported files to qemu style Dr. David Alan Gilbert (git)
2020-01-03 12:04   ` Daniel P. Berrangé
2020-01-09 12:21   ` Aleksandar Markovic
2019-12-12 16:37 ` [PATCH 008/104] virtiofsd: remove mountpoint dummy argument Dr. David Alan Gilbert (git)
2020-01-03 12:12   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 009/104] virtiofsd: remove unused notify reply support Dr. David Alan Gilbert (git)
2020-01-03 12:14   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 010/104] virtiofsd: Fix fuse_daemonize ignored return values Dr. David Alan Gilbert (git)
2020-01-03 12:18   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 011/104] virtiofsd: Fix common header and define for QEMU builds Dr. David Alan Gilbert (git)
2020-01-03 12:22   ` Daniel P. Berrangé
2020-01-06 16:29     ` Dr. David Alan Gilbert
2019-12-12 16:37 ` [PATCH 012/104] virtiofsd: Trim out compatibility code Dr. David Alan Gilbert (git)
2020-01-03 12:26   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 013/104] virtiofsd: Make fsync work even if only inode is passed in Dr. David Alan Gilbert (git)
2020-01-03 15:13   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 014/104] virtiofsd: Add options for virtio Dr. David Alan Gilbert (git)
2020-01-03 15:18   ` Daniel P. Berrangé
2020-01-10 16:01     ` Dr. David Alan Gilbert
2019-12-12 16:37 ` [PATCH 015/104] virtiofsd: add -o source=PATH to help output Dr. David Alan Gilbert (git)
2020-01-03 15:18   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting Dr. David Alan Gilbert (git)
2020-01-03 15:21   ` Daniel P. Berrangé
2020-01-21  6:57   ` Misono Tomohiro
2020-01-21 11:38     ` Dr. David Alan Gilbert
2019-12-12 16:37 ` [PATCH 017/104] virtiofsd: Start wiring up vhost-user Dr. David Alan Gilbert (git)
2020-01-03 15:25   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 018/104] virtiofsd: Add main virtio loop Dr. David Alan Gilbert (git)
2020-01-03 15:26   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 019/104] virtiofsd: get/set features callbacks Dr. David Alan Gilbert (git)
2020-01-03 15:26   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 020/104] virtiofsd: Start queue threads Dr. David Alan Gilbert (git)
2020-01-03 15:27   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 021/104] virtiofsd: Poll kick_fd for queue Dr. David Alan Gilbert (git)
2020-01-03 15:33   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 022/104] virtiofsd: Start reading commands from queue Dr. David Alan Gilbert (git)
2020-01-03 15:34   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 023/104] virtiofsd: Send replies to messages Dr. David Alan Gilbert (git)
2020-01-03 15:36   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 024/104] virtiofsd: Keep track of replies Dr. David Alan Gilbert (git)
2020-01-03 15:41   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 025/104] virtiofsd: Add Makefile wiring for virtiofsd contrib Dr. David Alan Gilbert (git)
2019-12-13 16:02   ` Liam Merwick
2019-12-13 16:56     ` Dr. David Alan Gilbert
2020-01-03 15:41   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 026/104] virtiofsd: Fast path for virtio read Dr. David Alan Gilbert (git)
2020-01-17 18:54   ` Masayoshi Mizuma
2020-01-20 12:32     ` Dr. David Alan Gilbert
2019-12-12 16:37 ` [PATCH 027/104] virtiofsd: add --fd=FDNUM fd passing option Dr. David Alan Gilbert (git)
2020-01-06 14:12   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 028/104] virtiofsd: make -f (foreground) the default Dr. David Alan Gilbert (git)
2020-01-06 14:19   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 029/104] virtiofsd: add vhost-user.json file Dr. David Alan Gilbert (git)
2020-01-06 14:19   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 030/104] virtiofsd: add --print-capabilities option Dr. David Alan Gilbert (git)
2020-01-06 14:20   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 031/104] virtiofs: Add maintainers entry Dr. David Alan Gilbert (git)
2020-01-06 14:21   ` Daniel P. Berrangé
2020-01-15 17:19   ` Philippe Mathieu-Daudé
2019-12-12 16:37 ` [PATCH 032/104] virtiofsd: passthrough_ll: create new files in caller's context Dr. David Alan Gilbert (git)
2020-01-06 14:30   ` Daniel P. Berrangé
2020-01-06 19:00     ` Dr. David Alan Gilbert
2020-01-06 19:08       ` Dr. David Alan Gilbert
2020-01-07  9:22         ` Daniel P. Berrangé
2020-01-10 13:05           ` Dr. David Alan Gilbert
2019-12-12 16:37 ` [PATCH 033/104] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection Dr. David Alan Gilbert (git)
2020-01-17 21:44   ` Masayoshi Mizuma
2019-12-12 16:37 ` [PATCH 034/104] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers Dr. David Alan Gilbert (git)
2020-01-17 21:45   ` Masayoshi Mizuma
2019-12-12 16:37 ` [PATCH 035/104] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers Dr. David Alan Gilbert (git)
2020-01-17 13:58   ` Philippe Mathieu-Daudé
2019-12-12 16:37 ` [PATCH 036/104] virtiofsd: passthrough_ll: add fd_map to hide file descriptors Dr. David Alan Gilbert (git)
2020-01-17 22:32   ` Masayoshi Mizuma
2019-12-12 16:37 ` [PATCH 037/104] virtiofsd: passthrough_ll: add fallback for racy ops Dr. David Alan Gilbert (git)
2020-01-18 16:22   ` Masayoshi Mizuma
2020-01-20 13:26     ` Dr. David Alan Gilbert
2019-12-12 16:37 ` [PATCH 038/104] virtiofsd: validate path components Dr. David Alan Gilbert (git)
2020-01-06 14:32   ` Daniel P. Berrangé
2019-12-12 16:37 ` [PATCH 039/104] virtiofsd: Plumb fuse_bufvec through to do_write_buf Dr. David Alan Gilbert (git)
2020-01-17 21:01   ` Masayoshi Mizuma
2019-12-12 16:38 ` [PATCH 040/104] virtiofsd: Pass write iov's all the way through Dr. David Alan Gilbert (git)
2020-01-19  8:08   ` Xiao Yang
2020-01-20  8:24     ` Philippe Mathieu-Daudé
2020-01-20 13:28       ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 041/104] virtiofsd: add fuse_mbuf_iter API Dr. David Alan Gilbert (git)
2020-01-16 14:17   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 042/104] virtiofsd: validate input buffer sizes in do_write_buf() Dr. David Alan Gilbert (git)
2020-01-16 14:19   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 043/104] virtiofsd: check input buffer size in fuse_lowlevel.c ops Dr. David Alan Gilbert (git)
2020-01-16 14:25   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 044/104] virtiofsd: prevent ".." escape in lo_do_lookup() Dr. David Alan Gilbert (git)
2020-01-16 14:33   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 045/104] virtiofsd: prevent ".." escape in lo_do_readdir() Dr. David Alan Gilbert (git)
2020-01-16 14:35   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 046/104] virtiofsd: use /proc/self/fd/ O_PATH file descriptor Dr. David Alan Gilbert (git)
2020-01-15 18:09   ` Philippe Mathieu-Daudé
2020-01-17  9:42     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 047/104] virtiofsd: sandbox mount namespace Dr. David Alan Gilbert (git)
2020-01-06 14:36   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 048/104] virtiofsd: move to an empty network namespace Dr. David Alan Gilbert (git)
2020-01-06 14:37   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 049/104] virtiofsd: move to a new pid namespace Dr. David Alan Gilbert (git)
2020-01-06 14:40   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 050/104] virtiofsd: add seccomp whitelist Dr. David Alan Gilbert (git)
2020-01-06 14:56   ` Daniel P. Berrangé
2020-01-06 18:54     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 051/104] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV Dr. David Alan Gilbert (git)
2020-01-15 12:06   ` Misono Tomohiro
2020-01-15 14:34     ` Dr. David Alan Gilbert
2020-01-16 14:37   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 052/104] virtiofsd: cap-ng helpers Dr. David Alan Gilbert (git)
2020-01-06 14:58   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 053/104] virtiofsd: Drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
2020-01-16  4:41   ` Misono Tomohiro
2020-01-16 15:21   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 054/104] virtiofsd: set maximum RLIMIT_NOFILE limit Dr. David Alan Gilbert (git)
2020-01-06 15:00   ` Daniel P. Berrangé
2020-01-15 17:09   ` Philippe Mathieu-Daudé
2020-01-15 17:38     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 055/104] virtiofsd: fix libfuse information leaks Dr. David Alan Gilbert (git)
2020-01-06 15:01   ` Daniel P. Berrangé
2020-01-15 17:07   ` Philippe Mathieu-Daudé
2019-12-12 16:38 ` [PATCH 056/104] virtiofsd: add security guide document Dr. David Alan Gilbert (git)
2020-01-06 15:03   ` Daniel P. Berrangé
2020-01-06 17:53     ` Dr. David Alan Gilbert
2020-01-07 10:05       ` Daniel P. Berrangé
2020-01-09 17:02         ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 057/104] virtiofsd: add --syslog command-line option Dr. David Alan Gilbert (git)
2020-01-06 15:05   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 058/104] virtiofsd: print log only when priority is high enough Dr. David Alan Gilbert (git)
2020-01-06 15:10   ` Daniel P. Berrangé
2020-01-06 17:05     ` Dr. David Alan Gilbert
2020-01-06 17:20       ` Daniel P. Berrangé
2020-01-06 17:27         ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 059/104] virtiofsd: Add ID to the log with FUSE_LOG_DEBUG level Dr. David Alan Gilbert (git)
2020-01-06 15:18   ` Daniel P. Berrangé
2020-01-06 17:47     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 060/104] virtiofsd: Add timestamp " Dr. David Alan Gilbert (git)
2020-01-07 11:11   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 061/104] virtiofsd: Handle reinit Dr. David Alan Gilbert (git)
2020-01-07 11:12   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 062/104] virtiofsd: Handle hard reboot Dr. David Alan Gilbert (git)
2020-01-07 11:14   ` Daniel P. Berrangé
2020-01-10 15:43     ` Dr. David Alan Gilbert
2020-01-20  6:46   ` Misono Tomohiro
2020-01-22 18:28     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 063/104] virtiofsd: Kill threads when queues are stopped Dr. David Alan Gilbert (git)
2020-01-07 11:16   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 064/104] vhost-user: Print unexpected slave message types Dr. David Alan Gilbert (git)
2020-01-07 11:18   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 065/104] contrib/libvhost-user: Protect slave fd with mutex Dr. David Alan Gilbert (git)
2020-01-07 11:19   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 066/104] virtiofsd: passthrough_ll: add renameat2 support Dr. David Alan Gilbert (git)
2020-01-07 11:21   ` Daniel P. Berrangé
2020-01-10  9:52     ` Dr. David Alan Gilbert
2020-01-13 20:06       ` Dr. David Alan Gilbert
2020-01-14  8:29         ` Daniel P. Berrangé
2020-01-14 10:07           ` Dr. David Alan Gilbert
2020-01-14 10:12             ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 067/104] virtiofsd: passthrough_ll: disable readdirplus on cache=never Dr. David Alan Gilbert (git)
2020-01-07 11:22   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 068/104] virtiofsd: passthrough_ll: control readdirplus Dr. David Alan Gilbert (git)
2020-01-07 11:23   ` Daniel P. Berrangé
2020-01-10 15:04     ` Dr. David Alan Gilbert
2020-01-10 15:13       ` Miklos Szeredi
2020-01-10 15:18         ` Daniel P. Berrangé
2020-01-10 15:30           ` Miklos Szeredi
2020-01-10 15:40             ` Vivek Goyal
2020-01-10 16:00               ` Miklos Szeredi
2019-12-12 16:38 ` [PATCH 069/104] virtiofsd: rename unref_inode() to unref_inode_lolocked() Dr. David Alan Gilbert (git)
2020-01-07 11:23   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 070/104] virtiofsd: fail when parent inode isn't known in lo_do_lookup() Dr. David Alan Gilbert (git)
2020-01-16  7:17   ` Misono Tomohiro
2020-01-20 10:08   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 071/104] virtiofsd: extract root inode init into setup_root() Dr. David Alan Gilbert (git)
2020-01-16  7:20   ` Misono Tomohiro
2020-01-16 15:51     ` Dr. David Alan Gilbert
2020-01-20 10:09   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 072/104] virtiofsd: passthrough_ll: fix refcounting on remove/rename Dr. David Alan Gilbert (git)
2020-01-16 11:56   ` Misono Tomohiro
2020-01-16 16:45     ` Dr. David Alan Gilbert
2020-01-17 10:19       ` Miklos Szeredi
2020-01-17 11:37         ` Dr. David Alan Gilbert
2020-01-17 18:43         ` Dr. David Alan Gilbert
2020-01-20 10:17   ` Sergio Lopez
2020-01-20 10:56     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 073/104] virtiofsd: passthrough_ll: clean up cache related options Dr. David Alan Gilbert (git)
2020-01-07 11:24   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 074/104] virtiofsd: passthrough_ll: use hashtable Dr. David Alan Gilbert (git)
2020-01-07 11:28   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 075/104] virtiofsd: Clean up inodes on destroy Dr. David Alan Gilbert (git)
2020-01-07 11:29   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 076/104] virtiofsd: support nanosecond resolution for file timestamp Dr. David Alan Gilbert (git)
2020-01-07 11:30   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 077/104] virtiofsd: fix error handling in main() Dr. David Alan Gilbert (git)
2020-01-07 11:30   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 078/104] virtiofsd: cleanup allocated resource in se Dr. David Alan Gilbert (git)
2020-01-07 11:34   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 079/104] virtiofsd: fix memory leak on lo.source Dr. David Alan Gilbert (git)
2020-01-07 11:37   ` Daniel P. Berrangé
2020-01-09 17:38     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 080/104] virtiofsd: add helper for lo_data cleanup Dr. David Alan Gilbert (git)
2020-01-07 11:40   ` Daniel P. Berrangé
2020-01-09 17:41     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 081/104] virtiofsd: Prevent multiply running with same vhost_user_socket Dr. David Alan Gilbert (git)
2020-01-07 11:43   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 082/104] virtiofsd: enable PARALLEL_DIROPS during INIT Dr. David Alan Gilbert (git)
2020-01-07 11:44   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 083/104] virtiofsd: fix incorrect error handling in lo_do_lookup Dr. David Alan Gilbert (git)
2020-01-07 11:45   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 084/104] Virtiofsd: fix memory leak on fuse queueinfo Dr. David Alan Gilbert (git)
2020-01-15 11:20   ` Misono Tomohiro
2020-01-15 16:57     ` Dr. David Alan Gilbert
2020-01-16  0:54       ` misono.tomohiro
2020-01-16 12:19         ` Dr. David Alan Gilbert
2020-01-20 10:24   ` Sergio Lopez
2020-01-20 10:54     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 085/104] virtiofsd: Support remote posix locks Dr. David Alan Gilbert (git)
2020-01-15 23:38   ` Masayoshi Mizuma
2020-01-16 13:26     ` Vivek Goyal
2020-01-17  9:27       ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 086/104] virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy() Dr. David Alan Gilbert (git)
2020-01-07 12:01   ` Daniel P. Berrangé
2020-01-07 13:24     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 087/104] virtiofsd: prevent fv_queue_thread() vs virtio_loop() races Dr. David Alan Gilbert (git)
2020-01-07 12:02   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 088/104] virtiofsd: make lo_release() atomic Dr. David Alan Gilbert (git)
2020-01-07 12:03   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 089/104] virtiofsd: prevent races with lo_dirp_put() Dr. David Alan Gilbert (git)
2020-01-17 13:52   ` Philippe Mathieu-Daudé
2019-12-12 16:38 ` [PATCH 090/104] virtiofsd: rename inode->refcount to inode->nlookup Dr. David Alan Gilbert (git)
2020-01-17 13:54   ` Philippe Mathieu-Daudé
2019-12-12 16:38 ` [PATCH 091/104] libvhost-user: Fix some memtable remap cases Dr. David Alan Gilbert (git)
2020-01-17 13:58   ` Marc-André Lureau
2020-01-20 15:50     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 092/104] virtiofsd: add man page Dr. David Alan Gilbert (git)
2019-12-13 14:33   ` Liam Merwick
2019-12-13 15:26     ` Dr. David Alan Gilbert
2020-01-07 12:13   ` Daniel P. Berrangé
2020-01-09 20:02     ` Dr. David Alan Gilbert
2020-01-10  9:30       ` Daniel P. Berrangé
2020-01-10 11:06         ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 093/104] virtiofsd: introduce inode refcount to prevent use-after-free Dr. David Alan Gilbert (git)
2020-01-16 12:25   ` Misono Tomohiro
2020-01-16 17:21     ` Stefan Hajnoczi
2020-01-16 17:42       ` Dr. David Alan Gilbert
2020-01-17  0:47         ` misono.tomohiro
2020-01-20 10:28   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 094/104] virtiofsd: do not always set FUSE_FLOCK_LOCKS Dr. David Alan Gilbert (git)
2020-01-17  8:50   ` Misono Tomohiro
2020-01-20 10:31   ` Sergio Lopez
2019-12-12 16:38 ` [PATCH 095/104] virtiofsd: convert more fprintf and perror to use fuse log infra Dr. David Alan Gilbert (git)
2020-01-07 12:16   ` Daniel P. Berrangé
2020-01-16 12:29   ` Misono Tomohiro
2020-01-16 16:32     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 096/104] virtiofsd: Reset O_DIRECT flag during file open Dr. David Alan Gilbert (git)
2020-01-07 12:17   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 097/104] virtiofsd: Fix data corruption with O_APPEND wirte in writeback mode Dr. David Alan Gilbert (git)
2020-01-07 12:20   ` Daniel P. Berrangé
2020-01-07 13:27     ` Dr. David Alan Gilbert
2019-12-12 16:38 ` [PATCH 098/104] virtiofsd: add definition of fuse_buf_writev() Dr. David Alan Gilbert (git)
2020-01-07 12:21   ` Daniel P. Berrangé
2019-12-12 16:38 ` [PATCH 099/104] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance Dr. David Alan Gilbert (git)
2020-01-07 12:23   ` Daniel P. Berrangé
2020-01-10 13:15     ` Dr. David Alan Gilbert
2019-12-12 16:39 ` [PATCH 100/104] virtiofsd: process requests in a thread pool Dr. David Alan Gilbert (git)
2020-01-20 12:54   ` Misono Tomohiro
2019-12-12 16:39 ` [PATCH 101/104] virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races Dr. David Alan Gilbert (git)
2020-01-15 23:05   ` Masayoshi Mizuma
2020-01-16 12:24     ` Dr. David Alan Gilbert
2020-01-17 13:40   ` Philippe Mathieu-Daudé
2020-01-17 15:28     ` Dr. David Alan Gilbert
2020-01-17 15:30       ` Philippe Mathieu-Daudé
2019-12-12 16:39 ` [PATCH 102/104] virtiofsd: fix lo_destroy() resource leaks Dr. David Alan Gilbert (git)
2020-01-17 13:43   ` Philippe Mathieu-Daudé
2019-12-12 16:39 ` [PATCH 103/104] virtiofsd: add --thread-pool-size=NUM option Dr. David Alan Gilbert (git)
2020-01-07 12:25   ` Daniel P. Berrangé
2020-01-17 13:35   ` Philippe Mathieu-Daudé
2019-12-12 16:39 ` [PATCH 104/104] virtiofsd: Convert lo_destroy to take the lo->mutex lock itself Dr. David Alan Gilbert (git)
2020-01-17 13:33   ` Philippe Mathieu-Daudé
2019-12-12 18:21 ` [PATCH 000/104] virtiofs daemon [all] no-reply
2020-01-17 11:32 ` Dr. David Alan Gilbert
2020-01-17 13:32 ` [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename() Philippe Mathieu-Daudé
2020-01-19  8:35   ` Xiao Yang
2020-01-20  8:27     ` Philippe Mathieu-Daudé
2020-01-20 18:52   ` Dr. David Alan Gilbert
2020-01-20 18:55     ` Philippe Mathieu-Daudé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.