All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/25] virtiofs daemon (security)
@ 2019-10-24 11:26 Dr. David Alan Gilbert (git)
  2019-10-24 11:26 ` [PATCH 01/25] virtiofsd: passthrough_ll: create new files in caller's context Dr. David Alan Gilbert (git)
                   ` (24 more replies)
  0 siblings, 25 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:26 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is the 2nd set for the virtiofsd - this set sits
on top of the 'base' set recently posted.  Most of the changes
in the set are security related (with a couple more tagging
along because they were hard to separate).

  Stefan's main chunks make the daemon check the input from
the guest;  the upstream fuse code is much more trusting
about what it gets from the kernel; here the security
equation is inverted and the daemon is more trusted.
  In adition the daemon now gets sandboxing/namespacing/seccomp
limited to stop anything escaping.

  With this set virtiofsd is reasonably safe to use; we've
got some bug fixes (including some threading fixes) to send
as well though.

Dave

Dr. David Alan Gilbert (2):
  virtiofsd: Plumb fuse_bufvec through to do_write_buf
  virtiofsd: Pass write iov's all the way through

Eryu Guan (1):
  virtiofsd: print log only when priority is high enough

Miklos Szeredi (1):
  virtiofsd: passthrough_ll: add fallback for racy ops

Stefan Hajnoczi (18):
  virtiofsd: passthrough_ll: add lo_map for ino/fh indirection
  virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers
  virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers
  virtiofsd: passthrough_ll: add fd_map to hide file descriptors
  virtiofsd: validate path components
  virtiofsd: add fuse_mbuf_iter API
  virtiofsd: validate input buffer sizes in do_write_buf()
  virtiofsd: check input buffer size in fuse_lowlevel.c ops
  virtiofsd: prevent ".." escape in lo_do_lookup()
  virtiofsd: prevent ".." escape in lo_do_readdir()
  virtiofsd: use /proc/self/fd/ O_PATH file descriptor
  virtiofsd: sandbox mount namespace
  virtiofsd: move to an empty network namespace
  virtiofsd: move to a new pid namespace
  virtiofsd: add seccomp whitelist
  virtiofsd: set maximum RLIMIT_NOFILE limit
  virtiofsd: add security guide document
  virtiofsd: add --syslog command-line option

Vivek Goyal (3):
  virtiofsd: passthrough_ll: create new files in caller's context
  virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV
  virtiofsd: Drop CAP_FSETID if client asked for it

 contrib/virtiofsd/Makefile.objs    |    7 +-
 contrib/virtiofsd/buffer.c         |   28 +
 contrib/virtiofsd/fuse_common.h    |   53 +-
 contrib/virtiofsd/fuse_i.h         |    2 +-
 contrib/virtiofsd/fuse_log.c       |    4 +
 contrib/virtiofsd/fuse_lowlevel.c  |  779 +++++++++++-----
 contrib/virtiofsd/fuse_lowlevel.h  |    2 +
 contrib/virtiofsd/fuse_virtio.c    |   72 +-
 contrib/virtiofsd/helper.c         |   11 +-
 contrib/virtiofsd/passthrough_ll.c | 1317 ++++++++++++++++++++++++----
 contrib/virtiofsd/seccomp.c        |  146 +++
 contrib/virtiofsd/seccomp.h        |   16 +
 contrib/virtiofsd/security.rst     |  108 +++
 13 files changed, 2152 insertions(+), 393 deletions(-)
 create mode 100644 contrib/virtiofsd/seccomp.c
 create mode 100644 contrib/virtiofsd/seccomp.h
 create mode 100644 contrib/virtiofsd/security.rst

-- 
2.23.0



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/25] virtiofsd: passthrough_ll: create new files in caller's context
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
@ 2019-10-24 11:26 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:26 ` [PATCH 02/25] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:26 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Vivek Goyal <vgoyal@redhat.com>

We need to create files in callers context. Otherwise afer creating file,
caller himself might not be able to do file operations on that file.

Changed effective uid/gid to caller's uid/gid, create file and then
switch backto uid/gid 0.

Use syscall(setresuid, ...) otherwise glibc does some magic to change EUID
in all threads, which is not what we want.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 75 ++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 4 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 2a1a880224..2de35d4d3d 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -49,6 +49,7 @@
 #include <inttypes.h>
 #include <pthread.h>
 #include <sys/file.h>
+#include <sys/syscall.h>
 #include <sys/xattr.h>
 
 #include "passthrough_helpers.h"
@@ -78,6 +79,11 @@ struct lo_inode {
 	uint64_t refcount; /* protected by lo->mutex */
 };
 
+struct lo_cred {
+	uid_t euid;
+	gid_t egid;
+};
+
 enum {
 	CACHE_NEVER,
 	CACHE_NORMAL,
@@ -376,6 +382,51 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
 		fuse_reply_entry(req, &e);
 }
 
+/*
+ * Change to uid/gid of caller so that file is created with
+ * ownership of caller.
+ * TODO: What about selinux context?
+ */
+static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
+{
+	int res;
+
+	old->euid = geteuid();
+	old->egid = getegid();
+
+	res = syscall(SYS_setresgid, -1,fuse_req_ctx(req)->gid, -1);
+	if (res == -1)
+		return errno;
+
+	res = syscall(SYS_setresuid, -1, fuse_req_ctx(req)->uid, -1);
+	if (res == -1) {
+		int errno_save = errno;
+
+		syscall(SYS_setresgid, -1, old->egid, -1);
+		return errno_save;
+	}
+
+	return 0;
+}
+
+/* Regain Privileges */
+static void lo_restore_cred(struct lo_cred *old)
+{
+	int res;
+
+	res = syscall(SYS_setresuid, -1, old->euid, -1);
+	if (res == -1) {
+		fuse_log(FUSE_LOG_ERR, "seteuid(%u): %m\n", old->euid);
+		exit(1);
+	}
+
+	res = syscall(SYS_setresgid, -1, old->egid, -1);
+	if (res == -1) {
+		fuse_log(FUSE_LOG_ERR, "setegid(%u): %m\n", old->egid);
+		exit(1);
+	}
+}
+
 static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
 			     const char *name, mode_t mode, dev_t rdev,
 			     const char *link)
@@ -384,12 +435,20 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
 	int saverr;
 	struct lo_inode *dir = lo_inode(req, parent);
 	struct fuse_entry_param e;
+	struct lo_cred old = {};
 
 	saverr = ENOMEM;
 
+	saverr = lo_change_cred(req, &old);
+	if (saverr)
+		goto out;
+
 	res = mknod_wrapper(dir->fd, name, link, mode, rdev);
 
 	saverr = errno;
+
+	lo_restore_cred(&old);
+
 	if (res == -1)
 		goto out;
 
@@ -768,23 +827,31 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
 	struct lo_data *lo = lo_data(req);
 	struct fuse_entry_param e;
 	int err;
+	struct lo_cred old = {};
 
 	if (lo_debug(req))
 		fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
 			parent, name);
 
+	err = lo_change_cred(req, &old);
+	if (err)
+		goto out;
+
 	fd = openat(lo_fd(req, parent), name,
 		    (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
-	if (fd == -1)
-		return (void) fuse_reply_err(req, errno);
+	err = fd == -1 ? errno : 0;
+	lo_restore_cred(&old);
 
-	fi->fh = fd;
+	if (!err) {
+		fi->fh = fd;
+		err = lo_do_lookup(req, parent, name, &e);
+	}
 	if (lo->cache == CACHE_NEVER)
 		fi->direct_io = 1;
 	else if (lo->cache == CACHE_ALWAYS)
 		fi->keep_cache = 1;
 
-	err = lo_do_lookup(req, parent, name, &e);
+out:
 	if (err)
 		fuse_reply_err(req, err);
 	else
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/25] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
  2019-10-24 11:26 ` [PATCH 01/25] virtiofsd: passthrough_ll: create new files in caller's context Dr. David Alan Gilbert (git)
@ 2019-10-24 11:26 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:26 ` [PATCH 03/25] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:26 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

A layer of indirection is needed because passthrough_ll cannot expose
pointers or file descriptor numbers to untrusted clients.  Malicious
clients could send invalid pointers or file descriptors in order to
crash or exploit the file system daemon.

lo_map provides an integer key->value mapping.  This will be used for
ino and fh fields in the patches that follow.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 113 +++++++++++++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 2de35d4d3d..3093d1171e 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -69,6 +69,21 @@ struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct \
 			((sizeof(fuse_ino_t) >= sizeof(uintptr_t)) ? 1 : -1); };
 #endif
 
+struct lo_map_elem {
+	union {
+		/* Element values will go here... */
+		ssize_t freelist;
+	};
+	bool in_use;
+};
+
+/* Maps FUSE fh or ino values to internal objects */
+struct lo_map {
+	struct lo_map_elem *elems;
+	size_t nelems;
+	ssize_t freelist;
+};
+
 struct lo_inode {
 	struct lo_inode *next; /* protected by lo->mutex */
 	struct lo_inode *prev; /* protected by lo->mutex */
@@ -137,6 +152,104 @@ static struct lo_data *lo_data(fuse_req_t req)
 	return (struct lo_data *) fuse_req_userdata(req);
 }
 
+static void lo_map_init(struct lo_map *map)
+{
+	map->elems = NULL;
+	map->nelems = 0;
+	map->freelist = -1;
+}
+
+static void lo_map_destroy(struct lo_map *map)
+{
+	free(map->elems);
+}
+
+static int lo_map_grow(struct lo_map *map, size_t new_nelems)
+{
+	struct lo_map_elem *new_elems;
+	size_t i;
+
+	if (new_nelems <= map->nelems)
+		return 1;
+
+	new_elems = realloc(map->elems, sizeof(map->elems[0]) * new_nelems);
+	if (!new_elems)
+		return 0;
+
+	for (i = map->nelems; i < new_nelems; i++) {
+		new_elems[i].freelist = i + 1;
+		new_elems[i].in_use = false;
+	}
+	new_elems[new_nelems - 1].freelist = -1;
+
+	map->elems = new_elems;
+	map->freelist = map->nelems;
+	map->nelems = new_nelems;
+	return 1;
+}
+
+static struct lo_map_elem *lo_map_alloc_elem(struct lo_map *map)
+{
+	struct lo_map_elem *elem;
+
+	if (map->freelist == -1 && !lo_map_grow(map, map->nelems + 256))
+		return NULL;
+
+	elem = &map->elems[map->freelist];
+	map->freelist = elem->freelist;
+
+	elem->in_use = true;
+
+	return elem;
+}
+
+static struct lo_map_elem *lo_map_reserve(struct lo_map *map, size_t key)
+{
+	ssize_t *prev;
+
+	if (!lo_map_grow(map, key + 1))
+		return NULL;
+
+	for (prev = &map->freelist;
+	     *prev != -1;
+	     prev = &map->elems[*prev].freelist) {
+		if (*prev == key) {
+			struct lo_map_elem *elem = &map->elems[key];
+
+			*prev = elem->freelist;
+			elem->in_use = true;
+			return elem;
+		}
+	}
+	return NULL;
+}
+
+static struct lo_map_elem *lo_map_get(struct lo_map *map, size_t key)
+{
+	if (key >= map->nelems)
+		return NULL;
+	if (!map->elems[key].in_use)
+		return NULL;
+	return &map->elems[key];
+}
+
+static void lo_map_remove(struct lo_map *map, size_t key)
+{
+	struct lo_map_elem *elem;
+
+	if (key >= map->nelems)
+		return;
+
+	elem = &map->elems[key];
+	if (!elem->in_use)
+		return;
+
+	elem->in_use = false;
+
+	elem->freelist = map->freelist;
+	map->freelist = key;
+}
+
 static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 {
 	if (ino == FUSE_ROOT_ID)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/25] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
  2019-10-24 11:26 ` [PATCH 01/25] virtiofsd: passthrough_ll: create new files in caller's context Dr. David Alan Gilbert (git)
  2019-10-24 11:26 ` [PATCH 02/25] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection Dr. David Alan Gilbert (git)
@ 2019-10-24 11:26 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:26 ` [PATCH 04/25] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:26 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Do not expose lo_inode pointers to clients.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 125 ++++++++++++++++++++++++-----
 1 file changed, 105 insertions(+), 20 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 3093d1171e..f718c951f7 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -56,8 +56,8 @@
 
 #define HAVE_POSIX_FALLOCATE 1
 
-/* We are re-using pointers to our `struct lo_inode` and `struct
-   lo_dirp` elements as inodes. This means that we must be able to
+/* We are re-using pointers to our `struct lo_dirp`
+   elements as inodes. This means that we must be able to
    store uintptr_t values in a fuse_ino_t variable. The following
    incantation checks this condition at compile time. */
 #if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && !defined __cplusplus
@@ -71,7 +71,7 @@ struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct \
 
 struct lo_map_elem {
 	union {
-		/* Element values will go here... */
+		struct lo_inode *inode;
 		ssize_t freelist;
 	};
 	bool in_use;
@@ -92,6 +92,7 @@ struct lo_inode {
 	ino_t ino;
 	dev_t dev;
 	uint64_t refcount; /* protected by lo->mutex */
+	fuse_ino_t fuse_ino;
 };
 
 struct lo_cred {
@@ -116,6 +117,7 @@ struct lo_data {
 	int cache;
 	int timeout_set;
 	struct lo_inode root; /* protected by lo->mutex */
+	struct lo_map ino_map; /* protected by lo->mutex */
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -250,17 +252,38 @@ static void lo_map_remove(struct lo_map *map, size_t key)
 	map->freelist = key;
 }
 
+/* Assumes lo->mutex is held */
+static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
+{
+	struct lo_map_elem *elem;
+
+	elem = lo_map_alloc_elem(&lo_data(req)->ino_map);
+	if (!elem)
+		return -1;
+
+	elem->inode = inode;
+	return elem - lo_data(req)->ino_map.elems;
+}
+
 static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 {
-	if (ino == FUSE_ROOT_ID)
-		return &lo_data(req)->root;
-	else
-		return (struct lo_inode *) (uintptr_t) ino;
+	struct lo_data *lo = lo_data(req);
+	struct lo_map_elem *elem;
+
+	pthread_mutex_lock(&lo->mutex);
+	elem = lo_map_get(&lo->ino_map, ino);
+	pthread_mutex_unlock(&lo->mutex);
+
+	if (!elem)
+		return NULL;
+
+	return elem->inode;
 }
 
 static int lo_fd(fuse_req_t req, fuse_ino_t ino)
 {
-	return lo_inode(req, ino)->fd;
+	struct lo_inode *inode = lo_inode(req, ino);
+	return inode ? inode->fd : -1;
 }
 
 static bool lo_debug(fuse_req_t req)
@@ -330,10 +353,18 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 {
 	int saverr;
 	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
-	int ifd = inode->fd;
+	struct lo_inode *inode;
+	int ifd;
 	int res;
 
+	inode = lo_inode(req, ino);
+	if (!inode) {
+		fuse_reply_err(req, EBADF);
+		return;
+	}
+
+	ifd = inode->fd;
+
 	if (valid & FUSE_SET_ATTR_MODE) {
 		if (fi) {
 			res = fchmod(fi->fh, attr->st_mode);
@@ -456,6 +487,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 		inode->dev = e->attr.st_dev;
 
 		pthread_mutex_lock(&lo->mutex);
+		inode->fuse_ino = lo_add_inode_mapping(req, inode);
 		prev = &lo->root;
 		next = prev->next;
 		next->prev = inode;
@@ -464,7 +496,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 		prev->next = inode;
 		pthread_mutex_unlock(&lo->mutex);
 	}
-	e->ino = (uintptr_t) inode;
+	e->ino = inode->fuse_ino;
 
 	if (lo_debug(req))
 		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
@@ -546,10 +578,16 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
 {
 	int res;
 	int saverr;
-	struct lo_inode *dir = lo_inode(req, parent);
+	struct lo_inode *dir;
 	struct fuse_entry_param e;
 	struct lo_cred old = {};
 
+	dir = lo_inode(req, parent);
+	if (!dir) {
+		fuse_reply_err(req, EBADF);
+		return;
+	}
+
 	saverr = ENOMEM;
 
 	saverr = lo_change_cred(req, &old);
@@ -623,10 +661,16 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
 {
 	int res;
 	struct lo_data *lo = lo_data(req);
-	struct lo_inode *inode = lo_inode(req, ino);
+	struct lo_inode *inode;
 	struct fuse_entry_param e;
 	int saverr;
 
+	inode = lo_inode(req, ino);
+	if (!inode) {
+		fuse_reply_err(req, EBADF);
+		return;
+	}
+
 	memset(&e, 0, sizeof(struct fuse_entry_param));
 	e.attr_timeout = lo->timeout;
 	e.entry_timeout = lo->timeout;
@@ -642,7 +686,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
 	pthread_mutex_lock(&lo->mutex);
 	inode->refcount++;
 	pthread_mutex_unlock(&lo->mutex);
-	e.ino = (uintptr_t) inode;
+	e.ino = inode->fuse_ino;
 
 	if (lo_debug(req))
 		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
@@ -708,10 +752,10 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
 		next->prev = prev;
 		prev->next = next;
 
+		lo_map_remove(&lo->ino_map, inode->fuse_ino);
 		pthread_mutex_unlock(&lo->mutex);
 		close(inode->fd);
 		free(inode);
-
 	} else {
 		pthread_mutex_unlock(&lo->mutex);
 	}
@@ -720,7 +764,11 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
 static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
 {
 	struct lo_data *lo = lo_data(req);
-	struct lo_inode *inode = lo_inode(req, ino);
+	struct lo_inode *inode;
+
+	inode = lo_inode(req, ino);
+	if (!inode)
+		return;
 
 	if (lo_debug(req)) {
 		fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
@@ -1161,10 +1209,16 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 {
 	char *value = NULL;
 	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
+	struct lo_inode *inode;
 	ssize_t ret;
 	int saverr;
 
+	inode = lo_inode(req, ino);
+	if (!inode) {
+		fuse_reply_err(req, EBADF);
+		return;
+	}
+
 	saverr = ENOSYS;
 	if (!lo_data(req)->xattr)
 		goto out;
@@ -1217,10 +1271,16 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 {
 	char *value = NULL;
 	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
+	struct lo_inode *inode;
 	ssize_t ret;
 	int saverr;
 
+	inode = lo_inode(req, ino);
+	if (!inode) {
+		fuse_reply_err(req, EBADF);
+		return;
+	}
+
 	saverr = ENOSYS;
 	if (!lo_data(req)->xattr)
 		goto out;
@@ -1273,10 +1333,16 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 			const char *value, size_t size, int flags)
 {
 	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
+	struct lo_inode *inode;
 	ssize_t ret;
 	int saverr;
 
+	inode = lo_inode(req, ino);
+	if (!inode) {
+		fuse_reply_err(req, EBADF);
+		return;
+	}
+
 	saverr = ENOSYS;
 	if (!lo_data(req)->xattr)
 		goto out;
@@ -1304,10 +1370,16 @@ out:
 static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
 {
 	char procname[64];
-	struct lo_inode *inode = lo_inode(req, ino);
+	struct lo_inode *inode;
 	ssize_t ret;
 	int saverr;
 
+	inode = lo_inode(req, ino);
+	if (!inode) {
+		fuse_reply_err(req, EBADF);
+		return;
+	}
+
 	saverr = ENOSYS;
 	if (!lo_data(req)->xattr)
 		goto out;
@@ -1411,6 +1483,7 @@ int main(int argc, char *argv[])
 	struct fuse_cmdline_opts opts;
 	struct lo_data lo = { .debug = 0,
 	                      .writeback = 0 };
+	struct lo_map_elem *root_elem;
 	int ret = -1;
 
 	/* Don't mask creation mode, kernel already did that */
@@ -1419,8 +1492,18 @@ int main(int argc, char *argv[])
 	pthread_mutex_init(&lo.mutex, NULL);
 	lo.root.next = lo.root.prev = &lo.root;
 	lo.root.fd = -1;
+	lo.root.fuse_ino = FUSE_ROOT_ID;
 	lo.cache = CACHE_NORMAL;
 
+	/* Set up the ino map like this:
+	 * [0] Reserved (will not be used)
+	 * [1] Root inode
+	 */
+	lo_map_init(&lo.ino_map);
+	lo_map_reserve(&lo.ino_map, 0)->in_use = false;
+	root_elem = lo_map_reserve(&lo.ino_map, lo.root.fuse_ino);
+	root_elem->inode = &lo.root;
+
 	if (fuse_parse_cmdline(&args, &opts) != 0)
 		return 1;
 	if (opts.show_help) {
@@ -1514,6 +1597,8 @@ err_out2:
 err_out1:
 	fuse_opt_free_args(&args);
 
+	lo_map_destroy(&lo.ino_map);
+
 	if (lo.root.fd >= 0)
 		close(lo.root.fd);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/25] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2019-10-24 11:26 ` [PATCH 03/25] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers Dr. David Alan Gilbert (git)
@ 2019-10-24 11:26 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:26 ` [PATCH 05/25] virtiofsd: passthrough_ll: add fd_map to hide file descriptors Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:26 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Do not expose lo_dirp pointers to clients.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 100 +++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 25 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index f718c951f7..9f82166079 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -56,22 +56,10 @@
 
 #define HAVE_POSIX_FALLOCATE 1
 
-/* We are re-using pointers to our `struct lo_dirp`
-   elements as inodes. This means that we must be able to
-   store uintptr_t values in a fuse_ino_t variable. The following
-   incantation checks this condition at compile time. */
-#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 6) && !defined __cplusplus
-_Static_assert(sizeof(fuse_ino_t) >= sizeof(uintptr_t),
-	       "fuse_ino_t too small to hold uintptr_t values!");
-#else
-struct _uintptr_to_must_hold_fuse_ino_t_dummy_struct \
-	{ unsigned _uintptr_to_must_hold_fuse_ino_t:
-			((sizeof(fuse_ino_t) >= sizeof(uintptr_t)) ? 1 : -1); };
-#endif
-
 struct lo_map_elem {
 	union {
 		struct lo_inode *inode;
+		struct lo_dirp *dirp;
 		ssize_t freelist;
 	};
 	bool in_use;
@@ -118,6 +106,7 @@ struct lo_data {
 	int timeout_set;
 	struct lo_inode root; /* protected by lo->mutex */
 	struct lo_map ino_map; /* protected by lo->mutex */
+	struct lo_map dirp_map; /* protected by lo->mutex */
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -252,6 +241,19 @@ static void lo_map_remove(struct lo_map *map, size_t key)
 	map->freelist = key;
 }
 
+/* Assumes lo->mutex is held */
+static ssize_t lo_add_dirp_mapping(fuse_req_t req, struct lo_dirp *dirp)
+{
+	struct lo_map_elem *elem;
+
+	elem = lo_map_alloc_elem(&lo_data(req)->dirp_map);
+	if (!elem)
+		return -1;
+
+	elem->dirp = dirp;
+	return elem - lo_data(req)->dirp_map.elems;
+}
+
 /* Assumes lo->mutex is held */
 static ssize_t lo_add_inode_mapping(fuse_req_t req, struct lo_inode *inode)
 {
@@ -820,16 +822,28 @@ struct lo_dirp {
 	off_t offset;
 };
 
-static struct lo_dirp *lo_dirp(struct fuse_file_info *fi)
+static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
 {
-	return (struct lo_dirp *) (uintptr_t) fi->fh;
+	struct lo_data *lo = lo_data(req);
+	struct lo_map_elem *elem;
+
+	pthread_mutex_lock(&lo->mutex);
+	elem = lo_map_get(&lo->dirp_map, fi->fh);
+	pthread_mutex_unlock(&lo->mutex);
+	if (!elem)
+		return NULL;
+
+	return elem->dirp;
 }
 
 static void lo_opendir(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
 	int error = ENOMEM;
 	struct lo_data *lo = lo_data(req);
-	struct lo_dirp *d = calloc(1, sizeof(struct lo_dirp));
+	struct lo_dirp *d;
+	ssize_t fh;
+
+	d = calloc(1, sizeof(struct lo_dirp));
 	if (d == NULL)
 		goto out_err;
 
@@ -844,7 +858,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi
 	d->offset = 0;
 	d->entry = NULL;
 
-	fi->fh = (uintptr_t) d;
+	pthread_mutex_lock(&lo->mutex);
+	fh = lo_add_dirp_mapping(req, d);
+	pthread_mutex_unlock(&lo->mutex);
+	if (fh == -1)
+		goto out_err;
+
+	fi->fh = fh;
 	if (lo->cache == CACHE_ALWAYS)
 		fi->keep_cache = 1;
 	fuse_reply_open(req, fi);
@@ -854,6 +874,8 @@ out_errno:
 	error = errno;
 out_err:
 	if (d) {
+		if (d->dp)
+			closedir(d->dp);
 		if (d->fd != -1)
 			close(d->fd);
 		free(d);
@@ -870,19 +892,21 @@ static int is_dot_or_dotdot(const char *name)
 static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
 			  off_t offset, struct fuse_file_info *fi, int plus)
 {
-	struct lo_dirp *d = lo_dirp(fi);
-	char *buf;
+	struct lo_dirp *d;
+	char *buf = NULL;
 	char *p;
 	size_t rem = size;
-	int err;
+	int err = ENOMEM;
 
 	(void) ino;
 
+	d = lo_dirp(req, fi);
+	if (!d)
+		goto error;
+
 	buf = calloc(1, size);
-	if (!buf) {
-		err = ENOMEM;
+	if (!buf)
 		goto error;
-	}
 	p = buf;
 
 	if (offset != d->offset) {
@@ -974,8 +998,21 @@ static void lo_readdirplus(fuse_req_t req, fuse_ino_t ino, size_t size,
 
 static void lo_releasedir(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
-	struct lo_dirp *d = lo_dirp(fi);
+	struct lo_data *lo = lo_data(req);
+	struct lo_dirp *d;
+
 	(void) ino;
+
+	d = lo_dirp(req, fi);
+	if (!d) {
+		fuse_reply_err(req, EBADF);
+		return;
+	}
+
+	pthread_mutex_lock(&lo->mutex);
+	lo_map_remove(&lo->dirp_map, fi->fh);
+	pthread_mutex_unlock(&lo->mutex);
+
 	closedir(d->dp);
 	free(d);
 	fuse_reply_err(req, 0);
@@ -1023,8 +1060,18 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
 			struct fuse_file_info *fi)
 {
 	int res;
-	int fd = dirfd(lo_dirp(fi)->dp);
+	struct lo_dirp *d;
+	int fd;
+
 	(void) ino;
+
+	d = lo_dirp(req, fi);
+	if (!d) {
+		fuse_reply_err(req, EBADF);
+		return;
+	}
+
+	fd = dirfd(d->dp);
 	if (datasync)
 		res = fdatasync(fd);
 	else
@@ -1504,6 +1551,8 @@ int main(int argc, char *argv[])
 	root_elem = lo_map_reserve(&lo.ino_map, lo.root.fuse_ino);
 	root_elem->inode = &lo.root;
 
+	lo_map_init(&lo.dirp_map);
+
 	if (fuse_parse_cmdline(&args, &opts) != 0)
 		return 1;
 	if (opts.show_help) {
@@ -1597,6 +1646,7 @@ err_out2:
 err_out1:
 	fuse_opt_free_args(&args);
 
+	lo_map_destroy(&lo.dirp_map);
 	lo_map_destroy(&lo.ino_map);
 
 	if (lo.root.fd >= 0)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/25] virtiofsd: passthrough_ll: add fd_map to hide file descriptors
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2019-10-24 11:26 ` [PATCH 04/25] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers Dr. David Alan Gilbert (git)
@ 2019-10-24 11:26 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:26 ` [PATCH 06/25] virtiofsd: passthrough_ll: add fallback for racy ops Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:26 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Do not expose file descriptor numbers to clients.  This prevents the
abuse of internal file descriptors (like stdin/stdout).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 113 +++++++++++++++++++++++------
 1 file changed, 92 insertions(+), 21 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 9f82166079..a71fbff143 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -60,6 +60,7 @@ struct lo_map_elem {
 	union {
 		struct lo_inode *inode;
 		struct lo_dirp *dirp;
+		int fd;
 		ssize_t freelist;
 	};
 	bool in_use;
@@ -107,6 +108,7 @@ struct lo_data {
 	struct lo_inode root; /* protected by lo->mutex */
 	struct lo_map ino_map; /* protected by lo->mutex */
 	struct lo_map dirp_map; /* protected by lo->mutex */
+	struct lo_map fd_map; /* protected by lo->mutex */
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -241,6 +243,19 @@ static void lo_map_remove(struct lo_map *map, size_t key)
 	map->freelist = key;
 }
 
+/* Assumes lo->mutex is held */
+static ssize_t lo_add_fd_mapping(fuse_req_t req, int fd)
+{
+	struct lo_map_elem *elem;
+
+	elem = lo_map_alloc_elem(&lo_data(req)->fd_map);
+	if (!elem)
+		return -1;
+
+	elem->fd = fd;
+	return elem - lo_data(req)->fd_map.elems;
+}
+
 /* Assumes lo->mutex is held */
 static ssize_t lo_add_dirp_mapping(fuse_req_t req, struct lo_dirp *dirp)
 {
@@ -350,6 +365,21 @@ static int utimensat_empty_nofollow(struct lo_inode *inode,
 	return utimensat(AT_FDCWD, procname, tv, 0);
 }
 
+static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
+{
+	struct lo_data *lo = lo_data(req);
+	struct lo_map_elem *elem;
+
+	pthread_mutex_lock(&lo->mutex);
+	elem = lo_map_get(&lo->fd_map, fi->fh);
+	pthread_mutex_unlock(&lo->mutex);
+
+	if (!elem)
+		return -1;
+
+	return elem->fd;
+}
+
 static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 		       int valid, struct fuse_file_info *fi)
 {
@@ -358,6 +388,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 	struct lo_inode *inode;
 	int ifd;
 	int res;
+	int fd;
 
 	inode = lo_inode(req, ino);
 	if (!inode) {
@@ -367,9 +398,13 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 
 	ifd = inode->fd;
 
+	/* If fi->fh is invalid we'll report EBADF later */
+	if (fi)
+		fd = lo_fi_fd(req, fi);
+
 	if (valid & FUSE_SET_ATTR_MODE) {
 		if (fi) {
-			res = fchmod(fi->fh, attr->st_mode);
+			res = fchmod(fd, attr->st_mode);
 		} else {
 			sprintf(procname, "/proc/self/fd/%i", ifd);
 			res = chmod(procname, attr->st_mode);
@@ -390,7 +425,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 	}
 	if (valid & FUSE_SET_ATTR_SIZE) {
 		if (fi) {
-			res = ftruncate(fi->fh, attr->st_size);
+			res = ftruncate(fd, attr->st_size);
 		} else {
 			sprintf(procname, "/proc/self/fd/%i", ifd);
 			res = truncate(procname, attr->st_size);
@@ -417,7 +452,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 			tv[1] = attr->st_mtim;
 
 		if (fi)
-			res = futimens(fi->fh, tv);
+			res = futimens(fd, tv);
 		else
 			res = utimensat_empty_nofollow(inode, tv);
 		if (res == -1)
@@ -1041,7 +1076,18 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
 	lo_restore_cred(&old);
 
 	if (!err) {
-		fi->fh = fd;
+		ssize_t fh;
+
+		pthread_mutex_lock(&lo->mutex);
+		fh = lo_add_fd_mapping(req, fd);
+		pthread_mutex_unlock(&lo->mutex);
+		if (fh == -1) {
+			close(fd);
+			fuse_reply_err(req, ENOMEM);
+			return;
+		}
+
+		fi->fh = fh;
 		err = lo_do_lookup(req, parent, name, &e);
 	}
 	if (lo->cache == CACHE_NEVER)
@@ -1082,6 +1128,7 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
 static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
 	int fd;
+	ssize_t fh;
 	char buf[64];
 	struct lo_data *lo = lo_data(req);
 
@@ -1110,7 +1157,16 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 	if (fd == -1)
 		return (void) fuse_reply_err(req, errno);
 
-	fi->fh = fd;
+	pthread_mutex_lock(&lo->mutex);
+	fh = lo_add_fd_mapping(req, fd);
+	pthread_mutex_unlock(&lo->mutex);
+	if (fh == -1) {
+		close(fd);
+		fuse_reply_err(req, ENOMEM);
+		return;
+	}
+
+	fi->fh = fh;
 	if (lo->cache == CACHE_NEVER)
 		fi->direct_io = 1;
 	else if (lo->cache == CACHE_ALWAYS)
@@ -1120,9 +1176,18 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 
 static void lo_release(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
+	struct lo_data *lo = lo_data(req);
+	int fd;
+
 	(void) ino;
 
-	close(fi->fh);
+	fd = lo_fi_fd(req, fi);
+
+	pthread_mutex_lock(&lo->mutex);
+	lo_map_remove(&lo->fd_map, fi->fh);
+	pthread_mutex_unlock(&lo->mutex);
+
+	close(fd);
 	fuse_reply_err(req, 0);
 }
 
@@ -1130,7 +1195,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 {
 	int res;
 	(void) ino;
-	res = close(dup(fi->fh));
+	res = close(dup(lo_fi_fd(req, fi)));
 	fuse_reply_err(req, res == -1 ? errno : 0);
 }
 
@@ -1155,7 +1220,7 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
 		if (fd == -1)
 			return (void) fuse_reply_err(req, errno);
 	} else
-		fd = fi->fh;
+		fd = lo_fi_fd(req, fi);
 
 	if (datasync)
 		res = fdatasync(fd);
@@ -1176,7 +1241,7 @@ static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size,
 			"off=%lu)\n", ino, size, (unsigned long) offset);
 
 	buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
-	buf.buf[0].fd = fi->fh;
+	buf.buf[0].fd = lo_fi_fd(req, fi);
 	buf.buf[0].pos = offset;
 
 	fuse_reply_data(req, &buf, FUSE_BUF_SPLICE_MOVE);
@@ -1191,7 +1256,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
 	struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
 
 	out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
-	out_buf.buf[0].fd = fi->fh;
+	out_buf.buf[0].fd = lo_fi_fd(req, fi);
 	out_buf.buf[0].pos = off;
 
 	if (lo_debug(req))
@@ -1234,7 +1299,8 @@ static void lo_fallocate(fuse_req_t req, fuse_ino_t ino, int mode,
 		return;
 	}
 
-	err = posix_fallocate(fi->fh, offset, length);
+	err = posix_fallocate(lo_fi_fd(req, fi), offset,
+			      length);
 #endif
 
 	fuse_reply_err(req, err);
@@ -1246,7 +1312,7 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
 	int res;
 	(void) ino;
 
-	res = flock(fi->fh, op);
+	res = flock(lo_fi_fd(req, fi), op);
 
 	fuse_reply_err(req, res == -1 ? errno : 0);
 }
@@ -1458,17 +1524,20 @@ static void lo_copy_file_range(fuse_req_t req, fuse_ino_t ino_in, off_t off_in,
 			       struct fuse_file_info *fi_out, size_t len,
 			       int flags)
 {
+	int in_fd, out_fd;
 	ssize_t res;
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_copy_file_range(ino=%" PRIu64 "/fd=%lu, "
-				"off=%lu, ino=%" PRIu64 "/fd=%lu, "
-				"off=%lu, size=%zd, flags=0x%x)\n",
-			ino_in, fi_in->fh, off_in, ino_out, fi_out->fh, off_out,
-			len, flags);
-
-	res = copy_file_range(fi_in->fh, &off_in, fi_out->fh, &off_out, len,
-			      flags);
+	in_fd = lo_fi_fd(req, fi_in);
+	out_fd = lo_fi_fd(req, fi_out);
+
+	fuse_log(FUSE_LOG_DEBUG,
+		 "lo_copy_file_range(ino=%" PRIu64 "/fd=%d, "
+		 "off=%lu, ino=%" PRIu64 "/fd=%d, "
+		 "off=%lu, size=%zd, flags=0x%x)\n",
+		 ino_in, in_fd, off_in, ino_out, out_fd, off_out, len,
+		 flags);
+
+	res = copy_file_range(in_fd, &off_in, out_fd, &off_out, len, flags);
 	if (res < 0)
 		fuse_reply_err(req, -errno);
 	else
@@ -1552,6 +1621,7 @@ int main(int argc, char *argv[])
 	root_elem->inode = &lo.root;
 
 	lo_map_init(&lo.dirp_map);
+	lo_map_init(&lo.fd_map);
 
 	if (fuse_parse_cmdline(&args, &opts) != 0)
 		return 1;
@@ -1646,6 +1716,7 @@ err_out2:
 err_out1:
 	fuse_opt_free_args(&args);
 
+	lo_map_destroy(&lo.fd_map);
 	lo_map_destroy(&lo.dirp_map);
 	lo_map_destroy(&lo.ino_map);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/25] virtiofsd: passthrough_ll: add fallback for racy ops
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2019-10-24 11:26 ` [PATCH 05/25] virtiofsd: passthrough_ll: add fd_map to hide file descriptors Dr. David Alan Gilbert (git)
@ 2019-10-24 11:26 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 07/25] virtiofsd: validate path components Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:26 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Miklos Szeredi <mszeredi@redhat.com>

We have two operations that cannot be done race-free on a symlink in
certain cases: utimes and link.

Add racy fallback for these if the race-free method doesn't work.  We do
our best to avoid races even in this case:

  - get absolute path by reading /proc/self/fd/NN symlink

  - lookup parent directory: after this we are safe against renames in
    ancestors

  - lookup name in parent directory, and verify that we got to the original
    inode,  if not retry the whole thing

Both utimes(2) and link(2) hold i_lock on the inode across the operation,
so a racing rename/delete by this fuse instance is not possible, only from
other entities changing the filesystem.

If the "norace" option is given, then disable the racy fallbacks.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 149 +++++++++++++++++++++++++----
 1 file changed, 131 insertions(+), 18 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index a71fbff143..9f84419816 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -98,6 +98,7 @@ enum {
 struct lo_data {
 	pthread_mutex_t mutex;
 	int debug;
+	int norace;
 	int writeback;
 	int flock;
 	int xattr;
@@ -136,10 +137,16 @@ static const struct fuse_opt lo_opts[] = {
 	  offsetof(struct lo_data, cache), CACHE_NORMAL },
 	{ "cache=always",
 	  offsetof(struct lo_data, cache), CACHE_ALWAYS },
-
+	{ "norace",
+	  offsetof(struct lo_data, norace), 1 },
 	FUSE_OPT_END
 };
 
+static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
+
+static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st);
+
+
 static struct lo_data *lo_data(fuse_req_t req)
 {
 	return (struct lo_data *) fuse_req_userdata(req);
@@ -345,24 +352,116 @@ static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
 	fuse_reply_attr(req, &buf, lo->timeout);
 }
 
-static int utimensat_empty_nofollow(struct lo_inode *inode,
-				    const struct timespec *tv)
+static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
+			      char path[PATH_MAX], struct lo_inode **parent)
 {
-	int res;
 	char procname[64];
+	char *last;
+	struct stat stat;
+	struct lo_inode *p;
+	int retries = 2;
+	int res;
+
+retry:
+	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+
+	res = readlink(procname, path, PATH_MAX);
+	if (res < 0) {
+		fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink failed: %m\n");
+		goto fail_noretry;
+	}
+
+	if (res >= PATH_MAX) {
+		fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink overflowed\n");
+		goto fail_noretry;
+	}
+	path[res] = '\0';
+
+	last = strrchr(path, '/');
+	if (last == NULL) {
+		/* Shouldn't happen */
+		fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: INTERNAL ERROR: bad path read from proc\n");
+		goto fail_noretry;
+	}
+	if (last == path) {
+		p = &lo->root;
+		pthread_mutex_lock(&lo->mutex);
+		p->refcount++;
+		pthread_mutex_unlock(&lo->mutex);
+	} else {
+		*last = '\0';
+		res = fstatat(AT_FDCWD, last == path ? "/" : path, &stat, 0);
+		if (res == -1) {
+			if (!retries)
+				fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: failed to stat parent: %m\n");
+			goto fail;
+		}
+		p = lo_find(lo, &stat);
+		if (p == NULL) {
+			if (!retries)
+				fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: failed to find parent\n");
+			goto fail;
+		}
+	}
+	last++;
+	res = fstatat(p->fd, last, &stat, AT_SYMLINK_NOFOLLOW);
+	if (res == -1) {
+		if (!retries)
+			fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: failed to stat last\n");
+		goto fail_unref;
+	}
+	if (stat.st_dev != inode->dev || stat.st_ino != inode->ino) {
+		if (!retries)
+			fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: failed to match last\n");
+		goto fail_unref;
+	}
+	*parent = p;
+	memmove(path, last, strlen(last) + 1);
+
+	return 0;
+
+fail_unref:
+	unref_inode(lo, p, 1);
+fail:
+	if (retries) {
+		retries--;
+		goto retry;
+	}
+fail_noretry:
+	errno = EIO;
+	return -1;
+}
+
+static int utimensat_empty(struct lo_data *lo, struct lo_inode *inode,
+			   const struct timespec *tv)
+{
+	int res;
+	struct lo_inode *parent;
+	char path[PATH_MAX];
 
 	if (inode->is_symlink) {
-		res = utimensat(inode->fd, "", tv,
-				AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+		res = utimensat(inode->fd, "", tv, AT_EMPTY_PATH);
 		if (res == -1 && errno == EINVAL) {
 			/* Sorry, no race free way to set times on symlink. */
-			errno = EPERM;
+			if (lo->norace)
+				errno = EPERM;
+			else
+				goto fallback;
 		}
 		return res;
 	}
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+	sprintf(path, "/proc/self/fd/%i", inode->fd);
+
+	return utimensat(AT_FDCWD, path, tv, 0);
+
+fallback:
+	res = lo_parent_and_name(lo, inode, path, &parent);
+	if (res != -1) {
+		res = utimensat(parent->fd, path, tv, AT_SYMLINK_NOFOLLOW);
+		unref_inode(lo, parent, 1);
+	}
 
-	return utimensat(AT_FDCWD, procname, tv, 0);
+	return res;
 }
 
 static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
@@ -385,6 +484,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 {
 	int saverr;
 	char procname[64];
+	struct lo_data *lo = lo_data(req);
 	struct lo_inode *inode;
 	int ifd;
 	int res;
@@ -454,7 +554,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 		if (fi)
 			res = futimens(fd, tv);
 		else
-			res = utimensat_empty_nofollow(inode, tv);
+			res = utimensat_empty(lo, inode, tv);
 		if (res == -1)
 			goto out_err;
 	}
@@ -673,24 +773,37 @@ static void lo_symlink(fuse_req_t req, const char *link,
 	lo_mknod_symlink(req, parent, name, S_IFLNK, 0, link);
 }
 
-static int linkat_empty_nofollow(struct lo_inode *inode, int dfd,
-				 const char *name)
+static int linkat_empty_nofollow(struct lo_data *lo, struct lo_inode *inode,
+				 int dfd, const char *name)
 {
 	int res;
-	char procname[64];
+	struct lo_inode *parent;
+	char path[PATH_MAX];
 
 	if (inode->is_symlink) {
 		res = linkat(inode->fd, "", dfd, name, AT_EMPTY_PATH);
 		if (res == -1 && (errno == ENOENT || errno == EINVAL)) {
 			/* Sorry, no race free way to hard-link a symlink. */
-			errno = EPERM;
+			if (lo->norace)
+				errno = EPERM;
+			else
+				goto fallback;
 		}
 		return res;
 	}
 
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+	sprintf(path, "/proc/self/fd/%i", inode->fd);
+
+	return linkat(AT_FDCWD, path, dfd, name, AT_SYMLINK_FOLLOW);
 
-	return linkat(AT_FDCWD, procname, dfd, name, AT_SYMLINK_FOLLOW);
+fallback:
+	res = lo_parent_and_name(lo, inode, path, &parent);
+	if (res != -1) {
+		res = linkat(parent->fd, path, dfd, name, 0);
+		unref_inode(lo, parent, 1);
+	}
+
+	return res;
 }
 
 static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
@@ -712,7 +825,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
 	e.attr_timeout = lo->timeout;
 	e.entry_timeout = lo->timeout;
 
-	res = linkat_empty_nofollow(inode, lo_fd(req, parent), name);
+	res = linkat_empty_nofollow(lo, inode, lo_fd(req, parent), name);
 	if (res == -1)
 		goto out_err;
 
@@ -1466,7 +1579,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 	}
 
 	if (inode->is_symlink) {
-		/* Sorry, no race free way to setxattr on symlink. */
+		/* Sorry, no race free way to removexattr on symlink. */
 		saverr = EPERM;
 		goto out;
 	}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/25] virtiofsd: validate path components
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2019-10-24 11:26 ` [PATCH 06/25] virtiofsd: passthrough_ll: add fallback for racy ops Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 08/25] virtiofsd: Plumb fuse_bufvec through to do_write_buf Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Several FUSE requests contain single path components.  A correct FUSE
client sends well-formed path components but there is currently no input
validation in case something went wrong or the client is malicious.

Refuse ".", "..", and paths containing '/' when we expect a path
component.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 59 +++++++++++++++++++++++++++---
 1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 9f84419816..702fedc38a 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -146,6 +146,21 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
 
 static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st);
 
+static int is_dot_or_dotdot(const char *name)
+{
+	return name[0] == '.' && (name[1] == '\0' ||
+				  (name[1] == '.' && name[2] == '\0'));
+}
+
+/* Is `path` a single path component that is not "." or ".."? */
+static int is_safe_path_component(const char *path)
+{
+	if (strchr(path, '/')) {
+		return 0;
+	}
+
+	return !is_dot_or_dotdot(path);
+}
 
 static struct lo_data *lo_data(fuse_req_t req)
 {
@@ -657,6 +672,14 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
 		fuse_log(FUSE_LOG_DEBUG, "lo_lookup(parent=%" PRIu64 ", name=%s)\n",
 			parent, name);
 
+	/* Don't use is_safe_path_component(), allow "." and ".." for NFS export
+	 * support.
+	 */
+	if (strchr(name, '/')) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	err = lo_do_lookup(req, parent, name, &e);
 	if (err)
 		fuse_reply_err(req, err);
@@ -719,6 +742,11 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
 	struct fuse_entry_param e;
 	struct lo_cred old = {};
 
+	if (!is_safe_path_component(name)) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	dir = lo_inode(req, parent);
 	if (!dir) {
 		fuse_reply_err(req, EBADF);
@@ -815,6 +843,11 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
 	struct fuse_entry_param e;
 	int saverr;
 
+	if (!is_safe_path_component(name)) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	inode = lo_inode(req, ino);
 	if (!inode) {
 		fuse_reply_err(req, EBADF);
@@ -854,6 +887,10 @@ out_err:
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
 	int res;
+	if (!is_safe_path_component(name)) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
 
@@ -866,6 +903,12 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
 {
 	int res;
 
+	if (!is_safe_path_component(name) ||
+		!is_safe_path_component(newname)) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	if (flags) {
 		fuse_reply_err(req, EINVAL);
 		return;
@@ -881,6 +924,11 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
 	int res;
 
+	if (!is_safe_path_component(name)) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	res = unlinkat(lo_fd(req, parent), name, 0);
 
 	fuse_reply_err(req, res == -1 ? errno : 0);
@@ -1031,12 +1079,6 @@ out_err:
 	fuse_reply_err(req, error);
 }
 
-static int is_dot_or_dotdot(const char *name)
-{
-	return name[0] == '.' && (name[1] == '\0' ||
-				  (name[1] == '.' && name[2] == '\0'));
-}
-
 static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
 			  off_t offset, struct fuse_file_info *fi, int plus)
 {
@@ -1179,6 +1221,11 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
 		fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
 			parent, name);
 
+	if (!is_safe_path_component(name)) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	err = lo_change_cred(req, &old);
 	if (err)
 		goto out;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/25] virtiofsd: Plumb fuse_bufvec through to do_write_buf
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 07/25] virtiofsd: validate path components Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 09/25] virtiofsd: Pass write iov's all the way through Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Let fuse_session_process_buf_int take a fuse_bufvec * instead of a
fuse_buf;  and then through to do_write_buf - where in the best
case it can pass that stragiht through to op.write_buf without copying
(other than ksipping a header).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/virtiofsd/fuse_i.h        |  2 +-
 contrib/virtiofsd/fuse_lowlevel.c | 67 +++++++++++++++++++------------
 contrib/virtiofsd/fuse_virtio.c   |  3 +-
 3 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/contrib/virtiofsd/fuse_i.h b/contrib/virtiofsd/fuse_i.h
index 4b718f1aec..06f3d88a44 100644
--- a/contrib/virtiofsd/fuse_i.h
+++ b/contrib/virtiofsd/fuse_i.h
@@ -119,7 +119,7 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, struct iovec *iov,
 void fuse_free_req(fuse_req_t req);
 
 void fuse_session_process_buf_int(struct fuse_session *se,
-				  const struct fuse_buf *buf, struct fuse_chan *ch);
+				  struct fuse_bufvec *bufv, struct fuse_chan *ch);
 
 
 #define FUSE_MAX_MAX_PAGES 256
diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
index dd046ecf07..2bd2ba00b9 100644
--- a/contrib/virtiofsd/fuse_lowlevel.c
+++ b/contrib/virtiofsd/fuse_lowlevel.c
@@ -1007,11 +1007,12 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 }
 
 static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
-			 const struct fuse_buf *ibuf)
+			 struct fuse_bufvec *ibufv)
 {
 	struct fuse_session *se = req->se;
-	struct fuse_bufvec bufv = {
-		.buf[0] = *ibuf,
+	struct fuse_bufvec *pbufv = ibufv;
+	struct fuse_bufvec tmpbufv = {
+		.buf[0] = ibufv->buf[0],
 		.count = 1,
 	};
 	struct fuse_write_in *arg = (struct fuse_write_in *) inarg;
@@ -1021,28 +1022,35 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
 	fi.fh = arg->fh;
 	fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
 
-	if (se->conn.proto_minor < 9) {
-		bufv.buf[0].mem = ((char *) arg) + FUSE_COMPAT_WRITE_IN_SIZE;
-		bufv.buf[0].size -= sizeof(struct fuse_in_header) +
-			FUSE_COMPAT_WRITE_IN_SIZE;
-		assert(!(bufv.buf[0].flags & FUSE_BUF_IS_FD));
-	} else {
-		fi.lock_owner = arg->lock_owner;
-		fi.flags = arg->flags;
-		if (!(bufv.buf[0].flags & FUSE_BUF_IS_FD))
-			bufv.buf[0].mem = PARAM(arg);
+	if (ibufv->count == 1) {
+		if (se->conn.proto_minor < 9) {
+			tmpbufv.buf[0].mem = ((char *) arg) + FUSE_COMPAT_WRITE_IN_SIZE;
+			tmpbufv.buf[0].size -= sizeof(struct fuse_in_header) +
+				FUSE_COMPAT_WRITE_IN_SIZE;
+			assert(!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD));
+		} else {
+			fi.lock_owner = arg->lock_owner;
+			fi.flags = arg->flags;
+			if (!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD))
+				tmpbufv.buf[0].mem = PARAM(arg);
 
-		bufv.buf[0].size -= sizeof(struct fuse_in_header) +
-			sizeof(struct fuse_write_in);
-	}
-	if (bufv.buf[0].size < arg->size) {
-		fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
-		fuse_reply_err(req, EIO);
-		return;
+			tmpbufv.buf[0].size -= sizeof(struct fuse_in_header) +
+				sizeof(struct fuse_write_in);
+		}
+		if (tmpbufv.buf[0].size < arg->size) {
+			fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
+			fuse_reply_err(req, EIO);
+			return;
+		}
+		tmpbufv.buf[0].size = arg->size;
+		pbufv = &tmpbufv;
+	} else {
+		// Input bufv contains the headers in the first element
+		// and the data in the rest, we need to skip that first element
+		ibufv->buf[0].size = 0;
 	}
-	bufv.buf[0].size = arg->size;
 
-	se->op.write_buf(req, nodeid, &bufv, arg->offset, &fi);
+	se->op.write_buf(req, nodeid, pbufv, arg->offset, &fi);
 }
 
 static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
@@ -1979,12 +1987,21 @@ static const char *opname(enum fuse_opcode opcode)
 void fuse_session_process_buf(struct fuse_session *se,
 			      const struct fuse_buf *buf)
 {
-	fuse_session_process_buf_int(se, buf, NULL);
+	struct fuse_bufvec bufv = { .buf[0] = *buf, .count = 1 };
+	fuse_session_process_buf_int(se, &bufv, NULL);
 }
 
+// Restriction:
+//   bufv is normally a single entry buffer, except for a write
+//   where (if it's in memory) then the bufv may be multiple entries,
+//   where the first entry contains all headers and subsequent entries
+//   contain data
+//   bufv shall not use any offsets etc to make the data anything
+//   other than contiguous starting from 0.
 void fuse_session_process_buf_int(struct fuse_session *se,
-				  const struct fuse_buf *buf, struct fuse_chan *ch)
+				  struct fuse_bufvec *bufv, struct fuse_chan *ch)
 {
+	const struct fuse_buf *buf = bufv->buf;
 	struct fuse_in_header *in;
 	const void *inarg;
 	struct fuse_req *req;
@@ -2057,7 +2074,7 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 
 	inarg = (void *) &in[1];
 	if (in->opcode == FUSE_WRITE && se->op.write_buf)
-		do_write_buf(req, in->nodeid, inarg, buf);
+		do_write_buf(req, in->nodeid, inarg, bufv);
 	else
 		fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
 
diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 1f09fad93d..68036e2171 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -499,7 +499,8 @@ static void *fv_queue_thread(void *opaque)
             /* TODO! Endianness of header */
 
             /* TODO: Add checks for fuse_session_exited */
-            fuse_session_process_buf_int(se, &fbuf, &ch);
+            struct fuse_bufvec bufv = { .buf[0] = fbuf, .count = 1 };
+            fuse_session_process_buf_int(se, &bufv, &ch);
 
             if (!qi->reply_sent) {
                 fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n",
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/25] virtiofsd: Pass write iov's all the way through
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 08/25] virtiofsd: Plumb fuse_bufvec through to do_write_buf Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 10/25] virtiofsd: add fuse_mbuf_iter API Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Pass the write iov pointing to guest RAM all the way through rather
than copying the data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/virtiofsd/fuse_virtio.c | 73 ++++++++++++++++++++++++++++++---
 1 file changed, 67 insertions(+), 6 deletions(-)

diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 68036e2171..98f77d338d 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -452,6 +452,10 @@ static void *fv_queue_thread(void *opaque)
                  __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes);
 
         while (1) {
+            bool allocated_bufv = false;
+            struct fuse_bufvec bufv;
+            struct fuse_bufvec *pbufv;
+
             /*
              * An element contains one request and the space to send our
              * response They're spread over multiple descriptors in a
@@ -493,14 +497,71 @@ static void *fv_queue_thread(void *opaque)
                          __func__, elem->index);
                 assert(0); /* TODO */
             }
-            copy_from_iov(&fbuf, out_num, out_sg);
-            fbuf.size = out_len;
+            /* Copy just the first element and look at it */
+            copy_from_iov(&fbuf, 1, out_sg);
+
+            if (out_num > 2 &&
+                out_sg[0].iov_len == sizeof(struct fuse_in_header) &&
+                ((struct fuse_in_header *)fbuf.mem)->opcode == FUSE_WRITE &&
+                out_sg[1].iov_len == sizeof(struct fuse_write_in)) {
+                /*
+                 * For a write we don't actually need to copy the
+                 * data, we can just do it straight out of guest memory
+                 * but we must sitll copy the headers in case the guest
+                 * was nasty and changed them while we were using them.
+                 */
+                if (se->debug) {
+                    fprintf(stderr, "%s: Write special case\n", __func__);
+                }
+
+                /* copy the fuse_write_in header after the fuse_in_header */
+                fbuf.mem += out_sg->iov_len;
+                copy_from_iov(&fbuf, 1, out_sg + 1);
+                fbuf.mem -= out_sg->iov_len;
+                fbuf.size = out_sg[0].iov_len + out_sg[1].iov_len;
+
+                /* Allocate the bufv, with space for the rest of the iov */
+                allocated_bufv = true;
+                pbufv = malloc(sizeof(struct fuse_bufvec) +
+                               sizeof(struct fuse_buf) * (out_num - 2));
+
+                pbufv->count = 1;
+                pbufv->buf[0] = fbuf;
+
+                size_t iovindex, pbufvindex;
+                iovindex = 2; /* 2 headers, separate iovs */
+                pbufvindex = 1; /* 2 headers, 1 fusebuf */
+
+                for (; iovindex < out_num; iovindex++, pbufvindex++) {
+                    pbufv->count++;
+                    pbufv->buf[pbufvindex].pos = ~0; /* Dummy */
+                    pbufv->buf[pbufvindex].flags = 0;
+                    pbufv->buf[pbufvindex].mem = out_sg[iovindex].iov_base;
+                    pbufv->buf[pbufvindex].size = out_sg[iovindex].iov_len;
+                }
+            } else {
+                /* Normal (non fast write) path */
+
+                /* Copy the rest of the buffer */
+                fbuf.mem += out_sg->iov_len;
+                copy_from_iov(&fbuf, out_num - 1, out_sg + 1);
+                fbuf.mem -= out_sg->iov_len;
+                fbuf.size = out_len;
 
-            /* TODO! Endianness of header */
+                /* TODO! Endianness of header */
 
-            /* TODO: Add checks for fuse_session_exited */
-            struct fuse_bufvec bufv = { .buf[0] = fbuf, .count = 1 };
-            fuse_session_process_buf_int(se, &bufv, &ch);
+                /* TODO: Add checks for fuse_session_exited */
+                bufv.buf[0] = fbuf;
+                bufv.count = 1;
+                pbufv = &bufv;
+            }
+            pbufv->idx = 0;
+            pbufv->off = 0;
+            fuse_session_process_buf_int(se, pbufv, &ch);
+
+            if (allocated_bufv) {
+                free(pbufv);
+            }
 
             if (!qi->reply_sent) {
                 fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n",
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/25] virtiofsd: add fuse_mbuf_iter API
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 09/25] virtiofsd: Pass write iov's all the way through Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 11/25] virtiofsd: validate input buffer sizes in do_write_buf() Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Introduce an API for consuming bytes from a buffer with size checks.
All FUSE operations will be converted to use this safe API instead of
void *inarg.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/buffer.c      | 28 +++++++++++++++++++
 contrib/virtiofsd/fuse_common.h | 48 +++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
index 3c9b443fa0..2a0d123bde 100644
--- a/contrib/virtiofsd/buffer.c
+++ b/contrib/virtiofsd/buffer.c
@@ -316,3 +316,31 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
 
 	return copied;
 }
+
+void *fuse_mbuf_iter_advance(struct fuse_mbuf_iter *iter, size_t len)
+{
+	void *ptr;
+
+	if (len > iter->size - iter->pos) {
+		return NULL;
+	}
+
+	ptr = iter->mem + iter->pos;
+	iter->pos += len;
+	return ptr;
+}
+
+const char *fuse_mbuf_iter_advance_str(struct fuse_mbuf_iter *iter)
+{
+	const char *str = iter->mem + iter->pos;
+	size_t remaining = iter->size - iter->pos;
+	size_t i;
+
+	for (i = 0; i < remaining; i++) {
+		if (str[i] == '\0') {
+			iter->pos += i + 1;
+			return str;
+		}
+	}
+	return NULL;
+}
diff --git a/contrib/virtiofsd/fuse_common.h b/contrib/virtiofsd/fuse_common.h
index 2d686b2ac4..b5b6d270fe 100644
--- a/contrib/virtiofsd/fuse_common.h
+++ b/contrib/virtiofsd/fuse_common.h
@@ -760,6 +760,54 @@ size_t fuse_buf_size(const struct fuse_bufvec *bufv);
 ssize_t fuse_buf_copy(struct fuse_bufvec *dst, struct fuse_bufvec *src,
 		      enum fuse_buf_copy_flags flags);
 
+/**
+ * Memory buffer iterator
+ *
+ */
+struct fuse_mbuf_iter {
+	/**
+	 * Data pointer
+	 */
+	void *mem;
+
+	/**
+	 * Total length, in bytes
+	 */
+	size_t size;
+
+	/**
+	 * Offset from start of buffer
+	 */
+	size_t pos;
+};
+
+/* Initialize memory buffer iterator from a fuse_buf */
+#define FUSE_MBUF_ITER_INIT(fbuf) \
+	((struct fuse_mbuf_iter) { \
+		 .mem = fbuf->mem, \
+		 .size = fbuf->size, \
+		 .pos = 0, \
+	})
+
+/**
+ * Consume bytes from a memory buffer iterator
+ *
+ * @param iter memory buffer iterator
+ * @param len number of bytes to consume
+ * @return pointer to start of consumed bytes or
+ *         NULL if advancing beyond end of buffer
+ */
+void *fuse_mbuf_iter_advance(struct fuse_mbuf_iter *iter, size_t len);
+
+/**
+ * Consume a NUL-terminated string from a memory buffer iterator
+ *
+ * @param iter memory buffer iterator
+ * @return pointer to the string or
+ *         NULL if advancing beyond end of buffer or there is no NUL-terminator
+ */
+const char *fuse_mbuf_iter_advance_str(struct fuse_mbuf_iter *iter);
+
 /* ----------------------------------------------------------- *
  * Signal handling					       *
  * ----------------------------------------------------------- */
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/25] virtiofsd: validate input buffer sizes in do_write_buf()
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 10/25] virtiofsd: add fuse_mbuf_iter API Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 12/25] virtiofsd: check input buffer size in fuse_lowlevel.c ops Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

There is a small change in behavior: if fuse_write_in->size doesn't
match the input buffer size then the request is failed.  Previously
write requests with 1 fuse_buf element would truncate to
fuse_write_in->size.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/fuse_lowlevel.c | 62 +++++++++++++++++++------------
 1 file changed, 38 insertions(+), 24 deletions(-)

diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
index 2bd2ba00b9..7927348398 100644
--- a/contrib/virtiofsd/fuse_lowlevel.c
+++ b/contrib/virtiofsd/fuse_lowlevel.c
@@ -1006,7 +1006,8 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
+static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
+			 struct fuse_mbuf_iter *iter,
 			 struct fuse_bufvec *ibufv)
 {
 	struct fuse_session *se = req->se;
@@ -1015,34 +1016,36 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
 		.buf[0] = ibufv->buf[0],
 		.count = 1,
 	};
-	struct fuse_write_in *arg = (struct fuse_write_in *) inarg;
+	struct fuse_write_in *arg;
+	size_t arg_size = sizeof(*arg);
 	struct fuse_file_info fi;
 
 	memset(&fi, 0, sizeof(fi));
+
+	if (se->conn.proto_minor < 9) {
+		arg_size = FUSE_COMPAT_WRITE_IN_SIZE;
+	}
+
+	arg = fuse_mbuf_iter_advance(iter, arg_size);
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
+	/* Only access non-compat fields here! */
+	if (se->conn.proto_minor >= 9) {
+		fi.lock_owner = arg->lock_owner;
+		fi.flags = arg->flags;
+	}
+
 	fi.fh = arg->fh;
 	fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
 
 	if (ibufv->count == 1) {
-		if (se->conn.proto_minor < 9) {
-			tmpbufv.buf[0].mem = ((char *) arg) + FUSE_COMPAT_WRITE_IN_SIZE;
-			tmpbufv.buf[0].size -= sizeof(struct fuse_in_header) +
-				FUSE_COMPAT_WRITE_IN_SIZE;
-			assert(!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD));
-		} else {
-			fi.lock_owner = arg->lock_owner;
-			fi.flags = arg->flags;
-			if (!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD))
-				tmpbufv.buf[0].mem = PARAM(arg);
-
-			tmpbufv.buf[0].size -= sizeof(struct fuse_in_header) +
-				sizeof(struct fuse_write_in);
-		}
-		if (tmpbufv.buf[0].size < arg->size) {
-			fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size too small\n");
-			fuse_reply_err(req, EIO);
-			return;
-		}
-		tmpbufv.buf[0].size = arg->size;
+		assert(!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD));
+		tmpbufv.buf[0].mem = ((char *) arg) + arg_size;
+		tmpbufv.buf[0].size -= sizeof(struct fuse_in_header) +
+				       arg_size;
 		pbufv = &tmpbufv;
 	} else {
 		// Input bufv contains the headers in the first element
@@ -1050,6 +1053,12 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid, const void *inarg,
 		ibufv->buf[0].size = 0;
 	}
 
+	if (fuse_buf_size(pbufv) != arg->size) {
+		fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size doesn't match arg->size\n");
+		fuse_reply_err(req, EIO);
+		return;
+	}
+
 	se->op.write_buf(req, nodeid, pbufv, arg->offset, &fi);
 }
 
@@ -2002,12 +2011,17 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 				  struct fuse_bufvec *bufv, struct fuse_chan *ch)
 {
 	const struct fuse_buf *buf = bufv->buf;
+	struct fuse_mbuf_iter iter = FUSE_MBUF_ITER_INIT(buf);
 	struct fuse_in_header *in;
 	const void *inarg;
 	struct fuse_req *req;
 	int err;
 
-	in = buf->mem;
+	/* The first buffer must be a memory buffer */
+	assert(!(buf->flags & FUSE_BUF_IS_FD));
+
+	in = fuse_mbuf_iter_advance(&iter, sizeof(*in));
+	assert(in); /* caller guarantees the input buffer is large enough */
 
 	if (se->debug) {
 		fuse_log(FUSE_LOG_DEBUG,
@@ -2074,7 +2088,7 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 
 	inarg = (void *) &in[1];
 	if (in->opcode == FUSE_WRITE && se->op.write_buf)
-		do_write_buf(req, in->nodeid, inarg, bufv);
+		do_write_buf(req, in->nodeid, &iter, bufv);
 	else
 		fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/25] virtiofsd: check input buffer size in fuse_lowlevel.c ops
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 11/25] virtiofsd: validate input buffer sizes in do_write_buf() Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 13/25] virtiofsd: prevent ".." escape in lo_do_lookup() Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Each FUSE operation involves parsing the input buffer.  Currently the
code assumes the input buffer is large enough for the expected
arguments.  This patch uses fuse_mbuf_iter to check the size.

Most operations are simple to convert.  Some are more complicated due to
variable-length inputs or different sizes depending on the protocol
version.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/fuse_lowlevel.c | 621 +++++++++++++++++++++++-------
 1 file changed, 482 insertions(+), 139 deletions(-)

diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
index 7927348398..d032e411e1 100644
--- a/contrib/virtiofsd/fuse_lowlevel.c
+++ b/contrib/virtiofsd/fuse_lowlevel.c
@@ -15,6 +15,7 @@
 #include "fuse_misc.h"
 #include "fuse_virtio.h"
 
+#include <stdbool.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <stddef.h>
@@ -27,7 +28,6 @@
 
 
 
-#define PARAM(inarg) (((char *)(inarg)) + sizeof(*(inarg)))
 #define OFFSET_MAX 0x7fffffffffffffffLL
 
 struct fuse_pollhandle {
@@ -705,9 +705,14 @@ int fuse_reply_poll(fuse_req_t req, unsigned revents)
 	return send_reply_ok(req, &arg, sizeof(arg));
 }
 
-static void do_lookup(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_lookup(fuse_req_t req, fuse_ino_t nodeid,
+		      struct fuse_mbuf_iter *iter)
 {
-	char *name = (char *) inarg;
+	const char *name = fuse_mbuf_iter_advance_str(iter);
+	if (!name) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.lookup)
 		req->se->op.lookup(req, nodeid, name);
@@ -715,9 +720,16 @@ static void do_lookup(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_forget(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_forget(fuse_req_t req, fuse_ino_t nodeid,
+		      struct fuse_mbuf_iter *iter)
 {
-	struct fuse_forget_in *arg = (struct fuse_forget_in *) inarg;
+	struct fuse_forget_in *arg;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.forget)
 		req->se->op.forget(req, nodeid, arg->nlookup);
@@ -726,20 +738,47 @@ static void do_forget(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 }
 
 static void do_batch_forget(fuse_req_t req, fuse_ino_t nodeid,
-			    const void *inarg)
+			    struct fuse_mbuf_iter *iter)
 {
-	struct fuse_batch_forget_in *arg = (void *) inarg;
-	struct fuse_forget_one *param = (void *) PARAM(arg);
-	unsigned int i;
+	struct fuse_batch_forget_in *arg;
+	struct fuse_forget_data *forgets;
+	size_t scount;
 
 	(void) nodeid;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_none(req);
+	}
+
+	/* Prevent integer overflow.  The compiler emits the following warning
+	 * unless we use the scount local variable:
+	 *
+	 * error: comparison is always false due to limited range of data type
+	 * [-Werror=type-limits]
+	 *
+	 * This may be true on 64-bit hosts but we need this check for 32-bit
+	 * hosts.
+	 */
+	scount = arg->count;
+	if (scount > SIZE_MAX / sizeof(forgets[0])) {
+		fuse_reply_none(req);
+		return;
+	}
+
+	forgets = fuse_mbuf_iter_advance(iter, arg->count *
+					 sizeof(forgets[0]));
+	if (!forgets) {
+		fuse_reply_none(req);
+		return;
+	}
+
 	if (req->se->op.forget_multi) {
-		req->se->op.forget_multi(req, arg->count,
-				     (struct fuse_forget_data *) param);
+		req->se->op.forget_multi(req, arg->count, forgets);
 	} else if (req->se->op.forget) {
+		unsigned int i;
+
 		for (i = 0; i < arg->count; i++) {
-			struct fuse_forget_one *forget = &param[i];
 			struct fuse_req *dummy_req;
 
 			dummy_req = fuse_ll_alloc_req(req->se);
@@ -750,8 +789,8 @@ static void do_batch_forget(fuse_req_t req, fuse_ino_t nodeid,
 			dummy_req->ctx = req->ctx;
 			dummy_req->ch = NULL;
 
-			req->se->op.forget(dummy_req, forget->nodeid,
-					  forget->nlookup);
+			req->se->op.forget(dummy_req, forgets[i].ino,
+					   forgets[i].nlookup);
 		}
 		fuse_reply_none(req);
 	} else {
@@ -759,13 +798,20 @@ static void do_batch_forget(fuse_req_t req, fuse_ino_t nodeid,
 	}
 }
 
-static void do_getattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_getattr(fuse_req_t req, fuse_ino_t nodeid,
+		       struct fuse_mbuf_iter *iter)
 {
 	struct fuse_file_info *fip = NULL;
 	struct fuse_file_info fi;
 
 	if (req->se->conn.proto_minor >= 9) {
-		struct fuse_getattr_in *arg = (struct fuse_getattr_in *) inarg;
+		struct fuse_getattr_in *arg;
+
+		arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+		if (!arg) {
+			fuse_reply_err(req, EINVAL);
+			return;
+		}
 
 		if (arg->getattr_flags & FUSE_GETATTR_FH) {
 			memset(&fi, 0, sizeof(fi));
@@ -780,14 +826,21 @@ static void do_getattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_setattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_setattr(fuse_req_t req, fuse_ino_t nodeid,
+		       struct fuse_mbuf_iter *iter)
 {
-	struct fuse_setattr_in *arg = (struct fuse_setattr_in *) inarg;
-
 	if (req->se->op.setattr) {
+		struct fuse_setattr_in *arg;
 		struct fuse_file_info *fi = NULL;
 		struct fuse_file_info fi_store;
 		struct stat stbuf;
+
+		arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+		if (!arg) {
+			fuse_reply_err(req, EINVAL);
+			return;
+		}
+
 		memset(&stbuf, 0, sizeof(stbuf));
 		convert_attr(arg, &stbuf);
 		if (arg->valid & FATTR_FH) {
@@ -812,9 +865,16 @@ static void do_setattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_access(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_access(fuse_req_t req, fuse_ino_t nodeid,
+		      struct fuse_mbuf_iter *iter)
 {
-	struct fuse_access_in *arg = (struct fuse_access_in *) inarg;
+	struct fuse_access_in *arg;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.access)
 		req->se->op.access(req, nodeid, arg->mask);
@@ -822,9 +882,10 @@ static void do_access(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_readlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_readlink(fuse_req_t req, fuse_ino_t nodeid,
+			struct fuse_mbuf_iter *iter)
 {
-	(void) inarg;
+	(void) iter;
 
 	if (req->se->op.readlink)
 		req->se->op.readlink(req, nodeid);
@@ -832,15 +893,24 @@ static void do_readlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_mknod(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_mknod(fuse_req_t req, fuse_ino_t nodeid,
+		     struct fuse_mbuf_iter *iter)
 {
-	struct fuse_mknod_in *arg = (struct fuse_mknod_in *) inarg;
-	char *name = PARAM(arg);
+	bool compat = req->se->conn.proto_minor < 12;
+	struct fuse_mknod_in *arg;
+	const char *name;
 
-	if (req->se->conn.proto_minor >= 12)
+	arg = fuse_mbuf_iter_advance(iter,
+			compat ? FUSE_COMPAT_MKNOD_IN_SIZE : sizeof(*arg));
+	name = fuse_mbuf_iter_advance_str(iter);
+	if (!arg || !name) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
+	if (!compat) {
 		req->ctx.umask = arg->umask;
-	else
-		name = (char *) inarg + FUSE_COMPAT_MKNOD_IN_SIZE;
+	}
 
 	if (req->se->op.mknod)
 		req->se->op.mknod(req, nodeid, name, arg->mode, arg->rdev);
@@ -848,22 +918,40 @@ static void do_mknod(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid,
+		     struct fuse_mbuf_iter *iter)
 {
-	struct fuse_mkdir_in *arg = (struct fuse_mkdir_in *) inarg;
+	bool compat = req->se->conn.proto_minor < 12;
+	struct fuse_mkdir_in *arg;
+	const char *name;
 
-	if (req->se->conn.proto_minor >= 12)
+	arg = fuse_mbuf_iter_advance(iter,
+			compat ? sizeof(uint32_t) : sizeof(*arg));
+	name = fuse_mbuf_iter_advance_str(iter);
+	if (!arg || !name) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
+	if (!compat) {
 		req->ctx.umask = arg->umask;
+	}
 
 	if (req->se->op.mkdir)
-		req->se->op.mkdir(req, nodeid, PARAM(arg), arg->mode);
+		req->se->op.mkdir(req, nodeid, name, arg->mode);
 	else
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_unlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_unlink(fuse_req_t req, fuse_ino_t nodeid,
+		      struct fuse_mbuf_iter *iter)
 {
-	char *name = (char *) inarg;
+	const char *name = fuse_mbuf_iter_advance_str(iter);
+
+	if (!name) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.unlink)
 		req->se->op.unlink(req, nodeid, name);
@@ -871,9 +959,15 @@ static void do_unlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_rmdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_rmdir(fuse_req_t req, fuse_ino_t nodeid,
+		     struct fuse_mbuf_iter *iter)
 {
-	char *name = (char *) inarg;
+	const char *name = fuse_mbuf_iter_advance_str(iter);
+
+	if (!name) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.rmdir)
 		req->se->op.rmdir(req, nodeid, name);
@@ -881,10 +975,16 @@ static void do_rmdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_symlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_symlink(fuse_req_t req, fuse_ino_t nodeid,
+		       struct fuse_mbuf_iter *iter)
 {
-	char *name = (char *) inarg;
-	char *linkname = ((char *) inarg) + strlen((char *) inarg) + 1;
+	const char *name = fuse_mbuf_iter_advance_str(iter);
+	const char *linkname = fuse_mbuf_iter_advance_str(iter);
+
+	if (!name || !linkname) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.symlink)
 		req->se->op.symlink(req, linkname, nodeid, name);
@@ -892,11 +992,20 @@ static void do_symlink(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_rename(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_rename(fuse_req_t req, fuse_ino_t nodeid,
+		      struct fuse_mbuf_iter *iter)
 {
-	struct fuse_rename_in *arg = (struct fuse_rename_in *) inarg;
-	char *oldname = PARAM(arg);
-	char *newname = oldname + strlen(oldname) + 1;
+	struct fuse_rename_in *arg;
+	const char *oldname;
+	const char *newname;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	oldname = fuse_mbuf_iter_advance_str(iter);
+	newname = fuse_mbuf_iter_advance_str(iter);
+	if (!arg || !oldname || !newname) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.rename)
 		req->se->op.rename(req, nodeid, oldname, arg->newdir, newname,
@@ -905,11 +1014,20 @@ static void do_rename(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_rename2(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_rename2(fuse_req_t req, fuse_ino_t nodeid,
+		       struct fuse_mbuf_iter *iter)
 {
-	struct fuse_rename2_in *arg = (struct fuse_rename2_in *) inarg;
-	char *oldname = PARAM(arg);
-	char *newname = oldname + strlen(oldname) + 1;
+	struct fuse_rename2_in *arg;
+	const char *oldname;
+	const char *newname;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	oldname = fuse_mbuf_iter_advance_str(iter);
+	newname = fuse_mbuf_iter_advance_str(iter);
+	if (!arg || !oldname || !newname) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.rename)
 		req->se->op.rename(req, nodeid, oldname, arg->newdir, newname,
@@ -918,42 +1036,65 @@ static void do_rename2(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_link(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_link(fuse_req_t req, fuse_ino_t nodeid,
+		    struct fuse_mbuf_iter *iter)
 {
-	struct fuse_link_in *arg = (struct fuse_link_in *) inarg;
+	struct fuse_link_in *arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	const char *name = fuse_mbuf_iter_advance_str(iter);
+
+	if (!arg || !name) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.link)
-		req->se->op.link(req, arg->oldnodeid, nodeid, PARAM(arg));
+		req->se->op.link(req, arg->oldnodeid, nodeid, name);
 	else
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_create(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_create(fuse_req_t req, fuse_ino_t nodeid,
+		      struct fuse_mbuf_iter *iter)
 {
-	struct fuse_create_in *arg = (struct fuse_create_in *) inarg;
-
 	if (req->se->op.create) {
+		bool compat = req->se->conn.proto_minor < 12;
+		struct fuse_create_in *arg;
 		struct fuse_file_info fi;
-		char *name = PARAM(arg);
+		const char *name;
+
+		arg = fuse_mbuf_iter_advance(iter, compat ?
+				sizeof(struct fuse_open_in) :
+				sizeof(*arg));
+		name = fuse_mbuf_iter_advance_str(iter);
+		if (!arg || !name) {
+			fuse_reply_err(req, EINVAL);
+			return;
+		}
 
 		memset(&fi, 0, sizeof(fi));
 		fi.flags = arg->flags;
 
-		if (req->se->conn.proto_minor >= 12)
+		if (!compat) {
 			req->ctx.umask = arg->umask;
-		else
-			name = (char *) inarg + sizeof(struct fuse_open_in);
+		}
 
 		req->se->op.create(req, nodeid, name, arg->mode, &fi);
 	} else
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_open(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_open(fuse_req_t req, fuse_ino_t nodeid,
+		    struct fuse_mbuf_iter *iter)
 {
-	struct fuse_open_in *arg = (struct fuse_open_in *) inarg;
+	struct fuse_open_in *arg;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.flags = arg->flags;
 
@@ -963,16 +1104,21 @@ static void do_open(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_open(req, &fi);
 }
 
-static void do_read(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_read(fuse_req_t req, fuse_ino_t nodeid,
+		    struct fuse_mbuf_iter *iter)
 {
-	struct fuse_read_in *arg = (struct fuse_read_in *) inarg;
-
 	if (req->se->op.read) {
+		bool compat = req->se->conn.proto_minor < 9;
+		struct fuse_read_in *arg;
 		struct fuse_file_info fi;
 
+		arg = fuse_mbuf_iter_advance(iter, compat ?
+				offsetof(struct fuse_read_in, lock_owner) :
+				sizeof(*arg));
+
 		memset(&fi, 0, sizeof(fi));
 		fi.fh = arg->fh;
-		if (req->se->conn.proto_minor >= 9) {
+		if (!compat) {
 			fi.lock_owner = arg->lock_owner;
 			fi.flags = arg->flags;
 		}
@@ -981,22 +1127,34 @@ static void do_read(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_write(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_write(fuse_req_t req, fuse_ino_t nodeid,
+		     struct fuse_mbuf_iter *iter)
 {
-	struct fuse_write_in *arg = (struct fuse_write_in *) inarg;
+	bool compat = req->se->conn.proto_minor < 9;
+	struct fuse_write_in *arg;
 	struct fuse_file_info fi;
-	char *param;
+	const char *param;
+
+	arg = fuse_mbuf_iter_advance(iter, compat ?
+			FUSE_COMPAT_WRITE_IN_SIZE : sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
+	param = fuse_mbuf_iter_advance(iter, arg->size);
+	if (!param) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 	fi.writepage = (arg->write_flags & FUSE_WRITE_CACHE) != 0;
 
-	if (req->se->conn.proto_minor < 9) {
-		param = ((char *) arg) + FUSE_COMPAT_WRITE_IN_SIZE;
-	} else {
+	if (!compat) {
 		fi.lock_owner = arg->lock_owner;
 		fi.flags = arg->flags;
-		param = PARAM(arg);
 	}
 
 	if (req->se->op.write)
@@ -1062,16 +1220,27 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
 	se->op.write_buf(req, nodeid, pbufv, arg->offset, &fi);
 }
 
-static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_flush(fuse_req_t req, fuse_ino_t nodeid,
+		     struct fuse_mbuf_iter *iter)
 {
-	struct fuse_flush_in *arg = (struct fuse_flush_in *) inarg;
+	bool compat = req->se->conn.proto_minor < 7;
+	struct fuse_flush_in *arg;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, compat ?
+			offsetof(struct fuse_flush_in, lock_owner) :
+			sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 	fi.flush = 1;
-	if (req->se->conn.proto_minor >= 7)
+	if (!compat) {
 		fi.lock_owner = arg->lock_owner;
+	}
 
 	if (req->se->op.flush)
 		req->se->op.flush(req, nodeid, &fi);
@@ -1079,21 +1248,31 @@ static void do_flush(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_release(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_release(fuse_req_t req, fuse_ino_t nodeid,
+		       struct fuse_mbuf_iter *iter)
 {
-	struct fuse_release_in *arg = (struct fuse_release_in *) inarg;
+	bool compat = req->se->conn.proto_minor < 8;
+	struct fuse_release_in *arg;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, compat ?
+			offsetof(struct fuse_release_in, lock_owner) :
+			sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.flags = arg->flags;
 	fi.fh = arg->fh;
-	if (req->se->conn.proto_minor >= 8) {
+	if (!compat) {
 		fi.flush = (arg->release_flags & FUSE_RELEASE_FLUSH) ? 1 : 0;
 		fi.lock_owner = arg->lock_owner;
-	}
-	if (arg->release_flags & FUSE_RELEASE_FLOCK_UNLOCK) {
-		fi.flock_release = 1;
-		fi.lock_owner = arg->lock_owner;
+
+		if (arg->release_flags & FUSE_RELEASE_FLOCK_UNLOCK) {
+			fi.flock_release = 1;
+		}
 	}
 
 	if (req->se->op.release)
@@ -1102,11 +1281,19 @@ static void do_release(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, 0);
 }
 
-static void do_fsync(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_fsync(fuse_req_t req, fuse_ino_t nodeid,
+		     struct fuse_mbuf_iter *iter)
 {
-	struct fuse_fsync_in *arg = (struct fuse_fsync_in *) inarg;
+	struct fuse_fsync_in *arg;
 	struct fuse_file_info fi;
-	int datasync = arg->fsync_flags & 1;
+	int datasync;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+	datasync = arg->fsync_flags & 1;
 
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
@@ -1120,11 +1307,18 @@ static void do_fsync(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_opendir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_opendir(fuse_req_t req, fuse_ino_t nodeid,
+		       struct fuse_mbuf_iter *iter)
 {
-	struct fuse_open_in *arg = (struct fuse_open_in *) inarg;
+	struct fuse_open_in *arg;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.flags = arg->flags;
 
@@ -1134,11 +1328,18 @@ static void do_opendir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_open(req, &fi);
 }
 
-static void do_readdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_readdir(fuse_req_t req, fuse_ino_t nodeid,
+		       struct fuse_mbuf_iter *iter)
 {
-	struct fuse_read_in *arg = (struct fuse_read_in *) inarg;
+	struct fuse_read_in *arg;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 
@@ -1148,11 +1349,18 @@ static void do_readdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_readdirplus(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_readdirplus(fuse_req_t req, fuse_ino_t nodeid,
+			   struct fuse_mbuf_iter *iter)
 {
-	struct fuse_read_in *arg = (struct fuse_read_in *) inarg;
+	struct fuse_read_in *arg;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 
@@ -1162,11 +1370,18 @@ static void do_readdirplus(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_releasedir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_releasedir(fuse_req_t req, fuse_ino_t nodeid,
+			  struct fuse_mbuf_iter *iter)
 {
-	struct fuse_release_in *arg = (struct fuse_release_in *) inarg;
+	struct fuse_release_in *arg;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.flags = arg->flags;
 	fi.fh = arg->fh;
@@ -1177,11 +1392,19 @@ static void do_releasedir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, 0);
 }
 
-static void do_fsyncdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_fsyncdir(fuse_req_t req, fuse_ino_t nodeid,
+			struct fuse_mbuf_iter *iter)
 {
-	struct fuse_fsync_in *arg = (struct fuse_fsync_in *) inarg;
+	struct fuse_fsync_in *arg;
 	struct fuse_file_info fi;
-	int datasync = arg->fsync_flags & 1;
+	int datasync;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+	datasync = arg->fsync_flags & 1;
 
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
@@ -1192,10 +1415,11 @@ static void do_fsyncdir(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_statfs(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_statfs(fuse_req_t req, fuse_ino_t nodeid,
+		      struct fuse_mbuf_iter *iter)
 {
 	(void) nodeid;
-	(void) inarg;
+	(void) iter;
 
 	if (req->se->op.statfs)
 		req->se->op.statfs(req, nodeid);
@@ -1208,11 +1432,25 @@ static void do_statfs(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 	}
 }
 
-static void do_setxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_setxattr(fuse_req_t req, fuse_ino_t nodeid,
+			struct fuse_mbuf_iter *iter)
 {
-	struct fuse_setxattr_in *arg = (struct fuse_setxattr_in *) inarg;
-	char *name = PARAM(arg);
-	char *value = name + strlen(name) + 1;
+	struct fuse_setxattr_in *arg;
+	const char *name;
+	const char *value;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	name = fuse_mbuf_iter_advance_str(iter);
+	if (!arg || !name) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
+	value = fuse_mbuf_iter_advance(iter, arg->size);
+	if (!value) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.setxattr)
 		req->se->op.setxattr(req, nodeid, name, value, arg->size,
@@ -1221,19 +1459,35 @@ static void do_setxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_getxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_getxattr(fuse_req_t req, fuse_ino_t nodeid,
+			struct fuse_mbuf_iter *iter)
 {
-	struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *) inarg;
+	struct fuse_getxattr_in *arg;
+	const char *name;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	name = fuse_mbuf_iter_advance_str(iter);
+	if (!arg || !name) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.getxattr)
-		req->se->op.getxattr(req, nodeid, PARAM(arg), arg->size);
+		req->se->op.getxattr(req, nodeid, name, arg->size);
 	else
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_listxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_listxattr(fuse_req_t req, fuse_ino_t nodeid,
+			 struct fuse_mbuf_iter *iter)
 {
-	struct fuse_getxattr_in *arg = (struct fuse_getxattr_in *) inarg;
+	struct fuse_getxattr_in *arg;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.listxattr)
 		req->se->op.listxattr(req, nodeid, arg->size);
@@ -1241,9 +1495,15 @@ static void do_listxattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_removexattr(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_removexattr(fuse_req_t req, fuse_ino_t nodeid,
+			   struct fuse_mbuf_iter *iter)
 {
-	char *name = (char *) inarg;
+	const char *name = fuse_mbuf_iter_advance_str(iter);
+
+	if (!name) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.removexattr)
 		req->se->op.removexattr(req, nodeid, name);
@@ -1265,12 +1525,19 @@ static void convert_fuse_file_lock(struct fuse_file_lock *fl,
 	flock->l_pid = fl->pid;
 }
 
-static void do_getlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_getlk(fuse_req_t req, fuse_ino_t nodeid,
+		     struct fuse_mbuf_iter *iter)
 {
-	struct fuse_lk_in *arg = (struct fuse_lk_in *) inarg;
+	struct fuse_lk_in *arg;
 	struct fuse_file_info fi;
 	struct flock flock;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 	fi.lock_owner = arg->owner;
@@ -1283,12 +1550,18 @@ static void do_getlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 }
 
 static void do_setlk_common(fuse_req_t req, fuse_ino_t nodeid,
-			    const void *inarg, int sleep)
+			    struct fuse_mbuf_iter *iter, int sleep)
 {
-	struct fuse_lk_in *arg = (struct fuse_lk_in *) inarg;
+	struct fuse_lk_in *arg;
 	struct fuse_file_info fi;
 	struct flock flock;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 	fi.lock_owner = arg->owner;
@@ -1323,14 +1596,16 @@ static void do_setlk_common(fuse_req_t req, fuse_ino_t nodeid,
 	}
 }
 
-static void do_setlk(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_setlk(fuse_req_t req, fuse_ino_t nodeid,
+		     struct fuse_mbuf_iter *iter)
 {
-	do_setlk_common(req, nodeid, inarg, 0);
+	do_setlk_common(req, nodeid, iter, 0);
 }
 
-static void do_setlkw(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_setlkw(fuse_req_t req, fuse_ino_t nodeid,
+		      struct fuse_mbuf_iter *iter)
 {
-	do_setlk_common(req, nodeid, inarg, 1);
+	do_setlk_common(req, nodeid, iter, 1);
 }
 
 static int find_interrupted(struct fuse_session *se, struct fuse_req *req)
@@ -1372,12 +1647,20 @@ static int find_interrupted(struct fuse_session *se, struct fuse_req *req)
 	return 0;
 }
 
-static void do_interrupt(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_interrupt(fuse_req_t req, fuse_ino_t nodeid,
+			 struct fuse_mbuf_iter *iter)
 {
-	struct fuse_interrupt_in *arg = (struct fuse_interrupt_in *) inarg;
+	struct fuse_interrupt_in *arg;
 	struct fuse_session *se = req->se;
 
 	(void) nodeid;
+
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	if (se->debug)
 		fuse_log(FUSE_LOG_DEBUG, "INTERRUPT: %llu\n",
 			(unsigned long long) arg->unique);
@@ -1415,9 +1698,15 @@ static struct fuse_req *check_interrupt(struct fuse_session *se,
 		return NULL;
 }
 
-static void do_bmap(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_bmap(fuse_req_t req, fuse_ino_t nodeid,
+		    struct fuse_mbuf_iter *iter)
 {
-	struct fuse_bmap_in *arg = (struct fuse_bmap_in *) inarg;
+	struct fuse_bmap_in *arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
 
 	if (req->se->op.bmap)
 		req->se->op.bmap(req, nodeid, arg->blocksize, arg->block);
@@ -1425,19 +1714,35 @@ static void do_bmap(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_ioctl(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_ioctl(fuse_req_t req, fuse_ino_t nodeid,
+		     struct fuse_mbuf_iter *iter)
 {
-	struct fuse_ioctl_in *arg = (struct fuse_ioctl_in *) inarg;
-	unsigned int flags = arg->flags;
-	void *in_buf = arg->in_size ? PARAM(arg) : NULL;
+	struct fuse_ioctl_in *arg;
+	unsigned int flags;
+	void *in_buf = NULL;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
+	flags = arg->flags;
 	if (flags & FUSE_IOCTL_DIR &&
 	    !(req->se->conn.want & FUSE_CAP_IOCTL_DIR)) {
 		fuse_reply_err(req, ENOTTY);
 		return;
 	}
 
+	if (arg->in_size) {
+		in_buf = fuse_mbuf_iter_advance(iter, arg->in_size);
+		if (!in_buf) {
+			fuse_reply_err(req, EINVAL);
+			return;
+		}
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 
@@ -1459,11 +1764,18 @@ void fuse_pollhandle_destroy(struct fuse_pollhandle *ph)
 	free(ph);
 }
 
-static void do_poll(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_poll(fuse_req_t req, fuse_ino_t nodeid,
+		    struct fuse_mbuf_iter *iter)
 {
-	struct fuse_poll_in *arg = (struct fuse_poll_in *) inarg;
+	struct fuse_poll_in *arg;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 	fi.poll_events = arg->events;
@@ -1487,11 +1799,18 @@ static void do_poll(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 	}
 }
 
-static void do_fallocate(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_fallocate(fuse_req_t req, fuse_ino_t nodeid,
+			 struct fuse_mbuf_iter *iter)
 {
-	struct fuse_fallocate_in *arg = (struct fuse_fallocate_in *) inarg;
+	struct fuse_fallocate_in *arg;
 	struct fuse_file_info fi;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 
@@ -1501,11 +1820,18 @@ static void do_fallocate(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_copy_file_range(fuse_req_t req, fuse_ino_t nodeid_in, const void *inarg)
+static void do_copy_file_range(fuse_req_t req, fuse_ino_t nodeid_in,
+			       struct fuse_mbuf_iter *iter)
 {
-	struct fuse_copy_file_range_in *arg = (struct fuse_copy_file_range_in *) inarg;
+	struct fuse_copy_file_range_in *arg;
 	struct fuse_file_info fi_in, fi_out;
 
+	arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
 	memset(&fi_in, 0, sizeof(fi_in));
 	fi_in.fh = arg->fh_in;
 
@@ -1522,15 +1848,34 @@ static void do_copy_file_range(fuse_req_t req, fuse_ino_t nodeid_in, const void
 		fuse_reply_err(req, ENOSYS);
 }
 
-static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_init(fuse_req_t req, fuse_ino_t nodeid,
+		    struct fuse_mbuf_iter *iter)
 {
-	struct fuse_init_in *arg = (struct fuse_init_in *) inarg;
+	size_t compat_size = offsetof(struct fuse_init_in, max_readahead);
+	struct fuse_init_in *arg;
 	struct fuse_init_out outarg;
 	struct fuse_session *se = req->se;
 	size_t bufsize = se->bufsize;
 	size_t outargsize = sizeof(outarg);
 
 	(void) nodeid;
+
+	/* First consume the old fields... */
+	arg = fuse_mbuf_iter_advance(iter, compat_size);
+	if (!arg) {
+		fuse_reply_err(req, EINVAL);
+		return;
+	}
+
+	/* ...and now consume the new fields. */
+	if (arg->major == 7 && arg->minor >= 6) {
+		if (!fuse_mbuf_iter_advance(iter, sizeof(*arg) -
+					    compat_size)) {
+			fuse_reply_err(req, EINVAL);
+			return;
+		}
+	}
+
 	if (se->debug) {
 		fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
 		if (arg->major == 7 && arg->minor >= 6) {
@@ -1744,12 +2089,13 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
 	send_reply_ok(req, &outarg, outargsize);
 }
 
-static void do_destroy(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
+static void do_destroy(fuse_req_t req, fuse_ino_t nodeid,
+		       struct fuse_mbuf_iter *iter)
 {
 	struct fuse_session *se = req->se;
 
 	(void) nodeid;
-	(void) inarg;
+	(void) iter;
 
 	se->got_destroy = 1;
 	if (se->op.destroy)
@@ -1934,7 +2280,7 @@ int fuse_req_interrupted(fuse_req_t req)
 }
 
 static struct {
-	void (*func)(fuse_req_t, fuse_ino_t, const void *);
+	void (*func)(fuse_req_t, fuse_ino_t, struct fuse_mbuf_iter *);
 	const char *name;
 } fuse_ll_ops[] = {
 	[FUSE_LOOKUP]	   = { do_lookup,      "LOOKUP"	     },
@@ -2013,7 +2359,6 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 	const struct fuse_buf *buf = bufv->buf;
 	struct fuse_mbuf_iter iter = FUSE_MBUF_ITER_INIT(buf);
 	struct fuse_in_header *in;
-	const void *inarg;
 	struct fuse_req *req;
 	int err;
 
@@ -2086,12 +2431,10 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 			fuse_reply_err(intr, EAGAIN);
 	}
 
-	inarg = (void *) &in[1];
 	if (in->opcode == FUSE_WRITE && se->op.write_buf)
 		do_write_buf(req, in->nodeid, &iter, bufv);
 	else
-		fuse_ll_ops[in->opcode].func(req, in->nodeid, inarg);
-
+		fuse_ll_ops[in->opcode].func(req, in->nodeid, &iter);
 	return;
 
 reply_err:
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/25] virtiofsd: prevent ".." escape in lo_do_lookup()
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 12/25] virtiofsd: check input buffer size in fuse_lowlevel.c ops Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 14/25] virtiofsd: prevent ".." escape in lo_do_readdir() Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 702fedc38a..7a61bf94fe 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -606,12 +606,17 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 	int res;
 	int saverr;
 	struct lo_data *lo = lo_data(req);
-	struct lo_inode *inode;
+	struct lo_inode *inode, *dir = lo_inode(req, parent);
 
 	memset(e, 0, sizeof(*e));
 	e->attr_timeout = lo->timeout;
 	e->entry_timeout = lo->timeout;
 
+	/* Do not allow escaping root directory */
+	if (dir == &lo->root && strcmp(name, "..") == 0) {
+		name = ".";
+	}
+
 	newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
 	if (newfd == -1)
 		goto out_err;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/25] virtiofsd: prevent ".." escape in lo_do_readdir()
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 13/25] virtiofsd: prevent ".." escape in lo_do_lookup() Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 15/25] virtiofsd: use /proc/self/fd/ O_PATH file descriptor Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Construct a fake dirent for the root directory's ".." entry.  This hides
the parent directory from the FUSE client.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 36 ++++++++++++++++++------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 7a61bf94fe..cb01e3f088 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -1087,18 +1087,24 @@ out_err:
 static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
 			  off_t offset, struct fuse_file_info *fi, int plus)
 {
+	struct lo_data *lo = lo_data(req);
 	struct lo_dirp *d;
+	struct lo_inode *dinode;
 	char *buf = NULL;
 	char *p;
 	size_t rem = size;
-	int err = ENOMEM;
+	int err = EBADF;
 
-	(void) ino;
+	dinode = lo_inode(req, ino);
+	if (!dinode) {
+		goto error;
+	}
 
 	d = lo_dirp(req, fi);
 	if (!d)
 		goto error;
 
+	err = ENOMEM;
 	buf = calloc(1, size);
 	if (!buf)
 		goto error;
@@ -1128,15 +1134,21 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
 		}
 		nextoff = d->entry->d_off;
 		name = d->entry->d_name;
+
 		fuse_ino_t entry_ino = 0;
+		struct fuse_entry_param e = (struct fuse_entry_param) {
+			.attr.st_ino = d->entry->d_ino,
+			.attr.st_mode = d->entry->d_type << 12,
+		};
+
+		/* Hide root's parent directory */
+		if (dinode == &lo->root && strcmp(name, "..") == 0) {
+			e.attr.st_ino = lo->root.ino;
+			e.attr.st_mode = DT_DIR << 12;
+		}
+
 		if (plus) {
-			struct fuse_entry_param e;
-			if (is_dot_or_dotdot(name)) {
-				e = (struct fuse_entry_param) {
-					.attr.st_ino = d->entry->d_ino,
-					.attr.st_mode = d->entry->d_type << 12,
-				};
-			} else {
+			if (!is_dot_or_dotdot(name)) {
 				err = lo_do_lookup(req, ino, name, &e);
 				if (err)
 					goto error;
@@ -1146,12 +1158,8 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
 			entsize = fuse_add_direntry_plus(req, p, rem, name,
 							 &e, nextoff);
 		} else {
-			struct stat st = {
-				.st_ino = d->entry->d_ino,
-				.st_mode = d->entry->d_type << 12,
-			};
 			entsize = fuse_add_direntry(req, p, rem, name,
-						    &st, nextoff);
+						    &e.attr, nextoff);
 		}
 		if (entsize > rem) {
 			if (entry_ino != 0) 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 15/25] virtiofsd: use /proc/self/fd/ O_PATH file descriptor
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 14/25] virtiofsd: prevent ".." escape in lo_do_readdir() Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 16/25] virtiofsd: sandbox mount namespace Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Sandboxing will remove /proc from the mount namespace so we can no
longer build string paths into "/proc/self/fd/...".

Keep an O_PATH file descriptor so we can still re-open fds via
/proc/self/fd.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 128 +++++++++++++++++++++++------
 1 file changed, 101 insertions(+), 27 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index cb01e3f088..3ddf22d162 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -110,6 +110,9 @@ struct lo_data {
 	struct lo_map ino_map; /* protected by lo->mutex */
 	struct lo_map dirp_map; /* protected by lo->mutex */
 	struct lo_map fd_map; /* protected by lo->mutex */
+
+	/* An O_PATH file descriptor to /proc/self/fd/ */
+	int proc_self_fd;
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -378,9 +381,9 @@ static int lo_parent_and_name(struct lo_data *lo, struct lo_inode *inode,
 	int res;
 
 retry:
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+	sprintf(procname, "%i", inode->fd);
 
-	res = readlink(procname, path, PATH_MAX);
+	res = readlinkat(lo->proc_self_fd, procname, path, PATH_MAX);
 	if (res < 0) {
 		fuse_log(FUSE_LOG_WARNING, "lo_parent_and_name: readlink failed: %m\n");
 		goto fail_noretry;
@@ -465,9 +468,9 @@ static int utimensat_empty(struct lo_data *lo, struct lo_inode *inode,
 		}
 		return res;
 	}
-	sprintf(path, "/proc/self/fd/%i", inode->fd);
+	sprintf(path, "%i", inode->fd);
 
-	return utimensat(AT_FDCWD, path, tv, 0);
+	return utimensat(lo->proc_self_fd, path, tv, 0);
 
 fallback:
 	res = lo_parent_and_name(lo, inode, path, &parent);
@@ -521,8 +524,9 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 		if (fi) {
 			res = fchmod(fd, attr->st_mode);
 		} else {
-			sprintf(procname, "/proc/self/fd/%i", ifd);
-			res = chmod(procname, attr->st_mode);
+			sprintf(procname, "%i", ifd);
+			res = fchmodat(lo->proc_self_fd, procname,
+				       attr->st_mode, 0);
 		}
 		if (res == -1)
 			goto out_err;
@@ -539,11 +543,23 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
 			goto out_err;
 	}
 	if (valid & FUSE_SET_ATTR_SIZE) {
+		int truncfd;
+
 		if (fi) {
-			res = ftruncate(fd, attr->st_size);
+			truncfd = fd;
 		} else {
-			sprintf(procname, "/proc/self/fd/%i", ifd);
-			res = truncate(procname, attr->st_size);
+			sprintf(procname, "%i", ifd);
+			truncfd = openat(lo->proc_self_fd, procname, O_RDWR);
+			if (truncfd < 0) {
+				goto out_err;
+			}
+		}
+
+		res = ftruncate(truncfd, attr->st_size);
+		if (!fi) {
+			saverr = errno;
+			close(truncfd);
+			errno = saverr;
 		}
 		if (res == -1)
 			goto out_err;
@@ -825,9 +841,9 @@ static int linkat_empty_nofollow(struct lo_data *lo, struct lo_inode *inode,
 		return res;
 	}
 
-	sprintf(path, "/proc/self/fd/%i", inode->fd);
+	sprintf(path, "%i", inode->fd);
 
-	return linkat(AT_FDCWD, path, dfd, name, AT_SYMLINK_FOLLOW);
+	return linkat(lo->proc_self_fd, path, dfd, name, AT_SYMLINK_FOLLOW);
 
 fallback:
 	res = lo_parent_and_name(lo, inode, path, &parent);
@@ -1325,8 +1341,8 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 	if (lo->writeback && (fi->flags & O_APPEND))
 		fi->flags &= ~O_APPEND;
 
-	sprintf(buf, "/proc/self/fd/%i", lo_fd(req, ino));
-	fd = open(buf, fi->flags & ~O_NOFOLLOW);
+	sprintf(buf, "%i", lo_fd(req, ino));
+	fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
 	if (fd == -1)
 		return (void) fuse_reply_err(req, errno);
 
@@ -1375,8 +1391,8 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
 		     struct fuse_file_info *fi)
 {
+	struct lo_data *lo = lo_data(req);
 	int res;
-	(void) ino;
 	int fd;
 	char *buf;
 
@@ -1384,11 +1400,11 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
 		 (void *)fi);
 
 	if (!fi) {
-		res = asprintf(&buf, "/proc/self/fd/%i", lo_fd(req, ino));
+		res = asprintf(&buf, "%i", lo_fd(req, ino));
 		if (res == -1)
 			return (void) fuse_reply_err(req, errno);
 
-		fd = open(buf, O_RDWR);
+		fd = openat(lo->proc_self_fd, buf, O_RDWR);
 		free(buf);
 		if (fd == -1)
 			return (void) fuse_reply_err(req, errno);
@@ -1493,11 +1509,13 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi,
 static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 			size_t size)
 {
+	struct lo_data *lo = lo_data(req);
 	char *value = NULL;
 	char procname[64];
 	struct lo_inode *inode;
 	ssize_t ret;
 	int saverr;
+	int fd = -1;
 
 	inode = lo_inode(req, ino);
 	if (!inode) {
@@ -1520,14 +1538,18 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 		goto out;
 	}
 
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+	sprintf(procname, "%i", inode->fd);
+	fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+	if (fd < 0) {
+		goto out_err;
+	}
 
 	if (size) {
 		value = malloc(size);
 		if (!value)
 			goto out_err;
 
-		ret = getxattr(procname, name, value, size);
+		ret = fgetxattr(fd, name, value, size);
 		if (ret == -1)
 			goto out_err;
 		saverr = 0;
@@ -1536,7 +1558,7 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 
 		fuse_reply_buf(req, value, ret);
 	} else {
-		ret = getxattr(procname, name, NULL, 0);
+		ret = fgetxattr(fd, name, NULL, 0);
 		if (ret == -1)
 			goto out_err;
 
@@ -1544,6 +1566,10 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 	}
 out_free:
 	free(value);
+
+	if (fd >= 0) {
+		close(fd);
+	}
 	return;
 
 out_err:
@@ -1555,11 +1581,13 @@ out:
 
 static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 {
+	struct lo_data *lo = lo_data(req);
 	char *value = NULL;
 	char procname[64];
 	struct lo_inode *inode;
 	ssize_t ret;
 	int saverr;
+	int fd = -1;
 
 	inode = lo_inode(req, ino);
 	if (!inode) {
@@ -1582,14 +1610,18 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 		goto out;
 	}
 
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+	sprintf(procname, "%i", inode->fd);
+	fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+	if (fd < 0) {
+		goto out_err;
+	}
 
 	if (size) {
 		value = malloc(size);
 		if (!value)
 			goto out_err;
 
-		ret = listxattr(procname, value, size);
+		ret = flistxattr(fd, value, size);
 		if (ret == -1)
 			goto out_err;
 		saverr = 0;
@@ -1598,7 +1630,7 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 
 		fuse_reply_buf(req, value, ret);
 	} else {
-		ret = listxattr(procname, NULL, 0);
+		ret = flistxattr(fd, NULL, 0);
 		if (ret == -1)
 			goto out_err;
 
@@ -1606,6 +1638,10 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 	}
 out_free:
 	free(value);
+
+	if (fd >= 0) {
+		close(fd);
+	}
 	return;
 
 out_err:
@@ -1619,9 +1655,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 			const char *value, size_t size, int flags)
 {
 	char procname[64];
+	struct lo_data *lo = lo_data(req);
 	struct lo_inode *inode;
 	ssize_t ret;
 	int saverr;
+	int fd = -1;
 
 	inode = lo_inode(req, ino);
 	if (!inode) {
@@ -1644,21 +1682,31 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 		goto out;
 	}
 
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+	sprintf(procname, "%i", inode->fd);
+	fd = openat(lo->proc_self_fd, procname, O_RDWR);
+	if (fd < 0) {
+		saverr = errno;
+		goto out;
+	}
 
-	ret = setxattr(procname, name, value, size, flags);
+	ret = fsetxattr(fd, name, value, size, flags);
 	saverr = ret == -1 ? errno : 0;
 
 out:
+	if (fd >= 0) {
+		close(fd);
+	}
 	fuse_reply_err(req, saverr);
 }
 
 static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
 {
 	char procname[64];
+	struct lo_data *lo = lo_data(req);
 	struct lo_inode *inode;
 	ssize_t ret;
 	int saverr;
+	int fd = -1;
 
 	inode = lo_inode(req, ino);
 	if (!inode) {
@@ -1681,12 +1729,20 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
 		goto out;
 	}
 
-	sprintf(procname, "/proc/self/fd/%i", inode->fd);
+	sprintf(procname, "%i", inode->fd);
+	fd = openat(lo->proc_self_fd, procname, O_RDWR);
+	if (fd < 0) {
+		saverr = errno;
+		goto out;
+	}
 
-	ret = removexattr(procname, name);
+	ret = fremovexattr(fd, name);
 	saverr = ret == -1 ? errno : 0;
 
 out:
+	if (fd >= 0) {
+		close(fd);
+	}
 	fuse_reply_err(req, saverr);
 }
 
@@ -1765,13 +1821,24 @@ static void print_capabilities(void)
 	printf("}\n");
 }
 
+static void setup_proc_self_fd(struct lo_data *lo)
+{
+	lo->proc_self_fd = open("/proc/self/fd", O_PATH);
+	if (lo->proc_self_fd == -1) {
+		fuse_log(FUSE_LOG_ERR, "open(/proc/self/fd, O_PATH): %m\n");
+		exit(1);
+	}
+}
+
 int main(int argc, char *argv[])
 {
 	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
 	struct fuse_session *se;
 	struct fuse_cmdline_opts opts;
 	struct lo_data lo = { .debug = 0,
-	                      .writeback = 0 };
+	                      .writeback = 0,
+	                      .proc_self_fd = -1,
+	};
 	struct lo_map_elem *root_elem;
 	int ret = -1;
 
@@ -1878,6 +1945,9 @@ int main(int argc, char *argv[])
 
 	fuse_daemonize(opts.foreground);
 
+	/* Must be after daemonize to get the right /proc/self/fd */
+	setup_proc_self_fd(&lo);
+
 	/* Block until ctrl+c or fusermount -u */
 	ret = virtio_loop(se);
 
@@ -1893,6 +1963,10 @@ err_out1:
 	lo_map_destroy(&lo.dirp_map);
 	lo_map_destroy(&lo.ino_map);
 
+	if (lo.proc_self_fd >= 0) {
+		close(lo.proc_self_fd);
+	}
+
 	if (lo.root.fd >= 0)
 		close(lo.root.fd);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 16/25] virtiofsd: sandbox mount namespace
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 15/25] virtiofsd: use /proc/self/fd/ O_PATH file descriptor Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 17/25] virtiofsd: move to an empty network namespace Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Use a mount namespace with the shared directory tree mounted at "/" and
no other mounts.

This prevents symlink escape attacks because symlink targets are
resolved only against the shared directory and cannot go outside it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
---
 contrib/virtiofsd/passthrough_ll.c | 89 ++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 3ddf22d162..20a34d4d83 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -51,6 +51,7 @@
 #include <sys/file.h>
 #include <sys/syscall.h>
 #include <sys/xattr.h>
+#include <sys/mount.h>
 
 #include "passthrough_helpers.h"
 
@@ -1821,6 +1822,58 @@ static void print_capabilities(void)
 	printf("}\n");
 }
 
+/* This magic is based on lxc's lxc_pivot_root() */
+static void setup_pivot_root(const char *source)
+{
+	int oldroot;
+	int newroot;
+
+	oldroot = open("/", O_DIRECTORY | O_RDONLY | O_CLOEXEC);
+	if (oldroot < 0) {
+		fuse_log(FUSE_LOG_ERR, "open(/): %m\n");
+		exit(1);
+	}
+
+	newroot = open(source, O_DIRECTORY | O_RDONLY | O_CLOEXEC);
+	if (newroot < 0) {
+		fuse_log(FUSE_LOG_ERR, "open(%s): %m\n", source);
+		exit(1);
+	}
+
+	if (fchdir(newroot) < 0) {
+		fuse_log(FUSE_LOG_ERR, "fchdir(newroot): %m\n");
+		exit(1);
+	}
+
+	if (syscall(__NR_pivot_root, ".", ".") < 0){
+		fuse_log(FUSE_LOG_ERR, "pivot_root(., .): %m\n");
+		exit(1);
+	}
+
+	if (fchdir(oldroot) < 0) {
+		fuse_log(FUSE_LOG_ERR, "fchdir(oldroot): %m\n");
+		exit(1);
+	}
+
+	if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
+		fuse_log(FUSE_LOG_ERR, "mount(., MS_SLAVE | MS_REC): %m\n");
+		exit(1);
+	}
+
+	if (umount2(".", MNT_DETACH) < 0) {
+		fuse_log(FUSE_LOG_ERR, "umount2(., MNT_DETACH): %m\n");
+		exit(1);
+	}
+
+	if (fchdir(newroot) < 0) {
+		fuse_log(FUSE_LOG_ERR, "fchdir(newroot): %m\n");
+		exit(1);
+	}
+
+	close(newroot);
+	close(oldroot);
+}
+
 static void setup_proc_self_fd(struct lo_data *lo)
 {
 	lo->proc_self_fd = open("/proc/self/fd", O_PATH);
@@ -1830,6 +1883,39 @@ static void setup_proc_self_fd(struct lo_data *lo)
 	}
 }
 
+/*
+ * Make the source directory our root so symlinks cannot escape and no other
+ * files are accessible.
+ */
+static void setup_mount_namespace(const char *source)
+{
+	if (unshare(CLONE_NEWNS) != 0) {
+		fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWNS): %m\n");
+		exit(1);
+	}
+
+	if (mount(NULL, "/", NULL, MS_REC|MS_SLAVE, NULL) < 0) {
+		fuse_log(FUSE_LOG_ERR, "mount(/, MS_REC|MS_PRIVATE): %m\n");
+		exit(1);
+	}
+
+	if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
+		fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
+		exit(1);
+	}
+
+	setup_pivot_root(source);
+}
+
+/*
+ * Lock down this process to prevent access to other processes or files outside
+ * source directory.  This reduces the impact of arbitrary code execution bugs.
+ */
+static void setup_sandbox(struct lo_data *lo)
+{
+	setup_mount_namespace(lo->source);
+}
+
 int main(int argc, char *argv[])
 {
 	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -1927,6 +2013,7 @@ int main(int argc, char *argv[])
 	}
 
 	lo.root.fd = open(lo.source, O_PATH);
+
 	if (lo.root.fd == -1) {
 		fuse_log(FUSE_LOG_ERR, "open(\"%s\", O_PATH): %m\n",
 			 lo.source);
@@ -1948,6 +2035,8 @@ int main(int argc, char *argv[])
 	/* Must be after daemonize to get the right /proc/self/fd */
 	setup_proc_self_fd(&lo);
 
+	setup_sandbox(&lo);
+
 	/* Block until ctrl+c or fusermount -u */
 	ret = virtio_loop(se);
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 17/25] virtiofsd: move to an empty network namespace
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 16/25] virtiofsd: sandbox mount namespace Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 18/25] virtiofsd: move to a new pid namespace Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

If the process is compromised there should be no network access.  Use an
empty network namespace to sandbox networking.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 20a34d4d83..58cf82a89f 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -1822,6 +1822,19 @@ static void print_capabilities(void)
 	printf("}\n");
 }
 
+/*
+ * Called after our UNIX domain sockets have been created, now we can move to
+ * an empty network namespace to prevent TCP/IP and other network activity in
+ * case this process is compromised.
+ */
+static void setup_net_namespace(void)
+{
+	if (unshare(CLONE_NEWNET) != 0) {
+		fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWNET): %m\n");
+		exit(1);
+	}
+}
+
 /* This magic is based on lxc's lxc_pivot_root() */
 static void setup_pivot_root(const char *source)
 {
@@ -1913,6 +1926,7 @@ static void setup_mount_namespace(const char *source)
  */
 static void setup_sandbox(struct lo_data *lo)
 {
+	setup_net_namespace();
 	setup_mount_namespace(lo->source);
 }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 18/25] virtiofsd: move to a new pid namespace
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 17/25] virtiofsd: move to an empty network namespace Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 19/25] virtiofsd: add seccomp whitelist Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

virtiofsd needs access to /proc/self/fd.  Let's move to a new pid
namespace so that a compromised process cannot see another other
processes running on the system.

One wrinkle in this approach: unshare(CLONE_NEWPID) affects *child*
processes and not the current process.  Therefore we need to fork the
pid 1 process that will actually run virtiofsd and leave a parent in
waitpid(2).  This is not the same thing as daemonization and parent
processes should not notice a difference.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 76 ++++++++++++++++++++++++------
 1 file changed, 62 insertions(+), 14 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 58cf82a89f..c027db64e6 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -52,6 +52,8 @@
 #include <sys/syscall.h>
 #include <sys/xattr.h>
 #include <sys/mount.h>
+#include <sys/types.h>
+#include <sys/wait.h>
 
 #include "passthrough_helpers.h"
 
@@ -1835,6 +1837,63 @@ static void setup_net_namespace(void)
 	}
 }
 
+/*
+ * Move to a new pid namespace to prevent access to other processes if this
+ * process is compromised.
+ */
+static void setup_pid_namespace(void)
+{
+	pid_t child;
+
+	/*
+	 * Create a new pid namespace for *child* processes.  We'll have to
+	 * fork in order to enter the new pid namespace.  A new mount namespace
+	 * is also needed so that we can remount /proc for the new pid
+	 * namespace.
+	 */
+	if (unshare(CLONE_NEWPID | CLONE_NEWNS) != 0) {
+		fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWPID | CLONE_NEWNS): %m\n");
+		exit(1);
+	}
+
+	child = fork();
+	if (child < 0) {
+		fuse_log(FUSE_LOG_ERR, "fork() failed: %m\n");
+		exit(1);
+	}
+	if (child > 0) {
+		pid_t waited;
+		int wstatus;
+
+		/* The parent waits for the child */
+		do {
+			waited = waitpid(child, &wstatus, 0);
+		} while (waited < 0 && errno == EINTR);
+
+		if (WIFEXITED(wstatus)) {
+			exit(WEXITSTATUS(wstatus));
+		}
+
+		exit(1);
+	}
+
+	/*
+	 * If the mounts have shared propagation then we want to opt out so our
+	 * mount changes don't affect the parent mount namespace.
+	 */
+	if (mount(NULL, "/", NULL, MS_REC|MS_SLAVE, NULL) < 0) {
+		fuse_log(FUSE_LOG_ERR, "mount(/, MS_REC|MS_SLAVE): %m\n");
+		exit(1);
+	}
+
+	/* The child must remount /proc to use the new pid namespace */
+	if (mount("proc", "/proc", "proc",
+		  MS_NODEV | MS_NOEXEC | MS_NOSUID | MS_RELATIME, NULL) < 0) {
+		fuse_log(FUSE_LOG_ERR, "mount(/proc): %m\n");
+		exit(1);
+	}
+}
+
 /* This magic is based on lxc's lxc_pivot_root() */
 static void setup_pivot_root(const char *source)
 {
@@ -1898,20 +1957,10 @@ static void setup_proc_self_fd(struct lo_data *lo)
 
 /*
  * Make the source directory our root so symlinks cannot escape and no other
- * files are accessible.
+ * files are accessible.  Assumes unshare(CLONE_NEWNS) was already called.
  */
 static void setup_mount_namespace(const char *source)
 {
-	if (unshare(CLONE_NEWNS) != 0) {
-		fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWNS): %m\n");
-		exit(1);
-	}
-
-	if (mount(NULL, "/", NULL, MS_REC|MS_SLAVE, NULL) < 0) {
-		fuse_log(FUSE_LOG_ERR, "mount(/, MS_REC|MS_PRIVATE): %m\n");
-		exit(1);
-	}
-
 	if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
 		fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
 		exit(1);
@@ -1926,6 +1975,8 @@ static void setup_mount_namespace(const char *source)
  */
 static void setup_sandbox(struct lo_data *lo)
 {
+	setup_pid_namespace();
+	setup_proc_self_fd(lo);
 	setup_net_namespace();
 	setup_mount_namespace(lo->source);
 }
@@ -2046,9 +2097,6 @@ int main(int argc, char *argv[])
 
 	fuse_daemonize(opts.foreground);
 
-	/* Must be after daemonize to get the right /proc/self/fd */
-	setup_proc_self_fd(&lo);
-
 	setup_sandbox(&lo);
 
 	/* Block until ctrl+c or fusermount -u */
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 19/25] virtiofsd: add seccomp whitelist
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 18/25] virtiofsd: move to a new pid namespace Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 20/25] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Only allow system calls that are needed by virtiofsd.  All other system
calls cause SIGSYS to be directed at the thread.

Restricting system calls reduces the kernel attack surface and limits
what the process can do when compromised.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
with additional entries by:
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: piaojun <piaojun@huawei.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
---
 contrib/virtiofsd/Makefile.objs    |   5 +-
 contrib/virtiofsd/passthrough_ll.c |   2 +
 contrib/virtiofsd/seccomp.c        | 132 +++++++++++++++++++++++++++++
 contrib/virtiofsd/seccomp.h        |  14 +++
 4 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100644 contrib/virtiofsd/seccomp.c
 create mode 100644 contrib/virtiofsd/seccomp.h

diff --git a/contrib/virtiofsd/Makefile.objs b/contrib/virtiofsd/Makefile.objs
index 67be16332c..941b19f18e 100644
--- a/contrib/virtiofsd/Makefile.objs
+++ b/contrib/virtiofsd/Makefile.objs
@@ -6,5 +6,8 @@ virtiofsd-obj-y = buffer.o \
                   fuse_signals.o \
                   fuse_virtio.o \
                   helper.o \
-                  passthrough_ll.o
+                  passthrough_ll.o \
+                  seccomp.o
 
+seccomp.o-cflags := $(SECCOMP_CFLAGS)
+seccomp.o-libs := $(SECCOMP_LIBS)
diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index c027db64e6..93873bf6f4 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -56,6 +56,7 @@
 #include <sys/wait.h>
 
 #include "passthrough_helpers.h"
+#include "seccomp.h"
 
 #define HAVE_POSIX_FALLOCATE 1
 
@@ -1979,6 +1980,7 @@ static void setup_sandbox(struct lo_data *lo)
 	setup_proc_self_fd(lo);
 	setup_net_namespace();
 	setup_mount_namespace(lo->source);
+	setup_seccomp();
 }
 
 int main(int argc, char *argv[])
diff --git a/contrib/virtiofsd/seccomp.c b/contrib/virtiofsd/seccomp.c
new file mode 100644
index 0000000000..df1390d6be
--- /dev/null
+++ b/contrib/virtiofsd/seccomp.c
@@ -0,0 +1,132 @@
+/*
+ * Seccomp sandboxing for virtiofsd
+ *
+ * Copyright (C) 2019 Red Hat, Inc.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "seccomp.h"
+#include "fuse_i.h"
+#include "fuse_log.h"
+#include <errno.h>
+#include <glib.h>
+#include <seccomp.h>
+#include <stdlib.h>
+
+static const int syscall_whitelist[] = {
+    /* TODO ireg sem*() syscalls */
+    SCMP_SYS(brk),
+    SCMP_SYS(capget), /* For CAP_FSETID */
+    SCMP_SYS(capset),
+    SCMP_SYS(clock_gettime),
+    SCMP_SYS(clone),
+    SCMP_SYS(close),
+    SCMP_SYS(copy_file_range),
+    SCMP_SYS(dup),
+    SCMP_SYS(eventfd2),
+    SCMP_SYS(exit),
+    SCMP_SYS(exit_group),
+    SCMP_SYS(fallocate),
+    SCMP_SYS(fchmodat),
+    SCMP_SYS(fchownat),
+    SCMP_SYS(fcntl),
+    SCMP_SYS(fdatasync),
+    SCMP_SYS(fgetxattr),
+    SCMP_SYS(flistxattr),
+    SCMP_SYS(flock),
+    SCMP_SYS(fremovexattr),
+    SCMP_SYS(fsetxattr),
+    SCMP_SYS(fstat),
+    SCMP_SYS(fstatfs),
+    SCMP_SYS(fsync),
+    SCMP_SYS(ftruncate),
+    SCMP_SYS(futex),
+    SCMP_SYS(getdents),
+    SCMP_SYS(getdents64),
+    SCMP_SYS(getegid),
+    SCMP_SYS(geteuid),
+    SCMP_SYS(getpid),
+    SCMP_SYS(gettid),
+    SCMP_SYS(gettimeofday),
+    SCMP_SYS(linkat),
+    SCMP_SYS(lseek),
+    SCMP_SYS(madvise),
+    SCMP_SYS(mkdirat),
+    SCMP_SYS(mknodat),
+    SCMP_SYS(mmap),
+    SCMP_SYS(mprotect),
+    SCMP_SYS(mremap),
+    SCMP_SYS(munmap),
+    SCMP_SYS(newfstatat),
+    SCMP_SYS(open),
+    SCMP_SYS(openat),
+    SCMP_SYS(ppoll),
+    SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
+    SCMP_SYS(preadv),
+    SCMP_SYS(pread64),
+    SCMP_SYS(pwritev),
+    SCMP_SYS(pwrite64),
+    SCMP_SYS(read),
+    SCMP_SYS(readlinkat),
+    SCMP_SYS(recvmsg),
+    SCMP_SYS(renameat),
+    SCMP_SYS(renameat2),
+    SCMP_SYS(rt_sigaction),
+    SCMP_SYS(rt_sigprocmask),
+    SCMP_SYS(rt_sigreturn),
+    SCMP_SYS(sendmsg),
+    SCMP_SYS(setresgid),
+    SCMP_SYS(setresuid),
+    SCMP_SYS(set_robust_list),
+    SCMP_SYS(symlinkat),
+    SCMP_SYS(time), /* Rarely needed, except on static builds */
+    SCMP_SYS(tgkill),
+    SCMP_SYS(unlinkat),
+    SCMP_SYS(utimensat),
+    SCMP_SYS(write),
+    SCMP_SYS(writev),
+};
+
+void setup_seccomp(void)
+{
+    scmp_filter_ctx ctx;
+    size_t i;
+
+#ifdef SCMP_ACT_KILL_PROCESS
+    ctx = seccomp_init(SCMP_ACT_KILL_PROCESS);
+    /* Handle a newer libseccomp but an older kernel */
+    if (!ctx && errno == EOPNOTSUPP) {
+        ctx = seccomp_init(SCMP_ACT_KILL);
+    }
+#else
+    ctx = seccomp_init(SCMP_ACT_KILL);
+#endif
+    if (!ctx) {
+        fuse_log(FUSE_LOG_ERR, "seccomp_init() failed\n");
+        exit(1);
+    }
+
+    for (i = 0; i < G_N_ELEMENTS(syscall_whitelist); i++) {
+        if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW,
+                             syscall_whitelist[i], 0) != 0) {
+            fuse_log(FUSE_LOG_ERR, "seccomp_rule_add syscall %d",
+                     syscall_whitelist[i]);
+            exit(1);
+        }
+    }
+
+    /* libvhost-user calls this for post-copy migration, we don't need it */
+    if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(ENOSYS),
+                         SCMP_SYS(userfaultfd), 0) != 0) {
+        fuse_log(FUSE_LOG_ERR, "seccomp_rule_add userfaultfd failed\n");
+        exit(1);
+    }
+
+    if (seccomp_load(ctx) < 0) {
+        fuse_log(FUSE_LOG_ERR, "seccomp_load() failed\n");
+        exit(1);
+    }
+
+    seccomp_release(ctx);
+}
diff --git a/contrib/virtiofsd/seccomp.h b/contrib/virtiofsd/seccomp.h
new file mode 100644
index 0000000000..86bce72652
--- /dev/null
+++ b/contrib/virtiofsd/seccomp.h
@@ -0,0 +1,14 @@
+/*
+ * Seccomp sandboxing for virtiofsd
+ *
+ * Copyright (C) 2019 Red Hat, Inc.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef VIRTIOFSD_SECCOMP_H
+#define VIRTIOFSD_SECCOMP_H
+
+void setup_seccomp(void);
+
+#endif /* VIRTIOFSD_SECCOMP_H */
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 20/25] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 19/25] virtiofsd: add seccomp whitelist Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 21/25] virtiofsd: Drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Vivek Goyal <vgoyal@redhat.com>

Caller can set FUSE_WRITE_KILL_PRIV in write_flags. Parse it and pass it
to the filesystem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 contrib/virtiofsd/fuse_common.h   | 5 ++++-
 contrib/virtiofsd/fuse_lowlevel.c | 4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/contrib/virtiofsd/fuse_common.h b/contrib/virtiofsd/fuse_common.h
index b5b6d270fe..63ef2390a1 100644
--- a/contrib/virtiofsd/fuse_common.h
+++ b/contrib/virtiofsd/fuse_common.h
@@ -83,8 +83,11 @@ struct fuse_file_info {
 	    nothing when set by open()). */
 	unsigned int cache_readdir : 1;
 
+	/* Indicates that suid/sgid bits should be removed upon write */
+	unsigned int kill_priv : 1;
+
 	/** Padding.  Reserved for future use*/
-	unsigned int padding : 25;
+	unsigned int padding : 24;
 	unsigned int padding2 : 32;
 
 	/** File handle id.  May be filled in by filesystem in create,
diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
index d032e411e1..8f9a59a34c 100644
--- a/contrib/virtiofsd/fuse_lowlevel.c
+++ b/contrib/virtiofsd/fuse_lowlevel.c
@@ -1151,6 +1151,7 @@ static void do_write(fuse_req_t req, fuse_ino_t nodeid,
 	memset(&fi, 0, sizeof(fi));
 	fi.fh = arg->fh;
 	fi.writepage = (arg->write_flags & FUSE_WRITE_CACHE) != 0;
+	fi.kill_priv = !!(arg->write_flags & FUSE_WRITE_KILL_PRIV);
 
 	if (!compat) {
 		fi.lock_owner = arg->lock_owner;
@@ -1197,7 +1198,8 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
 	}
 
 	fi.fh = arg->fh;
-	fi.writepage = arg->write_flags & FUSE_WRITE_CACHE;
+	fi.writepage = !!(arg->write_flags & FUSE_WRITE_CACHE);
+	fi.kill_priv = !!(arg->write_flags & FUSE_WRITE_KILL_PRIV);
 
 	if (ibufv->count == 1) {
 		assert(!(tmpbufv.buf[0].flags & FUSE_BUF_IS_FD));
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 21/25] virtiofsd: Drop CAP_FSETID if client asked for it
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 20/25] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 22/25] virtiofsd: set maximum RLIMIT_NOFILE limit Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Vivek Goyal <vgoyal@redhat.com>

If client requested killing setuid/setgid bits on file being written, drop
CAP_FSETID capability so that setuid/setgid bits are cleared upon write
automatically.

pjdfstest chown/12.t needs this.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/virtiofsd/Makefile.objs    |   2 +
 contrib/virtiofsd/passthrough_ll.c | 127 +++++++++++++++++++++++++++++
 2 files changed, 129 insertions(+)

diff --git a/contrib/virtiofsd/Makefile.objs b/contrib/virtiofsd/Makefile.objs
index 941b19f18e..25f1e8dd73 100644
--- a/contrib/virtiofsd/Makefile.objs
+++ b/contrib/virtiofsd/Makefile.objs
@@ -11,3 +11,5 @@ virtiofsd-obj-y = buffer.o \
 
 seccomp.o-cflags := $(SECCOMP_CFLAGS)
 seccomp.o-libs := $(SECCOMP_LIBS)
+
+passthrough_ll.o-libs += -lcap
diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 93873bf6f4..fe46b25fb6 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -51,6 +51,7 @@
 #include <sys/file.h>
 #include <sys/syscall.h>
 #include <sys/xattr.h>
+#include <sys/capability.h>
 #include <sys/mount.h>
 #include <sys/types.h>
 #include <sys/wait.h>
@@ -174,6 +175,115 @@ static struct lo_data *lo_data(fuse_req_t req)
 	return (struct lo_data *) fuse_req_userdata(req);
 }
 
+/* Helpers for dropping and regaining effective capabilities. Returns 0
+ * on success, error otherwise  */
+static int drop_effective_cap(const char *cap_name, bool *cap_dropped)
+{
+	cap_t caps;
+	cap_value_t cap;
+	cap_flag_value_t cap_value;
+	int ret = 0;
+
+	ret = cap_from_name(cap_name, &cap);
+	if (ret == -1) {
+		ret = errno;
+		fuse_log(FUSE_LOG_ERR, "cap_from_name(%s) failed:%s\n", cap_name,
+			 strerror(errno));
+		goto out;
+	}
+
+	if (!CAP_IS_SUPPORTED(cap)) {
+		fuse_log(FUSE_LOG_ERR, "capability(%s) is not supported\n", cap_name);
+		return EINVAL;
+	}
+
+	caps = cap_get_proc();
+	if (caps == NULL) {
+		ret = errno;
+		fuse_log(FUSE_LOG_ERR, "cap_get_proc() failed\n");
+		goto out;
+	}
+
+	if (cap_get_flag(caps, cap, CAP_EFFECTIVE, &cap_value) == -1) {
+		ret = errno;
+		fuse_log(FUSE_LOG_ERR, "cap_get_flag() failed\n");
+		goto out_cap_free;
+	}
+
+	/* We dont have this capability in effective set already. */
+	if (cap_value == CAP_CLEAR) {
+		ret = 0;
+		goto out_cap_free;
+	}
+
+	if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap, CAP_CLEAR) == -1) {
+		ret = errno;
+		fuse_log(FUSE_LOG_ERR, "cap_set_flag() failed\n");
+		goto out_cap_free;
+	}
+
+	if (cap_set_proc(caps) == -1) {
+		ret = errno;
+		fuse_log(FUSE_LOG_ERR, "cap_set_procs() failed\n");
+		goto out_cap_free;
+	}
+
+	ret = 0;
+	if (cap_dropped)
+		*cap_dropped = true;
+
+out_cap_free:
+	cap_free(caps);
+out:
+	return ret;
+}
+
+static int gain_effective_cap(const char *cap_name)
+{
+	cap_t caps;
+	cap_value_t cap;
+	int ret = 0;
+
+	ret = cap_from_name(cap_name, &cap);
+	if (ret == -1) {
+		ret = errno;
+		fuse_log(FUSE_LOG_ERR, "cap_from_name(%s) failed:%s\n", cap_name,
+			 strerror(errno));
+		goto out;
+	}
+
+	if (!CAP_IS_SUPPORTED(cap)) {
+		fuse_log(FUSE_LOG_ERR, "capability(%s) is not supported\n", cap_name);
+		return EINVAL;
+	}
+
+	caps = cap_get_proc();
+	if (caps == NULL) {
+		ret = errno;
+		fuse_log(FUSE_LOG_ERR, "cap_get_proc() failed\n");
+		goto out;
+	}
+
+
+	if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap, CAP_SET) == -1) {
+		ret = errno;
+		fuse_log(FUSE_LOG_ERR, "cap_set_flag() failed\n");
+		goto out_cap_free;
+	}
+
+	if (cap_set_proc(caps) == -1) {
+		ret = errno;
+		fuse_log(FUSE_LOG_ERR, "cap_set_procs() failed\n");
+		goto out_cap_free;
+	}
+	ret = 0;
+
+out_cap_free:
+	cap_free(caps);
+out:
+	return ret;
+}
+
 static void lo_map_init(struct lo_map *map)
 {
 	map->elems = NULL;
@@ -1447,6 +1557,7 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
 	(void) ino;
 	ssize_t res;
 	struct fuse_bufvec out_buf = FUSE_BUFVEC_INIT(fuse_buf_size(in_buf));
+	bool cap_fsetid_dropped = false;
 
 	out_buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
 	out_buf.buf[0].fd = lo_fi_fd(req, fi);
@@ -1456,11 +1567,27 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
 		fuse_log(FUSE_LOG_DEBUG, "lo_write(ino=%" PRIu64 ", size=%zd, off=%lu)\n",
 			ino, out_buf.buf[0].size, (unsigned long) off);
 
+	/*
+	 * If kill_priv is set, drop CAP_FSETID which should lead to kernel
+	 * clearing setuid/setgid on file.
+	 */
+	if (fi->kill_priv) {
+		res = drop_effective_cap("cap_fsetid", &cap_fsetid_dropped);
+		if (res != 0)
+			fuse_reply_err(req, res);
+	}
+
 	res = fuse_buf_copy(&out_buf, in_buf, 0);
 	if(res < 0)
 		fuse_reply_err(req, -res);
 	else
 		fuse_reply_write(req, (size_t) res);
+
+	if (cap_fsetid_dropped) {
+		res = gain_effective_cap("cap_fsetid");
+		if (res)
+			fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
+	}
 }
 
 static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 22/25] virtiofsd: set maximum RLIMIT_NOFILE limit
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 21/25] virtiofsd: Drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 23/25] virtiofsd: add security guide document Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

virtiofsd can exceed the default open file descriptor limit easily on
most systems.  Take advantage of the fact that it runs as root to set up
the maximum open file descriptor limit allowed on the system (the
nr_open sysctl).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/passthrough_ll.c | 34 ++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index fe46b25fb6..25f7ad854a 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -53,9 +53,11 @@
 #include <sys/xattr.h>
 #include <sys/capability.h>
 #include <sys/mount.h>
+#include <sys/resource.h>
 #include <sys/types.h>
 #include <sys/wait.h>
 
+#include <glib.h>
 #include "passthrough_helpers.h"
 #include "seccomp.h"
 
@@ -2110,6 +2112,36 @@ static void setup_sandbox(struct lo_data *lo)
 	setup_seccomp();
 }
 
+/* Raise the maximum number of open file descriptors to the system limit */
+static void setup_nofile_rlimit(void)
+{
+	gchar *nr_open = NULL;
+	struct rlimit rlim;
+	long long max;
+
+	if (!g_file_get_contents("/proc/sys/fs/nr_open", &nr_open, NULL, NULL)) {
+		fuse_log(FUSE_LOG_ERR, "unable to read /proc/sys/fs/nr_open\n");
+		exit(1);
+	}
+
+	errno = 0;
+	max = strtoll(nr_open, NULL, 0);
+	if (errno) {
+		fuse_log(FUSE_LOG_ERR, "strtoll(%s): %m\n", nr_open);
+		exit(1);
+	}
+
+	rlim.rlim_cur = max;
+	rlim.rlim_max = max;
+
+	if (setrlimit(RLIMIT_NOFILE, &rlim) < 0) {
+		fuse_log(FUSE_LOG_ERR, "setrlimit(RLIMIT_NOFILE): %m\n");
+		exit(1);
+	}
+
+	g_free(nr_open);
+}
+
 int main(int argc, char *argv[])
 {
 	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -2125,6 +2157,8 @@ int main(int argc, char *argv[])
 	/* Don't mask creation mode, kernel already did that */
 	umask(0);
 
+	setup_nofile_rlimit();
+
 	pthread_mutex_init(&lo.mutex, NULL);
 	lo.root.next = lo.root.prev = &lo.root;
 	lo.root.fd = -1;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 23/25] virtiofsd: add security guide document
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 22/25] virtiofsd: set maximum RLIMIT_NOFILE limit Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 24/25] virtiofsd: add --syslog command-line option Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 25/25] virtiofsd: print log only when priority is high enough Dr. David Alan Gilbert (git)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Many people want to know: what's up with virtiofsd and security?  This
document provides the answers!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/security.rst | 108 +++++++++++++++++++++++++++++++++
 1 file changed, 108 insertions(+)
 create mode 100644 contrib/virtiofsd/security.rst

diff --git a/contrib/virtiofsd/security.rst b/contrib/virtiofsd/security.rst
new file mode 100644
index 0000000000..1f3037f4e4
--- /dev/null
+++ b/contrib/virtiofsd/security.rst
@@ -0,0 +1,108 @@
+========================
+Virtiofsd Security Guide
+========================
+
+Introduction
+============
+This document covers security topics for users of virtiofsd, the daemon that
+implements host<->guest file system sharing.  Sharing files between one or more
+guests and the host raises questions about the trust relationships between
+these entities.  By understanding these topics users can safely deploy
+virtiofsd and control access to their data.
+
+Architecture
+============
+The virtiofsd daemon process acts as a vhost-user device backend, implementing
+the virtio-fs device that the corresponding device driver inside the guest
+interacts with.
+
+There is one virtiofsd process per virtio-fs device instance.  For example,
+when two guests have access to the same shared directory there are still two
+virtiofsd processes since there are two virtio-fs device instances.  Similarly,
+if one guest has access to two shared directories, there are two virtiofsd
+processes since there are two virtio-fs device instances.
+
+Files are created on the host with uid/gid values provided by the guest.
+Furthermore, virtiofsd is unable to enforce file permissions since guests have
+the ability to access any file within the shared directory.  File permissions
+are implemented in the guest, just like with traditional local file systems.
+
+Security Requirements
+=====================
+Guests have root access to the shared directory.  This is necessary for root
+file systems on virtio-fs and similar use cases.
+
+When multiple guests have access to the same shared directory, the guests have
+a trust relationship.  A broken or malicious guest could delete or corrupt
+files.  It could exploit symlink or time-of-check-to-time-of-use (TOCTOU) race
+conditions against applications in other guests.  It could plant device nodes
+or setuid executables to gain privileges in other guests.  It could perform
+denial-of-service (DoS) attacks by consuming available space or making the file
+system unavailable to other guests.
+
+Guests are restricted to the shared directory and cannot access other files on
+the host.
+
+Guests should not be able to gain arbitrary code execution inside the virtiofsd
+process.  If they do, the process is sandboxed to prevent escaping into other
+parts of the host.
+
+Daemon Sandboxing
+=================
+The virtiofsd process handles virtio-fs FUSE requests from the untrusted guest.
+This attack surface could give the guest access to host resources and must
+therefore be protected.  Sandboxing mechanisms are integrated into virtiofsd to
+reduce the impact in the event that an attacker gains control of the process.
+
+As a general rule, virtiofsd does not trust inputs from the guest, aside from
+uid/gid values.  Input validation is performed so that the guest cannot corrupt
+memory or otherwise gain arbitrary code execution in the virtiofsd process.
+
+Sandboxing adds restrictions on the virtiofsd so that even if an attacker is
+able to exploit a bug, they will be constrained to the virtiofsd process and
+unable to cause damage on the host.
+
+Seccomp Whitelist
+-----------------
+Many system calls are not required by virtiofsd to perform its function.  For
+example, ptrace(2) and execve(2) are not necessary and attackers are likely to
+use them to further compromise the system.  This is prevented using a seccomp
+whitelist in virtiofsd.
+
+During startup virtiofsd installs a whitelist of allowed system calls.  All
+other system calls are forbidden for the remaining lifetime of the process.
+This list has been built through experience of running virtiofsd on several
+flavors of Linux and observing which system calls were encountered.
+
+It is possible that previously unexplored code paths or newer library versions
+will invoke system calls that have not been whitelisted yet.  In this case the
+process terminates and a seccomp error is captured in the audit log.  The log
+can typically be viewed using ``journalctl -xe`` and searching for ``SECCOMP``.
+
+Should it be necessary to extend the whitelist, system call numbers from the
+audit log can be translated to names through a CPU architecture-specific
+``.tbl`` file in the Linux source tree.  They can then be added to the
+whitelist in ``seccomp.c`` in the virtiofsd source tree.
+
+Mount Namespace
+---------------
+During startup virtiofsd enters a new mount namespace and releases all mounts
+except for the shared directory.  This makes the file system root `/` the
+shared directory.  It is impossible to access files outside the shared
+directory since they cannot be looked up by path resolution.
+
+Several attacks, including `..` traversal and symlink escapes, are prevented by
+the mount namespace.
+
+The current virtiofsd implementation keeps a directory file descriptor to
+/proc/self/fd open in order to implement several FUSE requests.  This file
+descriptor could be used by attackers to access files outside the shared
+directory.  This limitation will be addressed in a future release of virtiofsd.
+
+Deployment Best Practices
+=========================
+If the shared directory is also accessible from a host mount namespace, it is
+recommended to keep a parent directory with rwx------ permissions so that other
+users on the host are unable to access any setuid executables or device nodes
+in the shared directory.  The `nosuid` and `nodev` mount options can also be
+used to prevent this issue.
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 24/25] virtiofsd: add --syslog command-line option
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 23/25] virtiofsd: add security guide document Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  2019-10-24 11:27 ` [PATCH 25/25] virtiofsd: print log only when priority is high enough Dr. David Alan Gilbert (git)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Stefan Hajnoczi <stefanha@redhat.com>

Sometimes collecting output from stderr is inconvenient or does not fit
within the overall logging architecture.  Add syslog(3) support for
cases where stderr cannot be used.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
dgilbert: Reworked as a logging function
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/virtiofsd/fuse_lowlevel.h  |  1 +
 contrib/virtiofsd/helper.c         |  2 ++
 contrib/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++++++---
 contrib/virtiofsd/seccomp.c        | 32 ++++++++++++++++++++--------
 contrib/virtiofsd/seccomp.h        |  4 +++-
 5 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/contrib/virtiofsd/fuse_lowlevel.h b/contrib/virtiofsd/fuse_lowlevel.h
index 13fd9791d5..112596caaf 100644
--- a/contrib/virtiofsd/fuse_lowlevel.h
+++ b/contrib/virtiofsd/fuse_lowlevel.h
@@ -1798,6 +1798,7 @@ struct fuse_cmdline_opts {
 	int show_help;
 	int print_capabilities;
 	int clone_fd;
+	int syslog;
 	unsigned int max_idle_threads;
 };
 
diff --git a/contrib/virtiofsd/helper.c b/contrib/virtiofsd/helper.c
index 08e00c0d13..b1e45aee05 100644
--- a/contrib/virtiofsd/helper.c
+++ b/contrib/virtiofsd/helper.c
@@ -53,6 +53,7 @@ static const struct fuse_opt fuse_helper_opts[] = {
 #endif
 	FUSE_HELPER_OPT("clone_fd",	clone_fd),
 	FUSE_HELPER_OPT("max_idle_threads=%u", max_idle_threads),
+	FUSE_HELPER_OPT("--syslog",	syslog),
 	FUSE_OPT_END
 };
 
@@ -135,6 +136,7 @@ void fuse_cmdline_help(void)
 	       "    -V   --version             print version\n"
 	       "    --print-capabilities       print vhost-user.json\n"
 	       "    -d   -o debug              enable debug output (implies -f)\n"
+	       "    --syslog                   log to syslog (default stderr)\n"
 	       "    -f                         foreground operation\n"
 	       "    --daemonize                run in background\n"
 	       "    -s                         disable multi-threaded operation\n"
diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index 25f7ad854a..b413687720 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -42,6 +42,7 @@
 #include <stddef.h>
 #include <stdbool.h>
 #include <string.h>
+#include <syslog.h>
 #include <limits.h>
 #include <dirent.h>
 #include <assert.h>
@@ -151,6 +152,7 @@ static const struct fuse_opt lo_opts[] = {
 	  offsetof(struct lo_data, norace), 1 },
 	FUSE_OPT_END
 };
+static bool use_syslog = false;
 
 static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
 
@@ -2103,13 +2105,13 @@ static void setup_mount_namespace(const char *source)
  * Lock down this process to prevent access to other processes or files outside
  * source directory.  This reduces the impact of arbitrary code execution bugs.
  */
-static void setup_sandbox(struct lo_data *lo)
+static void setup_sandbox(struct lo_data *lo, bool enable_syslog)
 {
 	setup_pid_namespace();
 	setup_proc_self_fd(lo);
 	setup_net_namespace();
 	setup_mount_namespace(lo->source);
-	setup_seccomp();
+	setup_seccomp(enable_syslog);
 }
 
 /* Raise the maximum number of open file descriptors to the system limit */
@@ -2142,6 +2144,27 @@ static void setup_nofile_rlimit(void)
 	g_free(nr_open);
 }
 
+static void log_func(enum fuse_log_level level,
+		     const char *fmt, va_list ap)
+{
+	if (use_syslog) {
+		int priority = LOG_ERR;
+		switch (level) {
+			case FUSE_LOG_EMERG:   priority = LOG_EMERG;   break;
+			case FUSE_LOG_ALERT:   priority = LOG_ALERT;   break;
+			case FUSE_LOG_CRIT:    priority = LOG_CRIT;    break;
+			case FUSE_LOG_ERR:     priority = LOG_ERR;     break;
+			case FUSE_LOG_WARNING: priority = LOG_WARNING; break;
+			case FUSE_LOG_NOTICE:  priority = LOG_NOTICE;  break;
+			case FUSE_LOG_INFO:    priority = LOG_INFO;    break;
+			case FUSE_LOG_DEBUG:   priority = LOG_DEBUG;   break;
+		}
+		vsyslog(priority, fmt, ap);
+	} else {
+		vfprintf(stderr, fmt, ap);
+	}
+}
+
 int main(int argc, char *argv[])
 {
 	struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
@@ -2179,6 +2202,11 @@ int main(int argc, char *argv[])
 
 	if (fuse_parse_cmdline(&args, &opts) != 0)
 		return 1;
+	fuse_set_log_func(log_func);
+	use_syslog = opts.syslog;
+	if (use_syslog) {
+	    openlog("virtiofsd", LOG_PID, LOG_DAEMON);
+	}
 	if (opts.show_help) {
 		printf("usage: %s [options]\n\n", argv[0]);
 		fuse_cmdline_help();
@@ -2260,7 +2288,7 @@ int main(int argc, char *argv[])
 
 	fuse_daemonize(opts.foreground);
 
-	setup_sandbox(&lo);
+	setup_sandbox(&lo, opts.syslog);
 
 	/* Block until ctrl+c or fusermount -u */
 	ret = virtio_loop(se);
diff --git a/contrib/virtiofsd/seccomp.c b/contrib/virtiofsd/seccomp.c
index df1390d6be..77bc405bdc 100644
--- a/contrib/virtiofsd/seccomp.c
+++ b/contrib/virtiofsd/seccomp.c
@@ -88,11 +88,28 @@ static const int syscall_whitelist[] = {
     SCMP_SYS(writev),
 };
 
-void setup_seccomp(void)
+/* Syscalls used when --syslog is enabled */
+static const int syscall_whitelist_syslog[] = {
+    SCMP_SYS(sendto),
+};
+
+static void add_whitelist(scmp_filter_ctx ctx, const int syscalls[], size_t len)
 {
-    scmp_filter_ctx ctx;
     size_t i;
 
+    for (i = 0; i < len; i++) {
+        if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW, syscalls[i], 0) != 0) {
+            fuse_log(FUSE_LOG_ERR, "seccomp_rule_add syscall %d failed\n",
+                     syscalls[i]);
+            exit(1);
+        }
+    }
+}
+
+void setup_seccomp(bool enable_syslog)
+{
+    scmp_filter_ctx ctx;
+
 #ifdef SCMP_ACT_KILL_PROCESS
     ctx = seccomp_init(SCMP_ACT_KILL_PROCESS);
     /* Handle a newer libseccomp but an older kernel */
@@ -107,13 +124,10 @@ void setup_seccomp(void)
         exit(1);
     }
 
-    for (i = 0; i < G_N_ELEMENTS(syscall_whitelist); i++) {
-        if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW,
-                             syscall_whitelist[i], 0) != 0) {
-            fuse_log(FUSE_LOG_ERR, "seccomp_rule_add syscall %d",
-                     syscall_whitelist[i]);
-            exit(1);
-        }
+    add_whitelist(ctx, syscall_whitelist, G_N_ELEMENTS(syscall_whitelist));
+    if (enable_syslog) {
+        add_whitelist(ctx, syscall_whitelist_syslog,
+                      G_N_ELEMENTS(syscall_whitelist_syslog));
     }
 
     /* libvhost-user calls this for post-copy migration, we don't need it */
diff --git a/contrib/virtiofsd/seccomp.h b/contrib/virtiofsd/seccomp.h
index 86bce72652..d47c8eade6 100644
--- a/contrib/virtiofsd/seccomp.h
+++ b/contrib/virtiofsd/seccomp.h
@@ -9,6 +9,8 @@
 #ifndef VIRTIOFSD_SECCOMP_H
 #define VIRTIOFSD_SECCOMP_H
 
-void setup_seccomp(void);
+#include <stdbool.h>
+
+void setup_seccomp(bool enable_syslog);
 
 #endif /* VIRTIOFSD_SECCOMP_H */
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 25/25] virtiofsd: print log only when priority is high enough
  2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2019-10-24 11:27 ` [PATCH 24/25] virtiofsd: add --syslog command-line option Dr. David Alan Gilbert (git)
@ 2019-10-24 11:27 ` Dr. David Alan Gilbert (git)
  24 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-10-24 11:27 UTC (permalink / raw)
  To: qemu-devel, renzhen, eguan, ganesh.mahalingam, m.mizuma,
	mszeredi, misono.tomohiro, tao.peng, piaojun, stefanha, vgoyal,
	mst, berrange

From: Eryu Guan <eguan@linux.alibaba.com>

Introduce "-o log_level=" command line option to specify current log
level (priority), valid values are "debug info warn err", e.g.

    ./virtiofsd -o log_level=debug ...

So only log priority higher than "debug" will be printed to
stderr/syslog. And the default level is info.

The "-o debug"/"-d" options are kept, and imply debug log level.

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Reworked for libfuse's log_func
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 contrib/virtiofsd/fuse_log.c       |   4 ++
 contrib/virtiofsd/fuse_lowlevel.c  |  79 ++++++++++-------------
 contrib/virtiofsd/fuse_lowlevel.h  |   1 +
 contrib/virtiofsd/helper.c         |   9 ++-
 contrib/virtiofsd/passthrough_ll.c | 100 +++++++++++++----------------
 5 files changed, 91 insertions(+), 102 deletions(-)

diff --git a/contrib/virtiofsd/fuse_log.c b/contrib/virtiofsd/fuse_log.c
index 0d268ab014..b11f6d0c08 100644
--- a/contrib/virtiofsd/fuse_log.c
+++ b/contrib/virtiofsd/fuse_log.c
@@ -8,6 +8,10 @@
   See the file COPYING.LIB
 */
 
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <syslog.h>
 #include "fuse_log.h"
 
 #include <stdarg.h>
diff --git a/contrib/virtiofsd/fuse_lowlevel.c b/contrib/virtiofsd/fuse_lowlevel.c
index 8f9a59a34c..20f2190194 100644
--- a/contrib/virtiofsd/fuse_lowlevel.c
+++ b/contrib/virtiofsd/fuse_lowlevel.c
@@ -155,20 +155,17 @@ static int fuse_send_msg(struct fuse_session *se, struct fuse_chan *ch,
 	struct fuse_out_header *out = iov[0].iov_base;
 
 	out->len = iov_length(iov, count);
-	if (se->debug) {
-		if (out->unique == 0) {
-			fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n",
-				out->error, out->len);
-		} else if (out->error) {
-			fuse_log(FUSE_LOG_DEBUG,
-				"   unique: %llu, error: %i (%s), outsize: %i\n",
-				(unsigned long long) out->unique, out->error,
-				strerror(-out->error), out->len);
-		} else {
-			fuse_log(FUSE_LOG_DEBUG,
-				"   unique: %llu, success, outsize: %i\n",
-				(unsigned long long) out->unique, out->len);
-		}
+	if (out->unique == 0) {
+		fuse_log(FUSE_LOG_DEBUG, "NOTIFY: code=%d length=%u\n",
+			   out->error, out->len);
+	} else if (out->error) {
+		fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, error: %i (%s), outsize: %i\n",
+			   (unsigned long long) out->unique,
+			   out->error, strerror(-out->error),
+			   out->len);
+	} else {
+		fuse_log(FUSE_LOG_DEBUG, "   unique: %llu, success, outsize: %i\n",
+			   (unsigned long long) out->unique, out->len);
 	}
 
 	if (fuse_lowlevel_is_virtio(se)) {
@@ -1663,9 +1660,8 @@ static void do_interrupt(fuse_req_t req, fuse_ino_t nodeid,
 		return;
 	}
 
-	if (se->debug)
-		fuse_log(FUSE_LOG_DEBUG, "INTERRUPT: %llu\n",
-			(unsigned long long) arg->unique);
+	fuse_log(FUSE_LOG_DEBUG, "INTERRUPT: %llu\n",
+		 (unsigned long long) arg->unique);
 
 	req->u.i.unique = arg->unique;
 
@@ -1878,13 +1874,11 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 		}
 	}
 
-	if (se->debug) {
-		fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
-		if (arg->major == 7 && arg->minor >= 6) {
-			fuse_log(FUSE_LOG_DEBUG, "flags=0x%08x\n", arg->flags);
-			fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n",
-				arg->max_readahead);
-		}
+	fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
+	if (arg->major == 7 && arg->minor >= 6) {
+		fuse_log(FUSE_LOG_DEBUG, "flags=0x%08x\n", arg->flags);
+		fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n",
+			   arg->max_readahead);
 	}
 	se->conn.proto_major = arg->major;
 	se->conn.proto_minor = arg->minor;
@@ -2070,19 +2064,17 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 	if (se->conn.proto_minor >= 23)
 		outarg.time_gran = se->conn.time_gran;
 
-	if (se->debug) {
-		fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major, outarg.minor);
-		fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
-		fuse_log(FUSE_LOG_DEBUG, "   max_readahead=0x%08x\n",
-			outarg.max_readahead);
-		fuse_log(FUSE_LOG_DEBUG, "   max_write=0x%08x\n", outarg.max_write);
-		fuse_log(FUSE_LOG_DEBUG, "   max_background=%i\n",
-			outarg.max_background);
-		fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
-			outarg.congestion_threshold);
-		fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n",
-			outarg.time_gran);
-	}
+	fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major, outarg.minor);
+	fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
+	fuse_log(FUSE_LOG_DEBUG, "   max_readahead=0x%08x\n",
+		   outarg.max_readahead);
+	fuse_log(FUSE_LOG_DEBUG, "   max_write=0x%08x\n", outarg.max_write);
+	fuse_log(FUSE_LOG_DEBUG, "   max_background=%i\n",
+		   outarg.max_background);
+	fuse_log(FUSE_LOG_DEBUG, "   congestion_threshold=%i\n",
+		   outarg.congestion_threshold);
+	fuse_log(FUSE_LOG_DEBUG, "   time_gran=%u\n",
+		   outarg.time_gran);
 	if (arg->minor < 5)
 		outargsize = FUSE_COMPAT_INIT_OUT_SIZE;
 	else if (arg->minor < 23)
@@ -2370,13 +2362,12 @@ void fuse_session_process_buf_int(struct fuse_session *se,
 	in = fuse_mbuf_iter_advance(&iter, sizeof(*in));
 	assert(in); /* caller guarantees the input buffer is large enough */
 
-	if (se->debug) {
-		fuse_log(FUSE_LOG_DEBUG,
-			"unique: %llu, opcode: %s (%i), nodeid: %llu, insize: %zu, pid: %u\n",
-			(unsigned long long) in->unique,
-			opname((enum fuse_opcode) in->opcode), in->opcode,
-			(unsigned long long) in->nodeid, buf->size, in->pid);
-	}
+	fuse_log(FUSE_LOG_DEBUG,
+		 "unique: %llu, opcode: %s (%i), nodeid: %llu, insize: %zu, pid: %u\n",
+		 (unsigned long long) in->unique,
+		 opname((enum fuse_opcode) in->opcode), in->opcode,
+		 (unsigned long long) in->nodeid, buf->size,
+		 in->pid);
 
 	req = fuse_ll_alloc_req(se);
 	if (req == NULL) {
diff --git a/contrib/virtiofsd/fuse_lowlevel.h b/contrib/virtiofsd/fuse_lowlevel.h
index 112596caaf..49cf5b22ff 100644
--- a/contrib/virtiofsd/fuse_lowlevel.h
+++ b/contrib/virtiofsd/fuse_lowlevel.h
@@ -1799,6 +1799,7 @@ struct fuse_cmdline_opts {
 	int print_capabilities;
 	int clone_fd;
 	int syslog;
+	int log_level;
 	unsigned int max_idle_threads;
 };
 
diff --git a/contrib/virtiofsd/helper.c b/contrib/virtiofsd/helper.c
index b1e45aee05..d9eb9ff62c 100644
--- a/contrib/virtiofsd/helper.c
+++ b/contrib/virtiofsd/helper.c
@@ -29,7 +29,6 @@
 #define FUSE_HELPER_OPT_VALUE(t, p, v) \
 	{ t, offsetof(struct fuse_cmdline_opts, p), v }
 
-
 static const struct fuse_opt fuse_helper_opts[] = {
 	FUSE_HELPER_OPT("-h",		show_help),
 	FUSE_HELPER_OPT("--help",	show_help),
@@ -54,6 +53,10 @@ static const struct fuse_opt fuse_helper_opts[] = {
 	FUSE_HELPER_OPT("clone_fd",	clone_fd),
 	FUSE_HELPER_OPT("max_idle_threads=%u", max_idle_threads),
 	FUSE_HELPER_OPT("--syslog",	syslog),
+	FUSE_HELPER_OPT_VALUE("log_level=debug", log_level, FUSE_LOG_DEBUG),
+	FUSE_HELPER_OPT_VALUE("log_level=info", log_level, FUSE_LOG_INFO),
+	FUSE_HELPER_OPT_VALUE("log_level=warn", log_level, FUSE_LOG_WARNING),
+	FUSE_HELPER_OPT_VALUE("log_level=err", log_level, FUSE_LOG_ERR),
 	FUSE_OPT_END
 };
 
@@ -143,7 +146,9 @@ void fuse_cmdline_help(void)
 	       "    -o clone_fd                use separate fuse device fd for each thread\n"
 	       "                               (may improve performance)\n"
 	       "    -o max_idle_threads        the maximum number of idle worker threads\n"
-	       "                               allowed (default: 10)\n");
+	       "                               allowed (default: 10)\n"
+	       "    -o log_level=<level>       log level, default to \"info\"\n"
+	       "                               level could be one of \"debug, info, warn, err\"\n");
 }
 
 static int fuse_helper_opt_proc(void *data, const char *arg, int key,
diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c
index b413687720..8ad41b1072 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -36,6 +36,7 @@
 
 #include "fuse_virtio.h"
 #include "fuse_lowlevel.h"
+#include "fuse_log.h"
 #include <unistd.h>
 #include <stdlib.h>
 #include <stdio.h>
@@ -153,6 +154,7 @@ static const struct fuse_opt lo_opts[] = {
 	FUSE_OPT_END
 };
 static bool use_syslog = false;
+static int current_log_level;
 
 static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n);
 
@@ -446,11 +448,6 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino)
 	return inode ? inode->fd : -1;
 }
 
-static bool lo_debug(fuse_req_t req)
-{
-	return lo_data(req)->debug != 0;
-}
-
 static void lo_init(void *userdata,
 		    struct fuse_conn_info *conn)
 {
@@ -461,13 +458,11 @@ static void lo_init(void *userdata,
 
 	if (lo->writeback &&
 	    conn->capable & FUSE_CAP_WRITEBACK_CACHE) {
-		if (lo->debug)
-			fuse_log(FUSE_LOG_DEBUG, "lo_init: activating writeback\n");
+		fuse_log(FUSE_LOG_DEBUG, "lo_init: activating writeback\n");
 		conn->want |= FUSE_CAP_WRITEBACK_CACHE;
 	}
 	if (lo->flock && conn->capable & FUSE_CAP_FLOCK_LOCKS) {
-		if (lo->debug)
-			fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
+		fuse_log(FUSE_LOG_DEBUG, "lo_init: activating flock locks\n");
 		conn->want |= FUSE_CAP_FLOCK_LOCKS;
 	}
 }
@@ -789,9 +784,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 	}
 	e->ino = inode->fuse_ino;
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
-			(unsigned long long) parent, name, (unsigned long long) e->ino);
+	fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
+		 (unsigned long long) parent, name,
+		 (unsigned long long) e->ino);
 
 	return 0;
 
@@ -807,9 +802,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
 	struct fuse_entry_param e;
 	int err;
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_lookup(parent=%" PRIu64 ", name=%s)\n",
-			parent, name);
+	fuse_log(FUSE_LOG_DEBUG, "lo_lookup(parent=%" PRIu64 ", name=%s)\n", parent, name);
 
 	/* Don't use is_safe_path_component(), allow "." and ".." for NFS export
 	 * support.
@@ -911,9 +904,9 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
 	if (saverr)
 		goto out;
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
-			(unsigned long long) parent, name, (unsigned long long) e.ino);
+	fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
+		   (unsigned long long) parent, name,
+		   (unsigned long long) e.ino);
 
 	fuse_reply_entry(req, &e);
 	return;
@@ -1010,10 +1003,9 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
 	pthread_mutex_unlock(&lo->mutex);
 	e.ino = inode->fuse_ino;
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
-			(unsigned long long) parent, name,
-			(unsigned long long) e.ino);
+	fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n",
+		   (unsigned long long) parent, name,
+		   (unsigned long long) e.ino);
 
 	fuse_reply_entry(req, &e);
 	return;
@@ -1107,12 +1099,10 @@ static void lo_forget_one(fuse_req_t req, fuse_ino_t ino, uint64_t nlookup)
 	if (!inode)
 		return;
 
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
-			(unsigned long long) ino,
-			(unsigned long long) inode->refcount,
-			(unsigned long long) nlookup);
-	}
+	fuse_log(FUSE_LOG_DEBUG, "  forget %lli %lli -%lli\n",
+		(unsigned long long) ino,
+		(unsigned long long) inode->refcount,
+		(unsigned long long) nlookup);
 
 	unref_inode(lo, inode, nlookup);
 }
@@ -1364,9 +1354,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
 	int err;
 	struct lo_cred old = {};
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
-			parent, name);
+	fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n", parent, name);
 
 	if (!is_safe_path_component(name)) {
 		fuse_reply_err(req, EINVAL);
@@ -1439,9 +1427,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 	char buf[64];
 	struct lo_data *lo = lo_data(req);
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n",
-			ino, fi->flags);
+	fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino, fi->flags);
 
 	/* With writeback cache, kernel may send read requests even
 	   when userspace opened write-only */
@@ -1543,9 +1529,8 @@ static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size,
 {
 	struct fuse_bufvec buf = FUSE_BUFVEC_INIT(size);
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_read(ino=%" PRIu64 ", size=%zd, "
-			"off=%lu)\n", ino, size, (unsigned long) offset);
+	fuse_log(FUSE_LOG_DEBUG, "lo_read(ino=%" PRIu64 ", size=%zd, "
+		   "off=%lu)\n", ino, size, (unsigned long) offset);
 
 	buf.buf[0].flags = FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK;
 	buf.buf[0].fd = lo_fi_fd(req, fi);
@@ -1567,9 +1552,8 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
 	out_buf.buf[0].fd = lo_fi_fd(req, fi);
 	out_buf.buf[0].pos = off;
 
-	if (lo_debug(req))
-		fuse_log(FUSE_LOG_DEBUG, "lo_write(ino=%" PRIu64 ", size=%zd, off=%lu)\n",
-			ino, out_buf.buf[0].size, (unsigned long) off);
+	fuse_log(FUSE_LOG_DEBUG, "lo_write_buf(ino=%" PRIu64 ", size=%zd, off=%lu)\n",
+		   ino, out_buf.buf[0].size, (unsigned long) off);
 
 	/*
 	 * If kill_priv is set, drop CAP_FSETID which should lead to kernel
@@ -1662,10 +1646,8 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 	if (!lo_data(req)->xattr)
 		goto out;
 
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "lo_getxattr(ino=%" PRIu64 ", name=%s size=%zd)\n",
-			ino, name, size);
-	}
+	fuse_log(FUSE_LOG_DEBUG, "lo_getxattr(ino=%" PRIu64 ", name=%s size=%zd)\n",
+		   ino, name, size);
 
 	if (inode->is_symlink) {
 		/* Sorry, no race free way to getxattr on symlink. */
@@ -1734,10 +1716,7 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 	if (!lo_data(req)->xattr)
 		goto out;
 
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "lo_listxattr(ino=%" PRIu64 ", size=%zd)\n",
-			ino, size);
-	}
+	fuse_log(FUSE_LOG_DEBUG, "lo_listxattr(ino=%" PRIu64 ", size=%zd)\n", ino, size);
 
 	if (inode->is_symlink) {
 		/* Sorry, no race free way to listxattr on symlink. */
@@ -1806,10 +1785,9 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *name,
 	if (!lo_data(req)->xattr)
 		goto out;
 
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64 ", name=%s value=%s size=%zd)\n",
-			ino, name, value, size);
-	}
+	fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64,
+		 ", name=%s value=%s size=%zd)\n",
+		 ino, name, value, size);
 
 	if (inode->is_symlink) {
 		/* Sorry, no race free way to removexattr on symlink. */
@@ -1853,10 +1831,8 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *name)
 	if (!lo_data(req)->xattr)
 		goto out;
 
-	if (lo_debug(req)) {
-		fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n",
-			ino, name);
-	}
+	fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n",
+		 ino, name);
 
 	if (inode->is_symlink) {
 		/* Sorry, no race free way to setxattr on symlink. */
@@ -2147,6 +2123,9 @@ static void setup_nofile_rlimit(void)
 static void log_func(enum fuse_log_level level,
 		     const char *fmt, va_list ap)
 {
+	if (current_log_level < level)
+		return;
+
 	if (use_syslog) {
 		int priority = LOG_ERR;
 		switch (level) {
@@ -2227,8 +2206,17 @@ int main(int argc, char *argv[])
 	if (fuse_opt_parse(&args, &lo, lo_opts, NULL)== -1)
 		return 1;
 
+	/*
+	 * log_level is 0 if not configured via cmd options (0 is LOG_EMERG,
+	 * and we don't use this log level).
+	 */
+	if (opts.log_level != 0)
+		current_log_level = opts.log_level;
 	lo.debug = opts.debug;
+	if (lo.debug)
+		current_log_level = FUSE_LOG_DEBUG;
 	lo.root.refcount = 2;
+
 	if (lo.source) {
 		struct stat stat;
 		int res;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-10-24 13:17 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24 11:26 [PATCH 00/25] virtiofs daemon (security) Dr. David Alan Gilbert (git)
2019-10-24 11:26 ` [PATCH 01/25] virtiofsd: passthrough_ll: create new files in caller's context Dr. David Alan Gilbert (git)
2019-10-24 11:26 ` [PATCH 02/25] virtiofsd: passthrough_ll: add lo_map for ino/fh indirection Dr. David Alan Gilbert (git)
2019-10-24 11:26 ` [PATCH 03/25] virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers Dr. David Alan Gilbert (git)
2019-10-24 11:26 ` [PATCH 04/25] virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers Dr. David Alan Gilbert (git)
2019-10-24 11:26 ` [PATCH 05/25] virtiofsd: passthrough_ll: add fd_map to hide file descriptors Dr. David Alan Gilbert (git)
2019-10-24 11:26 ` [PATCH 06/25] virtiofsd: passthrough_ll: add fallback for racy ops Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 07/25] virtiofsd: validate path components Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 08/25] virtiofsd: Plumb fuse_bufvec through to do_write_buf Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 09/25] virtiofsd: Pass write iov's all the way through Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 10/25] virtiofsd: add fuse_mbuf_iter API Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 11/25] virtiofsd: validate input buffer sizes in do_write_buf() Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 12/25] virtiofsd: check input buffer size in fuse_lowlevel.c ops Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 13/25] virtiofsd: prevent ".." escape in lo_do_lookup() Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 14/25] virtiofsd: prevent ".." escape in lo_do_readdir() Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 15/25] virtiofsd: use /proc/self/fd/ O_PATH file descriptor Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 16/25] virtiofsd: sandbox mount namespace Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 17/25] virtiofsd: move to an empty network namespace Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 18/25] virtiofsd: move to a new pid namespace Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 19/25] virtiofsd: add seccomp whitelist Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 20/25] virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 21/25] virtiofsd: Drop CAP_FSETID if client asked for it Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 22/25] virtiofsd: set maximum RLIMIT_NOFILE limit Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 23/25] virtiofsd: add security guide document Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 24/25] virtiofsd: add --syslog command-line option Dr. David Alan Gilbert (git)
2019-10-24 11:27 ` [PATCH 25/25] virtiofsd: print log only when priority is high enough Dr. David Alan Gilbert (git)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.