linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC
@ 2020-09-16 16:17 Vivek Goyal
  2020-09-16 16:17 ` [PATCH v2 1/6] fuse: Introduce the notion of FUSE_HANDLE_KILLPRIV_V2 Vivek Goyal
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-09-16 16:17 UTC (permalink / raw)
  To: linux-fsdevel, miklos; +Cc: vgoyal, virtio-fs

Hi All,

Please find attached V2 of the patches to enable SB_NOSEC for fuse. I
posted V1 here.

https://lore.kernel.org/linux-fsdevel/20200724183812.19573-1-vgoyal@redhat.com/

I have generated these patches on top of.

https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/log/?h=for-next

Previously I was not keen on implementing FUSE_HANDLE_KILLPRIV_V2 and
implemented another idea to enable SB_NOSEC conditional on server
declaring that filesystem is not shared. But that did not go too
far when it came to requirements for virtiofs.

https://lore.kernel.org/linux-fsdevel/20200901204045.1250822-1-vgoyal@redhat.com/

So I went back to having another look at implementing FUSE_HANDLE_KILLPRIV_V2
and I think it fits nicely and should work nicely with wide variety of
use cases.

I have taken care of feedback from last round. For the case of random
write peformance has jumped from 50MB/s to 250MB/s. So I am really
looking forward to these changes so that fuse/virtiofs performance
can be improved.

Thanks
Vivek 

Vivek Goyal (6):
  fuse: Introduce the notion of FUSE_HANDLE_KILLPRIV_V2
  fuse: Set FUSE_WRITE_KILL_PRIV in cached write path
  fuse: setattr should set FATTR_KILL_PRIV upon size change
  fuse: Kill suid/sgid using ATTR_MODE if it is not truncate
  fuse: Add a flag FUSE_OPEN_KILL_PRIV for open() request
  virtiofs: Support SB_NOSEC flag to improve direct write performance

 fs/fuse/dir.c             | 19 ++++++++++++++++++-
 fs/fuse/file.c            |  7 +++++++
 fs/fuse/fuse_i.h          |  6 ++++++
 fs/fuse/inode.c           | 17 ++++++++++++++++-
 include/uapi/linux/fuse.h | 18 +++++++++++++++++-
 5 files changed, 64 insertions(+), 3 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/6] fuse: Introduce the notion of FUSE_HANDLE_KILLPRIV_V2
  2020-09-16 16:17 [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
@ 2020-09-16 16:17 ` Vivek Goyal
  2020-09-16 16:17 ` [PATCH v2 2/6] fuse: Set FUSE_WRITE_KILL_PRIV in cached write path Vivek Goyal
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-09-16 16:17 UTC (permalink / raw)
  To: linux-fsdevel, miklos; +Cc: vgoyal, virtio-fs

We already have FUSE_HANDLE_KILLPRIV flag that says that file server will
remove suid/sgid/caps on truncate/chown/write. But that's little different
from what Linux VFS implements.

To be consistent with Linux VFS behavior what we want is.

- caps are always cleared on chown/write/truncate
- suid is always cleared on chown, while for truncate/write it is cleared
  only if caller does not have CAP_FSETID.
- sgid is always cleared on chown, while for truncate/write it is cleared
  only if caller does not have CAP_FSETID as well as file has group execute
  permission.

As previous flag did not provide above semantics. Implement a V2 of the
protocol with above said constraints.

Server does not know if caller has CAP_FSETID or not. So for the case
of write()/truncate(), client will send information in special flag to
indicate whether to kill priviliges or not. These changes are in subsequent
patches.

FUSE_HANDLE_KILLPRIV_V2 relies on WRITE being sent to server to clear
suid/sgid/security.capability. But with ->writeback_cache, WRITES are
cached in guest. So it is not recommended to use FUSE_HANDLE_KILLPRIV_V2
and writeback_cache together. Though it probably might be good enough
for lot of use cases.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/fuse_i.h          | 6 ++++++
 fs/fuse/inode.c           | 5 ++++-
 include/uapi/linux/fuse.h | 7 +++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index dbaae2f6c73e..3dd1578be405 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -631,6 +631,12 @@ struct fuse_conn {
 	/* show legacy mount options */
 	unsigned int legacy_opts_show:1;
 
+	/** fs kills suid/sgid/cap on write/chown/trunc. suid is
+	    killed on write/trunc only if caller did not have CAP_FSETID.
+	    sgid is killed on write/truncate only if caller did not have
+	    CAP_FSETID as well as file has group execute permission. */
+	unsigned handle_killpriv_v2:1;
+
 	/*
 	 * The following bitfields are only for optimization purposes
 	 * and hence races in setting them will not cause malfunction
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index d252237219bf..20740b61f12b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -993,6 +993,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_args *args,
 			    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
 				ok = false;
 			}
+			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2)
+				fc->handle_killpriv_v2 = 1;
 		} else {
 			ra_pages = fc->max_read / PAGE_SIZE;
 			fc->no_lock = 1;
@@ -1035,7 +1037,8 @@ void fuse_send_init(struct fuse_conn *fc)
 		FUSE_WRITEBACK_CACHE | FUSE_NO_OPEN_SUPPORT |
 		FUSE_PARALLEL_DIROPS | FUSE_HANDLE_KILLPRIV | FUSE_POSIX_ACL |
 		FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
-		FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA;
+		FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
+		FUSE_HANDLE_KILLPRIV_V2;
 #ifdef CONFIG_FUSE_DAX
 	if (fc->dax)
 		ia->in.flags |= FUSE_MAP_ALIGNMENT;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 8899e4862309..3ae3f222a0ed 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -172,6 +172,7 @@
  *  - add FUSE_WRITE_KILL_PRIV flag
  *  - add FUSE_SETUPMAPPING and FUSE_REMOVEMAPPING
  *  - add map_alignment to fuse_init_out, add FUSE_MAP_ALIGNMENT flag
+ *  - add FUSE_HANDLE_KILLPRIV_V2
  */
 
 #ifndef _LINUX_FUSE_H
@@ -316,6 +317,11 @@ struct fuse_file_lock {
  * FUSE_MAP_ALIGNMENT: init_out.map_alignment contains log2(byte alignment) for
  *		       foffset and moffset fields in struct
  *		       fuse_setupmapping_out and fuse_removemapping_one.
+ * FUSE_HANDLE_KILLPRIV_V2: fs kills suid/sgid/cap on write/chown/trunc.
+ * 			Upon write/truncate suid/sgid is only killed if caller
+ * 			does not have CAP_FSETID. Additionally upon
+ * 			write/truncate sgid is killed only if file has group
+ * 			execute permission. (Same as Linux VFS behavior).
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -344,6 +350,7 @@ struct fuse_file_lock {
 #define FUSE_NO_OPENDIR_SUPPORT (1 << 24)
 #define FUSE_EXPLICIT_INVAL_DATA (1 << 25)
 #define FUSE_MAP_ALIGNMENT	(1 << 26)
+#define FUSE_HANDLE_KILLPRIV_V2	(1 << 27)
 
 /**
  * CUSE INIT request/reply flags
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/6] fuse: Set FUSE_WRITE_KILL_PRIV in cached write path
  2020-09-16 16:17 [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
  2020-09-16 16:17 ` [PATCH v2 1/6] fuse: Introduce the notion of FUSE_HANDLE_KILLPRIV_V2 Vivek Goyal
@ 2020-09-16 16:17 ` Vivek Goyal
  2020-09-16 16:17 ` [PATCH v2 3/6] fuse: setattr should set FATTR_KILL_PRIV upon size change Vivek Goyal
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-09-16 16:17 UTC (permalink / raw)
  To: linux-fsdevel, miklos; +Cc: vgoyal, virtio-fs

With HANDLE_KILLPRIV_V2, server will need to kill suid/sgid if caller
does not have CAP_FSETID. We already have a flag FUSE_WRITE_KILL_PRIV
in WRITE request and we already set it in direct I/O path.

To make it work in cached write path also, start setting FUSE_WRITE_KILL_PRIV
in this path too.

Set it only if fc->handle_killpriv_v2 is set. Otherwise client is responsible
for kill suid/sgid.

In case of direct I/O we set FUSE_WRITE_KILL_PRIV unconditionally because
we do't call file_remove_privs() in that path (with cache=none option).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 172a0b1aa634..e40428f3d0f1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1095,6 +1095,8 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
 
 	fuse_write_args_fill(ia, ff, pos, count);
 	ia->write.in.flags = fuse_write_flags(iocb);
+	if (fc->handle_killpriv_v2 && !capable(CAP_FSETID))
+		ia->write.in.write_flags |= FUSE_WRITE_KILL_PRIV;
 
 	err = fuse_simple_request(fc, &ap->args);
 	if (!err && ia->write.out.size > count)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/6] fuse: setattr should set FATTR_KILL_PRIV upon size change
  2020-09-16 16:17 [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
  2020-09-16 16:17 ` [PATCH v2 1/6] fuse: Introduce the notion of FUSE_HANDLE_KILLPRIV_V2 Vivek Goyal
  2020-09-16 16:17 ` [PATCH v2 2/6] fuse: Set FUSE_WRITE_KILL_PRIV in cached write path Vivek Goyal
@ 2020-09-16 16:17 ` Vivek Goyal
  2020-09-16 16:17 ` [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate Vivek Goyal
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-09-16 16:17 UTC (permalink / raw)
  To: linux-fsdevel, miklos; +Cc: vgoyal, virtio-fs

If fc->handle_killpriv_v2 is enabled, we expect file server to clear
suid/sgid/security.capbility upon chown/truncate/write as appropriate.

Upon truncate (ATTR_SIZE), suid/sgid is cleared only if caller does
not have CAP_FSETID. File server does not know whether caller has
CAP_FSETID or not. Hence set FATTR_KILL_PRIV upon truncate to let
file server know that caller does not have CAP_FSETID and it should
kill suid/sgid as appropriate.

We don't have to send this information for chown (ATTR_UID/ATTR_GID)
as that always clears suid/sgid irrespective of capabilities of
calling process.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/dir.c             | 2 ++
 include/uapi/linux/fuse.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index c4a01290aec6..ecdb7895c156 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1575,6 +1575,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
 		/* For mandatory locking in truncate */
 		inarg.valid |= FATTR_LOCKOWNER;
 		inarg.lock_owner = fuse_lock_owner_id(fc, current->files);
+		if (fc->handle_killpriv_v2 && !capable(CAP_FSETID))
+			inarg.valid |= FATTR_KILL_PRIV;
 	}
 	fuse_setattr_fill(fc, &args, inode, &inarg, &outarg);
 	err = fuse_simple_request(fc, &args);
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 3ae3f222a0ed..7b8da0a2de0d 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -269,6 +269,7 @@ struct fuse_file_lock {
 #define FATTR_MTIME_NOW	(1 << 8)
 #define FATTR_LOCKOWNER	(1 << 9)
 #define FATTR_CTIME	(1 << 10)
+#define FATTR_KILL_PRIV	(1 << 14) /* Matches ATTR_KILL_PRIV */
 
 /**
  * Flags returned by the OPEN request
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate
  2020-09-16 16:17 [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
                   ` (2 preceding siblings ...)
  2020-09-16 16:17 ` [PATCH v2 3/6] fuse: setattr should set FATTR_KILL_PRIV upon size change Vivek Goyal
@ 2020-09-16 16:17 ` Vivek Goyal
  2020-09-22 13:56   ` Miklos Szeredi
  2020-09-16 16:17 ` [PATCH v2 5/6] fuse: Add a flag FUSE_OPEN_KILL_PRIV for open() request Vivek Goyal
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2020-09-16 16:17 UTC (permalink / raw)
  To: linux-fsdevel, miklos; +Cc: vgoyal, virtio-fs

If a truncate is happening with ->handle_killpriv_v2 is enabled, then
we don't have to send ATTR_MODE to kill suid/sgid as server will
kill it as part of the protocol.

But if this is non-truncate setattr then server will not kill suid/sgid.
So continue to send ATTR_MODE to kill suid/sgid for non-truncate setattr,
even if ->handle_killpriv_v2 is enabled.

This path is taken when client does a write on a file which has suid/
sgid is set. VFS will first kill suid/sgid and then proceed with WRITE.

One can argue that why not simply ignore ATTR_MODE because a WRITE
will follow and ->handle_killpriv_v2 will kill suid/sgid that time.
I feel this is a safer approach for following reasons.

- With ->writeback_cache enabled, WRITE will not go to server. I feel
  that for this reason ->writeback_cache mode is not fully compatible
  with ->handle_killpriv_v2. But if we kill suid/sgid now, this will
  solve this particular issue for ->writeback_cache mode too.

  Again, I will not solve all the issues around ->writeback_cache but
  makes things better.

- If we rely on WRITE killing suid/sgid, then after cache becomes
  out of sync w.r.t host. Client will still have suid/sgid set but
  subsequent WRITE will clear suid/sgid. Well WRITE will also invalidate
  client cache so further access to inode->i_mode should result in
  a ->getattr. Hmm..., for the case of ->writeback_cache, I am
  kind of inclined to send ATTR_MODE.

- We are sending setattr(ATTR_FORCE) anyway (even if we clear ATTR_MODE).
  So if we are not saving on setattr(), why not kill suid/sgid now.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/dir.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index ecdb7895c156..4b0fe0828e36 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1655,6 +1655,21 @@ static int fuse_setattr(struct dentry *entry, struct iattr *attr)
 		return -EACCES;
 
 	if (attr->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID)) {
+		bool kill_sugid = true;
+		bool is_truncate = !!(attr->ia_valid & ATTR_SIZE);
+
+		if (fc->handle_killpriv ||
+		    (fc->handle_killpriv_v2 && is_truncate)) {
+			/*
+			 * If this is truncate and ->handle_killpriv_v2 is
+			 * enabled, we don't have to send ATTR_MODE to
+			 * kill suid/sgid as server will do it anyway as
+			 * part of truncate. But if this is not truncate
+			 * then kill suid/sgid by sending ATTR_MODE.
+			 */
+			kill_sugid = false;
+		}
+
 		attr->ia_valid &= ~(ATTR_KILL_SUID | ATTR_KILL_SGID |
 				    ATTR_MODE);
 
@@ -1664,7 +1679,7 @@ static int fuse_setattr(struct dentry *entry, struct iattr *attr)
 		 *
 		 * This should be done on write(), truncate() and chown().
 		 */
-		if (!fc->handle_killpriv) {
+		if (kill_sugid) {
 			/*
 			 * ia_mode calculation may have used stale i_mode.
 			 * Refresh and recalculate.
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/6] fuse: Add a flag FUSE_OPEN_KILL_PRIV for open() request
  2020-09-16 16:17 [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
                   ` (3 preceding siblings ...)
  2020-09-16 16:17 ` [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate Vivek Goyal
@ 2020-09-16 16:17 ` Vivek Goyal
  2020-09-16 16:17 ` [PATCH v2 6/6] virtiofs: Support SB_NOSEC flag to improve direct write performance Vivek Goyal
  2020-09-16 16:38 ` [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
  6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-09-16 16:17 UTC (permalink / raw)
  To: linux-fsdevel, miklos; +Cc: vgoyal, virtio-fs

With FUSE_HANDLE_KILLPRIV_V2 support, server will need to kill
suid/sgid/security.capability on open(O_TRUNC), if server supports
FUSE_ATOMIC_O_TRUNC.

But server needs to kill suid/sgid only if caller does not have
CAP_FSETID. Given server does not have this information, client
needs to send this info to server.

So add a flag FUSE_OPEN_KILL_PRIV to fuse_open_in request which tells
server to kill suid/sgid(only if group execute is set).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/file.c            |  5 +++++
 include/uapi/linux/fuse.h | 10 +++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e40428f3d0f1..2853f55fd8f7 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -42,6 +42,11 @@ static int fuse_send_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
 	inarg.flags = file->f_flags & ~(O_CREAT | O_EXCL | O_NOCTTY);
 	if (!fc->atomic_o_trunc)
 		inarg.flags &= ~O_TRUNC;
+
+	if (fc->handle_killpriv_v2 && (inarg.flags & O_TRUNC) &&
+	    !capable(CAP_FSETID))
+		inarg.open_flags |= FUSE_OPEN_KILL_PRIV;
+
 	args.opcode = opcode;
 	args.nodeid = nodeid;
 	args.in_numargs = 1;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 7b8da0a2de0d..e20b3ee9d292 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -173,6 +173,7 @@
  *  - add FUSE_SETUPMAPPING and FUSE_REMOVEMAPPING
  *  - add map_alignment to fuse_init_out, add FUSE_MAP_ALIGNMENT flag
  *  - add FUSE_HANDLE_KILLPRIV_V2
+ *  - add FUSE_OPEN_KILL_PRIV
  */
 
 #ifndef _LINUX_FUSE_H
@@ -427,6 +428,13 @@ struct fuse_file_lock {
  */
 #define FUSE_FSYNC_FDATASYNC	(1 << 0)
 
+/**
+ * Open flags
+ * FUSE_OPEN_KILL_PRIV: Kill suid/sgid/security.capability. sgid is cleared
+ * 			only if file has group execute permission.
+ */
+#define FUSE_OPEN_KILL_PRIV	(1 << 0)
+
 enum fuse_opcode {
 	FUSE_LOOKUP		= 1,
 	FUSE_FORGET		= 2,  /* no reply */
@@ -588,7 +596,7 @@ struct fuse_setattr_in {
 
 struct fuse_open_in {
 	uint32_t	flags;
-	uint32_t	unused;
+	uint32_t	open_flags;
 };
 
 struct fuse_create_in {
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 6/6] virtiofs: Support SB_NOSEC flag to improve direct write performance
  2020-09-16 16:17 [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
                   ` (4 preceding siblings ...)
  2020-09-16 16:17 ` [PATCH v2 5/6] fuse: Add a flag FUSE_OPEN_KILL_PRIV for open() request Vivek Goyal
@ 2020-09-16 16:17 ` Vivek Goyal
  2020-09-16 16:38 ` [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
  6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-09-16 16:17 UTC (permalink / raw)
  To: linux-fsdevel, miklos; +Cc: vgoyal, virtio-fs

virtiofs can be slow with small writes if xattr are enabled and we are
doing cached writes (No direct I/). Ganesh Mahalingam noticed this here.

https://github.com/kata-containers/runtime/issues/2815

Some debugging showed that that file_remove_privs() is called in cached
write path on every write. And everytime it calls
security_inode_need_killpriv() which results in call to
__vfs_getxattr(XATTR_NAME_CAPS). And this goes to file server to fetch
xattr. This extra round trip for every write slows down writes tremendously.

Normally to avoid paying this penalty on every write, vfs has the
notion of caching this information in inode (S_NOSEC). So vfs
sets S_NOSEC, if filesystem opted for it using super block flag
SB_NOSEC. And S_NOSEC is cleared when setuid/setgid bit is set or
when security xattr is set on inode so that next time a write
happens, we check inode again for clearing setuid/setgid bits as well
clear any security.capability xattr.

This seems to work well for local file systems but for remote file
systems it is possible that VFS does not have full picture and a
different client sets setuid/setgid bit or security.capability xattr
on file and that means VFS information about S_NOSEC on another client
will be stale. So for remote filesystems SB_NOSEC was disabled by
default.

commit 9e1f1de02c2275d7172e18dc4e7c2065777611bf
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Fri Jun 3 18:24:58 2011 -0400

    more conservative S_NOSEC handling

That commit mentioned that these filesystems can still make use of
SB_NOSEC as long as they clear S_NOSEC when they are refreshing inode
attriutes from server.

So this patch tries to enable SB_NOSEC on fuse (regular fuse as well
as virtiofs). And clear SB_NOSEC when we are refreshing inode attributes.

This is enabled only if server supports FUSE_HANDLE_KILLPRIV_V2. This
says that server will clear setuid/setgid/security.capability on
chown/truncate/write as apporpriate.

This should provide tighter coherency because now suid/sgid/security.capability
will be cleared even if fuse client cache has not seen these attrs.

Basic idea is that fuse client will trigger suid/sgid/security.capability
clearing based on its attr cache. But even if cache has gone stale,
it is fine because FUSE_HANDLE_KILLPRIV_V2 will make sure WRITE
clear suid/sgid/security.capability.

We make this change only if server supports FUSE_HANDLE_KILLPRIV_V2.
This should make sure that existing filesystems which might be
relying on seucurity.capability always being queried from server
are not impacted.

This tighter coherency relies on WRITE showing up on server (and not
being cached in guest). So writeback_cache mode will not provide that
tight coherency and it is not recommended to use two together. Having
said that it might work reasonably well for lot of use cases.

This change improves random write performance very significantly. I
am running virtiofsd with cache=auto and following fio command.

fio --ioengine=libaio --direct=1  --name=test --filename=/mnt/virtiofs/random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randwrite

Before this patch I get around 50MB/s and after the patch I get around
250MB/s bandwidth. So improvement is very significant.

Reported-by: "Mahalingam, Ganesh" <ganesh.mahalingam@intel.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/inode.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 20740b61f12b..4b7a043f21ee 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -201,6 +201,16 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 		inode->i_mode &= ~S_ISVTX;
 
 	fi->orig_ino = attr->ino;
+
+	/*
+	 * We are refreshing inode data and it is possible that another
+	 * client set suid/sgid or security.capability xattr. So clear
+	 * S_NOSEC. Ideally, we could have cleared it only if suid/sgid
+	 * was set or if security.capability xattr was set. But we don't
+	 * know if security.capability has been set or not. So clear it
+	 * anyway. Its less efficient but should is safe.
+	 */
+	inode->i_flags &= ~S_NOSEC;
 }
 
 void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
@@ -993,8 +1003,10 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_args *args,
 			    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
 				ok = false;
 			}
-			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2)
+			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
 				fc->handle_killpriv_v2 = 1;
+				fc->sb->s_flags |= SB_NOSEC;
+			}
 		} else {
 			ra_pages = fc->max_read / PAGE_SIZE;
 			fc->no_lock = 1;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC
  2020-09-16 16:17 [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
                   ` (5 preceding siblings ...)
  2020-09-16 16:17 ` [PATCH v2 6/6] virtiofs: Support SB_NOSEC flag to improve direct write performance Vivek Goyal
@ 2020-09-16 16:38 ` Vivek Goyal
  6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-09-16 16:38 UTC (permalink / raw)
  To: linux-fsdevel, miklos; +Cc: virtio-fs

On Wed, Sep 16, 2020 at 12:17:31PM -0400, Vivek Goyal wrote:
> Hi All,
> 
> Please find attached V2 of the patches to enable SB_NOSEC for fuse. I
> posted V1 here.

I have posted corresonding qemu/virtiofsd change patch here.

https://www.redhat.com/archives/virtio-fs/2020-September/msg00061.html

Thanks
Vivek

> 
> https://lore.kernel.org/linux-fsdevel/20200724183812.19573-1-vgoyal@redhat.com/
> 
> I have generated these patches on top of.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/log/?h=for-next
> 
> Previously I was not keen on implementing FUSE_HANDLE_KILLPRIV_V2 and
> implemented another idea to enable SB_NOSEC conditional on server
> declaring that filesystem is not shared. But that did not go too
> far when it came to requirements for virtiofs.
> 
> https://lore.kernel.org/linux-fsdevel/20200901204045.1250822-1-vgoyal@redhat.com/
> 
> So I went back to having another look at implementing FUSE_HANDLE_KILLPRIV_V2
> and I think it fits nicely and should work nicely with wide variety of
> use cases.
> 
> I have taken care of feedback from last round. For the case of random
> write peformance has jumped from 50MB/s to 250MB/s. So I am really
> looking forward to these changes so that fuse/virtiofs performance
> can be improved.
> 
> Thanks
> Vivek 
> 
> Vivek Goyal (6):
>   fuse: Introduce the notion of FUSE_HANDLE_KILLPRIV_V2
>   fuse: Set FUSE_WRITE_KILL_PRIV in cached write path
>   fuse: setattr should set FATTR_KILL_PRIV upon size change
>   fuse: Kill suid/sgid using ATTR_MODE if it is not truncate
>   fuse: Add a flag FUSE_OPEN_KILL_PRIV for open() request
>   virtiofs: Support SB_NOSEC flag to improve direct write performance
> 
>  fs/fuse/dir.c             | 19 ++++++++++++++++++-
>  fs/fuse/file.c            |  7 +++++++
>  fs/fuse/fuse_i.h          |  6 ++++++
>  fs/fuse/inode.c           | 17 ++++++++++++++++-
>  include/uapi/linux/fuse.h | 18 +++++++++++++++++-
>  5 files changed, 64 insertions(+), 3 deletions(-)
> 
> -- 
> 2.25.4
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate
  2020-09-16 16:17 ` [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate Vivek Goyal
@ 2020-09-22 13:56   ` Miklos Szeredi
  2020-09-22 20:08     ` Vivek Goyal
  0 siblings, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2020-09-22 13:56 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-fsdevel, virtio-fs-list

On Wed, Sep 16, 2020 at 6:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:

> But if this is non-truncate setattr then server will not kill suid/sgid.
> So continue to send ATTR_MODE to kill suid/sgid for non-truncate setattr,
> even if ->handle_killpriv_v2 is enabled.

Sending ATTR_MODE doesn't make sense, since that is racy.   The
refresh-recalculate makes the race window narrower, but it doesn't
eliminate it.

I think I suggested sending write synchronously if suid/sgid/caps are
set.  Do you see a problem with this?

Does this affect anything other than cached writes?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate
  2020-09-22 13:56   ` Miklos Szeredi
@ 2020-09-22 20:08     ` Vivek Goyal
  2020-09-22 21:25       ` Miklos Szeredi
  0 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2020-09-22 20:08 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, virtio-fs-list

On Tue, Sep 22, 2020 at 03:56:47PM +0200, Miklos Szeredi wrote:
> On Wed, Sep 16, 2020 at 6:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > But if this is non-truncate setattr then server will not kill suid/sgid.
> > So continue to send ATTR_MODE to kill suid/sgid for non-truncate setattr,
> > even if ->handle_killpriv_v2 is enabled.
> 
> Sending ATTR_MODE doesn't make sense, since that is racy.   The
> refresh-recalculate makes the race window narrower, but it doesn't
> eliminate it.

Hi Miklos,

Agreed that it does not eliminate that race.

> 
> I think I suggested sending write synchronously if suid/sgid/caps are
> set.  Do you see a problem with this?

Sorry, I might have missed it. So you are saying that for the case of
->writeback_cache, force a synchronous WRITE if suid/sgid is set. But
this will only work if client sees the suid/sgid bits. If client B
set the suid/sgid which client A does not see then all the WRITEs
will be cached in client A and not clear suid/sgid bits.

Also another problem is that if client sees suid/sgid and we make
WRITE synchronous, client's suid/sgid attrs are still cached till
next refresh (both for ->writeback_cache and non writeback_cache
case). So server is clearing suid/sgid bits but client still
keeps them cached. I hope none of the code paths end up using
this stale value and refresh attrs before using suid/sgid.

Shall we refresh attrs after WRITE if suid/sgid is set and client
expects it to clear after WRITE finishes to solve this problem. Or
this is something which is actually not a real problem and I am
overdesigning.

Thanks
Vivek

> 
> Does this affect anything other than cached writes?
> 
> Thanks,
> Miklos
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate
  2020-09-22 20:08     ` Vivek Goyal
@ 2020-09-22 21:25       ` Miklos Szeredi
  2020-09-22 21:31         ` Vivek Goyal
  0 siblings, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2020-09-22 21:25 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-fsdevel, virtio-fs-list

On Tue, Sep 22, 2020 at 10:08 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Sep 22, 2020 at 03:56:47PM +0200, Miklos Szeredi wrote:
> > On Wed, Sep 16, 2020 at 6:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > > But if this is non-truncate setattr then server will not kill suid/sgid.
> > > So continue to send ATTR_MODE to kill suid/sgid for non-truncate setattr,
> > > even if ->handle_killpriv_v2 is enabled.
> >
> > Sending ATTR_MODE doesn't make sense, since that is racy.   The
> > refresh-recalculate makes the race window narrower, but it doesn't
> > eliminate it.
>
> Hi Miklos,
>
> Agreed that it does not eliminate that race.
>
> >
> > I think I suggested sending write synchronously if suid/sgid/caps are
> > set.  Do you see a problem with this?
>
> Sorry, I might have missed it. So you are saying that for the case of
> ->writeback_cache, force a synchronous WRITE if suid/sgid is set. But
> this will only work if client sees the suid/sgid bits. If client B
> set the suid/sgid which client A does not see then all the WRITEs
> will be cached in client A and not clear suid/sgid bits.

Unless the attributes are invalidated (either by timeout or
explicitly) there's no way that in that situation the suid/sgid bits
can be cleared.  That's true of your patch as well.

>
> Also another problem is that if client sees suid/sgid and we make
> WRITE synchronous, client's suid/sgid attrs are still cached till
> next refresh (both for ->writeback_cache and non writeback_cache
> case). So server is clearing suid/sgid bits but client still
> keeps them cached. I hope none of the code paths end up using
> this stale value and refresh attrs before using suid/sgid.
>
> Shall we refresh attrs after WRITE if suid/sgid is set and client
> expects it to clear after WRITE finishes to solve this problem. Or
> this is something which is actually not a real problem and I am
> overdesigning.

The fuse_perform_write() path already has the attribute invalidation,
which will trigger GETATTR from fuse_update_attributes() in the next
write.

So I think all that should work fine.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate
  2020-09-22 21:25       ` Miklos Szeredi
@ 2020-09-22 21:31         ` Vivek Goyal
  0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-09-22 21:31 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, virtio-fs-list

On Tue, Sep 22, 2020 at 11:25:30PM +0200, Miklos Szeredi wrote:
> On Tue, Sep 22, 2020 at 10:08 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Tue, Sep 22, 2020 at 03:56:47PM +0200, Miklos Szeredi wrote:
> > > On Wed, Sep 16, 2020 at 6:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > > But if this is non-truncate setattr then server will not kill suid/sgid.
> > > > So continue to send ATTR_MODE to kill suid/sgid for non-truncate setattr,
> > > > even if ->handle_killpriv_v2 is enabled.
> > >
> > > Sending ATTR_MODE doesn't make sense, since that is racy.   The
> > > refresh-recalculate makes the race window narrower, but it doesn't
> > > eliminate it.
> >
> > Hi Miklos,
> >
> > Agreed that it does not eliminate that race.
> >
> > >
> > > I think I suggested sending write synchronously if suid/sgid/caps are
> > > set.  Do you see a problem with this?
> >
> > Sorry, I might have missed it. So you are saying that for the case of
> > ->writeback_cache, force a synchronous WRITE if suid/sgid is set. But
> > this will only work if client sees the suid/sgid bits. If client B
> > set the suid/sgid which client A does not see then all the WRITEs
> > will be cached in client A and not clear suid/sgid bits.
> 
> Unless the attributes are invalidated (either by timeout or
> explicitly) there's no way that in that situation the suid/sgid bits
> can be cleared.  That's true of your patch as well.

Right. And that's why I mentioned that handle_killpriv_v2 is not fully
compatible with ->writeback_cache.

> 
> >
> > Also another problem is that if client sees suid/sgid and we make
> > WRITE synchronous, client's suid/sgid attrs are still cached till
> > next refresh (both for ->writeback_cache and non writeback_cache
> > case). So server is clearing suid/sgid bits but client still
> > keeps them cached. I hope none of the code paths end up using
> > this stale value and refresh attrs before using suid/sgid.
> >
> > Shall we refresh attrs after WRITE if suid/sgid is set and client
> > expects it to clear after WRITE finishes to solve this problem. Or
> > this is something which is actually not a real problem and I am
> > overdesigning.
> 
> The fuse_perform_write() path already has the attribute invalidation,
> which will trigger GETATTR from fuse_update_attributes() in the next
> write.

Ok. So if there is any path which potentially can make use of cached
suid/sgid, we just need to make sure fuse_update_attributes() has been
called in that path.

> 
> So I think all that should work fine.

Sounds good. I will give it a try and see if I notice any other issues.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-09-22 21:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-16 16:17 [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal
2020-09-16 16:17 ` [PATCH v2 1/6] fuse: Introduce the notion of FUSE_HANDLE_KILLPRIV_V2 Vivek Goyal
2020-09-16 16:17 ` [PATCH v2 2/6] fuse: Set FUSE_WRITE_KILL_PRIV in cached write path Vivek Goyal
2020-09-16 16:17 ` [PATCH v2 3/6] fuse: setattr should set FATTR_KILL_PRIV upon size change Vivek Goyal
2020-09-16 16:17 ` [PATCH v2 4/6] fuse: Kill suid/sgid using ATTR_MODE if it is not truncate Vivek Goyal
2020-09-22 13:56   ` Miklos Szeredi
2020-09-22 20:08     ` Vivek Goyal
2020-09-22 21:25       ` Miklos Szeredi
2020-09-22 21:31         ` Vivek Goyal
2020-09-16 16:17 ` [PATCH v2 5/6] fuse: Add a flag FUSE_OPEN_KILL_PRIV for open() request Vivek Goyal
2020-09-16 16:17 ` [PATCH v2 6/6] virtiofs: Support SB_NOSEC flag to improve direct write performance Vivek Goyal
2020-09-16 16:38 ` [PATCH v2 0/6] fuse: Implement FUSE_HANDLE_KILLPRIV_V2 and enable SB_NOSEC Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).