All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Prepare for fsnotify pre-content permission events
@ 2023-12-07 12:38 Amir Goldstein
  2023-12-07 12:38 ` [PATCH 1/4] fs: use splice_copy_file_range() inline helper Amir Goldstein
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Amir Goldstein @ 2023-12-07 12:38 UTC (permalink / raw)
  To: Christian Brauner, Jan Kara
  Cc: Jeff Layton, Josef Bacik, Christoph Hellwig, David Howells,
	Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel

Hi Jan & Christian,

I am not planning to post the fanotify pre-content event patches [1]
for 6.8.  Not because they are not ready, but because the usersapce
example is not ready.

Also, I think it is a good idea to let the large permission hooks
cleanup work to mature over the 6.8 cycle, before we introduce the
pre-content events.

However, I would like to include the following vfs prep patches along
with the vfs.rw PR for 6.8, which could be titled as the subject of
this cover letter.

Patch 1 is a variant of a cleanup suggested by Christoph to get rid
of the generic_copy_file_range() exported symbol.

Patches 2,3 add the file_write_not_started() assertion to fsnotify
file permission hooks.  IMO, it is important to merge it along with
vfs.rw because:

1. This assert is how I tested vfs.rw does what it aimed to achieve
2. This will protect us from new callers that break the new order
3. The commit message of patch 3 provides the context for the entire
   series and can be included in the PR message

Patch 4 is the final change of fsnotify permission hook locations/args
and is the last of the vfs prerequsites for pre-content events.

If we merge patch 4 for 6.8, it will be much easier for the development
of fanotify pre-content events in 6.9 dev cycle, which be contained
within the fsnotify subsystem.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fan_pre_content

Amir Goldstein (4):
  fs: use splice_copy_file_range() inline helper
  fsnotify: split fsnotify_perm() into two hooks
  fsnotify: assert that file_start_write() is not held in permission
    hooks
  fsnotify: pass access range in file permission hooks

 fs/ceph/file.c           |  4 ++--
 fs/fuse/file.c           |  5 +++--
 fs/nfs/nfs4file.c        |  5 +++--
 fs/open.c                |  4 ++++
 fs/read_write.c          | 44 ++++++++--------------------------------
 fs/readdir.c             |  4 ++++
 fs/remap_range.c         |  8 +++++++-
 fs/smb/client/cifsfs.c   |  5 +++--
 fs/splice.c              |  2 +-
 include/linux/fs.h       |  3 ---
 include/linux/fsnotify.h | 42 ++++++++++++++++++++++++--------------
 include/linux/splice.h   |  8 ++++++++
 security/security.c      | 10 ++-------
 13 files changed, 72 insertions(+), 72 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] fs: use splice_copy_file_range() inline helper
  2023-12-07 12:38 [PATCH 0/4] Prepare for fsnotify pre-content permission events Amir Goldstein
@ 2023-12-07 12:38 ` Amir Goldstein
  2023-12-08 17:33   ` Christian Brauner
  2023-12-08 18:27   ` Jan Kara
  2023-12-07 12:38 ` [PATCH 2/4] fsnotify: split fsnotify_perm() into two hooks Amir Goldstein
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Amir Goldstein @ 2023-12-07 12:38 UTC (permalink / raw)
  To: Christian Brauner, Jan Kara
  Cc: Jeff Layton, Josef Bacik, Christoph Hellwig, David Howells,
	Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel

generic_copy_file_range() is just a wrapper around splice_file_range().
Move the wrapper to splice.h and rename it to splice_copy_file_range().

Suggested-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-fsdevel/20231204083849.GC32438@lst.de/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/ceph/file.c         |  4 ++--
 fs/fuse/file.c         |  5 +++--
 fs/nfs/nfs4file.c      |  5 +++--
 fs/read_write.c        | 34 ----------------------------------
 fs/smb/client/cifsfs.c |  5 +++--
 fs/splice.c            |  2 +-
 include/linux/fs.h     |  3 ---
 include/linux/splice.h |  8 ++++++++
 8 files changed, 20 insertions(+), 46 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f11de6e1f1c1..d380d9dad0e0 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -3090,8 +3090,8 @@ static ssize_t ceph_copy_file_range(struct file *src_file, loff_t src_off,
 				     len, flags);
 
 	if (ret == -EOPNOTSUPP || ret == -EXDEV)
-		ret = generic_copy_file_range(src_file, src_off, dst_file,
-					      dst_off, len, flags);
+		ret = splice_copy_file_range(src_file, src_off, dst_file,
+					     dst_off, len);
 	return ret;
 }
 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index a660f1f21540..148a71b8b4d0 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -19,6 +19,7 @@
 #include <linux/uio.h>
 #include <linux/fs.h>
 #include <linux/filelock.h>
+#include <linux/splice.h>
 
 static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
 			  unsigned int open_flags, int opcode,
@@ -3195,8 +3196,8 @@ static ssize_t fuse_copy_file_range(struct file *src_file, loff_t src_off,
 				     len, flags);
 
 	if (ret == -EOPNOTSUPP || ret == -EXDEV)
-		ret = generic_copy_file_range(src_file, src_off, dst_file,
-					      dst_off, len, flags);
+		ret = splice_copy_file_range(src_file, src_off, dst_file,
+					     dst_off, len);
 	return ret;
 }
 
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 02788c3c85e5..e238abc78a13 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -10,6 +10,7 @@
 #include <linux/mount.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_ssc.h>
+#include <linux/splice.h>
 #include "delegation.h"
 #include "internal.h"
 #include "iostat.h"
@@ -195,8 +196,8 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 	ret = __nfs4_copy_file_range(file_in, pos_in, file_out, pos_out, count,
 				     flags);
 	if (ret == -EOPNOTSUPP || ret == -EXDEV)
-		ret = generic_copy_file_range(file_in, pos_in, file_out,
-					      pos_out, count, flags);
+		ret = splice_copy_file_range(file_in, pos_in, file_out,
+					     pos_out, count);
 	return ret;
 }
 
diff --git a/fs/read_write.c b/fs/read_write.c
index 01a14570015b..97a9d5c7ad96 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1396,40 +1396,6 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
 }
 #endif
 
-/**
- * generic_copy_file_range - copy data between two files
- * @file_in:	file structure to read from
- * @pos_in:	file offset to read from
- * @file_out:	file structure to write data to
- * @pos_out:	file offset to write data to
- * @len:	amount of data to copy
- * @flags:	copy flags
- *
- * This is a generic filesystem helper to copy data from one file to another.
- * It has no constraints on the source or destination file owners - the files
- * can belong to different superblocks and different filesystem types. Short
- * copies are allowed.
- *
- * This should be called from the @file_out filesystem, as per the
- * ->copy_file_range() method.
- *
- * Returns the number of bytes copied or a negative error indicating the
- * failure.
- */
-
-ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
-				struct file *file_out, loff_t pos_out,
-				size_t len, unsigned int flags)
-{
-	/* May only be called from within ->copy_file_range() methods */
-	if (WARN_ON_ONCE(flags))
-		return -EINVAL;
-
-	return splice_file_range(file_in, &pos_in, file_out, &pos_out,
-				 min_t(size_t, len, MAX_RW_COUNT));
-}
-EXPORT_SYMBOL(generic_copy_file_range);
-
 /*
  * Performs necessary checks before doing a file copy
  *
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index ea3a7a668b45..ee461bf0ef63 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -25,6 +25,7 @@
 #include <linux/freezer.h>
 #include <linux/namei.h>
 #include <linux/random.h>
+#include <linux/splice.h>
 #include <linux/uuid.h>
 #include <linux/xattr.h>
 #include <uapi/linux/magic.h>
@@ -1362,8 +1363,8 @@ static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off,
 	free_xid(xid);
 
 	if (rc == -EOPNOTSUPP || rc == -EXDEV)
-		rc = generic_copy_file_range(src_file, off, dst_file,
-					     destoff, len, flags);
+		rc = splice_copy_file_range(src_file, off, dst_file,
+					    destoff, len);
 	return rc;
 }
 
diff --git a/fs/splice.c b/fs/splice.c
index 7cda013e5a1e..24bd93f8e4c3 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1243,7 +1243,7 @@ EXPORT_SYMBOL(do_splice_direct);
  * @len:	number of bytes to splice
  *
  * Description:
- *    For use by generic_copy_file_range() and ->copy_file_range() methods.
+ *    For use by ->copy_file_range() methods.
  *    Like do_splice_direct(), but vfs_copy_file_range() already holds
  *    start_file_write() on @out file.
  *
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 04422a0eccdd..900d0cd55b50 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2090,9 +2090,6 @@ extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
 				   loff_t, size_t, unsigned int);
-extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
-				       struct file *file_out, loff_t pos_out,
-				       size_t len, unsigned int flags);
 int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 				    struct file *file_out, loff_t pos_out,
 				    loff_t *len, unsigned int remap_flags,
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 49532d5dda52..b92c4676c59b 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -89,6 +89,14 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
 long splice_file_range(struct file *in, loff_t *ppos, struct file *out,
 		       loff_t *opos, size_t len);
 
+static inline long splice_copy_file_range(struct file *in, loff_t pos_in,
+					  struct file *out, loff_t pos_out,
+					  size_t len)
+{
+	return splice_file_range(in, &pos_in, out, &pos_out,
+				      min_t(size_t, len, MAX_RW_COUNT));
+}
+
 extern long do_tee(struct file *in, struct file *out, size_t len,
 		   unsigned int flags);
 extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] fsnotify: split fsnotify_perm() into two hooks
  2023-12-07 12:38 [PATCH 0/4] Prepare for fsnotify pre-content permission events Amir Goldstein
  2023-12-07 12:38 ` [PATCH 1/4] fs: use splice_copy_file_range() inline helper Amir Goldstein
@ 2023-12-07 12:38 ` Amir Goldstein
  2023-12-08 18:33   ` Jan Kara
  2023-12-07 12:38 ` [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks Amir Goldstein
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2023-12-07 12:38 UTC (permalink / raw)
  To: Christian Brauner, Jan Kara
  Cc: Jeff Layton, Josef Bacik, Christoph Hellwig, David Howells,
	Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel

We would like to make changes to the fsnotify access permission hook -
add file range arguments and add the pre modify event.

In preparation for these changes, split the fsnotify_perm() hook into
fsnotify_open_perm() and fsnotify_file_perm().

This is needed for fanotify "pre content" events.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/fsnotify.h | 34 +++++++++++++++++++---------------
 security/security.c      |  4 ++--
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index bcb6609b54b3..926bb4461b9e 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -100,29 +100,33 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
 	return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
 }
 
-/* Simple call site for access decisions */
-static inline int fsnotify_perm(struct file *file, int mask)
+/*
+ * fsnotify_file_perm - permission hook before file access
+ */
+static inline int fsnotify_file_perm(struct file *file, int perm_mask)
 {
-	int ret;
-	__u32 fsnotify_mask = 0;
+	__u32 fsnotify_mask = FS_ACCESS_PERM;
 
-	if (!(mask & (MAY_READ | MAY_OPEN)))
+	if (!(perm_mask & MAY_READ))
 		return 0;
 
-	if (mask & MAY_OPEN) {
-		fsnotify_mask = FS_OPEN_PERM;
+	return fsnotify_file(file, fsnotify_mask);
+}
 
-		if (file->f_flags & __FMODE_EXEC) {
-			ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
+/*
+ * fsnotify_open_perm - permission hook before file open
+ */
+static inline int fsnotify_open_perm(struct file *file)
+{
+	int ret;
 
-			if (ret)
-				return ret;
-		}
-	} else if (mask & MAY_READ) {
-		fsnotify_mask = FS_ACCESS_PERM;
+	if (file->f_flags & __FMODE_EXEC) {
+		ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
+		if (ret)
+			return ret;
 	}
 
-	return fsnotify_file(file, fsnotify_mask);
+	return fsnotify_file(file, FS_OPEN_PERM);
 }
 
 /*
diff --git a/security/security.c b/security/security.c
index dcb3e7014f9b..d7f3703c5905 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2586,7 +2586,7 @@ int security_file_permission(struct file *file, int mask)
 	if (ret)
 		return ret;
 
-	return fsnotify_perm(file, mask);
+	return fsnotify_file_perm(file, mask);
 }
 
 /**
@@ -2837,7 +2837,7 @@ int security_file_open(struct file *file)
 	if (ret)
 		return ret;
 
-	return fsnotify_perm(file, MAY_OPEN);
+	return fsnotify_open_perm(file);
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks
  2023-12-07 12:38 [PATCH 0/4] Prepare for fsnotify pre-content permission events Amir Goldstein
  2023-12-07 12:38 ` [PATCH 1/4] fs: use splice_copy_file_range() inline helper Amir Goldstein
  2023-12-07 12:38 ` [PATCH 2/4] fsnotify: split fsnotify_perm() into two hooks Amir Goldstein
@ 2023-12-07 12:38 ` Amir Goldstein
  2023-12-08 18:46   ` Jan Kara
  2023-12-07 12:38 ` [PATCH 4/4] fsnotify: pass access range in file " Amir Goldstein
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2023-12-07 12:38 UTC (permalink / raw)
  To: Christian Brauner, Jan Kara
  Cc: Jeff Layton, Josef Bacik, Christoph Hellwig, David Howells,
	Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel

filesystem may be modified in the context of fanotify permission events
(e.g. by HSM service), so assert that sb freeze protection is not held.

If the assertion fails, then the following deadlock would be possible:

CPU0				CPU1			CPU2
-------------------------------------------------------------------------
file_start_write()#0
...
  fsnotify_perm()
    fanotify_get_response() =>	(read event and fill file)
				...
				...			freeze_super()
				...			  sb_wait_write()
				...
				vfs_write()
				  file_start_write()#1

This example demonstrates a use case of an hierarchical storage management
(HSM) service that uses fanotify permission events to fill the content of
a file before access, while a 3rd process starts fsfreeze.

This creates a circular dependeny:
  file_start_write()#0 => fanotify_get_response =>
    file_start_write()#1 =>
      sb_wait_write() =>
        file_end_write()#0

Where file_end_write()#0 can never be called and none of the threads can
make progress.

The assertion is checked for both MAY_READ and MAY_WRITE permission
hooks in preparation for a pre-modify permission event.

The assertion is not checked for an open permission event, because
do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not
safe to write to filesystem in the content of an open permission event.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/fsnotify.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 926bb4461b9e..0a9d6a8a747a 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -107,6 +107,13 @@ static inline int fsnotify_file_perm(struct file *file, int perm_mask)
 {
 	__u32 fsnotify_mask = FS_ACCESS_PERM;
 
+	/*
+	 * filesystem may be modified in the context of permission events
+	 * (e.g. by HSM filling a file on access), so sb freeze protection
+	 * must not be held.
+	 */
+	lockdep_assert_once(file_write_not_started(file));
+
 	if (!(perm_mask & MAY_READ))
 		return 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] fsnotify: pass access range in file permission hooks
  2023-12-07 12:38 [PATCH 0/4] Prepare for fsnotify pre-content permission events Amir Goldstein
                   ` (2 preceding siblings ...)
  2023-12-07 12:38 ` [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks Amir Goldstein
@ 2023-12-07 12:38 ` Amir Goldstein
  2023-12-08 17:52   ` Christian Brauner
  2023-12-08 18:53   ` Jan Kara
  2023-12-07 21:51 ` [PATCH 0/4] Prepare for fsnotify pre-content permission events Josef Bacik
  2023-12-08 17:54 ` Christian Brauner
  5 siblings, 2 replies; 25+ messages in thread
From: Amir Goldstein @ 2023-12-07 12:38 UTC (permalink / raw)
  To: Christian Brauner, Jan Kara
  Cc: Jeff Layton, Josef Bacik, Christoph Hellwig, David Howells,
	Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel

In preparation for pre-content permission events with file access range,
move fsnotify_file_perm() hook out of security_file_permission() and into
the callers that have the access range information and pass the access
range to fsnotify_file_perm().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/open.c                |  4 ++++
 fs/read_write.c          | 10 ++++++++--
 fs/readdir.c             |  4 ++++
 fs/remap_range.c         |  8 +++++++-
 include/linux/fsnotify.h |  3 ++-
 security/security.c      |  8 +-------
 6 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 02dc608d40d8..530f70da69e1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -304,6 +304,10 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	if (ret)
 		return ret;
 
+	ret = fsnotify_file_perm(file, MAY_WRITE, &offset, len);
+	if (ret)
+		return ret;
+
 	if (S_ISFIFO(inode->i_mode))
 		return -ESPIPE;
 
diff --git a/fs/read_write.c b/fs/read_write.c
index 97a9d5c7ad96..1b5b0883edba 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -354,6 +354,9 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 
 int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t count)
 {
+	int mask = read_write == READ ? MAY_READ : MAY_WRITE;
+	int ret;
+
 	if (unlikely((ssize_t) count < 0))
 		return -EINVAL;
 
@@ -371,8 +374,11 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t
 		}
 	}
 
-	return security_file_permission(file,
-				read_write == READ ? MAY_READ : MAY_WRITE);
+	ret = security_file_permission(file, mask);
+	if (ret)
+		return ret;
+
+	return fsnotify_file_perm(file, mask, ppos, count);
 }
 EXPORT_SYMBOL(rw_verify_area);
 
diff --git a/fs/readdir.c b/fs/readdir.c
index c8c46e294431..684ae75d94a4 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -96,6 +96,10 @@ int iterate_dir(struct file *file, struct dir_context *ctx)
 	if (res)
 		goto out;
 
+	res = fsnotify_file_perm(file, MAY_READ, NULL, 0);
+	if (res)
+		goto out;
+
 	res = down_read_killable(&inode->i_rwsem);
 	if (res)
 		goto out;
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 12131f2a6c9e..ee4729aafbde 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -102,7 +102,9 @@ static int generic_remap_checks(struct file *file_in, loff_t pos_in,
 static int remap_verify_area(struct file *file, loff_t pos, loff_t len,
 			     bool write)
 {
+	int mask = write ? MAY_WRITE : MAY_READ;
 	loff_t tmp;
+	int ret;
 
 	if (unlikely(pos < 0 || len < 0))
 		return -EINVAL;
@@ -110,7 +112,11 @@ static int remap_verify_area(struct file *file, loff_t pos, loff_t len,
 	if (unlikely(check_add_overflow(pos, len, &tmp)))
 		return -EINVAL;
 
-	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
+	ret = security_file_permission(file, mask);
+	if (ret)
+		return ret;
+
+	return fsnotify_file_perm(file, mask, &pos, len);
 }
 
 /*
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 0a9d6a8a747a..45e6ecbca057 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -103,7 +103,8 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
 /*
  * fsnotify_file_perm - permission hook before file access
  */
-static inline int fsnotify_file_perm(struct file *file, int perm_mask)
+static inline int fsnotify_file_perm(struct file *file, int perm_mask,
+				     const loff_t *ppos, size_t count)
 {
 	__u32 fsnotify_mask = FS_ACCESS_PERM;
 
diff --git a/security/security.c b/security/security.c
index d7f3703c5905..2a7fc7881cbc 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2580,13 +2580,7 @@ int security_kernfs_init_security(struct kernfs_node *kn_dir,
  */
 int security_file_permission(struct file *file, int mask)
 {
-	int ret;
-
-	ret = call_int_hook(file_permission, 0, file, mask);
-	if (ret)
-		return ret;
-
-	return fsnotify_file_perm(file, mask);
+	return call_int_hook(file_permission, 0, file, mask);
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Prepare for fsnotify pre-content permission events
  2023-12-07 12:38 [PATCH 0/4] Prepare for fsnotify pre-content permission events Amir Goldstein
                   ` (3 preceding siblings ...)
  2023-12-07 12:38 ` [PATCH 4/4] fsnotify: pass access range in file " Amir Goldstein
@ 2023-12-07 21:51 ` Josef Bacik
  2023-12-08  7:34   ` Amir Goldstein
  2023-12-08 17:54 ` Christian Brauner
  5 siblings, 1 reply; 25+ messages in thread
From: Josef Bacik @ 2023-12-07 21:51 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Jan Kara, Jeff Layton, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Thu, Dec 07, 2023 at 02:38:21PM +0200, Amir Goldstein wrote:
> Hi Jan & Christian,
> 
> I am not planning to post the fanotify pre-content event patches [1]
> for 6.8.  Not because they are not ready, but because the usersapce
> example is not ready.
> 
> Also, I think it is a good idea to let the large permission hooks
> cleanup work to mature over the 6.8 cycle, before we introduce the
> pre-content events.
> 
> However, I would like to include the following vfs prep patches along
> with the vfs.rw PR for 6.8, which could be titled as the subject of
> this cover letter.
> 
> Patch 1 is a variant of a cleanup suggested by Christoph to get rid
> of the generic_copy_file_range() exported symbol.
> 
> Patches 2,3 add the file_write_not_started() assertion to fsnotify
> file permission hooks.  IMO, it is important to merge it along with
> vfs.rw because:
> 
> 1. This assert is how I tested vfs.rw does what it aimed to achieve
> 2. This will protect us from new callers that break the new order
> 3. The commit message of patch 3 provides the context for the entire
>    series and can be included in the PR message
> 
> Patch 4 is the final change of fsnotify permission hook locations/args
> and is the last of the vfs prerequsites for pre-content events.
> 
> If we merge patch 4 for 6.8, it will be much easier for the development
> of fanotify pre-content events in 6.9 dev cycle, which be contained
> within the fsnotify subsystem.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Can you get an fstest added that exercises the freeze deadlock?  I feel like
we're going to break that at some point and I'd rather find out in testing than
in production.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Prepare for fsnotify pre-content permission events
  2023-12-07 21:51 ` [PATCH 0/4] Prepare for fsnotify pre-content permission events Josef Bacik
@ 2023-12-08  7:34   ` Amir Goldstein
  2023-12-15 17:00     ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2023-12-08  7:34 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Christian Brauner, Jan Kara, Jeff Layton, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Thu, Dec 7, 2023 at 11:51 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Thu, Dec 07, 2023 at 02:38:21PM +0200, Amir Goldstein wrote:
> > Hi Jan & Christian,
> >
> > I am not planning to post the fanotify pre-content event patches [1]
> > for 6.8.  Not because they are not ready, but because the usersapce
> > example is not ready.
> >
> > Also, I think it is a good idea to let the large permission hooks
> > cleanup work to mature over the 6.8 cycle, before we introduce the
> > pre-content events.
> >
> > However, I would like to include the following vfs prep patches along
> > with the vfs.rw PR for 6.8, which could be titled as the subject of
> > this cover letter.
> >
> > Patch 1 is a variant of a cleanup suggested by Christoph to get rid
> > of the generic_copy_file_range() exported symbol.
> >
> > Patches 2,3 add the file_write_not_started() assertion to fsnotify
> > file permission hooks.  IMO, it is important to merge it along with
> > vfs.rw because:
> >
> > 1. This assert is how I tested vfs.rw does what it aimed to achieve
> > 2. This will protect us from new callers that break the new order
> > 3. The commit message of patch 3 provides the context for the entire
> >    series and can be included in the PR message
> >
> > Patch 4 is the final change of fsnotify permission hook locations/args
> > and is the last of the vfs prerequsites for pre-content events.
> >
> > If we merge patch 4 for 6.8, it will be much easier for the development
> > of fanotify pre-content events in 6.9 dev cycle, which be contained
> > within the fsnotify subsystem.
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>
> Can you get an fstest added that exercises the freeze deadlock?

I suppose that you mean a test that exercises the lockdep assertion?
This is much easier to do, so I don't see the point in actually testing
the deadlock. The only thing is that the assertion will not be backported
so this test would protect us from future regression, but will not nudge
stable kernel users to backport the deadlock fix, which I don't think they
should be doing anyway.

It is actually already exercised by tests overlay/068,069, but I can add
a generic test to get wider testing coverage.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] fs: use splice_copy_file_range() inline helper
  2023-12-07 12:38 ` [PATCH 1/4] fs: use splice_copy_file_range() inline helper Amir Goldstein
@ 2023-12-08 17:33   ` Christian Brauner
  2023-12-10 10:07     ` Amir Goldstein
  2023-12-08 18:27   ` Jan Kara
  1 sibling, 1 reply; 25+ messages in thread
From: Christian Brauner @ 2023-12-08 17:33 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Jeff Layton, Josef Bacik, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

> +static inline long splice_copy_file_range(struct file *in, loff_t pos_in,
> +					  struct file *out, loff_t pos_out,
> +					  size_t len)
> +{
> +	return splice_file_range(in, &pos_in, out, &pos_out,
> +				      min_t(size_t, len, MAX_RW_COUNT));
> +}

We should really cleanup the return values of the all the splice
helpers. Most callers of splice_file_range() use ssize_t already. So
does splice_direct_to_actor() and splice_to_socket(). IMO, all of the
splice helpers should just be changed to return ssize_t instead of long.
Doesn't have to be in this series though.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] fsnotify: pass access range in file permission hooks
  2023-12-07 12:38 ` [PATCH 4/4] fsnotify: pass access range in file " Amir Goldstein
@ 2023-12-08 17:52   ` Christian Brauner
  2023-12-08 18:53   ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Brauner @ 2023-12-08 17:52 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Jeff Layton, Josef Bacik, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Thu, Dec 07, 2023 at 02:38:25PM +0200, Amir Goldstein wrote:
> In preparation for pre-content permission events with file access range,
> move fsnotify_file_perm() hook out of security_file_permission() and into
> the callers that have the access range information and pass the access
> range to fsnotify_file_perm().

Not pleasant tbh. I really dislike that we have all this extra hook
machinery. In some places we have LSM hook, then IMA, then EVM, then
fsnotify. Luckily there's a push to move IMA and EVM into the LSM hooks
which is at least better and gets rid of some of that stuff. But not too
fond that we're moving fsnotify out of the hooks. But since there's no
obvious way to consolidate fsnotify and that LSM stuff it's ok as far as
I'm concerned.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Prepare for fsnotify pre-content permission events
  2023-12-07 12:38 [PATCH 0/4] Prepare for fsnotify pre-content permission events Amir Goldstein
                   ` (4 preceding siblings ...)
  2023-12-07 21:51 ` [PATCH 0/4] Prepare for fsnotify pre-content permission events Josef Bacik
@ 2023-12-08 17:54 ` Christian Brauner
  5 siblings, 0 replies; 25+ messages in thread
From: Christian Brauner @ 2023-12-08 17:54 UTC (permalink / raw)
  To: Jan Kara, Amir Goldstein
  Cc: Christian Brauner, Jeff Layton, Josef Bacik, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Thu, 07 Dec 2023 14:38:21 +0200, Amir Goldstein wrote:
> I am not planning to post the fanotify pre-content event patches [1]
> for 6.8.  Not because they are not ready, but because the usersapce
> example is not ready.
> 
> Also, I think it is a good idea to let the large permission hooks
> cleanup work to mature over the 6.8 cycle, before we introduce the
> pre-content events.
> 
> [...]

Picking this up to get it into -next rather sooner than later. But @Jan,
I'll wait for your Acks.

---

Applied to the vfs.rw branch of the vfs/vfs.git tree.
Patches in the vfs.rw branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.rw

[1/4] fs: use splice_copy_file_range() inline helper
      https://git.kernel.org/vfs/vfs/c/4955c918c9e5
[2/4] fsnotify: split fsnotify_perm() into two hooks
      https://git.kernel.org/vfs/vfs/c/d2fc40363ab1
[3/4] fsnotify: assert that file_start_write() is not held in permission hooks
      https://git.kernel.org/vfs/vfs/c/24065342b941
[4/4] fsnotify: pass access range in file permission hooks
      https://git.kernel.org/vfs/vfs/c/e5c56a33657b

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] fs: use splice_copy_file_range() inline helper
  2023-12-07 12:38 ` [PATCH 1/4] fs: use splice_copy_file_range() inline helper Amir Goldstein
  2023-12-08 17:33   ` Christian Brauner
@ 2023-12-08 18:27   ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Kara @ 2023-12-08 18:27 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Jan Kara, Jeff Layton, Josef Bacik,
	Christoph Hellwig, David Howells, Jens Axboe, Miklos Szeredi,
	Al Viro, linux-fsdevel

On Thu 07-12-23 14:38:22, Amir Goldstein wrote:
> generic_copy_file_range() is just a wrapper around splice_file_range().
> Move the wrapper to splice.h and rename it to splice_copy_file_range().
> 
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Link: https://lore.kernel.org/linux-fsdevel/20231204083849.GC32438@lst.de/
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ceph/file.c         |  4 ++--
>  fs/fuse/file.c         |  5 +++--
>  fs/nfs/nfs4file.c      |  5 +++--
>  fs/read_write.c        | 34 ----------------------------------
>  fs/smb/client/cifsfs.c |  5 +++--
>  fs/splice.c            |  2 +-
>  include/linux/fs.h     |  3 ---
>  include/linux/splice.h |  8 ++++++++
>  8 files changed, 20 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index f11de6e1f1c1..d380d9dad0e0 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -3090,8 +3090,8 @@ static ssize_t ceph_copy_file_range(struct file *src_file, loff_t src_off,
>  				     len, flags);
>  
>  	if (ret == -EOPNOTSUPP || ret == -EXDEV)
> -		ret = generic_copy_file_range(src_file, src_off, dst_file,
> -					      dst_off, len, flags);
> +		ret = splice_copy_file_range(src_file, src_off, dst_file,
> +					     dst_off, len);
>  	return ret;
>  }
>  
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index a660f1f21540..148a71b8b4d0 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -19,6 +19,7 @@
>  #include <linux/uio.h>
>  #include <linux/fs.h>
>  #include <linux/filelock.h>
> +#include <linux/splice.h>
>  
>  static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
>  			  unsigned int open_flags, int opcode,
> @@ -3195,8 +3196,8 @@ static ssize_t fuse_copy_file_range(struct file *src_file, loff_t src_off,
>  				     len, flags);
>  
>  	if (ret == -EOPNOTSUPP || ret == -EXDEV)
> -		ret = generic_copy_file_range(src_file, src_off, dst_file,
> -					      dst_off, len, flags);
> +		ret = splice_copy_file_range(src_file, src_off, dst_file,
> +					     dst_off, len);
>  	return ret;
>  }
>  
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index 02788c3c85e5..e238abc78a13 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -10,6 +10,7 @@
>  #include <linux/mount.h>
>  #include <linux/nfs_fs.h>
>  #include <linux/nfs_ssc.h>
> +#include <linux/splice.h>
>  #include "delegation.h"
>  #include "internal.h"
>  #include "iostat.h"
> @@ -195,8 +196,8 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
>  	ret = __nfs4_copy_file_range(file_in, pos_in, file_out, pos_out, count,
>  				     flags);
>  	if (ret == -EOPNOTSUPP || ret == -EXDEV)
> -		ret = generic_copy_file_range(file_in, pos_in, file_out,
> -					      pos_out, count, flags);
> +		ret = splice_copy_file_range(file_in, pos_in, file_out,
> +					     pos_out, count);
>  	return ret;
>  }
>  
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 01a14570015b..97a9d5c7ad96 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1396,40 +1396,6 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
>  }
>  #endif
>  
> -/**
> - * generic_copy_file_range - copy data between two files
> - * @file_in:	file structure to read from
> - * @pos_in:	file offset to read from
> - * @file_out:	file structure to write data to
> - * @pos_out:	file offset to write data to
> - * @len:	amount of data to copy
> - * @flags:	copy flags
> - *
> - * This is a generic filesystem helper to copy data from one file to another.
> - * It has no constraints on the source or destination file owners - the files
> - * can belong to different superblocks and different filesystem types. Short
> - * copies are allowed.
> - *
> - * This should be called from the @file_out filesystem, as per the
> - * ->copy_file_range() method.
> - *
> - * Returns the number of bytes copied or a negative error indicating the
> - * failure.
> - */
> -
> -ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
> -				struct file *file_out, loff_t pos_out,
> -				size_t len, unsigned int flags)
> -{
> -	/* May only be called from within ->copy_file_range() methods */
> -	if (WARN_ON_ONCE(flags))
> -		return -EINVAL;
> -
> -	return splice_file_range(file_in, &pos_in, file_out, &pos_out,
> -				 min_t(size_t, len, MAX_RW_COUNT));
> -}
> -EXPORT_SYMBOL(generic_copy_file_range);
> -
>  /*
>   * Performs necessary checks before doing a file copy
>   *
> diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
> index ea3a7a668b45..ee461bf0ef63 100644
> --- a/fs/smb/client/cifsfs.c
> +++ b/fs/smb/client/cifsfs.c
> @@ -25,6 +25,7 @@
>  #include <linux/freezer.h>
>  #include <linux/namei.h>
>  #include <linux/random.h>
> +#include <linux/splice.h>
>  #include <linux/uuid.h>
>  #include <linux/xattr.h>
>  #include <uapi/linux/magic.h>
> @@ -1362,8 +1363,8 @@ static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off,
>  	free_xid(xid);
>  
>  	if (rc == -EOPNOTSUPP || rc == -EXDEV)
> -		rc = generic_copy_file_range(src_file, off, dst_file,
> -					     destoff, len, flags);
> +		rc = splice_copy_file_range(src_file, off, dst_file,
> +					    destoff, len);
>  	return rc;
>  }
>  
> diff --git a/fs/splice.c b/fs/splice.c
> index 7cda013e5a1e..24bd93f8e4c3 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1243,7 +1243,7 @@ EXPORT_SYMBOL(do_splice_direct);
>   * @len:	number of bytes to splice
>   *
>   * Description:
> - *    For use by generic_copy_file_range() and ->copy_file_range() methods.
> + *    For use by ->copy_file_range() methods.
>   *    Like do_splice_direct(), but vfs_copy_file_range() already holds
>   *    start_file_write() on @out file.
>   *
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 04422a0eccdd..900d0cd55b50 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2090,9 +2090,6 @@ extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
>  extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
>  extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
>  				   loff_t, size_t, unsigned int);
> -extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
> -				       struct file *file_out, loff_t pos_out,
> -				       size_t len, unsigned int flags);
>  int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
>  				    struct file *file_out, loff_t pos_out,
>  				    loff_t *len, unsigned int remap_flags,
> diff --git a/include/linux/splice.h b/include/linux/splice.h
> index 49532d5dda52..b92c4676c59b 100644
> --- a/include/linux/splice.h
> +++ b/include/linux/splice.h
> @@ -89,6 +89,14 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
>  long splice_file_range(struct file *in, loff_t *ppos, struct file *out,
>  		       loff_t *opos, size_t len);
>  
> +static inline long splice_copy_file_range(struct file *in, loff_t pos_in,
> +					  struct file *out, loff_t pos_out,
> +					  size_t len)
> +{
> +	return splice_file_range(in, &pos_in, out, &pos_out,
> +				      min_t(size_t, len, MAX_RW_COUNT));
> +}
> +
>  extern long do_tee(struct file *in, struct file *out, size_t len,
>  		   unsigned int flags);
>  extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
> -- 
> 2.34.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] fsnotify: split fsnotify_perm() into two hooks
  2023-12-07 12:38 ` [PATCH 2/4] fsnotify: split fsnotify_perm() into two hooks Amir Goldstein
@ 2023-12-08 18:33   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2023-12-08 18:33 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Jan Kara, Jeff Layton, Josef Bacik,
	Christoph Hellwig, David Howells, Jens Axboe, Miklos Szeredi,
	Al Viro, linux-fsdevel

On Thu 07-12-23 14:38:23, Amir Goldstein wrote:
> We would like to make changes to the fsnotify access permission hook -
> add file range arguments and add the pre modify event.
> 
> In preparation for these changes, split the fsnotify_perm() hook into
> fsnotify_open_perm() and fsnotify_file_perm().
> 
> This is needed for fanotify "pre content" events.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/fsnotify.h | 34 +++++++++++++++++++---------------
>  security/security.c      |  4 ++--
>  2 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index bcb6609b54b3..926bb4461b9e 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -100,29 +100,33 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
>  	return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
>  }
>  
> -/* Simple call site for access decisions */
> -static inline int fsnotify_perm(struct file *file, int mask)
> +/*
> + * fsnotify_file_perm - permission hook before file access
> + */
> +static inline int fsnotify_file_perm(struct file *file, int perm_mask)
>  {
> -	int ret;
> -	__u32 fsnotify_mask = 0;
> +	__u32 fsnotify_mask = FS_ACCESS_PERM;
>  
> -	if (!(mask & (MAY_READ | MAY_OPEN)))
> +	if (!(perm_mask & MAY_READ))
>  		return 0;
>  
> -	if (mask & MAY_OPEN) {
> -		fsnotify_mask = FS_OPEN_PERM;
> +	return fsnotify_file(file, fsnotify_mask);
> +}
>  
> -		if (file->f_flags & __FMODE_EXEC) {
> -			ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
> +/*
> + * fsnotify_open_perm - permission hook before file open
> + */
> +static inline int fsnotify_open_perm(struct file *file)
> +{
> +	int ret;
>  
> -			if (ret)
> -				return ret;
> -		}
> -	} else if (mask & MAY_READ) {
> -		fsnotify_mask = FS_ACCESS_PERM;
> +	if (file->f_flags & __FMODE_EXEC) {
> +		ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
> +		if (ret)
> +			return ret;
>  	}
>  
> -	return fsnotify_file(file, fsnotify_mask);
> +	return fsnotify_file(file, FS_OPEN_PERM);
>  }
>  
>  /*
> diff --git a/security/security.c b/security/security.c
> index dcb3e7014f9b..d7f3703c5905 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -2586,7 +2586,7 @@ int security_file_permission(struct file *file, int mask)
>  	if (ret)
>  		return ret;
>  
> -	return fsnotify_perm(file, mask);
> +	return fsnotify_file_perm(file, mask);
>  }
>  
>  /**
> @@ -2837,7 +2837,7 @@ int security_file_open(struct file *file)
>  	if (ret)
>  		return ret;
>  
> -	return fsnotify_perm(file, MAY_OPEN);
> +	return fsnotify_open_perm(file);
>  }
>  
>  /**
> -- 
> 2.34.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks
  2023-12-07 12:38 ` [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks Amir Goldstein
@ 2023-12-08 18:46   ` Jan Kara
  2023-12-08 21:02     ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2023-12-08 18:46 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Jan Kara, Jeff Layton, Josef Bacik,
	Christoph Hellwig, David Howells, Jens Axboe, Miklos Szeredi,
	Al Viro, linux-fsdevel

On Thu 07-12-23 14:38:24, Amir Goldstein wrote:
> filesystem may be modified in the context of fanotify permission events
> (e.g. by HSM service), so assert that sb freeze protection is not held.
> 
> If the assertion fails, then the following deadlock would be possible:
> 
> CPU0				CPU1			CPU2
> -------------------------------------------------------------------------
> file_start_write()#0
> ...
>   fsnotify_perm()
>     fanotify_get_response() =>	(read event and fill file)
> 				...
> 				...			freeze_super()
> 				...			  sb_wait_write()
> 				...
> 				vfs_write()
> 				  file_start_write()#1
> 
> This example demonstrates a use case of an hierarchical storage management
> (HSM) service that uses fanotify permission events to fill the content of
> a file before access, while a 3rd process starts fsfreeze.
> 
> This creates a circular dependeny:
>   file_start_write()#0 => fanotify_get_response =>
>     file_start_write()#1 =>
>       sb_wait_write() =>
>         file_end_write()#0
> 
> Where file_end_write()#0 can never be called and none of the threads can
> make progress.
> 
> The assertion is checked for both MAY_READ and MAY_WRITE permission
> hooks in preparation for a pre-modify permission event.
> 
> The assertion is not checked for an open permission event, because
> do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not
> safe to write to filesystem in the content of an open permission event.
				     ^^^^^ context

BTW, isn't this a bit inconvenient? I mean filling file contents on open
looks quite natural... Do you plan to fill files only on individual read /
write events? I was under the impression simple HSM handlers would be doing
it on open.
 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Anyway this particular change looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/fsnotify.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 926bb4461b9e..0a9d6a8a747a 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -107,6 +107,13 @@ static inline int fsnotify_file_perm(struct file *file, int perm_mask)
>  {
>  	__u32 fsnotify_mask = FS_ACCESS_PERM;
>  
> +	/*
> +	 * filesystem may be modified in the context of permission events
> +	 * (e.g. by HSM filling a file on access), so sb freeze protection
> +	 * must not be held.
> +	 */
> +	lockdep_assert_once(file_write_not_started(file));
> +
>  	if (!(perm_mask & MAY_READ))
>  		return 0;
>  
> -- 
> 2.34.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] fsnotify: pass access range in file permission hooks
  2023-12-07 12:38 ` [PATCH 4/4] fsnotify: pass access range in file " Amir Goldstein
  2023-12-08 17:52   ` Christian Brauner
@ 2023-12-08 18:53   ` Jan Kara
  2023-12-08 21:34     ` Amir Goldstein
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Kara @ 2023-12-08 18:53 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Jan Kara, Jeff Layton, Josef Bacik,
	Christoph Hellwig, David Howells, Jens Axboe, Miklos Szeredi,
	Al Viro, linux-fsdevel

On Thu 07-12-23 14:38:25, Amir Goldstein wrote:
> In preparation for pre-content permission events with file access range,
> move fsnotify_file_perm() hook out of security_file_permission() and into
> the callers that have the access range information and pass the access
> range to fsnotify_file_perm().
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

So why don't you pass the range into security_file_permission() instead of
pulling fsnotify out of the hook? I mean conceptually the accessed range
makes sense for the hook as well although no LSM currently cares about it.
Also it would address the Christian's concern.

> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 0a9d6a8a747a..45e6ecbca057 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -103,7 +103,8 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
>  /*
>   * fsnotify_file_perm - permission hook before file access
>   */
> -static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> +static inline int fsnotify_file_perm(struct file *file, int perm_mask,
> +				     const loff_t *ppos, size_t count)
>  {
>  	__u32 fsnotify_mask = FS_ACCESS_PERM;

Also why do you actually pass in loff_t * instead of plain loff_t? You
don't plan to change it, do you?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks
  2023-12-08 18:46   ` Jan Kara
@ 2023-12-08 21:02     ` Amir Goldstein
  2023-12-11 10:30       ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2023-12-08 21:02 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Jeff Layton, Josef Bacik, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Fri, Dec 8, 2023 at 8:46 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 07-12-23 14:38:24, Amir Goldstein wrote:
> > filesystem may be modified in the context of fanotify permission events
> > (e.g. by HSM service), so assert that sb freeze protection is not held.
> >
> > If the assertion fails, then the following deadlock would be possible:
> >
> > CPU0                          CPU1                    CPU2
> > -------------------------------------------------------------------------
> > file_start_write()#0
> > ...
> >   fsnotify_perm()
> >     fanotify_get_response() =>        (read event and fill file)
> >                               ...
> >                               ...                     freeze_super()
> >                               ...                       sb_wait_write()
> >                               ...
> >                               vfs_write()
> >                                 file_start_write()#1
> >
> > This example demonstrates a use case of an hierarchical storage management
> > (HSM) service that uses fanotify permission events to fill the content of
> > a file before access, while a 3rd process starts fsfreeze.
> >
> > This creates a circular dependeny:
> >   file_start_write()#0 => fanotify_get_response =>
> >     file_start_write()#1 =>
> >       sb_wait_write() =>
> >         file_end_write()#0
> >
> > Where file_end_write()#0 can never be called and none of the threads can
> > make progress.
> >
> > The assertion is checked for both MAY_READ and MAY_WRITE permission
> > hooks in preparation for a pre-modify permission event.
> >
> > The assertion is not checked for an open permission event, because
> > do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not
> > safe to write to filesystem in the content of an open permission event.
>                                      ^^^^^ context
>
> BTW, isn't this a bit inconvenient? I mean filling file contents on open
> looks quite natural... Do you plan to fill files only on individual read /
> write events? I was under the impression simple HSM handlers would be doing
> it on open.
>

Naive HSMs perhaps... The problem with filling on open is that the pattern
open();fstat();close() is quite common with applications and we found open()
to be a sub-optimal predicate for near future read().

Filling the file on first read() access or directory on first
readdir() access does
a better job in "swapping in" the correct files.
A simple HSM would just fill the entire file/dir on the first PRE_ACCESS event.
that's not any more or less simple than filling it on an OPEN_PERM event.

Another point that could get lost when reading to above deadlock is that
filling the file content before open(O_TRUNC) would be really dumb,
because swap in is costly and you are going to throw away the data.
If we really wanted to provide HSM with a safe way to fill files on open,
we would probably need to report the open flags with the open event.
I actually think that reporting the open flags would be nice even with
an async open event.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] fsnotify: pass access range in file permission hooks
  2023-12-08 18:53   ` Jan Kara
@ 2023-12-08 21:34     ` Amir Goldstein
  2023-12-10 13:24       ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2023-12-08 21:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Jeff Layton, Josef Bacik, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Fri, Dec 8, 2023 at 8:53 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 07-12-23 14:38:25, Amir Goldstein wrote:
> > In preparation for pre-content permission events with file access range,
> > move fsnotify_file_perm() hook out of security_file_permission() and into
> > the callers that have the access range information and pass the access
> > range to fsnotify_file_perm().
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> So why don't you pass the range into security_file_permission() instead of
> pulling fsnotify out of the hook? I mean conceptually the accessed range
> makes sense for the hook as well although no LSM currently cares about it.
> Also it would address the Christian's concern.
>

I don't think it would be nice if the signature of
security_file_permission() did not match the signature of the LSM
methods of the same name (i.e. call_int_hook(file_permission, 0, file, mask);

I think that calling fsnotify_perm(), which is not an LSM,
from security_file_permission()/security_file_open() was a hack to
begin with and that decoupling fsnotify_perm() hooks from security
hooks is a good thing.

Also, it is needed for future work. If you look at commit
"fs: hold s_write_srcu for pre-modify permission events on write"
on my sb_write_barrier branch, you will see that the fsnotify
modify permission hook moves (again) into start_file_write_area().

So yes, fsnotify permission hooks become first class vfs hooks and
not LSM hooks, but then again, fsnotify_xxx hooks for async events
are already first class vfs hooks, and many times in the same
vfs_xxx helpers that will have the fsnotify permission hooks in them.

> > diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> > index 0a9d6a8a747a..45e6ecbca057 100644
> > --- a/include/linux/fsnotify.h
> > +++ b/include/linux/fsnotify.h
> > @@ -103,7 +103,8 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
> >  /*
> >   * fsnotify_file_perm - permission hook before file access
> >   */
> > -static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> > +static inline int fsnotify_file_perm(struct file *file, int perm_mask,
> > +                                  const loff_t *ppos, size_t count)
> >  {
> >       __u32 fsnotify_mask = FS_ACCESS_PERM;
>
> Also why do you actually pass in loff_t * instead of plain loff_t? You
> don't plan to change it, do you?

No I don't.

I used NULL to communicate "no range info" to fanotify.
It is currently only used from iterate_dir(), but filesystems may need to
use that to report other cases of pre-content access with no clear range info.

I could leave fsnotify_file_perm(file, mask) for reporting events without
range info and add fsnotify_file_area(file, mask, pos, count) for reporting
access permission with range info.

Do you think that would be better?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] fs: use splice_copy_file_range() inline helper
  2023-12-08 17:33   ` Christian Brauner
@ 2023-12-10 10:07     ` Amir Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2023-12-10 10:07 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Jeff Layton, Josef Bacik, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Fri, Dec 8, 2023 at 7:33 PM Christian Brauner <brauner@kernel.org> wrote:
>
> > +static inline long splice_copy_file_range(struct file *in, loff_t pos_in,
> > +                                       struct file *out, loff_t pos_out,
> > +                                       size_t len)
> > +{
> > +     return splice_file_range(in, &pos_in, out, &pos_out,
> > +                                   min_t(size_t, len, MAX_RW_COUNT));
> > +}
>
> We should really cleanup the return values of the all the splice
> helpers. Most callers of splice_file_range() use ssize_t already. So
> does splice_direct_to_actor() and splice_to_socket(). IMO, all of the
> splice helpers should just be changed to return ssize_t instead of long.
> Doesn't have to be in this series though.

I agree. This is very annoying. I will add this cleanup patch to v2.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] fsnotify: pass access range in file permission hooks
  2023-12-08 21:34     ` Amir Goldstein
@ 2023-12-10 13:24       ` Amir Goldstein
  2023-12-11 11:49         ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2023-12-10 13:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Jeff Layton, Josef Bacik, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

> > > diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> > > index 0a9d6a8a747a..45e6ecbca057 100644
> > > --- a/include/linux/fsnotify.h
> > > +++ b/include/linux/fsnotify.h
> > > @@ -103,7 +103,8 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
> > >  /*
> > >   * fsnotify_file_perm - permission hook before file access
> > >   */
> > > -static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> > > +static inline int fsnotify_file_perm(struct file *file, int perm_mask,
> > > +                                  const loff_t *ppos, size_t count)
> > >  {
> > >       __u32 fsnotify_mask = FS_ACCESS_PERM;
> >
> > Also why do you actually pass in loff_t * instead of plain loff_t? You
> > don't plan to change it, do you?
>
> No I don't.

Please note that the pointer is to const loff_t.

>
> I used NULL to communicate "no range info" to fanotify.
> It is currently only used from iterate_dir(), but filesystems may need to
> use that to report other cases of pre-content access with no clear range info.

Correction. iterate_dir() is not the only case.
The callers that use file_ppos(), namely ksys_{read,write}, do_{readv,writev}()
will pass a NULL ppos for an FMODE_STREAM file.
The only sane behavior I could come up with for those cases
is to not report range_info with the FAN_PRE_ACCESS event.

>
> I could leave fsnotify_file_perm(file, mask) for reporting events without
> range info and add fsnotify_file_area(file, mask, pos, count) for reporting
> access permission with range info.
>

I renamed the hook in v2 to fsnotify_file_area_perm() and added a wrapper:

static inline int fsnotify_file_perm(struct file *file, int perm_mask)
{
        return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
}

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks
  2023-12-08 21:02     ` Amir Goldstein
@ 2023-12-11 10:30       ` Jan Kara
  2023-12-11 10:57         ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2023-12-11 10:30 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Christian Brauner, Jeff Layton, Josef Bacik,
	Christoph Hellwig, David Howells, Jens Axboe, Miklos Szeredi,
	Al Viro, linux-fsdevel

On Fri 08-12-23 23:02:35, Amir Goldstein wrote:
> On Fri, Dec 8, 2023 at 8:46 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 07-12-23 14:38:24, Amir Goldstein wrote:
> > > filesystem may be modified in the context of fanotify permission events
> > > (e.g. by HSM service), so assert that sb freeze protection is not held.
> > >
> > > If the assertion fails, then the following deadlock would be possible:
> > >
> > > CPU0                          CPU1                    CPU2
> > > -------------------------------------------------------------------------
> > > file_start_write()#0
> > > ...
> > >   fsnotify_perm()
> > >     fanotify_get_response() =>        (read event and fill file)
> > >                               ...
> > >                               ...                     freeze_super()
> > >                               ...                       sb_wait_write()
> > >                               ...
> > >                               vfs_write()
> > >                                 file_start_write()#1
> > >
> > > This example demonstrates a use case of an hierarchical storage management
> > > (HSM) service that uses fanotify permission events to fill the content of
> > > a file before access, while a 3rd process starts fsfreeze.
> > >
> > > This creates a circular dependeny:
> > >   file_start_write()#0 => fanotify_get_response =>
> > >     file_start_write()#1 =>
> > >       sb_wait_write() =>
> > >         file_end_write()#0
> > >
> > > Where file_end_write()#0 can never be called and none of the threads can
> > > make progress.
> > >
> > > The assertion is checked for both MAY_READ and MAY_WRITE permission
> > > hooks in preparation for a pre-modify permission event.
> > >
> > > The assertion is not checked for an open permission event, because
> > > do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not
> > > safe to write to filesystem in the content of an open permission event.
> >                                      ^^^^^ context
> >
> > BTW, isn't this a bit inconvenient? I mean filling file contents on open
> > looks quite natural... Do you plan to fill files only on individual read /
> > write events? I was under the impression simple HSM handlers would be doing
> > it on open.
> >
> 
> Naive HSMs perhaps... The problem with filling on open is that the pattern
> open();fstat();close() is quite common with applications and we found open()
> to be a sub-optimal predicate for near future read().
> 
> Filling the file on first read() access or directory on first
> readdir() access does
> a better job in "swapping in" the correct files.
> A simple HSM would just fill the entire file/dir on the first PRE_ACCESS event.
> that's not any more or less simple than filling it on an OPEN_PERM event.
> 
> Another point that could get lost when reading to above deadlock is that
> filling the file content before open(O_TRUNC) would be really dumb,
> because swap in is costly and you are going to throw away the data.
> If we really wanted to provide HSM with a safe way to fill files on open,
> we would probably need to report the open flags with the open event.
> I actually think that reporting the open flags would be nice even with
> an async open event.

OK, thanks for explanation!

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks
  2023-12-11 10:30       ` Jan Kara
@ 2023-12-11 10:57         ` Amir Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2023-12-11 10:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Jeff Layton, Josef Bacik, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Mon, Dec 11, 2023 at 12:30 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 08-12-23 23:02:35, Amir Goldstein wrote:
> > On Fri, Dec 8, 2023 at 8:46 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Thu 07-12-23 14:38:24, Amir Goldstein wrote:
> > > > filesystem may be modified in the context of fanotify permission events
> > > > (e.g. by HSM service), so assert that sb freeze protection is not held.
> > > >
> > > > If the assertion fails, then the following deadlock would be possible:
> > > >
> > > > CPU0                          CPU1                    CPU2
> > > > -------------------------------------------------------------------------
> > > > file_start_write()#0
> > > > ...
> > > >   fsnotify_perm()
> > > >     fanotify_get_response() =>        (read event and fill file)
> > > >                               ...
> > > >                               ...                     freeze_super()
> > > >                               ...                       sb_wait_write()
> > > >                               ...
> > > >                               vfs_write()
> > > >                                 file_start_write()#1
> > > >
> > > > This example demonstrates a use case of an hierarchical storage management
> > > > (HSM) service that uses fanotify permission events to fill the content of
> > > > a file before access, while a 3rd process starts fsfreeze.
> > > >
> > > > This creates a circular dependeny:
> > > >   file_start_write()#0 => fanotify_get_response =>
> > > >     file_start_write()#1 =>
> > > >       sb_wait_write() =>
> > > >         file_end_write()#0
> > > >
> > > > Where file_end_write()#0 can never be called and none of the threads can
> > > > make progress.
> > > >
> > > > The assertion is checked for both MAY_READ and MAY_WRITE permission
> > > > hooks in preparation for a pre-modify permission event.
> > > >
> > > > The assertion is not checked for an open permission event, because
> > > > do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not
> > > > safe to write to filesystem in the content of an open permission event.
> > >                                      ^^^^^ context
> > >
> > > BTW, isn't this a bit inconvenient? I mean filling file contents on open
> > > looks quite natural... Do you plan to fill files only on individual read /
> > > write events? I was under the impression simple HSM handlers would be doing
> > > it on open.
> > >
> >
> > Naive HSMs perhaps... The problem with filling on open is that the pattern
> > open();fstat();close() is quite common with applications and we found open()
> > to be a sub-optimal predicate for near future read().
> >
> > Filling the file on first read() access or directory on first
> > readdir() access does
> > a better job in "swapping in" the correct files.
> > A simple HSM would just fill the entire file/dir on the first PRE_ACCESS event.
> > that's not any more or less simple than filling it on an OPEN_PERM event.
> >
> > Another point that could get lost when reading to above deadlock is that
> > filling the file content before open(O_TRUNC) would be really dumb,
> > because swap in is costly and you are going to throw away the data.
> > If we really wanted to provide HSM with a safe way to fill files on open,
> > we would probably need to report the open flags with the open event.
> > I actually think that reporting the open flags would be nice even with
> > an async open event.
>
> OK, thanks for explanation!

Anyway, I will try to find a solution also for the OPEN_PERM deadlock.
I have some sketches, but ->atomic_open() complicates things...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] fsnotify: pass access range in file permission hooks
  2023-12-10 13:24       ` Amir Goldstein
@ 2023-12-11 11:49         ` Jan Kara
  2023-12-11 12:00           ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2023-12-11 11:49 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Christian Brauner, Jeff Layton, Josef Bacik,
	Christoph Hellwig, David Howells, Jens Axboe, Miklos Szeredi,
	Al Viro, linux-fsdevel

On Sun 10-12-23 15:24:00, Amir Goldstein wrote:
> > > > diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> > > > index 0a9d6a8a747a..45e6ecbca057 100644
> > > > --- a/include/linux/fsnotify.h
> > > > +++ b/include/linux/fsnotify.h
> > > > @@ -103,7 +103,8 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
> > > >  /*
> > > >   * fsnotify_file_perm - permission hook before file access
> > > >   */
> > > > -static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> > > > +static inline int fsnotify_file_perm(struct file *file, int perm_mask,
> > > > +                                  const loff_t *ppos, size_t count)
> > > >  {
> > > >       __u32 fsnotify_mask = FS_ACCESS_PERM;
> > >
> > > Also why do you actually pass in loff_t * instead of plain loff_t? You
> > > don't plan to change it, do you?
> >
> > No I don't.
> 
> Please note that the pointer is to const loff_t.
> 
> >
> > I used NULL to communicate "no range info" to fanotify.
> > It is currently only used from iterate_dir(), but filesystems may need to
> > use that to report other cases of pre-content access with no clear range info.
> 
> Correction. iterate_dir() is not the only case.
> The callers that use file_ppos(), namely ksys_{read,write}, do_{readv,writev}()
> will pass a NULL ppos for an FMODE_STREAM file.
> The only sane behavior I could come up with for those cases
> is to not report range_info with the FAN_PRE_ACCESS event.

OK, understood. But isn't anything with len == 0 in fact "no valid range
provided" case? So we could use that to identify a case where we simply
don't report any range with the event without a need to pass the pointer?

> > I could leave fsnotify_file_perm(file, mask) for reporting events without
> > range info and add fsnotify_file_area(file, mask, pos, count) for reporting
> > access permission with range info.
> >
> 
> I renamed the hook in v2 to fsnotify_file_area_perm() and added a wrapper:
> 
> static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> {
>         return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
> }

Otherwise this works for me as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] fsnotify: pass access range in file permission hooks
  2023-12-11 11:49         ` Jan Kara
@ 2023-12-11 12:00           ` Amir Goldstein
  2023-12-11 14:53             ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2023-12-11 12:00 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Jeff Layton, Josef Bacik, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Mon, Dec 11, 2023 at 1:49 PM Jan Kara <jack@suse.cz> wrote:
>
> On Sun 10-12-23 15:24:00, Amir Goldstein wrote:
> > > > > diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> > > > > index 0a9d6a8a747a..45e6ecbca057 100644
> > > > > --- a/include/linux/fsnotify.h
> > > > > +++ b/include/linux/fsnotify.h
> > > > > @@ -103,7 +103,8 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
> > > > >  /*
> > > > >   * fsnotify_file_perm - permission hook before file access
> > > > >   */
> > > > > -static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> > > > > +static inline int fsnotify_file_perm(struct file *file, int perm_mask,
> > > > > +                                  const loff_t *ppos, size_t count)
> > > > >  {
> > > > >       __u32 fsnotify_mask = FS_ACCESS_PERM;
> > > >
> > > > Also why do you actually pass in loff_t * instead of plain loff_t? You
> > > > don't plan to change it, do you?
> > >
> > > No I don't.
> >
> > Please note that the pointer is to const loff_t.
> >
> > >
> > > I used NULL to communicate "no range info" to fanotify.
> > > It is currently only used from iterate_dir(), but filesystems may need to
> > > use that to report other cases of pre-content access with no clear range info.
> >
> > Correction. iterate_dir() is not the only case.
> > The callers that use file_ppos(), namely ksys_{read,write}, do_{readv,writev}()
> > will pass a NULL ppos for an FMODE_STREAM file.
> > The only sane behavior I could come up with for those cases
> > is to not report range_info with the FAN_PRE_ACCESS event.
>
> OK, understood. But isn't anything with len == 0 in fact "no valid range
> provided" case? So we could use that to identify a case where we simply
> don't report any range with the event without a need to pass the pointer?
>

IDK. read(2) and pread(2) with count=0 is a valid use case.
and we have reported FAN_ACCESS_PERM for those 0 length calls so far.
reporting those call with no range would be bad for HSM, because all
HSM can do with these events is to fill the entire file content.

Filling the entire file content is a valid action if the backing file does not
support seek or if it is a directory, but it is not a valid action for
an application
induced pread() with zero length.

Did I misunderstand what you meant?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] fsnotify: pass access range in file permission hooks
  2023-12-11 12:00           ` Amir Goldstein
@ 2023-12-11 14:53             ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2023-12-11 14:53 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Christian Brauner, Jeff Layton, Josef Bacik,
	Christoph Hellwig, David Howells, Jens Axboe, Miklos Szeredi,
	Al Viro, linux-fsdevel

On Mon 11-12-23 14:00:58, Amir Goldstein wrote:
> On Mon, Dec 11, 2023 at 1:49 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Sun 10-12-23 15:24:00, Amir Goldstein wrote:
> > > > > > diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> > > > > > index 0a9d6a8a747a..45e6ecbca057 100644
> > > > > > --- a/include/linux/fsnotify.h
> > > > > > +++ b/include/linux/fsnotify.h
> > > > > > @@ -103,7 +103,8 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
> > > > > >  /*
> > > > > >   * fsnotify_file_perm - permission hook before file access
> > > > > >   */
> > > > > > -static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> > > > > > +static inline int fsnotify_file_perm(struct file *file, int perm_mask,
> > > > > > +                                  const loff_t *ppos, size_t count)
> > > > > >  {
> > > > > >       __u32 fsnotify_mask = FS_ACCESS_PERM;
> > > > >
> > > > > Also why do you actually pass in loff_t * instead of plain loff_t? You
> > > > > don't plan to change it, do you?
> > > >
> > > > No I don't.
> > >
> > > Please note that the pointer is to const loff_t.
> > >
> > > >
> > > > I used NULL to communicate "no range info" to fanotify.
> > > > It is currently only used from iterate_dir(), but filesystems may need to
> > > > use that to report other cases of pre-content access with no clear range info.
> > >
> > > Correction. iterate_dir() is not the only case.
> > > The callers that use file_ppos(), namely ksys_{read,write}, do_{readv,writev}()
> > > will pass a NULL ppos for an FMODE_STREAM file.
> > > The only sane behavior I could come up with for those cases
> > > is to not report range_info with the FAN_PRE_ACCESS event.
> >
> > OK, understood. But isn't anything with len == 0 in fact "no valid range
> > provided" case? So we could use that to identify a case where we simply
> > don't report any range with the event without a need to pass the pointer?
> >
> 
> IDK. read(2) and pread(2) with count=0 is a valid use case.
> and we have reported FAN_ACCESS_PERM for those 0 length calls so far.
> reporting those call with no range would be bad for HSM, because all
> HSM can do with these events is to fill the entire file content.

Yes, reading or writing 0 bytes is valid but I was wondering whether we
are issuing event for that. Now that I've checked we indeed do so let's not
complicate this further...

> Filling the entire file content is a valid action if the backing file does not
> support seek or if it is a directory, but it is not a valid action for
> an application
> induced pread() with zero length.
> 
> Did I misunderstand what you meant?

Yeah, ok, let's leave the calling convention as you have it now. We can 
always change it when we come up with something more elegant.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Prepare for fsnotify pre-content permission events
  2023-12-08  7:34   ` Amir Goldstein
@ 2023-12-15 17:00     ` Amir Goldstein
  2023-12-15 20:04       ` Josef Bacik
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2023-12-15 17:00 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Christian Brauner, Jan Kara, Jeff Layton, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Fri, Dec 8, 2023 at 9:34 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Dec 7, 2023 at 11:51 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Thu, Dec 07, 2023 at 02:38:21PM +0200, Amir Goldstein wrote:
> > > Hi Jan & Christian,
> > >
> > > I am not planning to post the fanotify pre-content event patches [1]
> > > for 6.8.  Not because they are not ready, but because the usersapce
> > > example is not ready.
> > >
> > > Also, I think it is a good idea to let the large permission hooks
> > > cleanup work to mature over the 6.8 cycle, before we introduce the
> > > pre-content events.
> > >
> > > However, I would like to include the following vfs prep patches along
> > > with the vfs.rw PR for 6.8, which could be titled as the subject of
> > > this cover letter.
> > >
> > > Patch 1 is a variant of a cleanup suggested by Christoph to get rid
> > > of the generic_copy_file_range() exported symbol.
> > >
> > > Patches 2,3 add the file_write_not_started() assertion to fsnotify
> > > file permission hooks.  IMO, it is important to merge it along with
> > > vfs.rw because:
> > >
> > > 1. This assert is how I tested vfs.rw does what it aimed to achieve
> > > 2. This will protect us from new callers that break the new order
> > > 3. The commit message of patch 3 provides the context for the entire
> > >    series and can be included in the PR message
> > >
> > > Patch 4 is the final change of fsnotify permission hook locations/args
> > > and is the last of the vfs prerequsites for pre-content events.
> > >
> > > If we merge patch 4 for 6.8, it will be much easier for the development
> > > of fanotify pre-content events in 6.9 dev cycle, which be contained
> > > within the fsnotify subsystem.
> >
> > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> >
> > Can you get an fstest added that exercises the freeze deadlock?
>
> I suppose that you mean a test that exercises the lockdep assertion?
> This is much easier to do, so I don't see the point in actually testing
> the deadlock. The only thing is that the assertion will not be backported
> so this test would protect us from future regression, but will not nudge
> stable kernel users to backport the deadlock fix, which I don't think they
> should be doing anyway.
>
> It is actually already exercised by tests overlay/068,069, but I can add
> a generic test to get wider testing coverage.

Here is a WIP test:
https://github.com/amir73il/xfstests/commits/start-write-safe

I tested it by reverting "fs: move file_start_write() into
direct_splice_actor()"
and seeing that it triggers the assert.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Prepare for fsnotify pre-content permission events
  2023-12-15 17:00     ` Amir Goldstein
@ 2023-12-15 20:04       ` Josef Bacik
  0 siblings, 0 replies; 25+ messages in thread
From: Josef Bacik @ 2023-12-15 20:04 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Jan Kara, Jeff Layton, Christoph Hellwig,
	David Howells, Jens Axboe, Miklos Szeredi, Al Viro,
	linux-fsdevel

On Fri, Dec 15, 2023 at 07:00:08PM +0200, Amir Goldstein wrote:
> On Fri, Dec 8, 2023 at 9:34 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Thu, Dec 7, 2023 at 11:51 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > On Thu, Dec 07, 2023 at 02:38:21PM +0200, Amir Goldstein wrote:
> > > > Hi Jan & Christian,
> > > >
> > > > I am not planning to post the fanotify pre-content event patches [1]
> > > > for 6.8.  Not because they are not ready, but because the usersapce
> > > > example is not ready.
> > > >
> > > > Also, I think it is a good idea to let the large permission hooks
> > > > cleanup work to mature over the 6.8 cycle, before we introduce the
> > > > pre-content events.
> > > >
> > > > However, I would like to include the following vfs prep patches along
> > > > with the vfs.rw PR for 6.8, which could be titled as the subject of
> > > > this cover letter.
> > > >
> > > > Patch 1 is a variant of a cleanup suggested by Christoph to get rid
> > > > of the generic_copy_file_range() exported symbol.
> > > >
> > > > Patches 2,3 add the file_write_not_started() assertion to fsnotify
> > > > file permission hooks.  IMO, it is important to merge it along with
> > > > vfs.rw because:
> > > >
> > > > 1. This assert is how I tested vfs.rw does what it aimed to achieve
> > > > 2. This will protect us from new callers that break the new order
> > > > 3. The commit message of patch 3 provides the context for the entire
> > > >    series and can be included in the PR message
> > > >
> > > > Patch 4 is the final change of fsnotify permission hook locations/args
> > > > and is the last of the vfs prerequsites for pre-content events.
> > > >
> > > > If we merge patch 4 for 6.8, it will be much easier for the development
> > > > of fanotify pre-content events in 6.9 dev cycle, which be contained
> > > > within the fsnotify subsystem.
> > >
> > > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> > >
> > > Can you get an fstest added that exercises the freeze deadlock?
> >
> > I suppose that you mean a test that exercises the lockdep assertion?
> > This is much easier to do, so I don't see the point in actually testing
> > the deadlock. The only thing is that the assertion will not be backported
> > so this test would protect us from future regression, but will not nudge
> > stable kernel users to backport the deadlock fix, which I don't think they
> > should be doing anyway.
> >
> > It is actually already exercised by tests overlay/068,069, but I can add
> > a generic test to get wider testing coverage.
> 
> Here is a WIP test:
> https://github.com/amir73il/xfstests/commits/start-write-safe
> 
> I tested it by reverting "fs: move file_start_write() into
> direct_splice_actor()"
> and seeing that it triggers the assert.

Perfect, this is exactly the sort of thing I was hoping for.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-12-15 20:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-07 12:38 [PATCH 0/4] Prepare for fsnotify pre-content permission events Amir Goldstein
2023-12-07 12:38 ` [PATCH 1/4] fs: use splice_copy_file_range() inline helper Amir Goldstein
2023-12-08 17:33   ` Christian Brauner
2023-12-10 10:07     ` Amir Goldstein
2023-12-08 18:27   ` Jan Kara
2023-12-07 12:38 ` [PATCH 2/4] fsnotify: split fsnotify_perm() into two hooks Amir Goldstein
2023-12-08 18:33   ` Jan Kara
2023-12-07 12:38 ` [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks Amir Goldstein
2023-12-08 18:46   ` Jan Kara
2023-12-08 21:02     ` Amir Goldstein
2023-12-11 10:30       ` Jan Kara
2023-12-11 10:57         ` Amir Goldstein
2023-12-07 12:38 ` [PATCH 4/4] fsnotify: pass access range in file " Amir Goldstein
2023-12-08 17:52   ` Christian Brauner
2023-12-08 18:53   ` Jan Kara
2023-12-08 21:34     ` Amir Goldstein
2023-12-10 13:24       ` Amir Goldstein
2023-12-11 11:49         ` Jan Kara
2023-12-11 12:00           ` Amir Goldstein
2023-12-11 14:53             ` Jan Kara
2023-12-07 21:51 ` [PATCH 0/4] Prepare for fsnotify pre-content permission events Josef Bacik
2023-12-08  7:34   ` Amir Goldstein
2023-12-15 17:00     ` Amir Goldstein
2023-12-15 20:04       ` Josef Bacik
2023-12-08 17:54 ` Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.