linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/16] Fanotify event with name info
@ 2020-02-17 13:14 Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 01/16] fsnotify: tidy up FS_ and FAN_ constants Amir Goldstein
                   ` (16 more replies)
  0 siblings, 17 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-api, Matthew Bobrowski

Jan,

This is v2 of the fanotify name info series.
The user requirement for the name info feature, as well as early UAPI
discussions can be found in this [1] lore thread.
The "prep" part of v1 was posted to the list [2] and includes two
minor bug fixes, but I decided not to split the submission into
two series this posting.

The patches are also available on my github branch fanotify_name [3]
along with LTP tests [4], man page draft [5] and a demo [6].

Patches 1-7 are cleanup and minor re-factoring in prep for the name
info patches.

Patches 8-9 are fixes for minor bug that I found during the work.
The referred LTP branch [4] includes improvements to ltp tests fanotify09
and fanotify15 to cover these bugs.  I did not mark those patches for
stable, because backporting is not trivial and the bugs are really minor.
For the same reason, I did not bother to provide bug fix patches that are
not dependent on the cleanup patches.

Patches 10-13 implement the new event type FAN_DIR_MODIFY per your
suggestion, which includes the directory fid and entry name info.

Patches 14-15 implement the FAN_REPORT_NAME init flag for reporting
name info on path type events.

Patch 16 is a "bonus" patch that implements an unprivileged fanotify
watch.  It is not proposed for merging at this time, but is provided
in order to demonstrate how name info reporting is applicable for an
unprivileged watcher, should we decide to implement the feature.
LTP tests, man page draft for unprivileged fanotify written by Matthew
Bobrowski are available on fanotify_unpriv branches in respective trees.

The inotify demo branch [6] includes a script test_demo.sh whose
output [7] can be seen here below.  The demo generates filesystem
events including file and directory renames, sleeps 2 seconds and
then reads the events from the queue and generates a report on
changes in the filesystem.

At event report time, the watcher uses open_by_handle_at(2) to report
up-to-date paths for parent dirs.  The last name element in the path
is reported as it was recorded at event time, but the watcher uses
fstatat(2) to check whether the reported entry is negative or positive.
Negative entry paths are annotated with "(deleted)" postfix.

The idea is that file change monitors will use this information to
query the content of modified directories and file and update a
secondary data structure or take other actions.

The demo scripts can run as root and non-root user.  When run as
non-root user, if the bonus FAN_UNPRIVILEGED patch is applied, it
demonstrates the unprivileged fanotify recursive watcher and produces
the exact same report information as the privileged filesystem watcher.

Thanks,
Amir.

Changes since v1:
- A few more cleanup patches
- Drop the abstract take_name_snapshot() vfs interface change
- Do not obfuscate event type for path type events
- Deal with the corner cases of event on root and disconnected dentry
- Bonus FAN_UNPRIVILEGED patch

[1] https://lore.kernel.org/linux-fsdevel/CADKPpc2RuncyN+ZONkwBqtW7iBb5ep_3yQN7PKe7ASn8DpNvBw@mail.gmail.com/
[2] https://lore.kernel.org/linux-fsdevel/20200114151655.29473-1-amir73il@gmail.com/
[3] https://github.com/amir73il/linux/commits/fanotify_name
[4] https://github.com/amir73il/ltp/commits/fanotify_name
[5] https://github.com/amir73il/man-pages/commits/fanotify_name
[6] https://github.com/amir73il/inotify-tools/commits/fanotify_name
[7] Demo run of inotifywatch race free monitor
==============================================
~# ./test_demo.sh /vdf
+ WD=/vdf
+ cd /vdf
+ rm -rf a
+ mkdir -p a/b/c/d/e/f/g/
+ touch a/b/c/0 a/b/c/1 a/b/c/d/e/f/g/0
+ id -u
+ [ 0 = 0 ]
+ MODE=--global
+ EVENTS=-e dir_modify  -e modify -e attrib  -e close_write
+ sleep 1
+ inotifywatch --global -e dir_modify -e modify -e attrib -e
close_write --timeout -2 /vdf
Establishing filesystem global watch...
Finished establishing watches, now collecting statistics.
Sleeping for 2 seconds...
+
+ t=Create files and dirs...
+ touch a/0 a/1 a/2 a/3
+ mkdir a/dir0 a/dir1 a/dir2
+
+ t=Rename files and dirs...
+ mv a/0 a/3
+ mv a/dir0 a/dir3
+
+ t=Delete files and dirs...
+ rm a/1
+ rmdir a/dir1
+
+ t=Modify files and dirs...
+ chmod +x a/b/c/d
+ echo
+
+ t=Move files and dirs...
+ mv a/b/c/1 a/b/c/d/e/f/g/1
+ mv a/b/c/d/e/f/g a/b/c/d/e/G
+
[fid=fd50.0.2007403;name='0'] /vdf/a/0 (deleted)
[fid=fd50.0.2007403;name='1'] /vdf/a/1 (deleted)
[fid=fd50.0.2007403;name='2'] /vdf/a/2
[fid=fd50.0.2007403;name='3'] /vdf/a/3
[fid=fd50.0.2007403;name='dir0'] /vdf/a/dir0 (deleted)
[fid=fd50.0.2007403;name='dir1'] /vdf/a/dir1 (deleted)
[fid=fd50.0.2007403;name='dir2'] /vdf/a/dir2
[fid=fd50.0.2007403;name='dir3'] /vdf/a/dir3
[fid=fd50.0.86;name='d'] /vdf/a/b/c/d
[fid=fd50.0.86;name='0'] /vdf/a/b/c/0
[fid=fd50.0.86;name='1'] /vdf/a/b/c/1 (deleted)
[fid=fd50.0.87;name='1'] /vdf/a/b/c/d/e/G/1
[fid=fd50.0.3000083;name='g'] /vdf/a/b/c/d/e/f/g (deleted)
[fid=fd50.0.2007404;name='G'] /vdf/a/b/c/d/e/G
total  modify  attrib  close_write  dir_modify  filename
3      0       1       1            2           /vdf/a/0 (deleted)
3      0       1       1            2           /vdf/a/1 (deleted)
3      0       1       1            2           /vdf/a/3
2      0       1       1            1           /vdf/a/2
2      0       0       0            2           /vdf/a/dir0 (deleted)
2      0       0       0            2           /vdf/a/dir1 (deleted)
1      0       0       0            1           /vdf/a/dir2
1      0       0       0            1           /vdf/a/dir3
1      0       1       0            0           /vdf/a/b/c/d
1      1       0       1            0           /vdf/a/b/c/0
1      0       0       0            1           /vdf/a/b/c/1 (deleted)
1      0       0       0            1           /vdf/a/b/c/d/e/G/1
1      0       0       0            1           /vdf/a/b/c/d/e/f/g (deleted)
1      0       0       0            1           /vdf/a/b/c/d/e/G
==============================================

Amir Goldstein (16):
  fsnotify: tidy up FS_ and FAN_ constants
  fsnotify: factor helpers fsnotify_dentry() and fsnotify_file()
  fsnotify: funnel all dirent events through fsnotify_name()
  fsnotify: use helpers to access data by data_type
  fsnotify: simplify arguments passing to fsnotify_parent()
  fsnotify: pass dentry instead of inode for events possible on child
  fsnotify: replace inode pointer with tag
  fanotify: merge duplicate events on parent and child
  fanotify: fix merging marks masks with FAN_ONDIR
  fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name
  fanotify: prepare to encode both parent and child fid's
  fanotify: record name info for FAN_DIR_MODIFY event
  fanotify: report name info for FAN_DIR_MODIFY event
  fanotify: report parent fid + name with FAN_REPORT_NAME
  fanotify: refine rules for when name is reported
  fanotify: support limited functionality for unprivileged users

 fs/notify/fanotify/fanotify.c        | 231 +++++++++++++++++++++------
 fs/notify/fanotify/fanotify.h        | 111 ++++++++++---
 fs/notify/fanotify/fanotify_user.c   | 182 +++++++++++++++++----
 fs/notify/fsnotify.c                 |  22 +--
 fs/notify/inotify/inotify_fsnotify.c |  10 +-
 include/linux/fanotify.h             |  21 ++-
 include/linux/fsnotify.h             | 135 +++++++---------
 include/linux/fsnotify_backend.h     |  87 +++++++---
 include/uapi/linux/fanotify.h        |  11 +-
 kernel/audit_fsnotify.c              |  13 +-
 kernel/audit_watch.c                 |  16 +-
 11 files changed, 584 insertions(+), 255 deletions(-)


base-commit: 11a48a5a18c63fd7621bb050228cebf13566e4d8
-- 
2.17.1


^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 01/16] fsnotify: tidy up FS_ and FAN_ constants
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 02/16] fsnotify: factor helpers fsnotify_dentry() and fsnotify_file() Amir Goldstein
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Order by value, so the free value ranges are easier to find.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/fsnotify_backend.h | 11 +++++------
 include/uapi/linux/fanotify.h    |  4 ++--
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 1915bdba2fad..db3cabb4600e 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -49,16 +49,15 @@
 #define FS_OPEN_EXEC_PERM	0x00040000	/* open/exec event in a permission hook */
 
 #define FS_EXCL_UNLINK		0x04000000	/* do not send events if object is unlinked */
-#define FS_ISDIR		0x40000000	/* event occurred against dir */
-#define FS_IN_ONESHOT		0x80000000	/* only send event once */
-
-#define FS_DN_RENAME		0x10000000	/* file renamed */
-#define FS_DN_MULTISHOT		0x20000000	/* dnotify multishot */
-
 /* This inode cares about things that happen to its children.  Always set for
  * dnotify and inotify. */
 #define FS_EVENT_ON_CHILD	0x08000000
 
+#define FS_DN_RENAME		0x10000000	/* file renamed */
+#define FS_DN_MULTISHOT		0x20000000	/* dnotify multishot */
+#define FS_ISDIR		0x40000000	/* event occurred against dir */
+#define FS_IN_ONESHOT		0x80000000	/* only send event once */
+
 #define FS_MOVE			(FS_MOVED_FROM | FS_MOVED_TO)
 
 /*
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index b9effa6f8503..2a1844edda47 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -25,9 +25,9 @@
 #define FAN_ACCESS_PERM		0x00020000	/* File accessed in perm check */
 #define FAN_OPEN_EXEC_PERM	0x00040000	/* File open/exec in perm check */
 
-#define FAN_ONDIR		0x40000000	/* event occurred against dir */
+#define FAN_EVENT_ON_CHILD	0x08000000	/* Interested in child events */
 
-#define FAN_EVENT_ON_CHILD	0x08000000	/* interested in child events */
+#define FAN_ONDIR		0x40000000	/* Event occurred against dir */
 
 /* helper events */
 #define FAN_CLOSE		(FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE) /* close */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 02/16] fsnotify: factor helpers fsnotify_dentry() and fsnotify_file()
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 01/16] fsnotify: tidy up FS_ and FAN_ constants Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-25 13:46   ` Jan Kara
  2020-02-17 13:14 ` [PATCH v2 03/16] fsnotify: funnel all dirent events through fsnotify_name() Amir Goldstein
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Most of the code in fsnotify hooks is boiler plate of one or the other.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/fsnotify.h | 96 +++++++++++++++-------------------------
 1 file changed, 36 insertions(+), 60 deletions(-)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index a2d5d175d3c1..a87d4ab98da7 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -41,16 +41,36 @@ static inline int fsnotify_parent(const struct path *path,
 }
 
 /*
- * Simple wrapper to consolidate calls fsnotify_parent()/fsnotify() when
- * an event is on a path.
+ * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when
+ * an event is on a file/dentry.
  */
-static inline int fsnotify_path(struct inode *inode, const struct path *path,
-				__u32 mask)
+static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 {
-	int ret = fsnotify_parent(path, NULL, mask);
+	struct inode *inode = d_inode(dentry);
 
+	if (S_ISDIR(inode->i_mode))
+		mask |= FS_ISDIR;
+
+	fsnotify_parent(NULL, dentry, mask);
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static inline int fsnotify_file(struct file *file, __u32 mask)
+{
+	const struct path *path = &file->f_path;
+	struct inode *inode = file_inode(file);
+	int ret;
+
+	if (file->f_mode & FMODE_NONOTIFY)
+		return 0;
+
+	if (S_ISDIR(inode->i_mode))
+		mask |= FS_ISDIR;
+
+	ret = fsnotify_parent(path, NULL, mask);
 	if (ret)
 		return ret;
+
 	return fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
 }
 
@@ -58,8 +78,6 @@ static inline int fsnotify_path(struct inode *inode, const struct path *path,
 static inline int fsnotify_perm(struct file *file, int mask)
 {
 	int ret;
-	const struct path *path = &file->f_path;
-	struct inode *inode = file_inode(file);
 	__u32 fsnotify_mask = 0;
 
 	if (file->f_mode & FMODE_NONOTIFY)
@@ -70,7 +88,7 @@ static inline int fsnotify_perm(struct file *file, int mask)
 		fsnotify_mask = FS_OPEN_PERM;
 
 		if (file->f_flags & __FMODE_EXEC) {
-			ret = fsnotify_path(inode, path, FS_OPEN_EXEC_PERM);
+			ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
 
 			if (ret)
 				return ret;
@@ -79,10 +97,7 @@ static inline int fsnotify_perm(struct file *file, int mask)
 		fsnotify_mask = FS_ACCESS_PERM;
 	}
 
-	if (S_ISDIR(inode->i_mode))
-		fsnotify_mask |= FS_ISDIR;
-
-	return fsnotify_path(inode, path, fsnotify_mask);
+	return fsnotify_file(file, fsnotify_mask);
 }
 
 /*
@@ -229,15 +244,7 @@ static inline void fsnotify_rmdir(struct inode *dir, struct dentry *dentry)
  */
 static inline void fsnotify_access(struct file *file)
 {
-	const struct path *path = &file->f_path;
-	struct inode *inode = file_inode(file);
-	__u32 mask = FS_ACCESS;
-
-	if (S_ISDIR(inode->i_mode))
-		mask |= FS_ISDIR;
-
-	if (!(file->f_mode & FMODE_NONOTIFY))
-		fsnotify_path(inode, path, mask);
+	fsnotify_file(file, FS_ACCESS);
 }
 
 /*
@@ -245,15 +252,7 @@ static inline void fsnotify_access(struct file *file)
  */
 static inline void fsnotify_modify(struct file *file)
 {
-	const struct path *path = &file->f_path;
-	struct inode *inode = file_inode(file);
-	__u32 mask = FS_MODIFY;
-
-	if (S_ISDIR(inode->i_mode))
-		mask |= FS_ISDIR;
-
-	if (!(file->f_mode & FMODE_NONOTIFY))
-		fsnotify_path(inode, path, mask);
+	fsnotify_file(file, FS_MODIFY);
 }
 
 /*
@@ -261,16 +260,12 @@ static inline void fsnotify_modify(struct file *file)
  */
 static inline void fsnotify_open(struct file *file)
 {
-	const struct path *path = &file->f_path;
-	struct inode *inode = file_inode(file);
 	__u32 mask = FS_OPEN;
 
-	if (S_ISDIR(inode->i_mode))
-		mask |= FS_ISDIR;
 	if (file->f_flags & __FMODE_EXEC)
 		mask |= FS_OPEN_EXEC;
 
-	fsnotify_path(inode, path, mask);
+	fsnotify_file(file, mask);
 }
 
 /*
@@ -278,16 +273,10 @@ static inline void fsnotify_open(struct file *file)
  */
 static inline void fsnotify_close(struct file *file)
 {
-	const struct path *path = &file->f_path;
-	struct inode *inode = file_inode(file);
-	fmode_t mode = file->f_mode;
-	__u32 mask = (mode & FMODE_WRITE) ? FS_CLOSE_WRITE : FS_CLOSE_NOWRITE;
-
-	if (S_ISDIR(inode->i_mode))
-		mask |= FS_ISDIR;
+	__u32 mask = (file->f_mode & FMODE_WRITE) ? FS_CLOSE_WRITE :
+						    FS_CLOSE_NOWRITE;
 
-	if (!(file->f_mode & FMODE_NONOTIFY))
-		fsnotify_path(inode, path, mask);
+	fsnotify_file(file, mask);
 }
 
 /*
@@ -295,14 +284,7 @@ static inline void fsnotify_close(struct file *file)
  */
 static inline void fsnotify_xattr(struct dentry *dentry)
 {
-	struct inode *inode = dentry->d_inode;
-	__u32 mask = FS_ATTRIB;
-
-	if (S_ISDIR(inode->i_mode))
-		mask |= FS_ISDIR;
-
-	fsnotify_parent(NULL, dentry, mask);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+	fsnotify_dentry(dentry, FS_ATTRIB);
 }
 
 /*
@@ -311,7 +293,6 @@ static inline void fsnotify_xattr(struct dentry *dentry)
  */
 static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 {
-	struct inode *inode = dentry->d_inode;
 	__u32 mask = 0;
 
 	if (ia_valid & ATTR_UID)
@@ -332,13 +313,8 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 	if (ia_valid & ATTR_MODE)
 		mask |= FS_ATTRIB;
 
-	if (mask) {
-		if (S_ISDIR(inode->i_mode))
-			mask |= FS_ISDIR;
-
-		fsnotify_parent(NULL, dentry, mask);
-		fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
-	}
+	if (mask)
+		fsnotify_dentry(dentry, mask);
 }
 
 #endif	/* _LINUX_FS_NOTIFY_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 03/16] fsnotify: funnel all dirent events through fsnotify_name()
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 01/16] fsnotify: tidy up FS_ and FAN_ constants Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 02/16] fsnotify: factor helpers fsnotify_dentry() and fsnotify_file() Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 04/16] fsnotify: use helpers to access data by data_type Amir Goldstein
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Factor out fsnotify_name() from fsnotify_dirent(), so it can also serve
link and rename events and use this helper to report all directory entry
change events.

Both helpers return void because no caller checks their return value.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/fsnotify.h | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index a87d4ab98da7..420aca9fd5f4 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -18,16 +18,24 @@
 #include <linux/bug.h>
 
 /*
- * Notify this @dir inode about a change in the directory entry @dentry.
+ * Notify this @dir inode about a change in a child directory entry.
+ * The directory entry may have turned positive or negative or its inode may
+ * have changed (i.e. renamed over).
  *
  * Unlike fsnotify_parent(), the event will be reported regardless of the
  * FS_EVENT_ON_CHILD mask on the parent inode.
  */
-static inline int fsnotify_dirent(struct inode *dir, struct dentry *dentry,
-				  __u32 mask)
+static inline void fsnotify_name(struct inode *dir, __u32 mask,
+				 struct inode *child,
+				 const struct qstr *name, u32 cookie)
 {
-	return fsnotify(dir, mask, d_inode(dentry), FSNOTIFY_EVENT_INODE,
-			&dentry->d_name, 0);
+	fsnotify(dir, mask, child, FSNOTIFY_EVENT_INODE, name, cookie);
+}
+
+static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
+				   __u32 mask)
+{
+	fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0);
 }
 
 /* Notify this dentry's parent about a child's events. */
@@ -137,10 +145,8 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
 		mask |= FS_ISDIR;
 	}
 
-	fsnotify(old_dir, old_dir_mask, source, FSNOTIFY_EVENT_INODE, old_name,
-		 fs_cookie);
-	fsnotify(new_dir, new_dir_mask, source, FSNOTIFY_EVENT_INODE, new_name,
-		 fs_cookie);
+	fsnotify_name(old_dir, old_dir_mask, source, old_name, fs_cookie);
+	fsnotify_name(new_dir, new_dir_mask, source, new_name, fs_cookie);
 
 	if (target)
 		fsnotify_link_count(target);
@@ -195,12 +201,13 @@ static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
  * Note: We have to pass also the linked inode ptr as some filesystems leave
  *   new_dentry->d_inode NULL and instantiate inode pointer later
  */
-static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct dentry *new_dentry)
+static inline void fsnotify_link(struct inode *dir, struct inode *inode,
+				 struct dentry *new_dentry)
 {
 	fsnotify_link_count(inode);
 	audit_inode_child(dir, new_dentry, AUDIT_TYPE_CHILD_CREATE);
 
-	fsnotify(dir, FS_CREATE, inode, FSNOTIFY_EVENT_INODE, &new_dentry->d_name, 0);
+	fsnotify_name(dir, FS_CREATE, inode, &new_dentry->d_name, 0);
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 04/16] fsnotify: use helpers to access data by data_type
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (2 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 03/16] fsnotify: funnel all dirent events through fsnotify_name() Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 05/16] fsnotify: simplify arguments passing to fsnotify_parent() Amir Goldstein
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Create helpers to access path and inode from different data types.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c        | 18 +++++++--------
 fs/notify/fsnotify.c                 |  5 ++--
 fs/notify/inotify/inotify_fsnotify.c |  8 +++----
 include/linux/fsnotify_backend.h     | 34 ++++++++++++++++++++++++----
 kernel/audit_fsnotify.c              | 13 ++---------
 kernel/audit_watch.c                 | 16 ++-----------
 6 files changed, 48 insertions(+), 46 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 5778d1347b35..19ec7a4f4d50 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -151,7 +151,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 {
 	__u32 marks_mask = 0, marks_ignored_mask = 0;
 	__u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS;
-	const struct path *path = data;
+	const struct path *path = fsnotify_data_path(data, data_type);
 	struct fsnotify_mark *mark;
 	int type;
 
@@ -160,7 +160,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 
 	if (!FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
 		/* Do we have path to open a file descriptor? */
-		if (data_type != FSNOTIFY_EVENT_PATH)
+		if (!path)
 			return 0;
 		/* Path type events are only relevant for files and dirs */
 		if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry))
@@ -269,11 +269,8 @@ static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask,
 {
 	if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS)
 		return to_tell;
-	else if (data_type == FSNOTIFY_EVENT_INODE)
-		return (struct inode *)data;
-	else if (data_type == FSNOTIFY_EVENT_PATH)
-		return d_inode(((struct path *)data)->dentry);
-	return NULL;
+
+	return (struct inode *)fsnotify_data_inode(data, data_type);
 }
 
 struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
@@ -284,6 +281,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	struct fanotify_event *event = NULL;
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
 	struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
+	const struct path *path = fsnotify_data_path(data, data_type);
 
 	/*
 	 * For queues with unlimited length lost events are not expected and
@@ -324,10 +322,10 @@ init: __maybe_unused
 	if (id && FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
 		/* Report the event without a file identifier on encode error */
 		event->fh_type = fanotify_encode_fid(event, id, gfp, fsid);
-	} else if (data_type == FSNOTIFY_EVENT_PATH) {
+	} else if (path) {
 		event->fh_type = FILEID_ROOT;
-		event->path = *((struct path *)data);
-		path_get(&event->path);
+		event->path = *path;
+		path_get(path);
 	} else {
 		event->fh_type = FILEID_INVALID;
 		event->path.mnt = NULL;
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 46f225580009..a5d6467f89a0 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -318,6 +318,7 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info)
 int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	     const struct qstr *file_name, u32 cookie)
 {
+	const struct path *path = fsnotify_data_path(data, data_is);
 	struct fsnotify_iter_info iter_info = {};
 	struct super_block *sb = to_tell->i_sb;
 	struct mount *mnt = NULL;
@@ -325,8 +326,8 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	int ret = 0;
 	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
 
-	if (data_is == FSNOTIFY_EVENT_PATH) {
-		mnt = real_mount(((const struct path *)data)->mnt);
+	if (path) {
+		mnt = real_mount(path->mnt);
 		mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
 	}
 	/* An event "on child" is not intended for a mount/sb mark */
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index d510223d302c..6bb98522bbfd 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -61,6 +61,7 @@ int inotify_handle_event(struct fsnotify_group *group,
 			 const struct qstr *file_name, u32 cookie,
 			 struct fsnotify_iter_info *iter_info)
 {
+	const struct path *path = fsnotify_data_path(data, data_type);
 	struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info);
 	struct inotify_inode_mark *i_mark;
 	struct inotify_event_info *event;
@@ -73,12 +74,9 @@ int inotify_handle_event(struct fsnotify_group *group,
 		return 0;
 
 	if ((inode_mark->mask & FS_EXCL_UNLINK) &&
-	    (data_type == FSNOTIFY_EVENT_PATH)) {
-		const struct path *path = data;
+	    path && d_unlinked(path->dentry))
+		return 0;
 
-		if (d_unlinked(path->dentry))
-			return 0;
-	}
 	if (file_name) {
 		len = file_name->len;
 		alloc_len += len + 1;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index db3cabb4600e..5cc838db422a 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -212,10 +212,36 @@ struct fsnotify_group {
 	};
 };
 
-/* when calling fsnotify tell it if the data is a path or inode */
-#define FSNOTIFY_EVENT_NONE	0
-#define FSNOTIFY_EVENT_PATH	1
-#define FSNOTIFY_EVENT_INODE	2
+/* When calling fsnotify tell it if the data is a path or inode */
+enum fsnotify_data_type {
+	FSNOTIFY_EVENT_NONE,
+	FSNOTIFY_EVENT_PATH,
+	FSNOTIFY_EVENT_INODE,
+};
+
+static inline const struct inode *fsnotify_data_inode(const void *data,
+						      int data_type)
+{
+	switch (data_type) {
+	case FSNOTIFY_EVENT_INODE:
+		return data;
+	case FSNOTIFY_EVENT_PATH:
+		return d_inode(((const struct path *)data)->dentry);
+	default:
+		return NULL;
+	}
+}
+
+static inline const struct path *fsnotify_data_path(const void *data,
+						    int data_type)
+{
+	switch (data_type) {
+	case FSNOTIFY_EVENT_PATH:
+		return data;
+	default:
+		return NULL;
+	}
+}
 
 enum fsnotify_obj_type {
 	FSNOTIFY_OBJ_TYPE_INODE,
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index f0d243318452..3596448bfdab 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -160,23 +160,14 @@ static int audit_mark_handle_event(struct fsnotify_group *group,
 {
 	struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info);
 	struct audit_fsnotify_mark *audit_mark;
-	const struct inode *inode = NULL;
+	const struct inode *inode = fsnotify_data_inode(data, data_type);
 
 	audit_mark = container_of(inode_mark, struct audit_fsnotify_mark, mark);
 
 	BUG_ON(group != audit_fsnotify_group);
 
-	switch (data_type) {
-	case (FSNOTIFY_EVENT_PATH):
-		inode = ((const struct path *)data)->dentry->d_inode;
-		break;
-	case (FSNOTIFY_EVENT_INODE):
-		inode = (const struct inode *)data;
-		break;
-	default:
-		BUG();
+	if (WARN_ON(!inode))
 		return 0;
-	}
 
 	if (mask & (FS_CREATE|FS_MOVED_TO|FS_DELETE|FS_MOVED_FROM)) {
 		if (audit_compare_dname_path(dname, audit_mark->path, AUDIT_NAME_FULL))
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 4508d5e0cf69..dcfbb44c6720 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -473,25 +473,13 @@ static int audit_watch_handle_event(struct fsnotify_group *group,
 				    struct fsnotify_iter_info *iter_info)
 {
 	struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info);
-	const struct inode *inode;
+	const struct inode *inode = fsnotify_data_inode(data, data_type);
 	struct audit_parent *parent;
 
 	parent = container_of(inode_mark, struct audit_parent, mark);
 
 	BUG_ON(group != audit_watch_group);
-
-	switch (data_type) {
-	case (FSNOTIFY_EVENT_PATH):
-		inode = d_backing_inode(((const struct path *)data)->dentry);
-		break;
-	case (FSNOTIFY_EVENT_INODE):
-		inode = (const struct inode *)data;
-		break;
-	default:
-		BUG();
-		inode = NULL;
-		break;
-	}
+	WARN_ON(!inode);
 
 	if (mask & (FS_CREATE|FS_MOVED_TO) && inode)
 		audit_update_watch(parent, dname, inode->i_sb->s_dev, inode->i_ino, 0);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 05/16] fsnotify: simplify arguments passing to fsnotify_parent()
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (3 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 04/16] fsnotify: use helpers to access data by data_type Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-19 10:50   ` kbuild test robot
  2020-02-19 11:11   ` Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 06/16] fsnotify: pass dentry instead of inode for events possible on child Amir Goldstein
                   ` (11 subsequent siblings)
  16 siblings, 2 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Instead of passing both dentry and path and having to figure out which
one to use, pass data/data_type to simplify the code.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fsnotify.c             | 15 ++++-----------
 include/linux/fsnotify.h         | 14 ++------------
 include/linux/fsnotify_backend.h | 13 +++++++------
 3 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index a5d6467f89a0..193530f57963 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -143,15 +143,13 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
 }
 
 /* Notify this dentry's parent about a child's events. */
-int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask)
+int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
+		    int data_type)
 {
 	struct dentry *parent;
 	struct inode *p_inode;
 	int ret = 0;
 
-	if (!dentry)
-		dentry = path->dentry;
-
 	if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
 		return 0;
 
@@ -168,12 +166,7 @@ int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask
 		mask |= FS_EVENT_ON_CHILD;
 
 		take_dentry_name_snapshot(&name, dentry);
-		if (path)
-			ret = fsnotify(p_inode, mask, path, FSNOTIFY_EVENT_PATH,
-				       &name.name, 0);
-		else
-			ret = fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE,
-				       &name.name, 0);
+		ret = fsnotify(p_inode, mask, data, data_type, &name.name, 0);
 		release_dentry_name_snapshot(&name);
 	}
 
@@ -181,7 +174,7 @@ int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(__fsnotify_parent);
+EXPORT_SYMBOL_GPL(fsnotify_parent);
 
 static int send_to_group(struct inode *to_tell,
 			 __u32 mask, const void *data,
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 420aca9fd5f4..af30e0a56f2e 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -38,16 +38,6 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
 	fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0);
 }
 
-/* Notify this dentry's parent about a child's events. */
-static inline int fsnotify_parent(const struct path *path,
-				  struct dentry *dentry, __u32 mask)
-{
-	if (!dentry)
-		dentry = path->dentry;
-
-	return __fsnotify_parent(path, dentry, mask);
-}
-
 /*
  * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when
  * an event is on a file/dentry.
@@ -59,7 +49,7 @@ static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 	if (S_ISDIR(inode->i_mode))
 		mask |= FS_ISDIR;
 
-	fsnotify_parent(NULL, dentry, mask);
+	fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE);
 	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 }
 
@@ -75,7 +65,7 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
 	if (S_ISDIR(inode->i_mode))
 		mask |= FS_ISDIR;
 
-	ret = fsnotify_parent(path, NULL, mask);
+	ret = fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 5cc838db422a..b1f418cc28e1 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -376,9 +376,10 @@ struct fsnotify_mark {
 /* called from the vfs helpers */
 
 /* main fsnotify call to send events */
-extern int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
-		    const struct qstr *name, u32 cookie);
-extern int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask);
+extern int fsnotify(struct inode *to_tell, __u32 mask, const void *data,
+		    int data_type, const struct qstr *name, u32 cookie);
+extern int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
+			   int data_type);
 extern void __fsnotify_inode_delete(struct inode *inode);
 extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt);
 extern void fsnotify_sb_delete(struct super_block *sb);
@@ -533,13 +534,13 @@ static inline void fsnotify_init_event(struct fsnotify_event *event,
 
 #else
 
-static inline int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
-			   const struct qstr *name, u32 cookie)
+static inline int fsnotify(struct inode *to_tell, __u32 mask, const void *data,
+			   int data_type, const struct qstr *name, u32 cookie)
 {
 	return 0;
 }
 
-static inline int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask)
+static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)
 {
 	return 0;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 06/16] fsnotify: pass dentry instead of inode for events possible on child
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (4 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 05/16] fsnotify: simplify arguments passing to fsnotify_parent() Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 07/16] fsnotify: replace inode pointer with tag Amir Goldstein
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Most events that can be reported to watching parent pass
FSNOTIFY_EVENT_PATH as event data, except for FS_ARRTIB and FS_MODIFY
as a result of truncate.

Define a new data type to pass for event - FSNOTIFY_EVENT_DENTRY
and use it to pass the dentry instead of it's ->d_inode for those events.

Soon, we are going to use the dentry data type to report events
with name info in fanotify backend.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/fsnotify.h         |  4 ++--
 include/linux/fsnotify_backend.h | 17 +++++++++++++++++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index af30e0a56f2e..7ba40c19bc7e 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -49,8 +49,8 @@ static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 	if (S_ISDIR(inode->i_mode))
 		mask |= FS_ISDIR;
 
-	fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+	fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY);
+	fsnotify(inode, mask, dentry, FSNOTIFY_EVENT_DENTRY, NULL, 0);
 }
 
 static inline int fsnotify_file(struct file *file, __u32 mask)
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index b1f418cc28e1..bd3f6114a7a9 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -217,6 +217,7 @@ enum fsnotify_data_type {
 	FSNOTIFY_EVENT_NONE,
 	FSNOTIFY_EVENT_PATH,
 	FSNOTIFY_EVENT_INODE,
+	FSNOTIFY_EVENT_DENTRY,
 };
 
 static inline const struct inode *fsnotify_data_inode(const void *data,
@@ -225,6 +226,8 @@ static inline const struct inode *fsnotify_data_inode(const void *data,
 	switch (data_type) {
 	case FSNOTIFY_EVENT_INODE:
 		return data;
+	case FSNOTIFY_EVENT_DENTRY:
+		return d_inode(data);
 	case FSNOTIFY_EVENT_PATH:
 		return d_inode(((const struct path *)data)->dentry);
 	default:
@@ -232,6 +235,20 @@ static inline const struct inode *fsnotify_data_inode(const void *data,
 	}
 }
 
+static inline struct dentry *fsnotify_data_dentry(const void *data,
+						  int data_type)
+{
+	switch (data_type) {
+	case FSNOTIFY_EVENT_DENTRY:
+		/* Non const is needed for dget() */
+		return (struct dentry *)data;
+	case FSNOTIFY_EVENT_PATH:
+		return ((const struct path *)data)->dentry;
+	default:
+		return NULL;
+	}
+}
+
 static inline const struct path *fsnotify_data_path(const void *data,
 						    int data_type)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 07/16] fsnotify: replace inode pointer with tag
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (5 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 06/16] fsnotify: pass dentry instead of inode for events possible on child Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-26  8:20   ` Jan Kara
  2020-02-26  8:52   ` Jan Kara
  2020-02-17 13:14 ` [PATCH v2 08/16] fanotify: merge duplicate events on parent and child Amir Goldstein
                   ` (9 subsequent siblings)
  16 siblings, 2 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

The event inode field is used only for comparison in queue merges and
cannot be dereferenced after handle_event(), because it does not hold a
refcount on the inode.

Replace it with an abstract tag do to the same thing. We are going to
set this tag for values other than inode pointer in fanotify.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c        | 2 +-
 fs/notify/inotify/inotify_fsnotify.c | 2 +-
 include/linux/fsnotify_backend.h     | 8 +++-----
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 19ec7a4f4d50..98c3cbf29003 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -26,7 +26,7 @@ static bool should_merge(struct fsnotify_event *old_fsn,
 	old = FANOTIFY_E(old_fsn);
 	new = FANOTIFY_E(new_fsn);
 
-	if (old_fsn->inode != new_fsn->inode || old->pid != new->pid ||
+	if (old_fsn->tag != new_fsn->tag || old->pid != new->pid ||
 	    old->fh_type != new->fh_type || old->fh_len != new->fh_len)
 		return false;
 
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 6bb98522bbfd..4f42ea7b7fdd 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -39,7 +39,7 @@ static bool event_compare(struct fsnotify_event *old_fsn,
 	if (old->mask & FS_IN_IGNORED)
 		return false;
 	if ((old->mask == new->mask) &&
-	    (old_fsn->inode == new_fsn->inode) &&
+	    (old_fsn->tag == new_fsn->tag) &&
 	    (old->name_len == new->name_len) &&
 	    (!old->name_len || !strcmp(old->name, new->name)))
 		return true;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index bd3f6114a7a9..cd106b5c87a4 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -132,8 +132,7 @@ struct fsnotify_ops {
  */
 struct fsnotify_event {
 	struct list_head list;
-	/* inode may ONLY be dereferenced during handle_event(). */
-	struct inode *inode;	/* either the inode the event happened to or its parent */
+	unsigned long tag;	/* identifier for queue merges */
 };
 
 /*
@@ -542,11 +541,10 @@ extern void fsnotify_put_mark(struct fsnotify_mark *mark);
 extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info);
 extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info);
 
-static inline void fsnotify_init_event(struct fsnotify_event *event,
-				       struct inode *inode)
+static inline void fsnotify_init_event(struct fsnotify_event *event, void *tag)
 {
 	INIT_LIST_HEAD(&event->list);
-	event->inode = inode;
+	event->tag = (unsigned long)tag;
 }
 
 #else
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 08/16] fanotify: merge duplicate events on parent and child
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (6 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 07/16] fsnotify: replace inode pointer with tag Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-26  9:18   ` Jan Kara
  2020-02-17 13:14 ` [PATCH v2 09/16] fanotify: fix merging marks masks with FAN_ONDIR Amir Goldstein
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

With inotify, when a watch is set on a directory and on its child, an
event on the child is reported twice, once with wd of the parent watch
and once with wd of the child watch without the filename.

With fanotify, when a watch is set on a directory and on its child, an
event on the child is reported twice, but it has the exact same
information - either an open file descriptor of the child or an encoded
fid of the child.

The reason that the two identical events are not merged is because the
tag used for merging events in the queue is the child inode in one event
and parent inode in the other.

For events with path or dentry data, use the dentry instead of inode as
the tag for event merging, so that the event reported on parent will be
merged with the event reported on the child.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 98c3cbf29003..dab7e9895e02 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -282,6 +282,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
 	struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
 	const struct path *path = fsnotify_data_path(data, data_type);
+	struct dentry *dentry = fsnotify_data_dentry(data, data_type);
 
 	/*
 	 * For queues with unlimited length lost events are not expected and
@@ -312,7 +313,12 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	if (!event)
 		goto out;
 init: __maybe_unused
-	fsnotify_init_event(&event->fse, inode);
+	/*
+	 * Use the dentry instead of inode as tag for event queue, so event
+	 * reported on parent is merged with event reported on child when both
+	 * directory and child watches exist.
+	 */
+	fsnotify_init_event(&event->fse, (void *)dentry ?: inode);
 	event->mask = mask;
 	if (FAN_GROUP_FLAG(group, FAN_REPORT_TID))
 		event->pid = get_pid(task_pid(current));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 09/16] fanotify: fix merging marks masks with FAN_ONDIR
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (7 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 08/16] fanotify: merge duplicate events on parent and child Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 10/16] fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name Amir Goldstein
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Change the logic of FAN_ONDIR in two ways that are similar to the logic
of FAN_EVENT_ON_CHILD, that was fixed in commit 54a307ba8d3c ("fanotify:
fix logic of events on child"):

1. The flag is meaningless in ignore mask
2. The flag refers only to events in the mask of the mark where it is set

This is what the fanotify_mark.2 man page says about FAN_ONDIR:
"Without this flag, only events for files are created."  It doesn't
say anything about setting this flag in ignore mask to stop getting
events on directories nor can I think of any setup where this capability
would be useful.

Currently, when marks masks are merged, the FAN_ONDIR flag set in one
mark affects the events that are set in another mark's mask and this
behavior causes unexpected results.  For example, a user adds a mark on a
directory with mask FAN_ATTRIB | FAN_ONDIR and a mount mark with mask
FAN_OPEN (without FAN_ONDIR).  An opendir() of that directory (which is
inside that mount) generates a FAN_OPEN event even though neither of the
marks requested to get open events on directories.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index dab7e9895e02..36903542aa57 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -171,6 +171,13 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 		if (!fsnotify_iter_should_report_type(iter_info, type))
 			continue;
 		mark = iter_info->marks[type];
+		/*
+		 * If the event is on dir and this mark doesn't care about
+		 * events on dir, don't send it!
+		 */
+		if (event_mask & FS_ISDIR && !(mark->mask & FS_ISDIR))
+			continue;
+
 		/*
 		 * If the event is for a child and this mark doesn't care about
 		 * events on a child, don't send it!
@@ -203,10 +210,6 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 		user_mask &= ~FAN_ONDIR;
 	}
 
-	if (event_mask & FS_ISDIR &&
-	    !(marks_mask & FS_ISDIR & ~marks_ignored_mask))
-		return 0;
-
 	return test_mask & user_mask;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 10/16] fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (8 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 09/16] fanotify: fix merging marks masks with FAN_ONDIR Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's Amir Goldstein
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-api

Dirent events are going to be supported in two flavors:

1. Directory fid info + mask that includes the specific event types
   (e.g. FAN_CREATE) and an optional FAN_ONDIR flag.
2. Directory fid info + name + mask that includes only FAN_DIR_MODIFY.

To request the second event flavor, user needs to set the event type
FAN_DIR_MODIFY in the mark mask.

The first flavor is supported since kernel v5.1 for groups initialized
with flag FAN_REPORT_FID.  It is intended to be used for watching
directories in "batch mode" - the watcher is notified when directory is
changed and re-scans the directory content in response.  This event
flavor is stored more compactly in the event queue, so it is optimal
for workloads with frequent directory changes.

The second event flavor is intended to be used for watching large
directories, where the cost of re-scan of the directory on every change
is considered too high.  The watcher getting the event with the directory
fid and entry name is expected to call fstatat(2) to query the content of
the entry after the change.

Legacy inotify events are reported with name and event mask (e.g. "foo",
FAN_CREATE | FAN_ONDIR).  That can lead users to the conclusion that
there is *currently* an entry "foo" that is a sub-directory, when in fact
"foo" may be negative or non-dir by the time user gets the event.

To make it clear that the current state of the named entry is unknown,
when reporting an event with name info, fanotify obfuscates the specific
event types (e.g. create,delete,rename) and uses a common event type -
FAN_DIR_MODIFY to decribe the change.  This should make it harder for
users to make wrong assumptions and write buggy filesystem monitors.

At this point, name info reporting is not yet implemented, so trying to
set FAN_DIR_MODIFY in mark mask will return -EINVAL.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c    | 7 ++++---
 fs/notify/fsnotify.c             | 2 +-
 include/linux/fsnotify.h         | 6 ++++++
 include/linux/fsnotify_backend.h | 4 +++-
 include/uapi/linux/fanotify.h    | 1 +
 5 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 36903542aa57..1f60823931b7 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -194,9 +194,9 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 	test_mask = event_mask & marks_mask & ~marks_ignored_mask;
 
 	/*
-	 * dirent modification events (create/delete/move) do not carry the
-	 * child entry name/inode information. Instead, we report FAN_ONDIR
-	 * for mkdir/rmdir so user can differentiate them from creat/unlink.
+	 * For dirent modification events (create/delete/move) that do not carry
+	 * the child entry name information, we report FAN_ONDIR for mkdir/rmdir
+	 * so user can differentiate them from creat/unlink.
 	 *
 	 * For backward compatibility and consistency, do not report FAN_ONDIR
 	 * to user in legacy fanotify mode (reporting fd) and report FAN_ONDIR
@@ -399,6 +399,7 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 	BUILD_BUG_ON(FAN_MOVED_FROM != FS_MOVED_FROM);
 	BUILD_BUG_ON(FAN_CREATE != FS_CREATE);
 	BUILD_BUG_ON(FAN_DELETE != FS_DELETE);
+	BUILD_BUG_ON(FAN_DIR_MODIFY != FS_DIR_MODIFY);
 	BUILD_BUG_ON(FAN_DELETE_SELF != FS_DELETE_SELF);
 	BUILD_BUG_ON(FAN_MOVE_SELF != FS_MOVE_SELF);
 	BUILD_BUG_ON(FAN_EVENT_ON_CHILD != FS_EVENT_ON_CHILD);
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 193530f57963..72d332ce8e12 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -383,7 +383,7 @@ static __init int fsnotify_init(void)
 {
 	int ret;
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 26);
 
 	ret = init_srcu_struct(&fsnotify_mark_srcu);
 	if (ret)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 7ba40c19bc7e..fb54d9d70552 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -30,6 +30,12 @@ static inline void fsnotify_name(struct inode *dir, __u32 mask,
 				 const struct qstr *name, u32 cookie)
 {
 	fsnotify(dir, mask, child, FSNOTIFY_EVENT_INODE, name, cookie);
+	/*
+	 * Send another flavor of the event without child inode data and
+	 * without the specific event type (e.g. FS_CREATE|FS_IS_DIR).
+	 * The name is relative to the dir inode the event is reported to.
+	 */
+	fsnotify(dir, FS_DIR_MODIFY, dir, FSNOTIFY_EVENT_INODE, name, 0);
 }
 
 static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index cd106b5c87a4..310c639de04e 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -47,6 +47,7 @@
 #define FS_OPEN_PERM		0x00010000	/* open event in an permission hook */
 #define FS_ACCESS_PERM		0x00020000	/* access event in a permissions hook */
 #define FS_OPEN_EXEC_PERM	0x00040000	/* open/exec event in a permission hook */
+#define FS_DIR_MODIFY		0x00080000	/* Directory entry was modified */
 
 #define FS_EXCL_UNLINK		0x04000000	/* do not send events if object is unlinked */
 /* This inode cares about things that happen to its children.  Always set for
@@ -66,7 +67,8 @@
  * The watching parent may get an FS_ATTRIB|FS_EVENT_ON_CHILD event
  * when a directory entry inside a child subdir changes.
  */
-#define ALL_FSNOTIFY_DIRENT_EVENTS	(FS_CREATE | FS_DELETE | FS_MOVE)
+#define ALL_FSNOTIFY_DIRENT_EVENTS	(FS_CREATE | FS_DELETE | FS_MOVE | \
+					 FS_DIR_MODIFY)
 
 #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
 				  FS_OPEN_EXEC_PERM)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 2a1844edda47..615fa2c87179 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -24,6 +24,7 @@
 #define FAN_OPEN_PERM		0x00010000	/* File open in perm check */
 #define FAN_ACCESS_PERM		0x00020000	/* File accessed in perm check */
 #define FAN_OPEN_EXEC_PERM	0x00040000	/* File open/exec in perm check */
+#define FAN_DIR_MODIFY		0x00080000	/* Directory entry was modified */
 
 #define FAN_EVENT_ON_CHILD	0x08000000	/* Interested in child events */
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (9 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 10/16] fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-26 10:23   ` Jan Kara
  2020-02-17 13:14 ` [PATCH v2 12/16] fanotify: record name info for FAN_DIR_MODIFY event Amir Goldstein
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

For some events, we are going to encode both child and parent fid's,
so we need to do a little refactoring of struct fanotify_event and fid
helper functions.

Move fsid member from struct fanotify_fid out to struct fanotify_event,
so we can store fsid once for two encoded fid's (we will only encode
parent if it is on the same filesystem).

This does not change the size of struct fanotify_event because struct
fanotify_fid is still bigger than struct path on 32bit arch and is the
same size as struct path (16 bytes) on 64bit arch.

Group fh_len and fh_type as struct fanotify_fid_hdr.
Pass struct fanotify_fid and struct fanotify_fid_hdr to helpers
fanotify_encode_fid() and copy_fid_to_user() instead of passing the
containing fanotify_event struct.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 48 +++++++++++++------------
 fs/notify/fanotify/fanotify.h      | 58 ++++++++++++++++--------------
 fs/notify/fanotify/fanotify_user.c | 35 ++++++++++--------
 3 files changed, 78 insertions(+), 63 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 1f60823931b7..3bc28f08aad1 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -27,7 +27,7 @@ static bool should_merge(struct fsnotify_event *old_fsn,
 	new = FANOTIFY_E(new_fsn);
 
 	if (old_fsn->tag != new_fsn->tag || old->pid != new->pid ||
-	    old->fh_type != new->fh_type || old->fh_len != new->fh_len)
+	    old->fh.type != new->fh.type || old->fh.len != new->fh.len)
 		return false;
 
 	if (fanotify_event_has_path(old)) {
@@ -43,7 +43,8 @@ static bool should_merge(struct fsnotify_event *old_fsn,
 		 * unlink pair or rmdir+create pair of events.
 		 */
 		return (old->mask & FS_ISDIR) == (new->mask & FS_ISDIR) &&
-			fanotify_fid_equal(&old->fid, &new->fid, old->fh_len);
+			fanotify_fsid_equal(&old->fsid, &new->fsid) &&
+			fanotify_fid_equal(&old->fid, &new->fid, old->fh.len);
 	}
 
 	/* Do not merge events if we failed to encode fid */
@@ -213,18 +214,18 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 	return test_mask & user_mask;
 }
 
-static int fanotify_encode_fid(struct fanotify_event *event,
-			       struct inode *inode, gfp_t gfp,
-			       __kernel_fsid_t *fsid)
+static struct fanotify_fid_hdr fanotify_encode_fid(struct fanotify_fid *fid,
+						   struct inode *inode,
+						   gfp_t gfp)
 {
-	struct fanotify_fid *fid = &event->fid;
+	struct fanotify_fid_hdr fh = { };
 	int dwords, bytes = 0;
-	int err, type;
+	int err;
 
 	fid->ext_fh = NULL;
 	dwords = 0;
 	err = -ENOENT;
-	type = exportfs_encode_inode_fh(inode, NULL, &dwords, NULL);
+	fh.type = exportfs_encode_inode_fh(inode, NULL, &dwords, NULL);
 	if (!dwords)
 		goto out_err;
 
@@ -237,26 +238,25 @@ static int fanotify_encode_fid(struct fanotify_event *event,
 			goto out_err;
 	}
 
-	type = exportfs_encode_inode_fh(inode, fanotify_fid_fh(fid, bytes),
-					&dwords, NULL);
+	fh.type = exportfs_encode_inode_fh(inode, fanotify_fid_fh(fid, bytes),
+					   &dwords, NULL);
 	err = -EINVAL;
-	if (!type || type == FILEID_INVALID || bytes != dwords << 2)
+	if (!fh.type || fh.type == FILEID_INVALID || bytes != dwords << 2)
 		goto out_err;
 
-	fid->fsid = *fsid;
-	event->fh_len = bytes;
+	fh.len = bytes;
 
-	return type;
+	return fh;
 
 out_err:
-	pr_warn_ratelimited("fanotify: failed to encode fid (fsid=%x.%x, "
-			    "type=%d, bytes=%d, err=%i)\n",
-			    fsid->val[0], fsid->val[1], type, bytes, err);
+	pr_warn_ratelimited("fanotify: failed to encode fid (type=%d, len=%d, err=%i)\n",
+			    fh.type, bytes, err);
 	kfree(fid->ext_fh);
 	fid->ext_fh = NULL;
-	event->fh_len = 0;
+	fh.type = FILEID_INVALID;
+	fh.len = 0;
 
-	return FILEID_INVALID;
+	return fh;
 }
 
 /*
@@ -327,16 +327,18 @@ init: __maybe_unused
 		event->pid = get_pid(task_pid(current));
 	else
 		event->pid = get_pid(task_tgid(current));
-	event->fh_len = 0;
+	event->fh.len = 0;
+	if (fsid)
+		event->fsid = *fsid;
 	if (id && FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
 		/* Report the event without a file identifier on encode error */
 		event->fh_type = fanotify_encode_fid(event, id, gfp, fsid);
 	} else if (path) {
-		event->fh_type = FILEID_ROOT;
+		event->fh.type = FILEID_ROOT;
 		event->path = *path;
 		path_get(path);
 	} else {
-		event->fh_type = FILEID_INVALID;
+		event->fh.type = FILEID_INVALID;
 		event->path.mnt = NULL;
 		event->path.dentry = NULL;
 	}
@@ -485,7 +487,7 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
 	event = FANOTIFY_E(fsn_event);
 	if (fanotify_event_has_path(event))
 		path_put(&event->path);
-	else if (fanotify_event_has_ext_fh(event))
+	else if (fanotify_fid_has_ext_fh(&event->fh))
 		kfree(event->fid.ext_fh);
 	put_pid(event->pid);
 	if (fanotify_is_perm_event(event->mask)) {
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 68b30504284c..4fee002235b6 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -18,10 +18,10 @@ enum {
 
 /*
  * 3 dwords are sufficient for most local fs (64bit ino, 32bit generation).
- * For 32bit arch, fid increases the size of fanotify_event by 12 bytes and
- * fh_* fields increase the size of fanotify_event by another 4 bytes.
- * For 64bit arch, fid increases the size of fanotify_fid by 8 bytes and
- * fh_* fields are packed in a hole after mask.
+ * For 32bit arch, fsid and fid increase the size of fanotify_event by 12 bytes
+ * and fh.* fields increase the size of fanotify_event by another 4 bytes.
+ * For 64bit arch, fanotify_fid is the same size as struct path, fsid increases
+ * fanotify_event by 8 bytes and fh.* fields are packed in a hole after mask.
  */
 #if BITS_PER_LONG == 32
 #define FANOTIFY_INLINE_FH_LEN	(3 << 2)
@@ -29,28 +29,46 @@ enum {
 #define FANOTIFY_INLINE_FH_LEN	(4 << 2)
 #endif
 
+struct fanotify_fid_hdr {
+	u8 type;
+	u8 len;
+};
+
 struct fanotify_fid {
-	__kernel_fsid_t fsid;
 	union {
 		unsigned char fh[FANOTIFY_INLINE_FH_LEN];
 		unsigned char *ext_fh;
 	};
 };
 
+static inline bool fanotify_fid_has_fh(struct fanotify_fid_hdr *fh)
+{
+	return fh->type != FILEID_ROOT && fh->type != FILEID_INVALID;
+}
+
+static inline bool fanotify_fid_has_ext_fh(struct fanotify_fid_hdr *fh)
+{
+	return fanotify_fid_has_fh(fh) && fh->len > FANOTIFY_INLINE_FH_LEN;
+}
+
 static inline void *fanotify_fid_fh(struct fanotify_fid *fid,
 				    unsigned int fh_len)
 {
 	return fh_len <= FANOTIFY_INLINE_FH_LEN ? fid->fh : fid->ext_fh;
 }
 
+static inline bool fanotify_fsid_equal(__kernel_fsid_t *fsid1,
+				       __kernel_fsid_t *fsid2)
+{
+	return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1];
+}
+
 static inline bool fanotify_fid_equal(struct fanotify_fid *fid1,
 				      struct fanotify_fid *fid2,
 				      unsigned int fh_len)
 {
-	return fid1->fsid.val[0] == fid2->fsid.val[0] &&
-		fid1->fsid.val[1] == fid2->fsid.val[1] &&
-		!memcmp(fanotify_fid_fh(fid1, fh_len),
-			fanotify_fid_fh(fid2, fh_len), fh_len);
+	return !memcmp(fanotify_fid_fh(fid1, fh_len),
+		       fanotify_fid_fh(fid2, fh_len), fh_len);
 }
 
 /*
@@ -63,13 +81,13 @@ struct fanotify_event {
 	u32 mask;
 	/*
 	 * Those fields are outside fanotify_fid to pack fanotify_event nicely
-	 * on 64bit arch and to use fh_type as an indication of whether path
+	 * on 64bit arch and to use fh.type as an indication of whether path
 	 * or fid are used in the union:
 	 * FILEID_ROOT (0) for path, > 0 for fid, FILEID_INVALID for neither.
 	 */
-	u8 fh_type;
-	u8 fh_len;
+	struct fanotify_fid_hdr fh;
 	u16 pad;
+	__kernel_fsid_t fsid;
 	union {
 		/*
 		 * We hold ref to this path so it may be dereferenced at any
@@ -88,24 +106,12 @@ struct fanotify_event {
 
 static inline bool fanotify_event_has_path(struct fanotify_event *event)
 {
-	return event->fh_type == FILEID_ROOT;
+	return event->fh.type == FILEID_ROOT;
 }
 
 static inline bool fanotify_event_has_fid(struct fanotify_event *event)
 {
-	return event->fh_type != FILEID_ROOT &&
-		event->fh_type != FILEID_INVALID;
-}
-
-static inline bool fanotify_event_has_ext_fh(struct fanotify_event *event)
-{
-	return fanotify_event_has_fid(event) &&
-		event->fh_len > FANOTIFY_INLINE_FH_LEN;
-}
-
-static inline void *fanotify_event_fh(struct fanotify_event *event)
-{
-	return fanotify_fid_fh(&event->fid, event->fh_len);
+	return fanotify_fid_has_fh(&event->fh);
 }
 
 /*
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 0aa362b88550..beb9f0661a7c 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -51,14 +51,19 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
 
 #define FANOTIFY_EVENT_ALIGN 4
 
+static int fanotify_fid_info_len(struct fanotify_fid_hdr *fh)
+{
+	return roundup(sizeof(struct fanotify_event_info_fid) +
+		       sizeof(struct file_handle) + fh->len,
+		       FANOTIFY_EVENT_ALIGN);
+}
+
 static int fanotify_event_info_len(struct fanotify_event *event)
 {
 	if (!fanotify_event_has_fid(event))
 		return 0;
 
-	return roundup(sizeof(struct fanotify_event_info_fid) +
-		       sizeof(struct file_handle) + event->fh_len,
-		       FANOTIFY_EVENT_ALIGN);
+	return fanotify_fid_info_len(&event->fh);
 }
 
 /*
@@ -204,13 +209,14 @@ static int process_access_response(struct fsnotify_group *group,
 	return -ENOENT;
 }
 
-static int copy_fid_to_user(struct fanotify_event *event, char __user *buf)
+static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
+			    struct fanotify_fid *fid, char __user *buf)
 {
 	struct fanotify_event_info_fid info = { };
 	struct file_handle handle = { };
-	unsigned char bounce[FANOTIFY_INLINE_FH_LEN], *fh;
-	size_t fh_len = event->fh_len;
-	size_t len = fanotify_event_info_len(event);
+	unsigned char bounce[FANOTIFY_INLINE_FH_LEN], *data;
+	size_t fh_len = fh->len;
+	size_t len = fanotify_fid_info_len(fh);
 
 	if (!len)
 		return 0;
@@ -221,13 +227,13 @@ static int copy_fid_to_user(struct fanotify_event *event, char __user *buf)
 	/* Copy event info fid header followed by vaiable sized file handle */
 	info.hdr.info_type = FAN_EVENT_INFO_TYPE_FID;
 	info.hdr.len = len;
-	info.fsid = event->fid.fsid;
+	info.fsid = *fsid;
 	if (copy_to_user(buf, &info, sizeof(info)))
 		return -EFAULT;
 
 	buf += sizeof(info);
 	len -= sizeof(info);
-	handle.handle_type = event->fh_type;
+	handle.handle_type = fh->type;
 	handle.handle_bytes = fh_len;
 	if (copy_to_user(buf, &handle, sizeof(handle)))
 		return -EFAULT;
@@ -238,12 +244,12 @@ static int copy_fid_to_user(struct fanotify_event *event, char __user *buf)
 	 * For an inline fh, copy through stack to exclude the copy from
 	 * usercopy hardening protections.
 	 */
-	fh = fanotify_event_fh(event);
+	data = fanotify_fid_fh(fid, fh_len);
 	if (fh_len <= FANOTIFY_INLINE_FH_LEN) {
-		memcpy(bounce, fh, fh_len);
-		fh = bounce;
+		memcpy(bounce, data, fh_len);
+		data = bounce;
 	}
-	if (copy_to_user(buf, fh, fh_len))
+	if (copy_to_user(buf, data, fh_len))
 		return -EFAULT;
 
 	/* Pad with 0's */
@@ -301,7 +307,8 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	if (fanotify_event_has_path(event)) {
 		fd_install(fd, f);
 	} else if (fanotify_event_has_fid(event)) {
-		ret = copy_fid_to_user(event, buf + FAN_EVENT_METADATA_LEN);
+		ret = copy_fid_to_user(&event->fsid, &event->fh, &event->fid,
+				       buf + FAN_EVENT_METADATA_LEN);
 		if (ret < 0)
 			return ret;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 12/16] fanotify: record name info for FAN_DIR_MODIFY event
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (10 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 13/16] fanotify: report " Amir Goldstein
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

For FAN_DIR_MODIFY event, allocate a larger event struct to store the
dir entry name along side the directory fid.

We are going to add support for reporting parent fid, name and child fid
for events reported on children.  FAN_DIR_MODIFY event does not record
nor report the child fid, but in order to stay consistent with events
"on child", we store the directory fid in struct fanotify_name_event and
not in the base struct fanotify_event as we do for other event types.

This wastes a few unused bytes (16) of memory per FAN_DIR_MODIFY event,
but keeps the code simpler and avoids creating a custom kmem_cache pool
just for FAN_DIR_MODIFY events.

At this point, name info reporting is not yet implemented, so trying to
set FAN_DIR_MODIFY in mark mask will return -EINVAL.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 87 +++++++++++++++++++++++++++---
 fs/notify/fanotify/fanotify.h      | 65 +++++++++++++++++++++-
 fs/notify/fanotify/fanotify_user.c |  5 +-
 3 files changed, 149 insertions(+), 8 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 3bc28f08aad1..fc75dc53a218 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -33,7 +33,22 @@ static bool should_merge(struct fsnotify_event *old_fsn,
 	if (fanotify_event_has_path(old)) {
 		return old->path.mnt == new->path.mnt &&
 			old->path.dentry == new->path.dentry;
-	} else if (fanotify_event_has_fid(old)) {
+	}
+
+	if (!fanotify_fsid_equal(&old->fsid, &new->fsid))
+		return false;
+
+	if (fanotify_event_has_dfid_name(old)) {
+		if (!fanotify_dfid_name_equal(FANOTIFY_NE(old_fsn),
+					      FANOTIFY_NE(new_fsn)))
+			return false;
+
+		/* FAN_DIR_MODIFY does not encode the "child" fid */
+		if (!fanotify_event_has_fid(old))
+			return true;
+	}
+
+	if (fanotify_event_has_fid(old)) {
 		/*
 		 * We want to merge many dirent events in the same dir (i.e.
 		 * creates/unlinks/renames), but we do not want to merge dirent
@@ -43,7 +58,6 @@ static bool should_merge(struct fsnotify_event *old_fsn,
 		 * unlink pair or rmdir+create pair of events.
 		 */
 		return (old->mask & FS_ISDIR) == (new->mask & FS_ISDIR) &&
-			fanotify_fsid_equal(&old->fsid, &new->fsid) &&
 			fanotify_fid_equal(&old->fid, &new->fid, old->fh.len);
 	}
 
@@ -279,13 +293,16 @@ static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask,
 struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 					    struct inode *inode, u32 mask,
 					    const void *data, int data_type,
+					    const struct qstr *file_name,
 					    __kernel_fsid_t *fsid)
 {
 	struct fanotify_event *event = NULL;
+	struct fanotify_name_event *fne = NULL;
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
 	struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
 	const struct path *path = fsnotify_data_path(data, data_type);
 	struct dentry *dentry = fsnotify_data_dentry(data, data_type);
+	struct inode *dir = NULL;
 
 	/*
 	 * For queues with unlimited length lost events are not expected and
@@ -310,12 +327,56 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 		event = &pevent->fae;
 		pevent->response = 0;
 		pevent->state = FAN_EVENT_INIT;
+		/*
+		 * Make sure that fanotify_event_has_fid() and
+		 * fanotify_event_has_name() are false for permission events.
+		 */
+		id = NULL;
+		event->dfh.type = FILEID_ROOT;
+		goto init;
+	}
+
+	/*
+	 * For FAN_DIR_MODIFY event, we report the fid of the directory and
+	 * the name of the modified entry.
+	 * Allocate an fanotify_name_event struct and copy the name.
+	 */
+	if (mask & FAN_DIR_MODIFY && !(WARN_ON_ONCE(!file_name))) {
+		char *name = NULL;
+
+		/*
+		 * Make sure that fanotify_event_has_name() is true and that
+		 * fanotify_event_has_fid() is false for FAN_DIR_MODIFY events.
+		 */
+		id = NULL;
+		dir = inode;
+		if (file_name->len + 1 > FANOTIFY_INLINE_NAME_LEN) {
+			name = kmalloc(file_name->len + 1, gfp);
+			if (!name)
+				goto out;
+		}
+
+		fne = kmem_cache_alloc(fanotify_name_event_cachep, gfp);
+		if (!fne)
+			goto out;
+
+		event = &fne->fae;
+		if (!name)
+			name = fne->inline_name;
+		strcpy(name, file_name->name);
+		fne->name.name = name;
+		fne->name.len = file_name->len;
+		event->fh.type = FILEID_INVALID;
+		event->dfh.type = FILEID_INVALID;
 		goto init;
 	}
+
 	event = kmem_cache_alloc(fanotify_event_cachep, gfp);
 	if (!event)
 		goto out;
-init: __maybe_unused
+
+	event->dfh.type = FILEID_ROOT;
+init:
 	/*
 	 * Use the dentry instead of inode as tag for event queue, so event
 	 * reported on parent is merged with event reported on child when both
@@ -328,11 +389,16 @@ init: __maybe_unused
 	else
 		event->pid = get_pid(task_tgid(current));
 	event->fh.len = 0;
+	event->dfh.len = 0;
 	if (fsid)
 		event->fsid = *fsid;
-	if (id && FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
+	if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
 		/* Report the event without a file identifier on encode error */
-		event->fh_type = fanotify_encode_fid(event, id, gfp, fsid);
+		if (id)
+			event->fh = fanotify_encode_fid(&event->fid, id, gfp);
+		/* The reported name is relative to 'dir' */
+		if (fne)
+			event->dfh = fanotify_encode_fid(&fne->dfid, dir, gfp);
 	} else if (path) {
 		event->fh.type = FILEID_ROOT;
 		event->path = *path;
@@ -439,7 +505,7 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 	}
 
 	event = fanotify_alloc_event(group, inode, mask, data, data_type,
-				     &fsid);
+				     file_name, &fsid);
 	ret = -ENOMEM;
 	if (unlikely(!event)) {
 		/*
@@ -494,6 +560,15 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
 		kmem_cache_free(fanotify_perm_event_cachep,
 				FANOTIFY_PE(fsn_event));
 		return;
+	} else if (fanotify_event_has_dfid_name(event)) {
+		struct fanotify_name_event *fne = FANOTIFY_NE(fsn_event);
+
+		if (fanotify_fid_has_ext_fh(&event->dfh))
+			kfree(fne->dfid.ext_fh);
+		if (fanotify_event_has_ext_name(fne))
+			kfree(fne->name.name);
+		kmem_cache_free(fanotify_name_event_cachep, fne);
+		return;
 	}
 	kmem_cache_free(fanotify_event_cachep, event);
 }
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 4fee002235b6..e4a67a2d77b8 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -6,6 +6,7 @@
 
 extern struct kmem_cache *fanotify_mark_cache;
 extern struct kmem_cache *fanotify_event_cachep;
+extern struct kmem_cache *fanotify_name_event_cachep;
 extern struct kmem_cache *fanotify_perm_event_cachep;
 
 /* Possible states of the permission event */
@@ -84,9 +85,10 @@ struct fanotify_event {
 	 * on 64bit arch and to use fh.type as an indication of whether path
 	 * or fid are used in the union:
 	 * FILEID_ROOT (0) for path, > 0 for fid, FILEID_INVALID for neither.
+	 * Non zero dfh.type indicates embedded in an fanotify_name_event.
 	 */
 	struct fanotify_fid_hdr fh;
-	u16 pad;
+	struct fanotify_fid_hdr dfh;
 	__kernel_fsid_t fsid;
 	union {
 		/*
@@ -114,6 +116,66 @@ static inline bool fanotify_event_has_fid(struct fanotify_event *event)
 	return fanotify_fid_has_fh(&event->fh);
 }
 
+/*
+ * Structure for fanotify events with name info.
+ * DNAME_INLINE_LEN is good enough for dentry name, so it's good enough for us.
+ * It also happens to bring the size of this struct to 128 bytes on 64bit arch.
+ */
+#define FANOTIFY_INLINE_NAME_LEN DNAME_INLINE_LEN
+
+struct fanotify_name_event {
+	struct fanotify_event fae;
+	struct fanotify_fid  dfid;
+	struct qstr name;
+	unsigned char inline_name[FANOTIFY_INLINE_NAME_LEN];
+};
+
+static inline struct fanotify_name_event *
+FANOTIFY_NE(struct fsnotify_event *fse)
+{
+	return container_of(fse, struct fanotify_name_event, fae.fse);
+}
+
+static inline bool fanotify_event_has_dfid_name(struct fanotify_event *event)
+{
+	return event->dfh.type != FILEID_ROOT;
+}
+
+static inline unsigned int fanotify_event_name_len(struct fanotify_event *event)
+{
+	return event->dfh.type != FILEID_ROOT ?
+		FANOTIFY_NE(&event->fse)->name.len : 0;
+}
+
+static inline bool fanotify_event_has_ext_name(struct fanotify_name_event *fne)
+{
+	return fne->name.len + 1 > FANOTIFY_INLINE_NAME_LEN;
+}
+
+static inline bool fanotify_dfid_name_equal(struct fanotify_name_event *fne1,
+					    struct fanotify_name_event *fne2)
+{
+	struct qstr *name1 = &fne1->name;
+	struct qstr *name2 = &fne2->name;
+	struct fanotify_fid_hdr *dfh1 = &fne1->fae.dfh;
+	struct fanotify_fid_hdr *dfh2 = &fne2->fae.dfh;
+
+	if (dfh1->type != dfh2->type || dfh1->len != dfh2->len ||
+	    name1->len != name2->len)
+		return false;
+
+	/* Could be pointing to same external_name */
+	if (name1->len && name1->name != name2->name &&
+	    strcmp(name1->name, name2->name))
+		return false;
+
+	/* No dfid means that encoding failed */
+	if (!dfh1->len)
+		return true;
+
+	return fanotify_fid_equal(&fne1->dfid, &fne2->dfid, dfh1->len);
+}
+
 /*
  * Structure for permission fanotify events. It gets allocated and freed in
  * fanotify_handle_event() since we wait there for user response. When the
@@ -148,4 +210,5 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
 struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 					    struct inode *inode, u32 mask,
 					    const void *data, int data_type,
+					    const struct qstr *file_name,
 					    __kernel_fsid_t *fsid);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index beb9f0661a7c..284f3548bb79 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -47,6 +47,7 @@ extern const struct fsnotify_ops fanotify_fsnotify_ops;
 
 struct kmem_cache *fanotify_mark_cache __read_mostly;
 struct kmem_cache *fanotify_event_cachep __read_mostly;
+struct kmem_cache *fanotify_name_event_cachep __read_mostly;
 struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
 
 #define FANOTIFY_EVENT_ALIGN 4
@@ -831,7 +832,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	group->memcg = get_mem_cgroup_from_mm(current->mm);
 
 	oevent = fanotify_alloc_event(group, NULL, FS_Q_OVERFLOW, NULL,
-				      FSNOTIFY_EVENT_NONE, NULL);
+				      FSNOTIFY_EVENT_NONE, NULL, NULL);
 	if (unlikely(!oevent)) {
 		fd = -ENOMEM;
 		goto out_destroy_group;
@@ -1147,6 +1148,8 @@ static int __init fanotify_user_setup(void)
 	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
 					 SLAB_PANIC|SLAB_ACCOUNT);
 	fanotify_event_cachep = KMEM_CACHE(fanotify_event, SLAB_PANIC);
+	fanotify_name_event_cachep = KMEM_CACHE(fanotify_name_event,
+						SLAB_PANIC);
 	if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) {
 		fanotify_perm_event_cachep =
 			KMEM_CACHE(fanotify_perm_event, SLAB_PANIC);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 13/16] fanotify: report name info for FAN_DIR_MODIFY event
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (11 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 12/16] fanotify: record name info for FAN_DIR_MODIFY event Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-19  9:43   ` kbuild test robot
                     ` (3 more replies)
  2020-02-17 13:14 ` [PATCH v2 14/16] fanotify: report parent fid + name with FAN_REPORT_NAME Amir Goldstein
                   ` (3 subsequent siblings)
  16 siblings, 4 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-api

Report event FAN_DIR_MODIFY with name in a variable length record similar
to how fid's are reported.  With name info reporting implemented, setting
FAN_DIR_MODIFY in mark mask is now allowed.

When events are reported with name, the reported fid identifies the
directory and the name follows the fid. The info record type for this
event info is FAN_EVENT_INFO_TYPE_DFID_NAME.

For now, all reported events have at most one info record which is
either FAN_EVENT_INFO_TYPE_FID or FAN_EVENT_INFO_TYPE_DFID_NAME (for
FAN_DIR_MODIFY).  Later on, events "on child" will report both records.

There are several ways that an application can use this information:

1. When watching a single directory, the name is always relative to
the watched directory, so application need to fstatat(2) the name
relative to the watched directory.

2. When watching a set of directories, the application could keep a map
of dirfd for all watched directories and hash the map by fid obtained
with name_to_handle_at(2).  When getting a name event, the fid in the
event info could be used to lookup the base dirfd in the map and then
call fstatat(2) with that dirfd.

3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
directories, the application could use open_by_handle_at(2) with the fid
in event info to obtain dirfd for the directory where event happened and
call fstatat(2) with this dirfd.

The last option scales better for a large number of watched directories.
The first two options may be available in the future also for non
privileged fanotify watchers, because open_by_handle_at(2) requires
the CAP_DAC_READ_SEARCH capability.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      |   2 +-
 fs/notify/fanotify/fanotify_user.c | 120 ++++++++++++++++++++++-------
 include/linux/fanotify.h           |   3 +-
 include/uapi/linux/fanotify.h      |   1 +
 4 files changed, 98 insertions(+), 28 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index fc75dc53a218..b651c18d3a93 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -478,7 +478,7 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 	BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
 	BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
 
 	mask = fanotify_group_event_mask(group, iter_info, mask, data,
 					 data_type);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 284f3548bb79..a1bafc21ebbb 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -51,20 +51,32 @@ struct kmem_cache *fanotify_name_event_cachep __read_mostly;
 struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
 
 #define FANOTIFY_EVENT_ALIGN 4
+#define FANOTIFY_INFO_HDR_LEN \
+	(sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
 
-static int fanotify_fid_info_len(struct fanotify_fid_hdr *fh)
+static int fanotify_fid_info_len(int fh_len, int name_len)
 {
-	return roundup(sizeof(struct fanotify_event_info_fid) +
-		       sizeof(struct file_handle) + fh->len,
-		       FANOTIFY_EVENT_ALIGN);
+	int info_len = fh_len;
+
+	if (name_len)
+		info_len += name_len + 1;
+
+	return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
 }
 
 static int fanotify_event_info_len(struct fanotify_event *event)
 {
-	if (!fanotify_event_has_fid(event))
-		return 0;
+	int info_len = 0;
+
+	if (fanotify_event_has_fid(event))
+		info_len += fanotify_fid_info_len(event->fh.len, 0);
+
+	if (fanotify_event_has_dfid_name(event)) {
+		info_len += fanotify_fid_info_len(event->dfh.len,
+					fanotify_event_name_len(event));
+	}
 
-	return fanotify_fid_info_len(&event->fh);
+	return info_len;
 }
 
 /*
@@ -210,23 +222,34 @@ static int process_access_response(struct fsnotify_group *group,
 	return -ENOENT;
 }
 
-static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
-			    struct fanotify_fid *fid, char __user *buf)
+static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
+			     struct fanotify_fid *fid, const struct qstr *name,
+			     char __user *buf, size_t count)
 {
 	struct fanotify_event_info_fid info = { };
 	struct file_handle handle = { };
-	unsigned char bounce[FANOTIFY_INLINE_FH_LEN], *data;
+	unsigned char bounce[max(FANOTIFY_INLINE_FH_LEN, DNAME_INLINE_LEN)];
+	const unsigned char *data;
 	size_t fh_len = fh->len;
-	size_t len = fanotify_fid_info_len(fh);
+	size_t name_len = name ? name->len : 0;
+	size_t info_len = fanotify_fid_info_len(fh_len, name_len);
+	size_t len = info_len;
+
+	pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
+		 __func__, fh_len, name_len, info_len, count);
 
-	if (!len)
+	if (!fh_len || (name && !name_len))
 		return 0;
 
-	if (WARN_ON_ONCE(len < sizeof(info) + sizeof(handle) + fh_len))
+	if (WARN_ON_ONCE(len < sizeof(info) || len > count))
 		return -EFAULT;
 
-	/* Copy event info fid header followed by vaiable sized file handle */
-	info.hdr.info_type = FAN_EVENT_INFO_TYPE_FID;
+	/*
+	 * Copy event info fid header followed by vaiable sized file handle
+	 * and optionally followed by vaiable sized filename.
+	 */
+	info.hdr.info_type = name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME :
+					FAN_EVENT_INFO_TYPE_FID;
 	info.hdr.len = len;
 	info.fsid = *fsid;
 	if (copy_to_user(buf, &info, sizeof(info)))
@@ -234,6 +257,9 @@ static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
 
 	buf += sizeof(info);
 	len -= sizeof(info);
+	if (WARN_ON_ONCE(len < sizeof(handle)))
+		return -EFAULT;
+
 	handle.handle_type = fh->type;
 	handle.handle_bytes = fh_len;
 	if (copy_to_user(buf, &handle, sizeof(handle)))
@@ -241,9 +267,12 @@ static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
 
 	buf += sizeof(handle);
 	len -= sizeof(handle);
+	if (WARN_ON_ONCE(len < fh_len))
+		return -EFAULT;
+
 	/*
-	 * For an inline fh, copy through stack to exclude the copy from
-	 * usercopy hardening protections.
+	 * For an inline fh and inline file name, copy through stack to exclude
+	 * the copy from usercopy hardening protections.
 	 */
 	data = fanotify_fid_fh(fid, fh_len);
 	if (fh_len <= FANOTIFY_INLINE_FH_LEN) {
@@ -253,14 +282,33 @@ static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
 	if (copy_to_user(buf, data, fh_len))
 		return -EFAULT;
 
-	/* Pad with 0's */
 	buf += fh_len;
 	len -= fh_len;
+
+	if (name_len) {
+		/* Copy the filename with terminating null */
+		name_len++;
+		if (WARN_ON_ONCE(len < name_len))
+			return -EFAULT;
+
+		data = name->name;
+		if (name_len <= DNAME_INLINE_LEN) {
+			memcpy(bounce, data, name_len);
+			data = bounce;
+		}
+		if (copy_to_user(buf, data, name_len))
+			return -EFAULT;
+
+		buf += name_len;
+		len -= name_len;
+	}
+
+	/* Pad with 0's */
 	WARN_ON_ONCE(len < 0 || len >= FANOTIFY_EVENT_ALIGN);
 	if (len > 0 && clear_user(buf, len))
 		return -EFAULT;
 
-	return 0;
+	return info_len;
 }
 
 static ssize_t copy_event_to_user(struct fsnotify_group *group,
@@ -282,12 +330,12 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
 	metadata.pid = pid_vnr(event->pid);
 
-	if (fanotify_event_has_path(event)) {
+	if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
+		metadata.event_len += fanotify_event_info_len(event);
+	} else if (fanotify_event_has_path(event)) {
 		fd = create_fd(group, event, &f);
 		if (fd < 0)
 			return fd;
-	} else if (fanotify_event_has_fid(event)) {
-		metadata.event_len += fanotify_event_info_len(event);
 	}
 	metadata.fd = fd;
 
@@ -302,16 +350,36 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	if (copy_to_user(buf, &metadata, FAN_EVENT_METADATA_LEN))
 		goto out_close_fd;
 
+	buf += FAN_EVENT_METADATA_LEN;
+	count -= FAN_EVENT_METADATA_LEN;
+
 	if (fanotify_is_perm_event(event->mask))
 		FANOTIFY_PE(fsn_event)->fd = fd;
 
-	if (fanotify_event_has_path(event)) {
+	if (f)
 		fd_install(fd, f);
-	} else if (fanotify_event_has_fid(event)) {
-		ret = copy_fid_to_user(&event->fsid, &event->fh, &event->fid,
-				       buf + FAN_EVENT_METADATA_LEN);
+
+	/* Event info records order is: dir fid + name, child fid */
+	if (fanotify_event_has_dfid_name(event)) {
+		struct fanotify_name_event *fne = FANOTIFY_NE(fsn_event);
+
+		ret = copy_info_to_user(&event->fsid, &event->dfh, &fne->dfid,
+					&fne->name, buf, count);
 		if (ret < 0)
 			return ret;
+
+		buf += ret;
+		count -= ret;
+	}
+
+	if (fanotify_event_has_fid(event)) {
+		ret = copy_info_to_user(&event->fsid, &event->fh, &event->fid,
+					NULL, buf, count);
+		if (ret < 0)
+			return ret;
+
+		buf += ret;
+		count -= ret;
 	}
 
 	return metadata.event_len;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index b79fa9bb7359..3049a6c06d9e 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -47,7 +47,8 @@
  * Directory entry modification events - reported only to directory
  * where entry is modified and not to a watching parent.
  */
-#define FANOTIFY_DIRENT_EVENTS	(FAN_MOVE | FAN_CREATE | FAN_DELETE)
+#define FANOTIFY_DIRENT_EVENTS	(FAN_MOVE | FAN_CREATE | FAN_DELETE | \
+				 FAN_DIR_MODIFY)
 
 /* Events that can only be reported with data type FSNOTIFY_EVENT_INODE */
 #define FANOTIFY_INODE_EVENTS	(FANOTIFY_DIRENT_EVENTS | \
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 615fa2c87179..2b56e194b858 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -117,6 +117,7 @@ struct fanotify_event_metadata {
 };
 
 #define FAN_EVENT_INFO_TYPE_FID		1
+#define FAN_EVENT_INFO_TYPE_DFID_NAME	2
 
 /* Variable length info record following event metadata */
 struct fanotify_event_info_header {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 14/16] fanotify: report parent fid + name with FAN_REPORT_NAME
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (12 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 13/16] fanotify: report " Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-17 13:14 ` [PATCH v2 15/16] fanotify: refine rules for when name is reported Amir Goldstein
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-api

For a group with fanotify_init() flag FAN_REPORT_NAME, we report the
parent fid and name for events possible "on child" (e.g. FAN_MODIFY)
in addition to reporting the child fid.

The flag FAN_REPORT_NAME requires the flag FAN_REPORT_FID and there is
a constant for setting both flags named FAN_REPORT_FID_NAME.

The parent fid and name are reported with an info record of type
FAN_EVENT_INFO_TYPE_DFID_NAME, similar to the way that name info is
reported for FAN_DIR_MODIFY events.

The child fid is reported with another info record of type
FAN_EVENT_INFO_TYPE_FID that follows the first info record, with the
same fid info that is reported to a group with FAN_REPORT_FID flag.

Events with name are reported the same way when reported to sb, mount
or inode marks and when reported to a directory watching children.

Events not possible "on child" (e.g. FAN_DELETE_SELF) are reported
with a single FAN_EVENT_INFO_TYPE_FID record, same as they are reported
to a group with FAN_REPORT_FID flag.

If parent is unknown (dentry is disconnected) or parent is not on the
same filesystem as child (dentry is sb root), event is also reported
with a single FAN_EVENT_INFO_TYPE_FID record.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 25 +++++++++++++++++++++++--
 fs/notify/fanotify/fanotify_user.c |  6 +++++-
 include/linux/fanotify.h           |  2 +-
 include/uapi/linux/fanotify.h      |  4 ++++
 4 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index b651c18d3a93..43c338a8a2f1 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -302,6 +302,8 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
 	const struct path *path = fsnotify_data_path(data, data_type);
 	struct dentry *dentry = fsnotify_data_dentry(data, data_type);
+	struct dentry *parent = NULL;
+	struct name_snapshot child_name;
 	struct inode *dir = NULL;
 
 	/*
@@ -339,17 +341,32 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	/*
 	 * For FAN_DIR_MODIFY event, we report the fid of the directory and
 	 * the name of the modified entry.
+	 * With flag FAN_REPORT_NAME, we report the parent fid and name for
+	 * events possible "on child" in addition to reporting the child fid.
+	 * If parent is unknown (dentry is disconnected) or parent is not on the
+	 * same filesystem as child (dentry is sb root), only "child" fid is
+	 * reported. Events are reported the same way when reported to sb, mount
+	 * or inode marks and when reported to a directory watching children.
 	 * Allocate an fanotify_name_event struct and copy the name.
 	 */
 	if (mask & FAN_DIR_MODIFY && !(WARN_ON_ONCE(!file_name))) {
-		char *name = NULL;
-
 		/*
 		 * Make sure that fanotify_event_has_name() is true and that
 		 * fanotify_event_has_fid() is false for FAN_DIR_MODIFY events.
 		 */
 		id = NULL;
 		dir = inode;
+	} else if (FAN_GROUP_FLAG(group, FAN_REPORT_NAME) &&
+		   mask & FS_EVENTS_POSS_ON_CHILD &&
+		   likely(dentry && !IS_ROOT(dentry))) {
+		parent = dget_parent(dentry);
+		dir = d_inode(parent);
+		take_dentry_name_snapshot(&child_name, dentry);
+		file_name = &child_name.name;
+	}
+	if (dir) {
+		char *name = NULL;
+
 		if (file_name->len + 1 > FANOTIFY_INLINE_NAME_LEN) {
 			name = kmalloc(file_name->len + 1, gfp);
 			if (!name)
@@ -409,6 +426,10 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 		event->path.dentry = NULL;
 	}
 out:
+	if (parent) {
+		dput(parent);
+		release_dentry_name_snapshot(&child_name);
+	}
 	memalloc_unuse_memcg();
 	return event;
 }
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index a1bafc21ebbb..5d369aa5d1bc 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -875,6 +875,10 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	    (flags & FANOTIFY_CLASS_BITS) != FAN_CLASS_NOTIF)
 		return -EINVAL;
 
+	/* Child name is reported with partent fid */
+	if ((flags & FAN_REPORT_NAME) && !(flags & FAN_REPORT_FID))
+		return -EINVAL;
+
 	user = get_current_user();
 	if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) {
 		free_uid(user);
@@ -1210,7 +1214,7 @@ COMPAT_SYSCALL_DEFINE6(fanotify_mark,
  */
 static int __init fanotify_user_setup(void)
 {
-	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 8);
+	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 9);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
 
 	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 3049a6c06d9e..5412a25c54c0 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -19,7 +19,7 @@
 				 FAN_CLASS_PRE_CONTENT)
 
 #define FANOTIFY_INIT_FLAGS	(FANOTIFY_CLASS_BITS | \
-				 FAN_REPORT_TID | FAN_REPORT_FID | \
+				 FAN_REPORT_TID | FAN_REPORT_FID_NAME | \
 				 FAN_CLOEXEC | FAN_NONBLOCK | \
 				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
 
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 2b56e194b858..04181769bb50 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -54,6 +54,10 @@
 /* Flags to determine fanotify event format */
 #define FAN_REPORT_TID		0x00000100	/* event->pid is thread id */
 #define FAN_REPORT_FID		0x00000200	/* Report unique file id */
+#define FAN_REPORT_NAME		0x00000400	/* Report events with name */
+
+/* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_FID */
+#define FAN_REPORT_FID_NAME	(FAN_REPORT_FID | FAN_REPORT_NAME)
 
 /* Deprecated - do not use this in programs and do not add new flags here! */
 #define FAN_ALL_INIT_FLAGS	(FAN_CLOEXEC | FAN_NONBLOCK | \
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 15/16] fanotify: refine rules for when name is reported
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (13 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 14/16] fanotify: report parent fid + name with FAN_REPORT_NAME Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-17 13:14 ` [BONUS][PATCH v2 16/16] fanotify: support limited functionality for unprivileged users Amir Goldstein
  2020-02-20 22:10 ` [PATCH v2 00/16] Fanotify event with name info Matthew Bobrowski
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

With FAN_REPORT_NAME, name will be reported if event is in the mask of a
watching parent or filesystem mark.

Name will not be reported if event is only in the mask of a mark on the
victim inode itself.

If event is only in the mask of a marked mount, name will be reported if
the victim inode is not the mount's root.  Note that the mount's root
could be a non-directory in case of bind mount.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c | 37 +++++++++++++++++++++++++++++------
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 43c338a8a2f1..45203c1484b9 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -202,6 +202,32 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 		     !(mark->mask & FS_EVENT_ON_CHILD)))
 			continue;
 
+		/*
+		 * fanotify_alloc_event() uses the "on child" flag as indication
+		 * for reporting name, but the flag will be masked out before
+		 * reporting to user.
+		 *
+		 * With FAN_REPORT_NAME, name will be reported if event is in
+		 * the mask of a watching parent or filesystem mark.
+		 * name will not be reported if event is only in the mask of a
+		 * mark on the victim inode itself.
+		 * If event is only in the mask of a marked mount, name will be
+		 * reported if the victim inode is not the mount's root. Note
+		 * that the mount's root could be a non-directory in case of
+		 * bind mount.
+		 */
+		if (FAN_GROUP_FLAG(group, FAN_REPORT_NAME) &&
+		    event_mask & mark->mask & FS_EVENTS_POSS_ON_CHILD) {
+			user_mask |= FS_EVENT_ON_CHILD;
+			if (type == FSNOTIFY_OBJ_TYPE_SB ||
+			    (type == FSNOTIFY_OBJ_TYPE_VFSMOUNT &&
+			     !WARN_ON_ONCE(data_type != FSNOTIFY_EVENT_PATH) &&
+			     path->dentry != path->mnt->mnt_root)) {
+				event_mask |= FS_EVENT_ON_CHILD;
+				marks_mask |= FS_EVENT_ON_CHILD;
+			}
+		}
+
 		marks_mask |= mark->mask;
 		marks_ignored_mask |= mark->ignored_mask;
 	}
@@ -344,9 +370,8 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	 * With flag FAN_REPORT_NAME, we report the parent fid and name for
 	 * events possible "on child" in addition to reporting the child fid.
 	 * If parent is unknown (dentry is disconnected) or parent is not on the
-	 * same filesystem as child (dentry is sb root), only "child" fid is
-	 * reported. Events are reported the same way when reported to sb, mount
-	 * or inode marks and when reported to a directory watching children.
+	 * same filesystem/mount as child (dentry is sb/mount root), only the
+	 * "child" fid is reported.
 	 * Allocate an fanotify_name_event struct and copy the name.
 	 */
 	if (mask & FAN_DIR_MODIFY && !(WARN_ON_ONCE(!file_name))) {
@@ -357,7 +382,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 		id = NULL;
 		dir = inode;
 	} else if (FAN_GROUP_FLAG(group, FAN_REPORT_NAME) &&
-		   mask & FS_EVENTS_POSS_ON_CHILD &&
+		   mask & FS_EVENT_ON_CHILD &&
 		   likely(dentry && !IS_ROOT(dentry))) {
 		parent = dget_parent(dentry);
 		dir = d_inode(parent);
@@ -400,7 +425,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	 * directory and child watches exist.
 	 */
 	fsnotify_init_event(&event->fse, (void *)dentry ?: inode);
-	event->mask = mask;
+	event->mask = mask & FANOTIFY_OUTGOING_EVENTS;
 	if (FAN_GROUP_FLAG(group, FAN_REPORT_TID))
 		event->pid = get_pid(task_pid(current));
 	else
@@ -503,7 +528,7 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 
 	mask = fanotify_group_event_mask(group, iter_info, mask, data,
 					 data_type);
-	if (!mask)
+	if (!(mask & FANOTIFY_OUTGOING_EVENTS))
 		return 0;
 
 	pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [BONUS][PATCH v2 16/16] fanotify: support limited functionality for unprivileged users
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (14 preceding siblings ...)
  2020-02-17 13:14 ` [PATCH v2 15/16] fanotify: refine rules for when name is reported Amir Goldstein
@ 2020-02-17 13:14 ` Amir Goldstein
  2020-02-20 22:10 ` [PATCH v2 00/16] Fanotify event with name info Matthew Bobrowski
  16 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-17 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-api, Matthew Bobrowski

Add support for new fanotify_init() flag FAN_UNPRIVILEGED.
User may request an unprivileged event listener using this flag even if
user is privileged.

An unprivileged event listener does not get an open file descriptor in
the event nor the process pid of another process.  An unprivileged event
listener cannot request permission events, cannot set mount/filesystem
marks and cannot request unlimited queue/marks.

This enables the limited functionality similar to inotify when watching a
set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
requiring SYS_CAP_ADMIN privileges.

The FAN_DIR_MODIFY event and FAN_REPORT_FID_NAME init flag, provide a
method for an unprivileged event listener watching a set of directories
(with FAN_EVENT_ON_CHILD) to monitor all changes inside those directories.

This typically requires that the listener keeps a map of watched
directory fid to dirfd (O_PATH), where fid is obtained with
name_to_handle_at() before starting to watch for changes.

When getting an event, the reported fid of the parent should be resolved
to dirfd and fstatsat(2) with dirfd and name should be used to query the
state of the filesystem entry.

Note that even though events do not report the event creator pid,
fanotify does not merge similar events on the same object that were
generated by different processes. This is aligned with exiting behavior
when generating processes are outside of the listener pidns (which
results in reporting 0 pid to listener).

Cc: <linux-api@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 42 ++++++++++++++++++++++++++----
 include/linux/fanotify.h           | 16 +++++++++++-
 include/uapi/linux/fanotify.h      |  1 +
 3 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 5d369aa5d1bc..ac2cdb5287fe 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -328,11 +328,21 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	metadata.vers = FANOTIFY_METADATA_VERSION;
 	metadata.reserved = 0;
 	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
-	metadata.pid = pid_vnr(event->pid);
+	/*
+	 * An unprivileged event listener does not get an open file descriptor
+	 * in the event nor another generating process pid. If the event was
+	 * generated by the unprivileged process itself, self pid is reported.
+	 */
+	if (!FAN_GROUP_FLAG(group, FAN_UNPRIVILEGED) ||
+	    task_tgid(current) == event->pid)
+		metadata.pid = pid_vnr(event->pid);
+	else
+		metadata.pid = 0;
 
 	if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
 		metadata.event_len += fanotify_event_info_len(event);
-	} else if (fanotify_event_has_path(event)) {
+	} else if (!FAN_GROUP_FLAG(group, FAN_UNPRIVILEGED) &&
+		   fanotify_event_has_path(event)) {
 		fd = create_fd(group, event, &f);
 		if (fd < 0)
 			return fd;
@@ -845,12 +855,26 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	int f_flags, fd;
 	struct user_struct *user;
 	struct fanotify_event *oevent;
+	unsigned int class = flags & FANOTIFY_CLASS_BITS;
 
 	pr_debug("%s: flags=%x event_f_flags=%x\n",
 		 __func__, flags, event_f_flags);
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (flags & FAN_UNPRIVILEGED) {
+		/*
+		 * User can request an unprivileged event listener even if
+		 * user is privileged. An unprivileged event listener does not
+		 * get an open file descriptor in the event nor the proccess id
+		 * of another process. An unprivileged event listener and cannot
+		 * request permission events, cannot set mount/filesystem marks
+		 * and cannot request unlimited queue/marks.
+		 */
+		if ((flags & ~FANOTIFY_UNPRIV_INIT_FLAGS) ||
+		    class != FAN_CLASS_NOTIF)
+			return -EINVAL;
+	} else if (!capable(CAP_SYS_ADMIN)) {
 		return -EPERM;
+	}
 
 #ifdef CONFIG_AUDITSYSCALL
 	if (flags & ~(FANOTIFY_INIT_FLAGS | FAN_ENABLE_AUDIT))
@@ -916,7 +940,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	group->fanotify_data.f_flags = event_f_flags;
 	init_waitqueue_head(&group->fanotify_data.access_waitq);
 	INIT_LIST_HEAD(&group->fanotify_data.access_list);
-	switch (flags & FANOTIFY_CLASS_BITS) {
+	switch (class) {
 	case FAN_CLASS_NOTIF:
 		group->priority = FS_PRIO_0;
 		break;
@@ -1101,6 +1125,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	    group->priority == FS_PRIO_0)
 		goto fput_and_out;
 
+	/*
+	 * An unprivileged event listener is not allowed to watch a mount
+	 * point nor a filesystem.
+	 */
+	if (FAN_GROUP_FLAG(group, FAN_UNPRIVILEGED) &&
+	    mark_type != FAN_MARK_INODE)
+		goto fput_and_out;
+
 	/*
 	 * Events with data type inode do not carry enough information to report
 	 * event->fd, so we do not allow setting a mask for inode events unless
@@ -1214,7 +1246,7 @@ COMPAT_SYSCALL_DEFINE6(fanotify_mark,
  */
 static int __init fanotify_user_setup(void)
 {
-	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 9);
+	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
 
 	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 5412a25c54c0..93107b44e4e1 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -21,7 +21,21 @@
 #define FANOTIFY_INIT_FLAGS	(FANOTIFY_CLASS_BITS | \
 				 FAN_REPORT_TID | FAN_REPORT_FID_NAME | \
 				 FAN_CLOEXEC | FAN_NONBLOCK | \
-				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
+				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS | \
+				 FAN_UNPRIVILEGED)
+
+/*
+ * fanotify_init() flags allowed for unprivileged listener.
+ * FAN_CLASS_NOTIF in this mask is purely semantic because it is zero,
+ * but it is the only class we allow for unprivileged listener.
+ * Since unprivileged listener does not provide file descriptors in events,
+ * FAN_REPORT_FID_NAME makes sense, but it is not a must.
+ * FAN_REPORT_TID does not make sense for unprivileged listener, which uses
+ * event->pid only to filter out events generated by listener process itself.
+ */
+#define FANOTIFY_UNPRIV_INIT_FLAGS	(FAN_CLOEXEC | FAN_NONBLOCK | \
+					 FAN_CLASS_NOTIF | FAN_UNPRIVILEGED | \
+					 FAN_REPORT_FID_NAME)
 
 #define FANOTIFY_MARK_TYPE_BITS	(FAN_MARK_INODE | FAN_MARK_MOUNT | \
 				 FAN_MARK_FILESYSTEM)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 04181769bb50..2be673862a43 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -50,6 +50,7 @@
 #define FAN_UNLIMITED_QUEUE	0x00000010
 #define FAN_UNLIMITED_MARKS	0x00000020
 #define FAN_ENABLE_AUDIT	0x00000040
+#define FAN_UNPRIVILEGED	0x00000080
 
 /* Flags to determine fanotify event format */
 #define FAN_REPORT_TID		0x00000100	/* event->pid is thread id */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 13/16] fanotify: report name info for FAN_DIR_MODIFY event
  2020-02-17 13:14 ` [PATCH v2 13/16] fanotify: report " Amir Goldstein
@ 2020-02-19  9:43   ` kbuild test robot
  2020-02-19 10:17   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 65+ messages in thread
From: kbuild test robot @ 2020-02-19  9:43 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: kbuild-all, Jan Kara, linux-fsdevel, linux-api

[-- Attachment #1: Type: text/plain, Size: 8632 bytes --]

Hi Amir,

I love your patch! Perhaps something to improve:

[auto build test WARNING on 11a48a5a18c63fd7621bb050228cebf13566e4d8]

url:    https://github.com/0day-ci/linux/commits/Amir-Goldstein/Fanotify-event-with-name-info/20200219-160517
base:    11a48a5a18c63fd7621bb050228cebf13566e4d8
config: c6x-randconfig-a001-20200219 (attached as .config)
compiler: c6x-elf-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=c6x 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:15:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/seqlock.h:36,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/fcntl.h:5,
                    from fs/notify/fanotify/fanotify_user.c:3:
   fs/notify/fanotify/fanotify_user.c: In function 'copy_info_to_user':
>> fs/notify/fanotify/fanotify_user.c:238:11: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
              ^
   include/linux/printk.h:288:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/dynamic_debug.h:143:2: note: in expansion of macro '__dynamic_func_call'
     __dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/dynamic_debug.h:153:2: note: in expansion of macro '_dynamic_func_call'
     _dynamic_func_call(fmt, __dynamic_pr_debug,  \
     ^~~~~~~~~~~~~~~~~~
   include/linux/printk.h:335:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:11: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
              ^
   include/linux/printk.h:288:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/dynamic_debug.h:143:2: note: in expansion of macro '__dynamic_func_call'
     __dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/dynamic_debug.h:153:2: note: in expansion of macro '_dynamic_func_call'
     _dynamic_func_call(fmt, __dynamic_pr_debug,  \
     ^~~~~~~~~~~~~~~~~~
   include/linux/printk.h:335:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:11: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t {aka unsigned int}' [-Wformat=]
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
              ^
   include/linux/printk.h:288:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/dynamic_debug.h:143:2: note: in expansion of macro '__dynamic_func_call'
     __dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/dynamic_debug.h:153:2: note: in expansion of macro '_dynamic_func_call'
     _dynamic_func_call(fmt, __dynamic_pr_debug,  \
     ^~~~~~~~~~~~~~~~~~
   include/linux/printk.h:335:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:11: warning: format '%lu' expects argument of type 'long unsigned int', but argument 7 has type 'size_t {aka unsigned int}' [-Wformat=]
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
              ^
   include/linux/printk.h:288:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/dynamic_debug.h:143:2: note: in expansion of macro '__dynamic_func_call'
     __dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/dynamic_debug.h:153:2: note: in expansion of macro '_dynamic_func_call'
     _dynamic_func_call(fmt, __dynamic_pr_debug,  \
     ^~~~~~~~~~~~~~~~~~
   include/linux/printk.h:335:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~

vim +238 fs/notify/fanotify/fanotify_user.c

   224	
   225	static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
   226				     struct fanotify_fid *fid, const struct qstr *name,
   227				     char __user *buf, size_t count)
   228	{
   229		struct fanotify_event_info_fid info = { };
   230		struct file_handle handle = { };
   231		unsigned char bounce[max(FANOTIFY_INLINE_FH_LEN, DNAME_INLINE_LEN)];
   232		const unsigned char *data;
   233		size_t fh_len = fh->len;
   234		size_t name_len = name ? name->len : 0;
   235		size_t info_len = fanotify_fid_info_len(fh_len, name_len);
   236		size_t len = info_len;
   237	
 > 238		pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
   239			 __func__, fh_len, name_len, info_len, count);
   240	
   241		if (!fh_len || (name && !name_len))
   242			return 0;
   243	
   244		if (WARN_ON_ONCE(len < sizeof(info) || len > count))
   245			return -EFAULT;
   246	
   247		/*
   248		 * Copy event info fid header followed by vaiable sized file handle
   249		 * and optionally followed by vaiable sized filename.
   250		 */
   251		info.hdr.info_type = name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME :
   252						FAN_EVENT_INFO_TYPE_FID;
   253		info.hdr.len = len;
   254		info.fsid = *fsid;
   255		if (copy_to_user(buf, &info, sizeof(info)))
   256			return -EFAULT;
   257	
   258		buf += sizeof(info);
   259		len -= sizeof(info);
   260		if (WARN_ON_ONCE(len < sizeof(handle)))
   261			return -EFAULT;
   262	
   263		handle.handle_type = fh->type;
   264		handle.handle_bytes = fh_len;
   265		if (copy_to_user(buf, &handle, sizeof(handle)))
   266			return -EFAULT;
   267	
   268		buf += sizeof(handle);
   269		len -= sizeof(handle);
   270		if (WARN_ON_ONCE(len < fh_len))
   271			return -EFAULT;
   272	
   273		/*
   274		 * For an inline fh and inline file name, copy through stack to exclude
   275		 * the copy from usercopy hardening protections.
   276		 */
   277		data = fanotify_fid_fh(fid, fh_len);
   278		if (fh_len <= FANOTIFY_INLINE_FH_LEN) {
   279			memcpy(bounce, data, fh_len);
   280			data = bounce;
   281		}
   282		if (copy_to_user(buf, data, fh_len))
   283			return -EFAULT;
   284	
   285		buf += fh_len;
   286		len -= fh_len;
   287	
   288		if (name_len) {
   289			/* Copy the filename with terminating null */
   290			name_len++;
   291			if (WARN_ON_ONCE(len < name_len))
   292				return -EFAULT;
   293	
   294			data = name->name;
   295			if (name_len <= DNAME_INLINE_LEN) {
   296				memcpy(bounce, data, name_len);
   297				data = bounce;
   298			}
   299			if (copy_to_user(buf, data, name_len))
   300				return -EFAULT;
   301	
   302			buf += name_len;
   303			len -= name_len;
   304		}
   305	
   306		/* Pad with 0's */
   307		WARN_ON_ONCE(len < 0 || len >= FANOTIFY_EVENT_ALIGN);
   308		if (len > 0 && clear_user(buf, len))
   309			return -EFAULT;
   310	
   311		return info_len;
   312	}
   313	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28991 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 13/16] fanotify: report name info for FAN_DIR_MODIFY event
  2020-02-17 13:14 ` [PATCH v2 13/16] fanotify: report " Amir Goldstein
  2020-02-19  9:43   ` kbuild test robot
@ 2020-02-19 10:17   ` kbuild test robot
  2020-02-19 11:22   ` Amir Goldstein
  2020-04-16 12:16   ` Michael Kerrisk (man-pages)
  3 siblings, 0 replies; 65+ messages in thread
From: kbuild test robot @ 2020-02-19 10:17 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: kbuild-all, Jan Kara, linux-fsdevel, linux-api

[-- Attachment #1: Type: text/plain, Size: 14178 bytes --]

Hi Amir,

I love your patch! Perhaps something to improve:

[auto build test WARNING on 11a48a5a18c63fd7621bb050228cebf13566e4d8]

url:    https://github.com/0day-ci/linux/commits/Amir-Goldstein/Fanotify-event-with-name-info/20200219-160517
base:    11a48a5a18c63fd7621bb050228cebf13566e4d8
config: microblaze-randconfig-a001-20200219 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=microblaze 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:15:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/seqlock.h:36,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/fcntl.h:5,
                    from fs/notify/fanotify/fanotify_user.c:3:
   fs/notify/fanotify/fanotify_user.c: In function 'copy_info_to_user':
>> include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:137:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:341:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:25: note: format string is defined here
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
                          ~~^
                          %u
   In file included from include/linux/kernel.h:15:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/seqlock.h:36,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/fcntl.h:5,
                    from fs/notify/fanotify/fanotify_user.c:3:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:137:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:341:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:38: note: format string is defined here
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
                                       ~~^
                                       %u
   In file included from include/linux/kernel.h:15:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/seqlock.h:36,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/fcntl.h:5,
                    from fs/notify/fanotify/fanotify_user.c:3:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:137:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:341:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:52: note: format string is defined here
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
                                                     ~~^
                                                     %u
   In file included from include/linux/kernel.h:15:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/seqlock.h:36,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/fcntl.h:5,
                    from fs/notify/fanotify/fanotify_user.c:3:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:137:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:341:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify/fanotify/fanotify_user.c:238:63: note: format string is defined here
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
--
   In file included from include/linux/kernel.h:15:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/seqlock.h:36,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/fcntl.h:5,
                    from fs/notify//fanotify/fanotify_user.c:3:
   fs/notify//fanotify/fanotify_user.c: In function 'copy_info_to_user':
>> include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:137:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:341:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs/notify//fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify//fanotify/fanotify_user.c:238:25: note: format string is defined here
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
                          ~~^
                          %u
   In file included from include/linux/kernel.h:15:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/seqlock.h:36,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/fcntl.h:5,
                    from fs/notify//fanotify/fanotify_user.c:3:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:137:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:341:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs/notify//fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify//fanotify/fanotify_user.c:238:38: note: format string is defined here
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
                                       ~~^
                                       %u
   In file included from include/linux/kernel.h:15:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/seqlock.h:36,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/fcntl.h:5,
                    from fs/notify//fanotify/fanotify_user.c:3:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:137:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:341:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs/notify//fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify//fanotify/fanotify_user.c:238:52: note: format string is defined here
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
                                                     ~~^
                                                     %u
   In file included from include/linux/kernel.h:15:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/seqlock.h:36,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/fcntl.h:5,
                    from fs/notify//fanotify/fanotify_user.c:3:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:137:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:341:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs/notify//fanotify/fanotify_user.c:238:2: note: in expansion of macro 'pr_debug'
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
     ^~~~~~~~
   fs/notify//fanotify/fanotify_user.c:238:63: note: format string is defined here
     pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",

vim +5 include/linux/kern_levels.h

314ba3520e513a Joe Perches 2012-07-30  4  
04d2c8c83d0e3a Joe Perches 2012-07-30 @5  #define KERN_SOH	"\001"		/* ASCII Start Of Header */
04d2c8c83d0e3a Joe Perches 2012-07-30  6  #define KERN_SOH_ASCII	'\001'
04d2c8c83d0e3a Joe Perches 2012-07-30  7  

:::::: The code at line 5 was first introduced by commit
:::::: 04d2c8c83d0e3ac5f78aeede51babb3236200112 printk: convert the format for KERN_<LEVEL> to a 2 byte pattern

:::::: TO: Joe Perches <joe@perches.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38740 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 05/16] fsnotify: simplify arguments passing to fsnotify_parent()
  2020-02-17 13:14 ` [PATCH v2 05/16] fsnotify: simplify arguments passing to fsnotify_parent() Amir Goldstein
@ 2020-02-19 10:50   ` kbuild test robot
  2020-02-19 11:11   ` Amir Goldstein
  1 sibling, 0 replies; 65+ messages in thread
From: kbuild test robot @ 2020-02-19 10:50 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: kbuild-all, Jan Kara, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 6241 bytes --]

Hi Amir,

I love your patch! Yet something to improve:

[auto build test ERROR on 11a48a5a18c63fd7621bb050228cebf13566e4d8]

url:    https://github.com/0day-ci/linux/commits/Amir-Goldstein/Fanotify-event-with-name-info/20200219-160517
base:    11a48a5a18c63fd7621bb050228cebf13566e4d8
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from fs///attr.c:15:0:
   include/linux/fsnotify.h: In function 'fsnotify_dentry':
>> include/linux/fsnotify.h:52:18: warning: passing argument 1 of 'fsnotify_parent' makes integer from pointer without a cast [-Wint-conversion]
     fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE);
                     ^~~~~~
   In file included from include/linux/fsnotify.h:15:0,
                    from fs///attr.c:15:
   include/linux/fsnotify_backend.h:543:19: note: expected '__u32 {aka unsigned int}' but argument is of type 'struct dentry *'
    static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)
                      ^~~~~~~~~~~~~~~
   In file included from fs///attr.c:15:0:
>> include/linux/fsnotify.h:52:26: warning: passing argument 2 of 'fsnotify_parent' makes pointer from integer without a cast [-Wint-conversion]
     fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE);
                             ^~~~
   In file included from include/linux/fsnotify.h:15:0,
                    from fs///attr.c:15:
   include/linux/fsnotify_backend.h:543:19: note: expected 'const void *' but argument is of type '__u32 {aka unsigned int}'
    static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)
                      ^~~~~~~~~~~~~~~
   In file included from fs///attr.c:15:0:
   include/linux/fsnotify.h:52:32: warning: passing argument 3 of 'fsnotify_parent' makes integer from pointer without a cast [-Wint-conversion]
     fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE);
                                   ^~~~~
   In file included from include/linux/fsnotify.h:15:0,
                    from fs///attr.c:15:
   include/linux/fsnotify_backend.h:543:19: note: expected 'int' but argument is of type 'struct inode *'
    static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)
                      ^~~~~~~~~~~~~~~
   In file included from fs///attr.c:15:0:
>> include/linux/fsnotify.h:52:2: error: too many arguments to function 'fsnotify_parent'
     fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE);
     ^~~~~~~~~~~~~~~
   In file included from include/linux/fsnotify.h:15:0,
                    from fs///attr.c:15:
   include/linux/fsnotify_backend.h:543:19: note: declared here
    static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)
                      ^~~~~~~~~~~~~~~
   In file included from fs///attr.c:15:0:
   include/linux/fsnotify.h: In function 'fsnotify_file':
   include/linux/fsnotify.h:68:24: warning: passing argument 1 of 'fsnotify_parent' makes integer from pointer without a cast [-Wint-conversion]
     ret = fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
                           ^~~~
   In file included from include/linux/fsnotify.h:15:0,
                    from fs///attr.c:15:
   include/linux/fsnotify_backend.h:543:19: note: expected '__u32 {aka unsigned int}' but argument is of type 'struct dentry * const'
    static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)
                      ^~~~~~~~~~~~~~~
   In file included from fs///attr.c:15:0:
   include/linux/fsnotify.h:68:38: warning: passing argument 2 of 'fsnotify_parent' makes pointer from integer without a cast [-Wint-conversion]
     ret = fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
                                         ^~~~
   In file included from include/linux/fsnotify.h:15:0,
                    from fs///attr.c:15:
   include/linux/fsnotify_backend.h:543:19: note: expected 'const void *' but argument is of type '__u32 {aka unsigned int}'
    static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)
                      ^~~~~~~~~~~~~~~
   In file included from fs///attr.c:15:0:
   include/linux/fsnotify.h:68:44: warning: passing argument 3 of 'fsnotify_parent' makes integer from pointer without a cast [-Wint-conversion]
     ret = fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
                                               ^~~~
   In file included from include/linux/fsnotify.h:15:0,
                    from fs///attr.c:15:
   include/linux/fsnotify_backend.h:543:19: note: expected 'int' but argument is of type 'const struct path *'
    static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)
                      ^~~~~~~~~~~~~~~
   In file included from fs///attr.c:15:0:
   include/linux/fsnotify.h:68:8: error: too many arguments to function 'fsnotify_parent'
     ret = fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
           ^~~~~~~~~~~~~~~
   In file included from include/linux/fsnotify.h:15:0,
                    from fs///attr.c:15:
   include/linux/fsnotify_backend.h:543:19: note: declared here
    static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)
                      ^~~~~~~~~~~~~~~

vim +/fsnotify_parent +52 include/linux/fsnotify.h

    40	
    41	/*
    42	 * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when
    43	 * an event is on a file/dentry.
    44	 */
    45	static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
    46	{
    47		struct inode *inode = d_inode(dentry);
    48	
    49		if (S_ISDIR(inode->i_mode))
    50			mask |= FS_ISDIR;
    51	
  > 52		fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE);
    53		fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
    54	}
    55	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7238 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 05/16] fsnotify: simplify arguments passing to fsnotify_parent()
  2020-02-17 13:14 ` [PATCH v2 05/16] fsnotify: simplify arguments passing to fsnotify_parent() Amir Goldstein
  2020-02-19 10:50   ` kbuild test robot
@ 2020-02-19 11:11   ` Amir Goldstein
  1 sibling, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-19 11:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Mon, Feb 17, 2020 at 3:15 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Instead of passing both dentry and path and having to figure out which
> one to use, pass data/data_type to simplify the code.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fsnotify.c             | 15 ++++-----------
>  include/linux/fsnotify.h         | 14 ++------------
>  include/linux/fsnotify_backend.h | 13 +++++++------
>  3 files changed, 13 insertions(+), 29 deletions(-)
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index a5d6467f89a0..193530f57963 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -143,15 +143,13 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
>  }
>
>  /* Notify this dentry's parent about a child's events. */
> -int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask)
> +int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> +                   int data_type)
>  {
>         struct dentry *parent;
>         struct inode *p_inode;
>         int ret = 0;
>
> -       if (!dentry)
> -               dentry = path->dentry;
> -
>         if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
>                 return 0;
>
> @@ -168,12 +166,7 @@ int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask
>                 mask |= FS_EVENT_ON_CHILD;
>
>                 take_dentry_name_snapshot(&name, dentry);
> -               if (path)
> -                       ret = fsnotify(p_inode, mask, path, FSNOTIFY_EVENT_PATH,
> -                                      &name.name, 0);
> -               else
> -                       ret = fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE,
> -                                      &name.name, 0);
> +               ret = fsnotify(p_inode, mask, data, data_type, &name.name, 0);
>                 release_dentry_name_snapshot(&name);
>         }
>
> @@ -181,7 +174,7 @@ int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask
>
>         return ret;
>  }
> -EXPORT_SYMBOL_GPL(__fsnotify_parent);
> +EXPORT_SYMBOL_GPL(fsnotify_parent);
>
>  static int send_to_group(struct inode *to_tell,
>                          __u32 mask, const void *data,
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 420aca9fd5f4..af30e0a56f2e 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -38,16 +38,6 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
>         fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0);
>  }
>
> -/* Notify this dentry's parent about a child's events. */
> -static inline int fsnotify_parent(const struct path *path,
> -                                 struct dentry *dentry, __u32 mask)
> -{
> -       if (!dentry)
> -               dentry = path->dentry;
> -
> -       return __fsnotify_parent(path, dentry, mask);
> -}
> -
>  /*
>   * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when
>   * an event is on a file/dentry.
> @@ -59,7 +49,7 @@ static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
>         if (S_ISDIR(inode->i_mode))
>                 mask |= FS_ISDIR;
>
> -       fsnotify_parent(NULL, dentry, mask);
> +       fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE);
>         fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
>  }
>
> @@ -75,7 +65,7 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
>         if (S_ISDIR(inode->i_mode))
>                 mask |= FS_ISDIR;
>
> -       ret = fsnotify_parent(path, NULL, mask);
> +       ret = fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
>         if (ret)
>                 return ret;
>
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 5cc838db422a..b1f418cc28e1 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -376,9 +376,10 @@ struct fsnotify_mark {
>  /* called from the vfs helpers */
>
>  /* main fsnotify call to send events */
> -extern int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> -                   const struct qstr *name, u32 cookie);
> -extern int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask);
> +extern int fsnotify(struct inode *to_tell, __u32 mask, const void *data,
> +                   int data_type, const struct qstr *name, u32 cookie);
> +extern int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> +                          int data_type);
>  extern void __fsnotify_inode_delete(struct inode *inode);
>  extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt);
>  extern void fsnotify_sb_delete(struct super_block *sb);
> @@ -533,13 +534,13 @@ static inline void fsnotify_init_event(struct fsnotify_event *event,
>
>  #else
>
> -static inline int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> -                          const struct qstr *name, u32 cookie)
> +static inline int fsnotify(struct inode *to_tell, __u32 mask, const void *data,
> +                          int data_type, const struct qstr *name, u32 cookie)
>  {
>         return 0;
>  }
>
> -static inline int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask)
> +static inline int fsnotify_parent(__u32 mask, const void *data, int data_type)

This should be:

+static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
const void *data, int data_type)

Will squash the fix.

Thanks kbuild test robot,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 13/16] fanotify: report name info for FAN_DIR_MODIFY event
  2020-02-17 13:14 ` [PATCH v2 13/16] fanotify: report " Amir Goldstein
  2020-02-19  9:43   ` kbuild test robot
  2020-02-19 10:17   ` kbuild test robot
@ 2020-02-19 11:22   ` Amir Goldstein
  2020-04-16 12:16   ` Michael Kerrisk (man-pages)
  3 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-19 11:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Linux API

On Mon, Feb 17, 2020 at 3:15 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Report event FAN_DIR_MODIFY with name in a variable length record similar
> to how fid's are reported.  With name info reporting implemented, setting
> FAN_DIR_MODIFY in mark mask is now allowed.
>
> When events are reported with name, the reported fid identifies the
> directory and the name follows the fid. The info record type for this
> event info is FAN_EVENT_INFO_TYPE_DFID_NAME.
>
> For now, all reported events have at most one info record which is
> either FAN_EVENT_INFO_TYPE_FID or FAN_EVENT_INFO_TYPE_DFID_NAME (for
> FAN_DIR_MODIFY).  Later on, events "on child" will report both records.
>
> There are several ways that an application can use this information:
>
> 1. When watching a single directory, the name is always relative to
> the watched directory, so application need to fstatat(2) the name
> relative to the watched directory.
>
> 2. When watching a set of directories, the application could keep a map
> of dirfd for all watched directories and hash the map by fid obtained
> with name_to_handle_at(2).  When getting a name event, the fid in the
> event info could be used to lookup the base dirfd in the map and then
> call fstatat(2) with that dirfd.
>
> 3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
> directories, the application could use open_by_handle_at(2) with the fid
> in event info to obtain dirfd for the directory where event happened and
> call fstatat(2) with this dirfd.
>
> The last option scales better for a large number of watched directories.
> The first two options may be available in the future also for non
> privileged fanotify watchers, because open_by_handle_at(2) requires
> the CAP_DAC_READ_SEARCH capability.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fanotify/fanotify.c      |   2 +-
>  fs/notify/fanotify/fanotify_user.c | 120 ++++++++++++++++++++++-------
>  include/linux/fanotify.h           |   3 +-
>  include/uapi/linux/fanotify.h      |   1 +
>  4 files changed, 98 insertions(+), 28 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index fc75dc53a218..b651c18d3a93 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -478,7 +478,7 @@ static int fanotify_handle_event(struct fsnotify_group *group,
>         BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
>         BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
>
> -       BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
> +       BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
>
>         mask = fanotify_group_event_mask(group, iter_info, mask, data,
>                                          data_type);
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 284f3548bb79..a1bafc21ebbb 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -51,20 +51,32 @@ struct kmem_cache *fanotify_name_event_cachep __read_mostly;
>  struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
>
>  #define FANOTIFY_EVENT_ALIGN 4
> +#define FANOTIFY_INFO_HDR_LEN \
> +       (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
>
> -static int fanotify_fid_info_len(struct fanotify_fid_hdr *fh)
> +static int fanotify_fid_info_len(int fh_len, int name_len)
>  {
> -       return roundup(sizeof(struct fanotify_event_info_fid) +
> -                      sizeof(struct file_handle) + fh->len,
> -                      FANOTIFY_EVENT_ALIGN);
> +       int info_len = fh_len;
> +
> +       if (name_len)
> +               info_len += name_len + 1;
> +
> +       return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
>  }
>
>  static int fanotify_event_info_len(struct fanotify_event *event)
>  {
> -       if (!fanotify_event_has_fid(event))
> -               return 0;
> +       int info_len = 0;
> +
> +       if (fanotify_event_has_fid(event))
> +               info_len += fanotify_fid_info_len(event->fh.len, 0);
> +
> +       if (fanotify_event_has_dfid_name(event)) {
> +               info_len += fanotify_fid_info_len(event->dfh.len,
> +                                       fanotify_event_name_len(event));
> +       }
>
> -       return fanotify_fid_info_len(&event->fh);
> +       return info_len;
>  }
>
>  /*
> @@ -210,23 +222,34 @@ static int process_access_response(struct fsnotify_group *group,
>         return -ENOENT;
>  }
>
> -static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
> -                           struct fanotify_fid *fid, char __user *buf)
> +static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
> +                            struct fanotify_fid *fid, const struct qstr *name,
> +                            char __user *buf, size_t count)
>  {
>         struct fanotify_event_info_fid info = { };
>         struct file_handle handle = { };
> -       unsigned char bounce[FANOTIFY_INLINE_FH_LEN], *data;
> +       unsigned char bounce[max(FANOTIFY_INLINE_FH_LEN, DNAME_INLINE_LEN)];
> +       const unsigned char *data;
>         size_t fh_len = fh->len;
> -       size_t len = fanotify_fid_info_len(fh);
> +       size_t name_len = name ? name->len : 0;
> +       size_t info_len = fanotify_fid_info_len(fh_len, name_len);
> +       size_t len = info_len;
> +
> +       pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
> +                __func__, fh_len, name_len, info_len, count);
>

Changed all %lu above to %zu to print size_t without a warning.

Thanks kbuild test robot,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/16] Fanotify event with name info
  2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
                   ` (15 preceding siblings ...)
  2020-02-17 13:14 ` [BONUS][PATCH v2 16/16] fanotify: support limited functionality for unprivileged users Amir Goldstein
@ 2020-02-20 22:10 ` Matthew Bobrowski
  16 siblings, 0 replies; 65+ messages in thread
From: Matthew Bobrowski @ 2020-02-20 22:10 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, linux-api

On Mon, Feb 17, 2020 at 03:14:39PM +0200, Amir Goldstein wrote:
> This is v2 of the fanotify name info series.
>
> The user requirement for the name info feature, as well as early UAPI
> discussions can be found in this [1] lore thread.

Oh, wonderful. I'm keen to have this feature come to fruition.

After my wedding, which is this Saturday (tomorrow), I'll come around
to reviewing this series.

/M

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 02/16] fsnotify: factor helpers fsnotify_dentry() and fsnotify_file()
  2020-02-17 13:14 ` [PATCH v2 02/16] fsnotify: factor helpers fsnotify_dentry() and fsnotify_file() Amir Goldstein
@ 2020-02-25 13:46   ` Jan Kara
  2020-02-25 14:27     ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-02-25 13:46 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Mon 17-02-20 15:14:41, Amir Goldstein wrote:
> Most of the code in fsnotify hooks is boiler plate of one or the other.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  include/linux/fsnotify.h | 96 +++++++++++++++-------------------------
>  1 file changed, 36 insertions(+), 60 deletions(-)

Nice cleanup. Just two comments below.

> @@ -58,8 +78,6 @@ static inline int fsnotify_path(struct inode *inode, const struct path *path,
>  static inline int fsnotify_perm(struct file *file, int mask)
>  {
>  	int ret;
> -	const struct path *path = &file->f_path;
> -	struct inode *inode = file_inode(file);
>  	__u32 fsnotify_mask = 0;
>  
>  	if (file->f_mode & FMODE_NONOTIFY)

I guess you can drop the NONOTIFY check from here. You've moved it to
fsnotify_file() and there's not much done in this function to be worth
skipping...

> @@ -70,7 +88,7 @@ static inline int fsnotify_perm(struct file *file, int mask)
>  		fsnotify_mask = FS_OPEN_PERM;
>  
>  		if (file->f_flags & __FMODE_EXEC) {
> -			ret = fsnotify_path(inode, path, FS_OPEN_EXEC_PERM);
> +			ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
>  
>  			if (ret)
>  				return ret;

Hum, I think we could simplify fsnotify_perm() even further by having:

	if (mask & MAY_OPEN) {
		if (file->f_flags & __FMODE_EXEC)
			fsnotify_mask = FS_OPEN_EXEC_PERM;
		else
			fsnotify_mask = FS_OPEN_PERM;
	} ...

Otherwise the patch looks good to me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 02/16] fsnotify: factor helpers fsnotify_dentry() and fsnotify_file()
  2020-02-25 13:46   ` Jan Kara
@ 2020-02-25 14:27     ` Amir Goldstein
  2020-02-26 13:59       ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-02-25 14:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Matthew Bobrowski

On Tue, Feb 25, 2020 at 3:46 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 17-02-20 15:14:41, Amir Goldstein wrote:
> > Most of the code in fsnotify hooks is boiler plate of one or the other.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  include/linux/fsnotify.h | 96 +++++++++++++++-------------------------
> >  1 file changed, 36 insertions(+), 60 deletions(-)
>
> Nice cleanup. Just two comments below.
>
> > @@ -58,8 +78,6 @@ static inline int fsnotify_path(struct inode *inode, const struct path *path,
> >  static inline int fsnotify_perm(struct file *file, int mask)
> >  {
> >       int ret;
> > -     const struct path *path = &file->f_path;
> > -     struct inode *inode = file_inode(file);
> >       __u32 fsnotify_mask = 0;
> >
> >       if (file->f_mode & FMODE_NONOTIFY)
>
> I guess you can drop the NONOTIFY check from here. You've moved it to
> fsnotify_file() and there's not much done in this function to be worth
> skipping...

True.

>
> > @@ -70,7 +88,7 @@ static inline int fsnotify_perm(struct file *file, int mask)
> >               fsnotify_mask = FS_OPEN_PERM;
> >
> >               if (file->f_flags & __FMODE_EXEC) {
> > -                     ret = fsnotify_path(inode, path, FS_OPEN_EXEC_PERM);
> > +                     ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
> >
> >                       if (ret)
> >                               return ret;
>
> Hum, I think we could simplify fsnotify_perm() even further by having:
>
>         if (mask & MAY_OPEN) {
>                 if (file->f_flags & __FMODE_EXEC)
>                         fsnotify_mask = FS_OPEN_EXEC_PERM;
>                 else
>                         fsnotify_mask = FS_OPEN_PERM;
>         } ...
>

But the current code sends both FS_OPEN_EXEC_PERM and FS_OPEN_PERM
on an open for exec. I believe that is what was discussed when Matthew wrote
the OPEN_EXEC patches, so existing receivers of OPEN_PERM event on exec
will not regress..

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 07/16] fsnotify: replace inode pointer with tag
  2020-02-17 13:14 ` [PATCH v2 07/16] fsnotify: replace inode pointer with tag Amir Goldstein
@ 2020-02-26  8:20   ` Jan Kara
  2020-02-26  9:34     ` Amir Goldstein
  2020-02-26  8:52   ` Jan Kara
  1 sibling, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-02-26  8:20 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Mon 17-02-20 15:14:46, Amir Goldstein wrote:
> The event inode field is used only for comparison in queue merges and
> cannot be dereferenced after handle_event(), because it does not hold a
> refcount on the inode.
> 
> Replace it with an abstract tag do to the same thing. We are going to
> set this tag for values other than inode pointer in fanotify.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

I like this but can we call it say 'objectid' or something like that? 'tag'
seems too generic to me and it isn't clear why we should merge or not merge
events with different tags...

								Honza

> ---
>  fs/notify/fanotify/fanotify.c        | 2 +-
>  fs/notify/inotify/inotify_fsnotify.c | 2 +-
>  include/linux/fsnotify_backend.h     | 8 +++-----
>  3 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 19ec7a4f4d50..98c3cbf29003 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -26,7 +26,7 @@ static bool should_merge(struct fsnotify_event *old_fsn,
>  	old = FANOTIFY_E(old_fsn);
>  	new = FANOTIFY_E(new_fsn);
>  
> -	if (old_fsn->inode != new_fsn->inode || old->pid != new->pid ||
> +	if (old_fsn->tag != new_fsn->tag || old->pid != new->pid ||
>  	    old->fh_type != new->fh_type || old->fh_len != new->fh_len)
>  		return false;
>  
> diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
> index 6bb98522bbfd..4f42ea7b7fdd 100644
> --- a/fs/notify/inotify/inotify_fsnotify.c
> +++ b/fs/notify/inotify/inotify_fsnotify.c
> @@ -39,7 +39,7 @@ static bool event_compare(struct fsnotify_event *old_fsn,
>  	if (old->mask & FS_IN_IGNORED)
>  		return false;
>  	if ((old->mask == new->mask) &&
> -	    (old_fsn->inode == new_fsn->inode) &&
> +	    (old_fsn->tag == new_fsn->tag) &&
>  	    (old->name_len == new->name_len) &&
>  	    (!old->name_len || !strcmp(old->name, new->name)))
>  		return true;
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index bd3f6114a7a9..cd106b5c87a4 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -132,8 +132,7 @@ struct fsnotify_ops {
>   */
>  struct fsnotify_event {
>  	struct list_head list;
> -	/* inode may ONLY be dereferenced during handle_event(). */
> -	struct inode *inode;	/* either the inode the event happened to or its parent */
> +	unsigned long tag;	/* identifier for queue merges */
>  };
>  
>  /*
> @@ -542,11 +541,10 @@ extern void fsnotify_put_mark(struct fsnotify_mark *mark);
>  extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info);
>  extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info);
>  
> -static inline void fsnotify_init_event(struct fsnotify_event *event,
> -				       struct inode *inode)
> +static inline void fsnotify_init_event(struct fsnotify_event *event, void *tag)
>  {
>  	INIT_LIST_HEAD(&event->list);
> -	event->inode = inode;
> +	event->tag = (unsigned long)tag;
>  }
>  
>  #else
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 07/16] fsnotify: replace inode pointer with tag
  2020-02-17 13:14 ` [PATCH v2 07/16] fsnotify: replace inode pointer with tag Amir Goldstein
  2020-02-26  8:20   ` Jan Kara
@ 2020-02-26  8:52   ` Jan Kara
  1 sibling, 0 replies; 65+ messages in thread
From: Jan Kara @ 2020-02-26  8:52 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Mon 17-02-20 15:14:46, Amir Goldstein wrote:
> The event inode field is used only for comparison in queue merges and
> cannot be dereferenced after handle_event(), because it does not hold a
> refcount on the inode.
> 
> Replace it with an abstract tag do to the same thing. We are going to
> set this tag for values other than inode pointer in fanotify.
...
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> -static inline void fsnotify_init_event(struct fsnotify_event *event,
> -				       struct inode *inode)
> +static inline void fsnotify_init_event(struct fsnotify_event *event, void *tag)
>  {
>  	INIT_LIST_HEAD(&event->list);
> -	event->inode = inode;
> +	event->tag = (unsigned long)tag;
>  }

Oh, and why not make the argument to fsnotify_init_event() unsigned long
from the start? It would be IMHO cleaner and using void * doesn't really
save us many type casts...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 08/16] fanotify: merge duplicate events on parent and child
  2020-02-17 13:14 ` [PATCH v2 08/16] fanotify: merge duplicate events on parent and child Amir Goldstein
@ 2020-02-26  9:18   ` Jan Kara
  2020-02-26 12:14     ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-02-26  9:18 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Mon 17-02-20 15:14:47, Amir Goldstein wrote:
> With inotify, when a watch is set on a directory and on its child, an
> event on the child is reported twice, once with wd of the parent watch
> and once with wd of the child watch without the filename.
> 
> With fanotify, when a watch is set on a directory and on its child, an
> event on the child is reported twice, but it has the exact same
> information - either an open file descriptor of the child or an encoded
> fid of the child.
> 
> The reason that the two identical events are not merged is because the
> tag used for merging events in the queue is the child inode in one event
> and parent inode in the other.
> 
> For events with path or dentry data, use the dentry instead of inode as
> the tag for event merging, so that the event reported on parent will be
> merged with the event reported on the child.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

I agree that reporting identical event twice seems wasteful but ...

> @@ -312,7 +313,12 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
>  	if (!event)
>  		goto out;
>  init: __maybe_unused
> -	fsnotify_init_event(&event->fse, inode);
> +	/*
> +	 * Use the dentry instead of inode as tag for event queue, so event
> +	 * reported on parent is merged with event reported on child when both
> +	 * directory and child watches exist.
> +	 */
> +	fsnotify_init_event(&event->fse, (void *)dentry ?: inode);

... this seems quite ugly and also previously we could merge 'inode' events
with others and now we cannot because some will carry "dentry where event
happened" and other ones "inode with watch" as object identifier. So if you
want to do this, I'd use "inode where event happened" as object identifier
for fanotify.

Hum, now thinking about this, maybe we could clean this up even a bit more.
event->inode is currently used only by inotify and fanotify for merging
purposes. Now inotify could use its 'wd' instead of inode with exactly the
same results, fanotify path or fid check is at least as strong as the inode
check. So only for the case of pure "inode" events, we need to store inode
identifier in struct fanotify_event - and we can do that in the union with
struct path and completely remove the 'inode' member from fsnotify_event.
Am I missing something?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 07/16] fsnotify: replace inode pointer with tag
  2020-02-26  8:20   ` Jan Kara
@ 2020-02-26  9:34     ` Amir Goldstein
  0 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-26  9:34 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Wed, Feb 26, 2020 at 10:20 AM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 17-02-20 15:14:46, Amir Goldstein wrote:
> > The event inode field is used only for comparison in queue merges and
> > cannot be dereferenced after handle_event(), because it does not hold a
> > refcount on the inode.
> >
> > Replace it with an abstract tag do to the same thing. We are going to
> > set this tag for values other than inode pointer in fanotify.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> I like this but can we call it say 'objectid' or something like that? 'tag'
> seems too generic to me and it isn't clear why we should merge or not merge
> events with different tags...

Sounds good to me.

And I agree to the comment about fsnotify_init_event()

Apropos event merging, I ran across a simple create/delete files
workload where fanotify_merge() was hogging the CPU.
I currently carry a small private patch to limit the merge depth to 128
recent events. I still didn't have time to think about the best way to
deal with this properly.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-17 13:14 ` [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's Amir Goldstein
@ 2020-02-26 10:23   ` Jan Kara
  2020-02-26 11:53     ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-02-26 10:23 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Mon 17-02-20 15:14:50, Amir Goldstein wrote:
> For some events, we are going to encode both child and parent fid's,
> so we need to do a little refactoring of struct fanotify_event and fid
> helper functions.
> 
> Move fsid member from struct fanotify_fid out to struct fanotify_event,
> so we can store fsid once for two encoded fid's (we will only encode
> parent if it is on the same filesystem).
> 
> This does not change the size of struct fanotify_event because struct
> fanotify_fid is still bigger than struct path on 32bit arch and is the
> same size as struct path (16 bytes) on 64bit arch.
> 
> Group fh_len and fh_type as struct fanotify_fid_hdr.
> Pass struct fanotify_fid and struct fanotify_fid_hdr to helpers
> fanotify_encode_fid() and copy_fid_to_user() instead of passing the
> containing fanotify_event struct.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

...

> @@ -327,16 +327,18 @@ init: __maybe_unused
>  		event->pid = get_pid(task_pid(current));
>  	else
>  		event->pid = get_pid(task_tgid(current));
> -	event->fh_len = 0;
> +	event->fh.len = 0;
> +	if (fsid)
> +		event->fsid = *fsid;
>  	if (id && FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
>  		/* Report the event without a file identifier on encode error */
>  		event->fh_type = fanotify_encode_fid(event, id, gfp, fsid);
			^^^^
This should be event->fh, shouldn't it? I wonder how come 0-day didn't
catch this...

> +struct fanotify_fid_hdr {
> +	u8 type;
> +	u8 len;
> +};
> +
>  struct fanotify_fid {
> -	__kernel_fsid_t fsid;
>  	union {
>  		unsigned char fh[FANOTIFY_INLINE_FH_LEN];
>  		unsigned char *ext_fh;
>  	};
>  };
...
> @@ -63,13 +81,13 @@ struct fanotify_event {
>  	u32 mask;
>  	/*
>  	 * Those fields are outside fanotify_fid to pack fanotify_event nicely
> -	 * on 64bit arch and to use fh_type as an indication of whether path
> +	 * on 64bit arch and to use fh.type as an indication of whether path
>  	 * or fid are used in the union:
>  	 * FILEID_ROOT (0) for path, > 0 for fid, FILEID_INVALID for neither.
>  	 */
> -	u8 fh_type;
> -	u8 fh_len;
> +	struct fanotify_fid_hdr fh;
>  	u16 pad;

The 'pad' here now looks rather bogus. Let's remove it and leave padding on
the compiler. It's in-memory struct anyway...

> +	__kernel_fsid_t fsid;
>  	union {
>  		/*
>  		 * We hold ref to this path so it may be dereferenced at any

Here I disagree. IMO 'fsid' should be still part of the union below because
the "object identification" is either struct path or (fsid + fh). I
understand that you want to reuse fsid for the other file handle. But then
I believe it should rather be done like:

struct fanotify_fh {
  	union {
  		unsigned char fh[FANOTIFY_INLINE_FH_LEN];
  		unsigned char *ext_fh;
  	};
};

struct fanotify_fid {
	__kernel_fsid_t fsid;
	struct fanotify_fh object;
	struct fanotify_fh dir;
}

BTW, is file handle length and type guaranteed to be the same for 'object' and
'dir'? Given how filehandles try to be rather opaque sequences of bytes,
I'm not sure we are safe to assume that... 

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-26 10:23   ` Jan Kara
@ 2020-02-26 11:53     ` Amir Goldstein
  2020-02-26 17:07       ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-02-26 11:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Wed, Feb 26, 2020 at 12:23 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 17-02-20 15:14:50, Amir Goldstein wrote:
> > For some events, we are going to encode both child and parent fid's,
> > so we need to do a little refactoring of struct fanotify_event and fid
> > helper functions.
> >
> > Move fsid member from struct fanotify_fid out to struct fanotify_event,
> > so we can store fsid once for two encoded fid's (we will only encode
> > parent if it is on the same filesystem).
> >
> > This does not change the size of struct fanotify_event because struct
> > fanotify_fid is still bigger than struct path on 32bit arch and is the
> > same size as struct path (16 bytes) on 64bit arch.
> >
> > Group fh_len and fh_type as struct fanotify_fid_hdr.
> > Pass struct fanotify_fid and struct fanotify_fid_hdr to helpers
> > fanotify_encode_fid() and copy_fid_to_user() instead of passing the
> > containing fanotify_event struct.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> ...
>
> > @@ -327,16 +327,18 @@ init: __maybe_unused
> >               event->pid = get_pid(task_pid(current));
> >       else
> >               event->pid = get_pid(task_tgid(current));
> > -     event->fh_len = 0;
> > +     event->fh.len = 0;
> > +     if (fsid)
> > +             event->fsid = *fsid;
> >       if (id && FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
> >               /* Report the event without a file identifier on encode error */
> >               event->fh_type = fanotify_encode_fid(event, id, gfp, fsid);
>                         ^^^^
> This should be event->fh, shouldn't it? I wonder how come 0-day didn't
> catch this...

Maybe 0-day is on vacation...

>
> > +struct fanotify_fid_hdr {
> > +     u8 type;
> > +     u8 len;
> > +};
> > +
> >  struct fanotify_fid {
> > -     __kernel_fsid_t fsid;
> >       union {
> >               unsigned char fh[FANOTIFY_INLINE_FH_LEN];
> >               unsigned char *ext_fh;
> >       };
> >  };
> ...
> > @@ -63,13 +81,13 @@ struct fanotify_event {
> >       u32 mask;
> >       /*
> >        * Those fields are outside fanotify_fid to pack fanotify_event nicely
> > -      * on 64bit arch and to use fh_type as an indication of whether path
> > +      * on 64bit arch and to use fh.type as an indication of whether path
> >        * or fid are used in the union:
> >        * FILEID_ROOT (0) for path, > 0 for fid, FILEID_INVALID for neither.
> >        */
> > -     u8 fh_type;
> > -     u8 fh_len;
> > +     struct fanotify_fid_hdr fh;
> >       u16 pad;
>
> The 'pad' here now looks rather bogus. Let's remove it and leave padding on
> the compiler. It's in-memory struct anyway...

ok.

>
> > +     __kernel_fsid_t fsid;
> >       union {
> >               /*
> >                * We hold ref to this path so it may be dereferenced at any
>
> Here I disagree. IMO 'fsid' should be still part of the union below because
> the "object identification" is either struct path or (fsid + fh). I
> understand that you want to reuse fsid for the other file handle. But then
> I believe it should rather be done like:
>
> struct fanotify_fh {
>         union {
>                 unsigned char fh[FANOTIFY_INLINE_FH_LEN];
>                 unsigned char *ext_fh;
>         };
> };
>

This I will do.

> struct fanotify_fid {
>         __kernel_fsid_t fsid;
>         struct fanotify_fh object;
>         struct fanotify_fh dir;
> }
>

object and dir do not end up in the same struct.
object is in fanotify_event
dir is in the extended fanotify_name_event, but I can do:

struct fanotify_fid {
        __kernel_fsid_t fsid;
        struct fanotify_fh fh;
}

 struct fanotify_event {
        struct fsnotify_event fse;
        u32 mask;
        struct fanotify_fid_hdr fh;
        struct fanotify_fid_hdr dfh;
        union {
                struct path path;
                struct fanotify_fid object;
        };
        struct pid *pid;
};

struct fanotify_name_event {
        struct fanotify_event fae;
        struct fanotify_fh  dir;
        struct qstr name;
        unsigned char inline_name[FANOTIFY_INLINE_NAME_LEN];
};



> BTW, is file handle length and type guaranteed to be the same for 'object' and
> 'dir'? Given how filehandles try to be rather opaque sequences of bytes,
> I'm not sure we are safe to assume that...

No and as you can see in the final struct above, we are not assuming that
we are only using safe fsid.

Looking at the final result, do you agree to leave fsid outside of struct
fanotify_fid as I did, or do you still dislike it being outside of the union?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 08/16] fanotify: merge duplicate events on parent and child
  2020-02-26  9:18   ` Jan Kara
@ 2020-02-26 12:14     ` Amir Goldstein
  2020-02-26 14:38       ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-02-26 12:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Wed, Feb 26, 2020 at 11:18 AM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 17-02-20 15:14:47, Amir Goldstein wrote:
> > With inotify, when a watch is set on a directory and on its child, an
> > event on the child is reported twice, once with wd of the parent watch
> > and once with wd of the child watch without the filename.
> >
> > With fanotify, when a watch is set on a directory and on its child, an
> > event on the child is reported twice, but it has the exact same
> > information - either an open file descriptor of the child or an encoded
> > fid of the child.
> >
> > The reason that the two identical events are not merged is because the
> > tag used for merging events in the queue is the child inode in one event
> > and parent inode in the other.
> >
> > For events with path or dentry data, use the dentry instead of inode as
> > the tag for event merging, so that the event reported on parent will be
> > merged with the event reported on the child.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> I agree that reporting identical event twice seems wasteful but ...
>
> > @@ -312,7 +313,12 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
> >       if (!event)
> >               goto out;
> >  init: __maybe_unused
> > -     fsnotify_init_event(&event->fse, inode);
> > +     /*
> > +      * Use the dentry instead of inode as tag for event queue, so event
> > +      * reported on parent is merged with event reported on child when both
> > +      * directory and child watches exist.
> > +      */
> > +     fsnotify_init_event(&event->fse, (void *)dentry ?: inode);
>
> ... this seems quite ugly and also previously we could merge 'inode' events
> with others and now we cannot because some will carry "dentry where event
> happened" and other ones "inode with watch" as object identifier. So if you
> want to do this, I'd use "inode where event happened" as object identifier
> for fanotify.

<scratch head> Why didn't I think of that?...

I suppose you mean to just use:

     fsnotify_init_event(&event->fse, id);


>
> Hum, now thinking about this, maybe we could clean this up even a bit more.
> event->inode is currently used only by inotify and fanotify for merging
> purposes. Now inotify could use its 'wd' instead of inode with exactly the
> same results, fanotify path or fid check is at least as strong as the inode
> check. So only for the case of pure "inode" events, we need to store inode
> identifier in struct fanotify_event - and we can do that in the union with
> struct path and completely remove the 'inode' member from fsnotify_event.
> Am I missing something?

That generally sounds good and I did notice it is strange that wd is not
being compared.
However, I think I was worried that comparing fid+name (in following patches)
would be more expensive than comparing dentry (or object inode) as a
"rule out first" in merge, so I preferred to keep the tag/dentry/id comparison
for fanotify_fid case.

Given this analysis (and assuming it is correct), would you like me to
just go a head
with the change suggested above? or anything beyond that?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 02/16] fsnotify: factor helpers fsnotify_dentry() and fsnotify_file()
  2020-02-25 14:27     ` Amir Goldstein
@ 2020-02-26 13:59       ` Jan Kara
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Kara @ 2020-02-26 13:59 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, Matthew Bobrowski

On Tue 25-02-20 16:27:02, Amir Goldstein wrote:
> On Tue, Feb 25, 2020 at 3:46 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Mon 17-02-20 15:14:41, Amir Goldstein wrote:
> > > Most of the code in fsnotify hooks is boiler plate of one or the other.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >  include/linux/fsnotify.h | 96 +++++++++++++++-------------------------
> > >  1 file changed, 36 insertions(+), 60 deletions(-)
> >
> > Nice cleanup. Just two comments below.
> >
> > > @@ -58,8 +78,6 @@ static inline int fsnotify_path(struct inode *inode, const struct path *path,
> > >  static inline int fsnotify_perm(struct file *file, int mask)
> > >  {
> > >       int ret;
> > > -     const struct path *path = &file->f_path;
> > > -     struct inode *inode = file_inode(file);
> > >       __u32 fsnotify_mask = 0;
> > >
> > >       if (file->f_mode & FMODE_NONOTIFY)
> >
> > I guess you can drop the NONOTIFY check from here. You've moved it to
> > fsnotify_file() and there's not much done in this function to be worth
> > skipping...
> 
> True.
> 
> >
> > > @@ -70,7 +88,7 @@ static inline int fsnotify_perm(struct file *file, int mask)
> > >               fsnotify_mask = FS_OPEN_PERM;
> > >
> > >               if (file->f_flags & __FMODE_EXEC) {
> > > -                     ret = fsnotify_path(inode, path, FS_OPEN_EXEC_PERM);
> > > +                     ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
> > >
> > >                       if (ret)
> > >                               return ret;
> >
> > Hum, I think we could simplify fsnotify_perm() even further by having:
> >
> >         if (mask & MAY_OPEN) {
> >                 if (file->f_flags & __FMODE_EXEC)
> >                         fsnotify_mask = FS_OPEN_EXEC_PERM;
> >                 else
> >                         fsnotify_mask = FS_OPEN_PERM;
> >         } ...
> >
> 
> But the current code sends both FS_OPEN_EXEC_PERM and FS_OPEN_PERM
> on an open for exec. I believe that is what was discussed when Matthew wrote
> the OPEN_EXEC patches, so existing receivers of OPEN_PERM event on exec
> will not regress..

Ah, my bad. You're right.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 08/16] fanotify: merge duplicate events on parent and child
  2020-02-26 12:14     ` Amir Goldstein
@ 2020-02-26 14:38       ` Jan Kara
  2021-01-22 13:59         ` fanotify_merge improvements Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-02-26 14:38 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Wed 26-02-20 14:14:50, Amir Goldstein wrote:
> On Wed, Feb 26, 2020 at 11:18 AM Jan Kara <jack@suse.cz> wrote:
> >
> > On Mon 17-02-20 15:14:47, Amir Goldstein wrote:
> > > With inotify, when a watch is set on a directory and on its child, an
> > > event on the child is reported twice, once with wd of the parent watch
> > > and once with wd of the child watch without the filename.
> > >
> > > With fanotify, when a watch is set on a directory and on its child, an
> > > event on the child is reported twice, but it has the exact same
> > > information - either an open file descriptor of the child or an encoded
> > > fid of the child.
> > >
> > > The reason that the two identical events are not merged is because the
> > > tag used for merging events in the queue is the child inode in one event
> > > and parent inode in the other.
> > >
> > > For events with path or dentry data, use the dentry instead of inode as
> > > the tag for event merging, so that the event reported on parent will be
> > > merged with the event reported on the child.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > I agree that reporting identical event twice seems wasteful but ...
> >
> > > @@ -312,7 +313,12 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
> > >       if (!event)
> > >               goto out;
> > >  init: __maybe_unused
> > > -     fsnotify_init_event(&event->fse, inode);
> > > +     /*
> > > +      * Use the dentry instead of inode as tag for event queue, so event
> > > +      * reported on parent is merged with event reported on child when both
> > > +      * directory and child watches exist.
> > > +      */
> > > +     fsnotify_init_event(&event->fse, (void *)dentry ?: inode);
> >
> > ... this seems quite ugly and also previously we could merge 'inode' events
> > with others and now we cannot because some will carry "dentry where event
> > happened" and other ones "inode with watch" as object identifier. So if you
> > want to do this, I'd use "inode where event happened" as object identifier
> > for fanotify.
> 
> <scratch head> Why didn't I think of that?...
> 
> I suppose you mean to just use:
> 
>      fsnotify_init_event(&event->fse, id);

Yes.

> > Hum, now thinking about this, maybe we could clean this up even a bit more.
> > event->inode is currently used only by inotify and fanotify for merging
> > purposes. Now inotify could use its 'wd' instead of inode with exactly the
> > same results, fanotify path or fid check is at least as strong as the inode
> > check. So only for the case of pure "inode" events, we need to store inode
> > identifier in struct fanotify_event - and we can do that in the union with
> > struct path and completely remove the 'inode' member from fsnotify_event.
> > Am I missing something?
> 
> That generally sounds good and I did notice it is strange that wd is not
> being compared.  However, I think I was worried that comparing fid+name
> (in following patches) would be more expensive than comparing dentry (or
> object inode) as a "rule out first" in merge, so I preferred to keep the
> tag/dentry/id comparison for fanotify_fid case.

Yes, that could be a concern.
 
> Given this analysis (and assuming it is correct), would you like me to
> just go a head with the change suggested above? or anything beyond that?

Let's go just with the change suggested above for now. We can work on this
later (probably with optimizing of the fanotify merging code).

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-26 11:53     ` Amir Goldstein
@ 2020-02-26 17:07       ` Jan Kara
  2020-02-26 17:50         ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-02-26 17:07 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Wed 26-02-20 13:53:06, Amir Goldstein wrote:
> On Wed, Feb 26, 2020 at 12:23 PM Jan Kara <jack@suse.cz> wrote:
> >
> > > +     __kernel_fsid_t fsid;
> > >       union {
> > >               /*
> > >                * We hold ref to this path so it may be dereferenced at any
> >
> > Here I disagree. IMO 'fsid' should be still part of the union below because
> > the "object identification" is either struct path or (fsid + fh). I
> > understand that you want to reuse fsid for the other file handle. But then
> > I believe it should rather be done like:
> >
> > struct fanotify_fh {
> >         union {
> >                 unsigned char fh[FANOTIFY_INLINE_FH_LEN];
> >                 unsigned char *ext_fh;
> >         };
> > };
> >
> 
> This I will do.
> 
> > struct fanotify_fid {
> >         __kernel_fsid_t fsid;
> >         struct fanotify_fh object;
> >         struct fanotify_fh dir;
> > }
> >
> 
> object and dir do not end up in the same struct.

Right, ok.

> object is in fanotify_event
> dir is in the extended fanotify_name_event, but I can do:
> 
> struct fanotify_fid {
>         __kernel_fsid_t fsid;
>         struct fanotify_fh fh;
> }
> 
>  struct fanotify_event {
>         struct fsnotify_event fse;
>         u32 mask;
>         struct fanotify_fid_hdr fh;
>         struct fanotify_fid_hdr dfh;
>         union {
>                 struct path path;
>                 struct fanotify_fid object;
>         };
>         struct pid *pid;
> };
> 
> struct fanotify_name_event {
>         struct fanotify_event fae;
>         struct fanotify_fh  dir;
>         struct qstr name;
>         unsigned char inline_name[FANOTIFY_INLINE_NAME_LEN];
> };

Looking at this I'm not quite happy either :-| E.g. 'dfh' contents here
somewhat magically tells that this is not fanotify_event but
fanotify_name_event. Also I agree that fsid hidden in 'object' is not ideal
although I still dislike having it directly in fanotify_event as for path
events it will not be filled and that can lead to confusion.

I understand this is so convoluted because there are several constraints:
1) We don't want to grow event size unnecessarily.
2) We prefer allocating from dedicated slab cache
3) We have events of several types needing to store different kind of
information.

But seeing how things evolve I think we should consider relaxing some of
the constraints to make the code easier to follow. How about having
something like:

struct fanotify_event {
	struct fsnotify_event fse;
	u32 mask;
	enum fanotify_event_type type;
	struct pid *pid;
};

where type would identify what kind of event we have. Then we would have

struct fanotify_path_event {
	struct fanotify_event fae;
	struct path path;
};

struct fanotify_perm_path_event {
	struct fanotify_event fae;
	struct path path;
	unsigned short response;
	unsigned short state;
	int fd;
};

struct fanotify_fh {
	u8 type;
	u8 len;
	union {
		unsigned char fh[FANOTIFY_INLINE_FH_LEN];
		unsigned char *ext_fh;
	};
};

struct fanotify_fid_event {
	struct fanotify_event fae;
	__kernel_fsid_t fsid;
	struct fanotify_fh object_fh;
};

struct fanofify_name_event {
	struct fanotify_event fae;
	__kernel_fsid_t fsid;
	struct fanotify_fh object_fh;
	struct fanotify_fh dir_fh;
	u8 name_len;
	char name[0];
};

WRT size, this would grow fanotify_fid_event by 1 long on 64-bits,
fanotify_path_event would be actually smaller by 1 long, fanofify_name_event
would be smaller but that's not really comparable because you chose a
solution with fixed-inline length while I'd just go with allocating from
kmalloc when we have to store the name.

In terms of kmalloc caches, we would need three: for path, perm_path, fid
events, I'd allocate name events from generic kmalloc caches.

So overall I think this would be better. The question is whether the
resulting code will really be more readable. I hope so because the
structures are definitely nicer this way and things belonging logically
together are now together. But you never know until you convert the code...
Would you be willing to try this refactoring?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-26 17:07       ` Jan Kara
@ 2020-02-26 17:50         ` Amir Goldstein
  2020-02-27  9:06           ` Amir Goldstein
  2020-02-27 11:01           ` Jan Kara
  0 siblings, 2 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-02-26 17:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Wed, Feb 26, 2020 at 7:07 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 26-02-20 13:53:06, Amir Goldstein wrote:
> > On Wed, Feb 26, 2020 at 12:23 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > > +     __kernel_fsid_t fsid;
> > > >       union {
> > > >               /*
> > > >                * We hold ref to this path so it may be dereferenced at any
> > >
> > > Here I disagree. IMO 'fsid' should be still part of the union below because
> > > the "object identification" is either struct path or (fsid + fh). I
> > > understand that you want to reuse fsid for the other file handle. But then
> > > I believe it should rather be done like:
> > >
> > > struct fanotify_fh {
> > >         union {
> > >                 unsigned char fh[FANOTIFY_INLINE_FH_LEN];
> > >                 unsigned char *ext_fh;
> > >         };
> > > };
> > >
> >
> > This I will do.
> >
> > > struct fanotify_fid {
> > >         __kernel_fsid_t fsid;
> > >         struct fanotify_fh object;
> > >         struct fanotify_fh dir;
> > > }
> > >
> >
> > object and dir do not end up in the same struct.
>
> Right, ok.
>
> > object is in fanotify_event
> > dir is in the extended fanotify_name_event, but I can do:
> >
> > struct fanotify_fid {
> >         __kernel_fsid_t fsid;
> >         struct fanotify_fh fh;
> > }
> >
> >  struct fanotify_event {
> >         struct fsnotify_event fse;
> >         u32 mask;
> >         struct fanotify_fid_hdr fh;
> >         struct fanotify_fid_hdr dfh;
> >         union {
> >                 struct path path;
> >                 struct fanotify_fid object;
> >         };
> >         struct pid *pid;
> > };
> >
> > struct fanotify_name_event {
> >         struct fanotify_event fae;
> >         struct fanotify_fh  dir;
> >         struct qstr name;
> >         unsigned char inline_name[FANOTIFY_INLINE_NAME_LEN];
> > };
>
> Looking at this I'm not quite happy either :-| E.g. 'dfh' contents here
> somewhat magically tells that this is not fanotify_event but
> fanotify_name_event. Also I agree that fsid hidden in 'object' is not ideal
> although I still dislike having it directly in fanotify_event as for path
> events it will not be filled and that can lead to confusion.
>
> I understand this is so convoluted because there are several constraints:
> 1) We don't want to grow event size unnecessarily.
> 2) We prefer allocating from dedicated slab cache
> 3) We have events of several types needing to store different kind of
> information.
>
> But seeing how things evolve I think we should consider relaxing some of
> the constraints to make the code easier to follow. How about having
> something like:
>
> struct fanotify_event {
>         struct fsnotify_event fse;
>         u32 mask;
>         enum fanotify_event_type type;
>         struct pid *pid;
> };
>
> where type would identify what kind of event we have. Then we would have
>
> struct fanotify_path_event {
>         struct fanotify_event fae;
>         struct path path;
> };
>
> struct fanotify_perm_path_event {
>         struct fanotify_event fae;
>         struct path path;

Any reason not to "inherit" from fanotify_path_event?
There is code that is generic to permission and non-permission path
events that accesses event->path and I wouldn't
want to make that code two cases instead of just one.


>         unsigned short response;
>         unsigned short state;
>         int fd;
> };
>
> struct fanotify_fh {
>         u8 type;
>         u8 len;

That's a 6 bytes hole! and then there are two of those
in object_fh and dir_fh.
That is why I stored the header in separate from the fh itself
so that two headers could pack up nicely and yes,
I also used the headers as an event type indication.

>         union {
>                 unsigned char fh[FANOTIFY_INLINE_FH_LEN];
>                 unsigned char *ext_fh;
>         };
> };
>
> struct fanotify_fid_event {
>         struct fanotify_event fae;
>         __kernel_fsid_t fsid;
>         struct fanotify_fh object_fh;
> };
>
> struct fanofify_name_event {
>         struct fanotify_event fae;
>         __kernel_fsid_t fsid;
>         struct fanotify_fh object_fh;

Again, any reason not to "inherit" from fanotify_fid_event?
There is plenty of code that is common to fid and name events
because name events are also fid events.

>         struct fanotify_fh dir_fh;
>         u8 name_len;
>         char name[0];
> };
>
> WRT size, this would grow fanotify_fid_event by 1 long on 64-bits,
> fanotify_path_event would be actually smaller by 1 long, fanofify_name_event
> would be smaller but that's not really comparable because you chose a
> solution with fixed-inline length while I'd just go with allocating from
> kmalloc when we have to store the name.

OK. Same an inotify.
I guess I started with the name_snapshot thing that was really fixed-size
event and then reused the same construct without the snapshot, but I
guess we can do away with the inline name.

>
> In terms of kmalloc caches, we would need three: for path, perm_path, fid
> events, I'd allocate name events from generic kmalloc caches.
>
> So overall I think this would be better. The question is whether the
> resulting code will really be more readable. I hope so because the
> structures are definitely nicer this way and things belonging logically
> together are now together. But you never know until you convert the code...
> Would you be willing to try this refactoring?

Yes, but I would like to know what you think about the two 6 byte holes
Just let that space be wasted for the sake of nicer abstraction?
It seems like too much to me.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-26 17:50         ` Amir Goldstein
@ 2020-02-27  9:06           ` Amir Goldstein
  2020-02-27 11:27             ` Jan Kara
  2020-02-27 11:01           ` Jan Kara
  1 sibling, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-02-27  9:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

> > So overall I think this would be better. The question is whether the
> > resulting code will really be more readable. I hope so because the
> > structures are definitely nicer this way and things belonging logically
> > together are now together. But you never know until you convert the code...
> > Would you be willing to try this refactoring?
>
> Yes, but I would like to know what you think about the two 6 byte holes
> Just let that space be wasted for the sake of nicer abstraction?
> It seems like too much to me.
>

What if we unite the fh and name into one struct and keep a 32bit hash of
fh+name inside?

This will allow us to mitigate the cost of memcmp of fh+name in merge
and get rid of objectid in fsnotify_event as you suggested.

struct fanotify_fh_name {
         union {
                struct {
                       u8 fh_type;
                       u8 fh_len;
                       u8 name_len;
                       u32 hash;
                };
                u64 hash_len;
        };
        union {
                unsigned char fh[FANOTIFY_INLINE_FH_LEN];
                unsigned char *ext_fh;
        };
        char name[0];
};

struct fanotify_fid_event {
        struct fanotify_event fae;
        __kernel_fsid_t fsid;
        struct fanotify_fh_name object_fh; /* name is empty */
};

struct fanofify_name_event {
        struct fanotify_fid_event ffe;
        struct fanotify_fh_name dirent;
};

So the only anomaly is that we use struct fanotify_fh_name
to describe object_fh which never has a name.

I think we can live with that and trying to beat that would be
over abstraction.

Thoughts?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-26 17:50         ` Amir Goldstein
  2020-02-27  9:06           ` Amir Goldstein
@ 2020-02-27 11:01           ` Jan Kara
  1 sibling, 0 replies; 65+ messages in thread
From: Jan Kara @ 2020-02-27 11:01 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Wed 26-02-20 19:50:30, Amir Goldstein wrote:
> On Wed, Feb 26, 2020 at 7:07 PM Jan Kara <jack@suse.cz> wrote:
> > Looking at this I'm not quite happy either :-| E.g. 'dfh' contents here
> > somewhat magically tells that this is not fanotify_event but
> > fanotify_name_event. Also I agree that fsid hidden in 'object' is not ideal
> > although I still dislike having it directly in fanotify_event as for path
> > events it will not be filled and that can lead to confusion.
> >
> > I understand this is so convoluted because there are several constraints:
> > 1) We don't want to grow event size unnecessarily.
> > 2) We prefer allocating from dedicated slab cache
> > 3) We have events of several types needing to store different kind of
> > information.
> >
> > But seeing how things evolve I think we should consider relaxing some of
> > the constraints to make the code easier to follow. How about having
> > something like:
> >
> > struct fanotify_event {
> >         struct fsnotify_event fse;
> >         u32 mask;
> >         enum fanotify_event_type type;
> >         struct pid *pid;
> > };
> >
> > where type would identify what kind of event we have. Then we would have
> >
> > struct fanotify_path_event {
> >         struct fanotify_event fae;
> >         struct path path;
> > };
> >
> > struct fanotify_perm_path_event {
> >         struct fanotify_event fae;
> >         struct path path;
> 
> Any reason not to "inherit" from fanotify_path_event?
> There is code that is generic to permission and non-permission path
> events that accesses event->path and I wouldn't
> want to make that code two cases instead of just one.

I'm OK with that if it works better for you. I was just thinking that we'll
have a helper like:

struct path *fanotify_event_path(struct fanotify_event *event)
{
	if (event->type == FA_PATH_EVENT)
		return ((struct fanotify_path_event *)event)->path;
	else if (event->type == FA_PERM_PATH_EVENT)
		return ((struct fanotify_perm_path_event *)event)->path;
	else
		return NULL;
}

and thus in most of code all the type details could be abstracted by this
helper and so there won't be reason for "intermediate" types. But as I
wrote above if you find good use for them, I'm OK with that.

> >         unsigned short response;
> >         unsigned short state;
> >         int fd;
> > };
> >
> > struct fanotify_fh {
> >         u8 type;
> >         u8 len;
> 
> That's a 6 bytes hole! and then there are two of those
> in object_fh and dir_fh.
> That is why I stored the header in separate from the fh itself
> so that two headers could pack up nicely and yes,
> I also used the headers as an event type indication.

Yes, I know but this packing of loosely related things is exactly what makes
the code difficult to follow... 

> >         union {
> >                 unsigned char fh[FANOTIFY_INLINE_FH_LEN];
> >                 unsigned char *ext_fh;
> >         };
> > };
> >
> > struct fanotify_fid_event {
> >         struct fanotify_event fae;
> >         __kernel_fsid_t fsid;
> >         struct fanotify_fh object_fh;
> > };
> >
> > struct fanofify_name_event {
> >         struct fanotify_event fae;
> >         __kernel_fsid_t fsid;
> >         struct fanotify_fh object_fh;
> 
> Again, any reason not to "inherit" from fanotify_fid_event?
> There is plenty of code that is common to fid and name events
> because name events are also fid events.

We could if the helper functions do not abstract the difference enough...

> >         struct fanotify_fh dir_fh;
> >         u8 name_len;
> >         char name[0];
> > };
> >
> > WRT size, this would grow fanotify_fid_event by 1 long on 64-bits,
> > fanotify_path_event would be actually smaller by 1 long, fanofify_name_event
> > would be smaller but that's not really comparable because you chose a
> > solution with fixed-inline length while I'd just go with allocating from
> > kmalloc when we have to store the name.
> 
> OK. Same an inotify.
> I guess I started with the name_snapshot thing that was really fixed-size
> event and then reused the same construct without the snapshot, but I
> guess we can do away with the inline name.
> 
> > In terms of kmalloc caches, we would need three: for path, perm_path, fid
> > events, I'd allocate name events from generic kmalloc caches.
> >
> > So overall I think this would be better. The question is whether the
> > resulting code will really be more readable. I hope so because the
> > structures are definitely nicer this way and things belonging logically
> > together are now together. But you never know until you convert the code...
> > Would you be willing to try this refactoring?
> 
> Yes, but I would like to know what you think about the two 6 byte holes
> Just let that space be wasted for the sake of nicer abstraction?
> It seems like too much to me.

Well, it's wasting 1 long per FID event (i.e., 72 vs 64 bytes on 64-bits if
I'm counting right) compared to the tight packing we had previously. I'd
say that's bearable.

For name events we are wasting two longs per event compared to the tightest
packing I can imagine (i.e., 97+name vs 81+name). That's bad enough but I
can live with that for now...

We could actually improve packing of name events by declaring handle as:

struct fanotify_fh {
	u8 type;
	u8 len;
	u8 fh[FANOTIFY_INLINE_FH_LEN];
};

This is a structure that has no padding requirements and so if we place two
next to each other they will use just 36 bytes instead of 48. But then we
have to play games with hiding pointer inside 'fh' like:

char **fh_ext_ptr(struct fanotify_fh *fh)
{
	return (char **)ALIGN((unsigned long)(fh->fh), __alignof__(char *));
}

Probably it's worth it but I wouldn't bother for this series if you don't
want to.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-27  9:06           ` Amir Goldstein
@ 2020-02-27 11:27             ` Jan Kara
  2020-02-27 12:12               ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-02-27 11:27 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Thu 27-02-20 11:06:18, Amir Goldstein wrote:
> > > So overall I think this would be better. The question is whether the
> > > resulting code will really be more readable. I hope so because the
> > > structures are definitely nicer this way and things belonging logically
> > > together are now together. But you never know until you convert the code...
> > > Would you be willing to try this refactoring?
> >
> > Yes, but I would like to know what you think about the two 6 byte holes
> > Just let that space be wasted for the sake of nicer abstraction?
> > It seems like too much to me.
> >
> 
> What if we unite the fh and name into one struct and keep a 32bit hash of
> fh+name inside?
> 
> This will allow us to mitigate the cost of memcmp of fh+name in merge
> and get rid of objectid in fsnotify_event as you suggested.

I definitely want to get rid of objectid in the long run but I wouldn't
necessarily tie it to this series.

What I had in mind to do for fanotify to speed up merging (in the light of
your report) was to associate a hash with each fanotify event based on
values we care about most (probably store it in the same word as fanotify
event type) and compare based on this hash first. Possibly, we could also
add a small hash table (say 128 entries) to each fanotify group based on this
hash to speed up looking up candidates for merging.

> struct fanotify_fh_name {
>          union {
>                 struct {
>                        u8 fh_type;
>                        u8 fh_len;
>                        u8 name_len;
>                        u32 hash;
>                 };
>                 u64 hash_len;
>         };
>         union {
>                 unsigned char fh[FANOTIFY_INLINE_FH_LEN];
>                 unsigned char *ext_fh;
>         };
>         char name[0];
> };

So based on the above I wouldn't add just name hash to fanotify_fh_name at
this point...

								Honza

> struct fanotify_fid_event {
>         struct fanotify_event fae;
>         __kernel_fsid_t fsid;
>         struct fanotify_fh_name object_fh; /* name is empty */
> };
> 
> struct fanofify_name_event {
>         struct fanotify_fid_event ffe;
>         struct fanotify_fh_name dirent;
> };
> 
> So the only anomaly is that we use struct fanotify_fh_name
> to describe object_fh which never has a name.
> 
> I think we can live with that and trying to beat that would be
> over abstraction.
> 
> Thoughts?
> 
> Thanks,
> Amir.
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-27 11:27             ` Jan Kara
@ 2020-02-27 12:12               ` Amir Goldstein
  2020-02-27 13:30                 ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-02-27 12:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

>
> > struct fanotify_fh_name {
> >          union {
> >                 struct {
> >                        u8 fh_type;
> >                        u8 fh_len;
> >                        u8 name_len;
> >                        u32 hash;
> >                 };
> >                 u64 hash_len;
> >         };
> >         union {
> >                 unsigned char fh[FANOTIFY_INLINE_FH_LEN];
> >                 unsigned char *ext_fh;
> >         };
> >         char name[0];
> > };
>
> So based on the above I wouldn't add just name hash to fanotify_fh_name at
> this point...
>

OK. but what do you think about tying name with fh as above?
At least name_len gets to use the hole this way.

I am trying this out now and it is really hard for me not to call the struct
above fanotify_fid.
IMO code looks much better when it is called this way.
The problem is inconsistency with struct fanotify_event_info_fid which
does include fsid, but I think we can live with that.

Anyway, I'll prepare a version or two of the end result and let you see
how it looks.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-27 12:12               ` Amir Goldstein
@ 2020-02-27 13:30                 ` Jan Kara
  2020-02-27 14:06                   ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-02-27 13:30 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Thu 27-02-20 14:12:30, Amir Goldstein wrote:
> >
> > > struct fanotify_fh_name {
> > >          union {
> > >                 struct {
> > >                        u8 fh_type;
> > >                        u8 fh_len;
> > >                        u8 name_len;
> > >                        u32 hash;
> > >                 };
> > >                 u64 hash_len;
> > >         };
> > >         union {
> > >                 unsigned char fh[FANOTIFY_INLINE_FH_LEN];
> > >                 unsigned char *ext_fh;
> > >         };
> > >         char name[0];
> > > };
> >
> > So based on the above I wouldn't add just name hash to fanotify_fh_name at
> > this point...
> >
> 
> OK. but what do you think about tying name with fh as above?
> At least name_len gets to use the hole this way.

Is saving that one byte for name_len really worth the packing? If anything,
I'd rather do the fanotity_fh padding optimization I outlined in another
email. That would save one long without any packing and the following u8
name_len would get packed tightly after the fanotify_fh by the compiler.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-27 13:30                 ` Jan Kara
@ 2020-02-27 14:06                   ` Amir Goldstein
  2020-03-01 16:26                     ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-02-27 14:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Thu, Feb 27, 2020 at 3:30 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 27-02-20 14:12:30, Amir Goldstein wrote:
> > >
> > > > struct fanotify_fh_name {
> > > >          union {
> > > >                 struct {
> > > >                        u8 fh_type;
> > > >                        u8 fh_len;
> > > >                        u8 name_len;
> > > >                        u32 hash;
> > > >                 };
> > > >                 u64 hash_len;
> > > >         };
> > > >         union {
> > > >                 unsigned char fh[FANOTIFY_INLINE_FH_LEN];
> > > >                 unsigned char *ext_fh;
> > > >         };
> > > >         char name[0];
> > > > };
> > >
> > > So based on the above I wouldn't add just name hash to fanotify_fh_name at
> > > this point...
> > >
> >
> > OK. but what do you think about tying name with fh as above?
> > At least name_len gets to use the hole this way.
>
> Is saving that one byte for name_len really worth the packing? If anything,
> I'd rather do the fanotity_fh padding optimization I outlined in another
> email. That would save one long without any packing and the following u8
> name_len would get packed tightly after the fanotify_fh by the compiler.
>

OK. I will try that and the non-inherited variant of perm/name event struct
and see how it looks like.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-02-27 14:06                   ` Amir Goldstein
@ 2020-03-01 16:26                     ` Amir Goldstein
  2020-03-05 15:49                       ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-03-01 16:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

> > I'd rather do the fanotity_fh padding optimization I outlined in another
> > email. That would save one long without any packing and the following u8
> > name_len would get packed tightly after the fanotify_fh by the compiler.
> >
>
> OK. I will try that and the non-inherited variant of perm/name event struct
> and see how it looks like.
>

Pushed sample code to branch fanotify_name-wip:

b5e56d3e1358 fanotify: fanotify_perm_event inherits from fanotify_path_event
55041285b3b7 fanotify: divorce fanotify_path_event and fanotify_fid_event

I opted for fanotify_name_event inherits from fanotify_fid_event,
because it felt
better this way.
I wasn't sure about fanotify_perm_event inherits from fanotify_path_event,
so did that is a separate patch so you can judge both variants.
IMO, neither variant is that good or bad, so I could go with either.

I do like the end result with your suggestions better than fanotify_name-v2.
If you like this version, I will work the changes into the series.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-03-01 16:26                     ` Amir Goldstein
@ 2020-03-05 15:49                       ` Jan Kara
  2020-03-06 11:19                         ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-03-05 15:49 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

Hi Amir!

On Sun 01-03-20 18:26:25, Amir Goldstein wrote:
> > > I'd rather do the fanotity_fh padding optimization I outlined in another
> > > email. That would save one long without any packing and the following u8
> > > name_len would get packed tightly after the fanotify_fh by the compiler.
> > >
> >
> > OK. I will try that and the non-inherited variant of perm/name event struct
> > and see how it looks like.
> >
> 
> Pushed sample code to branch fanotify_name-wip:
> 
> b5e56d3e1358 fanotify: fanotify_perm_event inherits from fanotify_path_event
> 55041285b3b7 fanotify: divorce fanotify_path_event and fanotify_fid_event

Thanks for the work!

> I opted for fanotify_name_event inherits from fanotify_fid_event,
> because it felt better this way.

I've commented on github in the patches - I'm not sure the inheritance
really brings a significant benefit and it costs 6 bytes per name event.
Maybe there can be more simplifications gained from the inheritance (but I
think the move of fsid out of fanotify_fid mostly precludes that) but at
this point it doesn't seem to be worth it to me.

> I wasn't sure about fanotify_perm_event inherits from fanotify_path_event,
> so did that is a separate patch so you can judge both variants.
> IMO, neither variant is that good or bad, so I could go with either.

Yeah, I don't think the inheritance is really worth the churn.

> I do like the end result with your suggestions better than fanotify_name-v2.
> If you like this version, I will work the changes into the series.

Yes, overall the code look better! Thanks!

									Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-03-05 15:49                       ` Jan Kara
@ 2020-03-06 11:19                         ` Amir Goldstein
  2020-03-08  7:29                           ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-03-06 11:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Thu, Mar 5, 2020 at 5:49 PM Jan Kara <jack@suse.cz> wrote:
>
> Hi Amir!
>
> On Sun 01-03-20 18:26:25, Amir Goldstein wrote:
> > > > I'd rather do the fanotity_fh padding optimization I outlined in another
> > > > email. That would save one long without any packing and the following u8
> > > > name_len would get packed tightly after the fanotify_fh by the compiler.
> > > >
> > >
> > > OK. I will try that and the non-inherited variant of perm/name event struct
> > > and see how it looks like.
> > >
> >
> > Pushed sample code to branch fanotify_name-wip:
> >
> > b5e56d3e1358 fanotify: fanotify_perm_event inherits from fanotify_path_event
> > 55041285b3b7 fanotify: divorce fanotify_path_event and fanotify_fid_event
>
> Thanks for the work!
>
> > I opted for fanotify_name_event inherits from fanotify_fid_event,
> > because it felt better this way.
>
> I've commented on github in the patches - I'm not sure the inheritance
> really brings a significant benefit and it costs 6 bytes per name event.
> Maybe there can be more simplifications gained from the inheritance (but I
> think the move of fsid out of fanotify_fid mostly precludes that) but at
> this point it doesn't seem to be worth it to me.
>

As agreed on github discussion, the padding is a non issue.
To see what the benefit of inherit fanotify_fid_event is, I did a test patch
to get rid of it and pushed the result to fanotify_name-wip:

* b7eb8314c61b - fanotify: do not inherit fanotify_name_event from
fanotify_fid_event

IMO, the removal of inheritance in this struct is artificial and
brings no benefit.
There is not a single line of code that improved IMO vs. several added
helpers which abstract something that is pretty obvious.

That said, I don't mind going with this variant.
Let me you what your final call is.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-03-06 11:19                         ` Amir Goldstein
@ 2020-03-08  7:29                           ` Amir Goldstein
  2020-03-18 17:51                             ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-03-08  7:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

> To see what the benefit of inherit fanotify_fid_event is, I did a test patch
> to get rid of it and pushed the result to fanotify_name-wip:
>
> * b7eb8314c61b - fanotify: do not inherit fanotify_name_event from
> fanotify_fid_event
>
> IMO, the removal of inheritance in this struct is artificial and
> brings no benefit.
> There is not a single line of code that improved IMO vs. several added
> helpers which abstract something that is pretty obvious.
>
> That said, I don't mind going with this variant.
> Let me you what your final call is.
>

Eventually, it was easier to work the non-inherited variant into the series
as the helpers aid with abstracting things as the series progresses and
because object_fh is added to fanotify_name_event late in the series.
So I went with your preference.

Pushed the work to fanotify_name branch.
Let me know if you want me to post v3.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-03-08  7:29                           ` Amir Goldstein
@ 2020-03-18 17:51                             ` Jan Kara
  2020-03-18 18:50                               ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-03-18 17:51 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Sun 08-03-20 09:29:08, Amir Goldstein wrote:
> > To see what the benefit of inherit fanotify_fid_event is, I did a test patch
> > to get rid of it and pushed the result to fanotify_name-wip:
> >
> > * b7eb8314c61b - fanotify: do not inherit fanotify_name_event from
> > fanotify_fid_event
> >
> > IMO, the removal of inheritance in this struct is artificial and
> > brings no benefit.
> > There is not a single line of code that improved IMO vs. several added
> > helpers which abstract something that is pretty obvious.
> >
> > That said, I don't mind going with this variant.
> > Let me you what your final call is.
> >
> 
> Eventually, it was easier to work the non-inherited variant into the series
> as the helpers aid with abstracting things as the series progresses and
> because object_fh is added to fanotify_name_event late in the series.
> So I went with your preference.
> 
> Pushed the work to fanotify_name branch.
> Let me know if you want me to post v3.

So I went through the patches - had only minor comments for most of them.
Can you post the next revision by email and I'll pickup at least the
obvious preparatory patches to my tree. Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-03-18 17:51                             ` Jan Kara
@ 2020-03-18 18:50                               ` Amir Goldstein
  2020-03-19  9:30                                 ` Jan Kara
  2020-03-30 19:29                                 ` Amir Goldstein
  0 siblings, 2 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-03-18 18:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

> > Pushed the work to fanotify_name branch.
> > Let me know if you want me to post v3.
>
> So I went through the patches - had only minor comments for most of them.
> Can you post the next revision by email and I'll pickup at least the
> obvious preparatory patches to my tree. Thanks!
>

Will do.
Most of your comments were minor, but the last comments on
FAN_REPORT_NAME patch send me to do some homework.

I wonder if you would like me to post only the FAN_DIR_MODIFY
patches, which seem ready for prime time and defer the
FAN_REPORT_NAME changes to the next merge window?
Or do you prefer to merge all the planned event name info API
changes at once?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-03-18 18:50                               ` Amir Goldstein
@ 2020-03-19  9:30                                 ` Jan Kara
  2020-03-19 10:07                                   ` Amir Goldstein
  2020-03-30 19:29                                 ` Amir Goldstein
  1 sibling, 1 reply; 65+ messages in thread
From: Jan Kara @ 2020-03-19  9:30 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Wed 18-03-20 20:50:39, Amir Goldstein wrote:
> > > Pushed the work to fanotify_name branch.
> > > Let me know if you want me to post v3.
> >
> > So I went through the patches - had only minor comments for most of them.
> > Can you post the next revision by email and I'll pickup at least the
> > obvious preparatory patches to my tree. Thanks!
> >
> 
> Will do.
> Most of your comments were minor, but the last comments on
> FAN_REPORT_NAME patch send me to do some homework.
> 
> I wonder if you would like me to post only the FAN_DIR_MODIFY
> patches, which seem ready for prime time and defer the
> FAN_REPORT_NAME changes to the next merge window?

Yes, that's certainly one option. AFAIU the patches, the FAN_DIR_MODIFY is
completely independent of the new feature of groups with FAN_REPORT_NAME,
isn't it?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-03-19  9:30                                 ` Jan Kara
@ 2020-03-19 10:07                                   ` Amir Goldstein
  0 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-03-19 10:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Thu, Mar 19, 2020 at 11:30 AM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 18-03-20 20:50:39, Amir Goldstein wrote:
> > > > Pushed the work to fanotify_name branch.
> > > > Let me know if you want me to post v3.
> > >
> > > So I went through the patches - had only minor comments for most of them.
> > > Can you post the next revision by email and I'll pickup at least the
> > > obvious preparatory patches to my tree. Thanks!
> > >
> >
> > Will do.
> > Most of your comments were minor, but the last comments on
> > FAN_REPORT_NAME patch send me to do some homework.
> >
> > I wonder if you would like me to post only the FAN_DIR_MODIFY
> > patches, which seem ready for prime time and defer the
> > FAN_REPORT_NAME changes to the next merge window?
>
> Yes, that's certainly one option. AFAIU the patches, the FAN_DIR_MODIFY is
> completely independent of the new feature of groups with FAN_REPORT_NAME,
> isn't it?
>

That is correct.
From UAPI perspective, I wanted to tell the final story and show that
it looks coherent, but even the man-page draft changes are in two
separate patches:
https://github.com/amir73il/man-pages/commits/fanotify_name

So there is no problem with merging them seperately.

Will post only FAN_DIR_MODIFY for v3.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's
  2020-03-18 18:50                               ` Amir Goldstein
  2020-03-19  9:30                                 ` Jan Kara
@ 2020-03-30 19:29                                 ` Amir Goldstein
  1 sibling, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2020-03-30 19:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Wed, Mar 18, 2020 at 8:50 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > Pushed the work to fanotify_name branch.
> > > Let me know if you want me to post v3.
> >
> > So I went through the patches - had only minor comments for most of them.
> > Can you post the next revision by email and I'll pickup at least the
> > obvious preparatory patches to my tree. Thanks!
> >
>
> Will do.
> Most of your comments were minor, but the last comments on
> FAN_REPORT_NAME patch send me to do some homework.
>

I know this patch is for next next release, but I was just investigating
so wanted to publish the results.
For the records, your question about the FAN_REPORT_NAME
patch was: "... this seems to be somewhat duplicating the functionality
of __fsnotify_parent(). Can't we somehow join these paths?"

I remembered that I started with this approach and moved to
taking name snapshots inside fanotify event handler for a reason,
but did not remember what it was. So I went digging back and
found that I wanted to avoid the situation where in mount/sb
marks events are reported in two flavors, one with name and
one without name. I ended up with something that works, but the
logic is quite hard to follow and to document.

So decided it is best to go back to fsnotify_parent() approach and
let the two flavors of events be reported for sb/mount marks.
I pushed the end result to branch fanotify_name and adjusted the
LTP test to expect the extra events.

I will see how that ends up looking in the man page.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 13/16] fanotify: report name info for FAN_DIR_MODIFY event
  2020-02-17 13:14 ` [PATCH v2 13/16] fanotify: report " Amir Goldstein
                     ` (2 preceding siblings ...)
  2020-02-19 11:22   ` Amir Goldstein
@ 2020-04-16 12:16   ` Michael Kerrisk (man-pages)
  2020-04-20 15:53     ` Jan Kara
  2020-04-20 18:45     ` Amir Goldstein
  3 siblings, 2 replies; 65+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-04-16 12:16 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, Linux API, Michael Kerrisk

Hello Amir,

On Mon, 17 Feb 2020 at 15:10, Amir Goldstein <amir73il@gmail.com> wrote:
>
> Report event FAN_DIR_MODIFY with name in a variable length record similar
> to how fid's are reported.  With name info reporting implemented, setting
> FAN_DIR_MODIFY in mark mask is now allowed.

I see this was merged for 5.7. Would you be able to send a man-pages
patch that documents this new feature please.

Cheers,

Michael

>
> When events are reported with name, the reported fid identifies the
> directory and the name follows the fid. The info record type for this
> event info is FAN_EVENT_INFO_TYPE_DFID_NAME.
>
> For now, all reported events have at most one info record which is
> either FAN_EVENT_INFO_TYPE_FID or FAN_EVENT_INFO_TYPE_DFID_NAME (for
> FAN_DIR_MODIFY).  Later on, events "on child" will report both records.
>
> There are several ways that an application can use this information:
>
> 1. When watching a single directory, the name is always relative to
> the watched directory, so application need to fstatat(2) the name
> relative to the watched directory.
>
> 2. When watching a set of directories, the application could keep a map
> of dirfd for all watched directories and hash the map by fid obtained
> with name_to_handle_at(2).  When getting a name event, the fid in the
> event info could be used to lookup the base dirfd in the map and then
> call fstatat(2) with that dirfd.
>
> 3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
> directories, the application could use open_by_handle_at(2) with the fid
> in event info to obtain dirfd for the directory where event happened and
> call fstatat(2) with this dirfd.
>
> The last option scales better for a large number of watched directories.
> The first two options may be available in the future also for non
> privileged fanotify watchers, because open_by_handle_at(2) requires
> the CAP_DAC_READ_SEARCH capability.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fanotify/fanotify.c      |   2 +-
>  fs/notify/fanotify/fanotify_user.c | 120 ++++++++++++++++++++++-------
>  include/linux/fanotify.h           |   3 +-
>  include/uapi/linux/fanotify.h      |   1 +
>  4 files changed, 98 insertions(+), 28 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index fc75dc53a218..b651c18d3a93 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -478,7 +478,7 @@ static int fanotify_handle_event(struct fsnotify_group *group,
>         BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC);
>         BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
>
> -       BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
> +       BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
>
>         mask = fanotify_group_event_mask(group, iter_info, mask, data,
>                                          data_type);
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 284f3548bb79..a1bafc21ebbb 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -51,20 +51,32 @@ struct kmem_cache *fanotify_name_event_cachep __read_mostly;
>  struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
>
>  #define FANOTIFY_EVENT_ALIGN 4
> +#define FANOTIFY_INFO_HDR_LEN \
> +       (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
>
> -static int fanotify_fid_info_len(struct fanotify_fid_hdr *fh)
> +static int fanotify_fid_info_len(int fh_len, int name_len)
>  {
> -       return roundup(sizeof(struct fanotify_event_info_fid) +
> -                      sizeof(struct file_handle) + fh->len,
> -                      FANOTIFY_EVENT_ALIGN);
> +       int info_len = fh_len;
> +
> +       if (name_len)
> +               info_len += name_len + 1;
> +
> +       return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN);
>  }
>
>  static int fanotify_event_info_len(struct fanotify_event *event)
>  {
> -       if (!fanotify_event_has_fid(event))
> -               return 0;
> +       int info_len = 0;
> +
> +       if (fanotify_event_has_fid(event))
> +               info_len += fanotify_fid_info_len(event->fh.len, 0);
> +
> +       if (fanotify_event_has_dfid_name(event)) {
> +               info_len += fanotify_fid_info_len(event->dfh.len,
> +                                       fanotify_event_name_len(event));
> +       }
>
> -       return fanotify_fid_info_len(&event->fh);
> +       return info_len;
>  }
>
>  /*
> @@ -210,23 +222,34 @@ static int process_access_response(struct fsnotify_group *group,
>         return -ENOENT;
>  }
>
> -static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
> -                           struct fanotify_fid *fid, char __user *buf)
> +static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
> +                            struct fanotify_fid *fid, const struct qstr *name,
> +                            char __user *buf, size_t count)
>  {
>         struct fanotify_event_info_fid info = { };
>         struct file_handle handle = { };
> -       unsigned char bounce[FANOTIFY_INLINE_FH_LEN], *data;
> +       unsigned char bounce[max(FANOTIFY_INLINE_FH_LEN, DNAME_INLINE_LEN)];
> +       const unsigned char *data;
>         size_t fh_len = fh->len;
> -       size_t len = fanotify_fid_info_len(fh);
> +       size_t name_len = name ? name->len : 0;
> +       size_t info_len = fanotify_fid_info_len(fh_len, name_len);
> +       size_t len = info_len;
> +
> +       pr_debug("%s: fh_len=%lu name_len=%lu, info_len=%lu, count=%lu\n",
> +                __func__, fh_len, name_len, info_len, count);
>
> -       if (!len)
> +       if (!fh_len || (name && !name_len))
>                 return 0;
>
> -       if (WARN_ON_ONCE(len < sizeof(info) + sizeof(handle) + fh_len))
> +       if (WARN_ON_ONCE(len < sizeof(info) || len > count))
>                 return -EFAULT;
>
> -       /* Copy event info fid header followed by vaiable sized file handle */
> -       info.hdr.info_type = FAN_EVENT_INFO_TYPE_FID;
> +       /*
> +        * Copy event info fid header followed by vaiable sized file handle
> +        * and optionally followed by vaiable sized filename.
> +        */
> +       info.hdr.info_type = name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME :
> +                                       FAN_EVENT_INFO_TYPE_FID;
>         info.hdr.len = len;
>         info.fsid = *fsid;
>         if (copy_to_user(buf, &info, sizeof(info)))
> @@ -234,6 +257,9 @@ static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
>
>         buf += sizeof(info);
>         len -= sizeof(info);
> +       if (WARN_ON_ONCE(len < sizeof(handle)))
> +               return -EFAULT;
> +
>         handle.handle_type = fh->type;
>         handle.handle_bytes = fh_len;
>         if (copy_to_user(buf, &handle, sizeof(handle)))
> @@ -241,9 +267,12 @@ static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
>
>         buf += sizeof(handle);
>         len -= sizeof(handle);
> +       if (WARN_ON_ONCE(len < fh_len))
> +               return -EFAULT;
> +
>         /*
> -        * For an inline fh, copy through stack to exclude the copy from
> -        * usercopy hardening protections.
> +        * For an inline fh and inline file name, copy through stack to exclude
> +        * the copy from usercopy hardening protections.
>          */
>         data = fanotify_fid_fh(fid, fh_len);
>         if (fh_len <= FANOTIFY_INLINE_FH_LEN) {
> @@ -253,14 +282,33 @@ static int copy_fid_to_user(__kernel_fsid_t *fsid, struct fanotify_fid_hdr *fh,
>         if (copy_to_user(buf, data, fh_len))
>                 return -EFAULT;
>
> -       /* Pad with 0's */
>         buf += fh_len;
>         len -= fh_len;
> +
> +       if (name_len) {
> +               /* Copy the filename with terminating null */
> +               name_len++;
> +               if (WARN_ON_ONCE(len < name_len))
> +                       return -EFAULT;
> +
> +               data = name->name;
> +               if (name_len <= DNAME_INLINE_LEN) {
> +                       memcpy(bounce, data, name_len);
> +                       data = bounce;
> +               }
> +               if (copy_to_user(buf, data, name_len))
> +                       return -EFAULT;
> +
> +               buf += name_len;
> +               len -= name_len;
> +       }
> +
> +       /* Pad with 0's */
>         WARN_ON_ONCE(len < 0 || len >= FANOTIFY_EVENT_ALIGN);
>         if (len > 0 && clear_user(buf, len))
>                 return -EFAULT;
>
> -       return 0;
> +       return info_len;
>  }
>
>  static ssize_t copy_event_to_user(struct fsnotify_group *group,
> @@ -282,12 +330,12 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>         metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
>         metadata.pid = pid_vnr(event->pid);
>
> -       if (fanotify_event_has_path(event)) {
> +       if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
> +               metadata.event_len += fanotify_event_info_len(event);
> +       } else if (fanotify_event_has_path(event)) {
>                 fd = create_fd(group, event, &f);
>                 if (fd < 0)
>                         return fd;
> -       } else if (fanotify_event_has_fid(event)) {
> -               metadata.event_len += fanotify_event_info_len(event);
>         }
>         metadata.fd = fd;
>
> @@ -302,16 +350,36 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>         if (copy_to_user(buf, &metadata, FAN_EVENT_METADATA_LEN))
>                 goto out_close_fd;
>
> +       buf += FAN_EVENT_METADATA_LEN;
> +       count -= FAN_EVENT_METADATA_LEN;
> +
>         if (fanotify_is_perm_event(event->mask))
>                 FANOTIFY_PE(fsn_event)->fd = fd;
>
> -       if (fanotify_event_has_path(event)) {
> +       if (f)
>                 fd_install(fd, f);
> -       } else if (fanotify_event_has_fid(event)) {
> -               ret = copy_fid_to_user(&event->fsid, &event->fh, &event->fid,
> -                                      buf + FAN_EVENT_METADATA_LEN);
> +
> +       /* Event info records order is: dir fid + name, child fid */
> +       if (fanotify_event_has_dfid_name(event)) {
> +               struct fanotify_name_event *fne = FANOTIFY_NE(fsn_event);
> +
> +               ret = copy_info_to_user(&event->fsid, &event->dfh, &fne->dfid,
> +                                       &fne->name, buf, count);
>                 if (ret < 0)
>                         return ret;
> +
> +               buf += ret;
> +               count -= ret;
> +       }
> +
> +       if (fanotify_event_has_fid(event)) {
> +               ret = copy_info_to_user(&event->fsid, &event->fh, &event->fid,
> +                                       NULL, buf, count);
> +               if (ret < 0)
> +                       return ret;
> +
> +               buf += ret;
> +               count -= ret;
>         }
>
>         return metadata.event_len;
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index b79fa9bb7359..3049a6c06d9e 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -47,7 +47,8 @@
>   * Directory entry modification events - reported only to directory
>   * where entry is modified and not to a watching parent.
>   */
> -#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE)
> +#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE | \
> +                                FAN_DIR_MODIFY)
>
>  /* Events that can only be reported with data type FSNOTIFY_EVENT_INODE */
>  #define FANOTIFY_INODE_EVENTS  (FANOTIFY_DIRENT_EVENTS | \
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index 615fa2c87179..2b56e194b858 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -117,6 +117,7 @@ struct fanotify_event_metadata {
>  };
>
>  #define FAN_EVENT_INFO_TYPE_FID                1
> +#define FAN_EVENT_INFO_TYPE_DFID_NAME  2
>
>  /* Variable length info record following event metadata */
>  struct fanotify_event_info_header {
> --
> 2.17.1
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 13/16] fanotify: report name info for FAN_DIR_MODIFY event
  2020-04-16 12:16   ` Michael Kerrisk (man-pages)
@ 2020-04-20 15:53     ` Jan Kara
  2020-04-20 18:45     ` Amir Goldstein
  1 sibling, 0 replies; 65+ messages in thread
From: Jan Kara @ 2020-04-20 15:53 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Amir Goldstein, Jan Kara, linux-fsdevel, Linux API

On Thu 16-04-20 14:16:40, Michael Kerrisk (man-pages) wrote:
> Hello Amir,
> 
> On Mon, 17 Feb 2020 at 15:10, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Report event FAN_DIR_MODIFY with name in a variable length record similar
> > to how fid's are reported.  With name info reporting implemented, setting
> > FAN_DIR_MODIFY in mark mask is now allowed.
> 
> I see this was merged for 5.7. Would you be able to send a man-pages
> patch that documents this new feature please.

I know that Amir has the manpage ready. I'm not sure when he plans to
submit it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 13/16] fanotify: report name info for FAN_DIR_MODIFY event
  2020-04-16 12:16   ` Michael Kerrisk (man-pages)
  2020-04-20 15:53     ` Jan Kara
@ 2020-04-20 18:45     ` Amir Goldstein
  2020-04-20 18:47       ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2020-04-20 18:45 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Jan Kara, linux-fsdevel, Linux API

On Thu, Apr 16, 2020 at 3:16 PM Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
>
> Hello Amir,
>
> On Mon, 17 Feb 2020 at 15:10, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Report event FAN_DIR_MODIFY with name in a variable length record similar
> > to how fid's are reported.  With name info reporting implemented, setting
> > FAN_DIR_MODIFY in mark mask is now allowed.
>
> I see this was merged for 5.7. Would you be able to send a man-pages
> patch that documents this new feature please.
>

Sorry, I missed your email.
Just posted the patches.
I never know when in development cycle you expect to get the man page patches...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 13/16] fanotify: report name info for FAN_DIR_MODIFY event
  2020-04-20 18:45     ` Amir Goldstein
@ 2020-04-20 18:47       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 65+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-04-20 18:47 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, Linux API

On Mon, 20 Apr 2020 at 20:45, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Apr 16, 2020 at 3:16 PM Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
> >
> > Hello Amir,
> >
> > On Mon, 17 Feb 2020 at 15:10, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > Report event FAN_DIR_MODIFY with name in a variable length record similar
> > > to how fid's are reported.  With name info reporting implemented, setting
> > > FAN_DIR_MODIFY in mark mask is now allowed.
> >
> > I see this was merged for 5.7. Would you be able to send a man-pages
> > patch that documents this new feature please.
> >
>
> Sorry, I missed your email.
> Just posted the patches.
> I never know when in development cycle you expect to get the man page patches...

Ideally,m the manual page patches are posted in parallel with the
kernel patches, with a note saying that the featur eis not yet merged.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2020-02-26 14:38       ` Jan Kara
@ 2021-01-22 13:59         ` Amir Goldstein
  2021-01-23 13:30           ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2021-01-22 13:59 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

> > > Hum, now thinking about this, maybe we could clean this up even a bit more.
> > > event->inode is currently used only by inotify and fanotify for merging
> > > purposes. Now inotify could use its 'wd' instead of inode with exactly the
> > > same results, fanotify path or fid check is at least as strong as the inode
> > > check. So only for the case of pure "inode" events, we need to store inode
> > > identifier in struct fanotify_event - and we can do that in the union with
> > > struct path and completely remove the 'inode' member from fsnotify_event.
> > > Am I missing something?
> >
> > That generally sounds good and I did notice it is strange that wd is not
> > being compared.  However, I think I was worried that comparing fid+name
> > (in following patches) would be more expensive than comparing dentry (or
> > object inode) as a "rule out first" in merge, so I preferred to keep the
> > tag/dentry/id comparison for fanotify_fid case.
>
> Yes, that could be a concern.
>
> > Given this analysis (and assuming it is correct), would you like me to
> > just go a head with the change suggested above? or anything beyond that?
>
> Let's go just with the change suggested above for now. We can work on this
> later (probably with optimizing of the fanotify merging code).
>

Hi Jan,

Recap:
- fanotify_merge is very inefficient and uses extensive CPU if queue contains
  many events, so it is rather easy for a poorly written listener to
cripple the system
- You had an idea to store in event->objectid a hash of all the compared
  fields (e.g. fid+name)
- I think you had an idea to keep a hash table of events in the queue
to find the
  merge candidates faster
- For internal uses, I carry a patch that limits the linear search for
last 128 events
  which is enough to relieve the CPU overuse in case of unattended long queues

I tried looking into implementing the hash table idea, assuming I understood you
correctly and I struggled to choose appropriate table sizes. It seemed to make
sense to use a global hash table, such as inode/dentry cache for all the groups
but that would add complexity to locking rules of queue/dequeue and
group cleanup.

A simpler solution I considered, similar to my 128 events limit patch,
is to limit
the linear search to events queued in the last X seconds.
The rationale is that event merging is not supposed to be long term at all.
If a listener fails to perform read from the queue, it is not fsnotify's job to
try and keep the queue compact. I think merging events mechanism was
mainly meant to merge short bursts of events on objects, which are quite
common and surely can happen concurrently on several objects.

My intuition is that making event->objectid into event->hash in addition
to limiting the age of events to merge would address the real life workloads.
One question if we do choose this approach is what should the age limit be?
Should it be configurable? Default to infinity and let distro cap the age or
provide a sane default by kernel while slightly changing behavior (yes please).

What are your thoughts about this?
Do you have a better idea maybe?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2021-01-22 13:59         ` fanotify_merge improvements Amir Goldstein
@ 2021-01-23 13:30           ` Amir Goldstein
  2021-01-25 13:01             ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2021-01-23 13:30 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Fri, Jan 22, 2021 at 3:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > > Hum, now thinking about this, maybe we could clean this up even a bit more.
> > > > event->inode is currently used only by inotify and fanotify for merging
> > > > purposes. Now inotify could use its 'wd' instead of inode with exactly the
> > > > same results, fanotify path or fid check is at least as strong as the inode
> > > > check. So only for the case of pure "inode" events, we need to store inode
> > > > identifier in struct fanotify_event - and we can do that in the union with
> > > > struct path and completely remove the 'inode' member from fsnotify_event.
> > > > Am I missing something?
> > >
> > > That generally sounds good and I did notice it is strange that wd is not
> > > being compared.  However, I think I was worried that comparing fid+name
> > > (in following patches) would be more expensive than comparing dentry (or
> > > object inode) as a "rule out first" in merge, so I preferred to keep the
> > > tag/dentry/id comparison for fanotify_fid case.
> >
> > Yes, that could be a concern.
> >
> > > Given this analysis (and assuming it is correct), would you like me to
> > > just go a head with the change suggested above? or anything beyond that?
> >
> > Let's go just with the change suggested above for now. We can work on this
> > later (probably with optimizing of the fanotify merging code).
> >
>
> Hi Jan,
>
> Recap:
> - fanotify_merge is very inefficient and uses extensive CPU if queue contains
>   many events, so it is rather easy for a poorly written listener to
> cripple the system
> - You had an idea to store in event->objectid a hash of all the compared
>   fields (e.g. fid+name)
> - I think you had an idea to keep a hash table of events in the queue
> to find the
>   merge candidates faster
> - For internal uses, I carry a patch that limits the linear search for
> last 128 events
>   which is enough to relieve the CPU overuse in case of unattended long queues
>
> I tried looking into implementing the hash table idea, assuming I understood you
> correctly and I struggled to choose appropriate table sizes. It seemed to make
> sense to use a global hash table, such as inode/dentry cache for all the groups
> but that would add complexity to locking rules of queue/dequeue and
> group cleanup.
>
> A simpler solution I considered, similar to my 128 events limit patch,
> is to limit
> the linear search to events queued in the last X seconds.
> The rationale is that event merging is not supposed to be long term at all.
> If a listener fails to perform read from the queue, it is not fsnotify's job to
> try and keep the queue compact. I think merging events mechanism was
> mainly meant to merge short bursts of events on objects, which are quite
> common and surely can happen concurrently on several objects.
>
> My intuition is that making event->objectid into event->hash in addition
> to limiting the age of events to merge would address the real life workloads.
> One question if we do choose this approach is what should the age limit be?
> Should it be configurable? Default to infinity and let distro cap the age or
> provide a sane default by kernel while slightly changing behavior (yes please).
>
> What are your thoughts about this?

Aha! found it:
https://lore.kernel.org/linux-fsdevel/20200227112755.GZ10728@quack2.suse.cz/
You suggested a small hash table per group (128 slots).

My intuition is that this will not be good enough for the worst case, which is
not that hard to hit is real life:
1. Listener sets FAN_UNLIMITED_QUEUE
2. Listener adds a FAN_MARK_FILESYSTEM watch
3. Many thousands of events are queued
4. Listener lingers (due to bad implementation?) in reading events
5. Every single event now incurs a huge fanotify_merge() cost

Reducing the cost of merge from O(N) to O(N/128) doesn't really fix the problem.

> Do you have a better idea maybe?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2021-01-23 13:30           ` Amir Goldstein
@ 2021-01-25 13:01             ` Jan Kara
  2021-01-26 16:21               ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2021-01-25 13:01 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Sat 23-01-21 15:30:59, Amir Goldstein wrote:
> On Fri, Jan 22, 2021 at 3:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > > > Hum, now thinking about this, maybe we could clean this up even a bit more.
> > > > > event->inode is currently used only by inotify and fanotify for merging
> > > > > purposes. Now inotify could use its 'wd' instead of inode with exactly the
> > > > > same results, fanotify path or fid check is at least as strong as the inode
> > > > > check. So only for the case of pure "inode" events, we need to store inode
> > > > > identifier in struct fanotify_event - and we can do that in the union with
> > > > > struct path and completely remove the 'inode' member from fsnotify_event.
> > > > > Am I missing something?
> > > >
> > > > That generally sounds good and I did notice it is strange that wd is not
> > > > being compared.  However, I think I was worried that comparing fid+name
> > > > (in following patches) would be more expensive than comparing dentry (or
> > > > object inode) as a "rule out first" in merge, so I preferred to keep the
> > > > tag/dentry/id comparison for fanotify_fid case.
> > >
> > > Yes, that could be a concern.
> > >
> > > > Given this analysis (and assuming it is correct), would you like me to
> > > > just go a head with the change suggested above? or anything beyond that?
> > >
> > > Let's go just with the change suggested above for now. We can work on this
> > > later (probably with optimizing of the fanotify merging code).
> > >
> >
> > Hi Jan,
> >
> > Recap:
> > - fanotify_merge is very inefficient and uses extensive CPU if queue contains
> >   many events, so it is rather easy for a poorly written listener to
> > cripple the system
> > - You had an idea to store in event->objectid a hash of all the compared
> >   fields (e.g. fid+name)
> > - I think you had an idea to keep a hash table of events in the queue
> > to find the
> >   merge candidates faster
> > - For internal uses, I carry a patch that limits the linear search for
> > last 128 events
> >   which is enough to relieve the CPU overuse in case of unattended long queues
> >
> > I tried looking into implementing the hash table idea, assuming I understood you
> > correctly and I struggled to choose appropriate table sizes. It seemed to make
> > sense to use a global hash table, such as inode/dentry cache for all the groups
> > but that would add complexity to locking rules of queue/dequeue and
> > group cleanup.
> >
> > A simpler solution I considered, similar to my 128 events limit patch,
> > is to limit
> > the linear search to events queued in the last X seconds.
> > The rationale is that event merging is not supposed to be long term at all.
> > If a listener fails to perform read from the queue, it is not fsnotify's job to
> > try and keep the queue compact. I think merging events mechanism was
> > mainly meant to merge short bursts of events on objects, which are quite
> > common and surely can happen concurrently on several objects.
> >
> > My intuition is that making event->objectid into event->hash in addition
> > to limiting the age of events to merge would address the real life workloads.
> > One question if we do choose this approach is what should the age limit be?
> > Should it be configurable? Default to infinity and let distro cap the age or
> > provide a sane default by kernel while slightly changing behavior (yes please).
> >
> > What are your thoughts about this?
> 
> Aha! found it:
> https://lore.kernel.org/linux-fsdevel/20200227112755.GZ10728@quack2.suse.cz/
> You suggested a small hash table per group (128 slots).
> 
> My intuition is that this will not be good enough for the worst case, which is
> not that hard to hit is real life:
> 1. Listener sets FAN_UNLIMITED_QUEUE
> 2. Listener adds a FAN_MARK_FILESYSTEM watch
> 3. Many thousands of events are queued
> 4. Listener lingers (due to bad implementation?) in reading events
> 5. Every single event now incurs a huge fanotify_merge() cost
> 
> Reducing the cost of merge from O(N) to O(N/128) doesn't really fix the
> problem.

So my thought was that indeed reducing the overhead of merging by a factor
of 128 should be enough for any practical case as much as I agree that in
principle the computational complexity remains the same. And I've picked
per-group hash table to avoid interferences among notification groups and
to keep locking simple. That being said I'm not opposed to combining this
with a limit on the number of elements traversed in a hash chain (e.g.
those 128 you use yourself) - it will be naturally ordered by queue order
if we are a bit careful. This will provide efficient and effective merging
for ~8k queued events which seems enough to me. I find time based limits
not really worth it. Yes, they provide more predictable behavior but less
predictable runtime and overall I don't find the complexity worth the
benefit.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2021-01-25 13:01             ` Jan Kara
@ 2021-01-26 16:21               ` Amir Goldstein
  2021-01-27 11:24                 ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2021-01-26 16:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Mon, Jan 25, 2021 at 3:01 PM Jan Kara <jack@suse.cz> wrote:
>
> On Sat 23-01-21 15:30:59, Amir Goldstein wrote:
> > On Fri, Jan 22, 2021 at 3:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > > > > Hum, now thinking about this, maybe we could clean this up even a bit more.
> > > > > > event->inode is currently used only by inotify and fanotify for merging
> > > > > > purposes. Now inotify could use its 'wd' instead of inode with exactly the
> > > > > > same results, fanotify path or fid check is at least as strong as the inode
> > > > > > check. So only for the case of pure "inode" events, we need to store inode
> > > > > > identifier in struct fanotify_event - and we can do that in the union with
> > > > > > struct path and completely remove the 'inode' member from fsnotify_event.
> > > > > > Am I missing something?
> > > > >
> > > > > That generally sounds good and I did notice it is strange that wd is not
> > > > > being compared.  However, I think I was worried that comparing fid+name
> > > > > (in following patches) would be more expensive than comparing dentry (or
> > > > > object inode) as a "rule out first" in merge, so I preferred to keep the
> > > > > tag/dentry/id comparison for fanotify_fid case.
> > > >
> > > > Yes, that could be a concern.
> > > >
> > > > > Given this analysis (and assuming it is correct), would you like me to
> > > > > just go a head with the change suggested above? or anything beyond that?
> > > >
> > > > Let's go just with the change suggested above for now. We can work on this
> > > > later (probably with optimizing of the fanotify merging code).
> > > >
> > >
> > > Hi Jan,
> > >
> > > Recap:
> > > - fanotify_merge is very inefficient and uses extensive CPU if queue contains
> > >   many events, so it is rather easy for a poorly written listener to
> > > cripple the system
> > > - You had an idea to store in event->objectid a hash of all the compared
> > >   fields (e.g. fid+name)
> > > - I think you had an idea to keep a hash table of events in the queue
> > > to find the
> > >   merge candidates faster
> > > - For internal uses, I carry a patch that limits the linear search for
> > > last 128 events
> > >   which is enough to relieve the CPU overuse in case of unattended long queues
> > >
> > > I tried looking into implementing the hash table idea, assuming I understood you
> > > correctly and I struggled to choose appropriate table sizes. It seemed to make
> > > sense to use a global hash table, such as inode/dentry cache for all the groups
> > > but that would add complexity to locking rules of queue/dequeue and
> > > group cleanup.
> > >
> > > A simpler solution I considered, similar to my 128 events limit patch,
> > > is to limit
> > > the linear search to events queued in the last X seconds.
> > > The rationale is that event merging is not supposed to be long term at all.
> > > If a listener fails to perform read from the queue, it is not fsnotify's job to
> > > try and keep the queue compact. I think merging events mechanism was
> > > mainly meant to merge short bursts of events on objects, which are quite
> > > common and surely can happen concurrently on several objects.
> > >
> > > My intuition is that making event->objectid into event->hash in addition
> > > to limiting the age of events to merge would address the real life workloads.
> > > One question if we do choose this approach is what should the age limit be?
> > > Should it be configurable? Default to infinity and let distro cap the age or
> > > provide a sane default by kernel while slightly changing behavior (yes please).
> > >
> > > What are your thoughts about this?
> >
> > Aha! found it:
> > https://lore.kernel.org/linux-fsdevel/20200227112755.GZ10728@quack2.suse.cz/
> > You suggested a small hash table per group (128 slots).
> >
> > My intuition is that this will not be good enough for the worst case, which is
> > not that hard to hit is real life:
> > 1. Listener sets FAN_UNLIMITED_QUEUE
> > 2. Listener adds a FAN_MARK_FILESYSTEM watch
> > 3. Many thousands of events are queued
> > 4. Listener lingers (due to bad implementation?) in reading events
> > 5. Every single event now incurs a huge fanotify_merge() cost
> >
> > Reducing the cost of merge from O(N) to O(N/128) doesn't really fix the
> > problem.
>
> So my thought was that indeed reducing the overhead of merging by a factor
> of 128 should be enough for any practical case as much as I agree that in
> principle the computational complexity remains the same. And I've picked
> per-group hash table to avoid interferences among notification groups and
> to keep locking simple. That being said I'm not opposed to combining this
> with a limit on the number of elements traversed in a hash chain (e.g.
> those 128 you use yourself) - it will be naturally ordered by queue order
> if we are a bit careful. This will provide efficient and effective merging
> for ~8k queued events which seems enough to me. I find time based limits
> not really worth it. Yes, they provide more predictable behavior but less
> predictable runtime and overall I don't find the complexity worth the
> benefit.
>

Sounds reasonable.
If you have time, please take a look at this WIP branch:
https://github.com/amir73il/linux/commits/fanotify_merge
and let me know if you like the direction it is taking.

This branch is only compile tested, but I am asking w.r.t to the chosen
data structures.
So far it is just an array of queues selected by (yet unmodified) objectid.
Reading is just from any available queue.

My goal was to avoid having to hang the event on multiple list/hlist and
the idea is to implement read by order of events as follows:

- With multi queue, high bit of obejctid will be masked for merge compare.
- Instead, they will be used to store the next_qid to read from

For example:
- event #1 is added to queue 6
- set group->last_qid = 6
- set group->next_qid = 6 (because group->num_events == 1)
- event #2 is added to queue 13
- the next_qid bits of the last event in last_qid (6) queue are set to 13
- set group->last_qid = 13

- read() checks value of group->next_qid and reads the first event
from queue 6 (event #1)
- event #1 has 13 stored in next_qid bits so set group->next_qid = 13
- read() reads first event from queue 13 (event #2)

Permission events require special care, but that is the idea of a simple singly
linked list using qid's for reading events by insert order and merging by hashed
queue.

The advantage of this method is that most of the generic code remains unaware
of the multi queue changes (see WIP) and I think it gets the job done without
complicating the code too much.

Thoughts?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2021-01-26 16:21               ` Amir Goldstein
@ 2021-01-27 11:24                 ` Jan Kara
  2021-01-27 12:57                   ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2021-01-27 11:24 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Tue 26-01-21 18:21:26, Amir Goldstein wrote:
> On Mon, Jan 25, 2021 at 3:01 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Sat 23-01-21 15:30:59, Amir Goldstein wrote:
> > > On Fri, Jan 22, 2021 at 3:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > > > > Hum, now thinking about this, maybe we could clean this up even a bit more.
> > > > > > > event->inode is currently used only by inotify and fanotify for merging
> > > > > > > purposes. Now inotify could use its 'wd' instead of inode with exactly the
> > > > > > > same results, fanotify path or fid check is at least as strong as the inode
> > > > > > > check. So only for the case of pure "inode" events, we need to store inode
> > > > > > > identifier in struct fanotify_event - and we can do that in the union with
> > > > > > > struct path and completely remove the 'inode' member from fsnotify_event.
> > > > > > > Am I missing something?
> > > > > >
> > > > > > That generally sounds good and I did notice it is strange that wd is not
> > > > > > being compared.  However, I think I was worried that comparing fid+name
> > > > > > (in following patches) would be more expensive than comparing dentry (or
> > > > > > object inode) as a "rule out first" in merge, so I preferred to keep the
> > > > > > tag/dentry/id comparison for fanotify_fid case.
> > > > >
> > > > > Yes, that could be a concern.
> > > > >
> > > > > > Given this analysis (and assuming it is correct), would you like me to
> > > > > > just go a head with the change suggested above? or anything beyond that?
> > > > >
> > > > > Let's go just with the change suggested above for now. We can work on this
> > > > > later (probably with optimizing of the fanotify merging code).
> > > > >
> > > >
> > > > Hi Jan,
> > > >
> > > > Recap:
> > > > - fanotify_merge is very inefficient and uses extensive CPU if queue contains
> > > >   many events, so it is rather easy for a poorly written listener to
> > > > cripple the system
> > > > - You had an idea to store in event->objectid a hash of all the compared
> > > >   fields (e.g. fid+name)
> > > > - I think you had an idea to keep a hash table of events in the queue
> > > > to find the
> > > >   merge candidates faster
> > > > - For internal uses, I carry a patch that limits the linear search for
> > > > last 128 events
> > > >   which is enough to relieve the CPU overuse in case of unattended long queues
> > > >
> > > > I tried looking into implementing the hash table idea, assuming I understood you
> > > > correctly and I struggled to choose appropriate table sizes. It seemed to make
> > > > sense to use a global hash table, such as inode/dentry cache for all the groups
> > > > but that would add complexity to locking rules of queue/dequeue and
> > > > group cleanup.
> > > >
> > > > A simpler solution I considered, similar to my 128 events limit patch,
> > > > is to limit
> > > > the linear search to events queued in the last X seconds.
> > > > The rationale is that event merging is not supposed to be long term at all.
> > > > If a listener fails to perform read from the queue, it is not fsnotify's job to
> > > > try and keep the queue compact. I think merging events mechanism was
> > > > mainly meant to merge short bursts of events on objects, which are quite
> > > > common and surely can happen concurrently on several objects.
> > > >
> > > > My intuition is that making event->objectid into event->hash in addition
> > > > to limiting the age of events to merge would address the real life workloads.
> > > > One question if we do choose this approach is what should the age limit be?
> > > > Should it be configurable? Default to infinity and let distro cap the age or
> > > > provide a sane default by kernel while slightly changing behavior (yes please).
> > > >
> > > > What are your thoughts about this?
> > >
> > > Aha! found it:
> > > https://lore.kernel.org/linux-fsdevel/20200227112755.GZ10728@quack2.suse.cz/
> > > You suggested a small hash table per group (128 slots).
> > >
> > > My intuition is that this will not be good enough for the worst case, which is
> > > not that hard to hit is real life:
> > > 1. Listener sets FAN_UNLIMITED_QUEUE
> > > 2. Listener adds a FAN_MARK_FILESYSTEM watch
> > > 3. Many thousands of events are queued
> > > 4. Listener lingers (due to bad implementation?) in reading events
> > > 5. Every single event now incurs a huge fanotify_merge() cost
> > >
> > > Reducing the cost of merge from O(N) to O(N/128) doesn't really fix the
> > > problem.
> >
> > So my thought was that indeed reducing the overhead of merging by a factor
> > of 128 should be enough for any practical case as much as I agree that in
> > principle the computational complexity remains the same. And I've picked
> > per-group hash table to avoid interferences among notification groups and
> > to keep locking simple. That being said I'm not opposed to combining this
> > with a limit on the number of elements traversed in a hash chain (e.g.
> > those 128 you use yourself) - it will be naturally ordered by queue order
> > if we are a bit careful. This will provide efficient and effective merging
> > for ~8k queued events which seems enough to me. I find time based limits
> > not really worth it. Yes, they provide more predictable behavior but less
> > predictable runtime and overall I don't find the complexity worth the
> > benefit.
> >
> 
> Sounds reasonable.
> If you have time, please take a look at this WIP branch:
> https://github.com/amir73il/linux/commits/fanotify_merge
> and let me know if you like the direction it is taking.
> 
> This branch is only compile tested, but I am asking w.r.t to the chosen
> data structures.  So far it is just an array of queues selected by (yet
> unmodified) objectid.  Reading is just from any available queue.
> 
> My goal was to avoid having to hang the event on multiple list/hlist and
> the idea is to implement read by order of events as follows:

As a side note, since we use notification_list as a strict queue, we could
actually use a singly linked list for linking all the events (implemented
in include/linux/llist.h). That way we can save one pointer in
fsnotify_event if we wish without too much complication AFAICT. But I'm not
sure we really care.

> - With multi queue, high bit of obejctid will be masked for merge compare.
> - Instead, they will be used to store the next_qid to read from
> 
> For example:
> - event #1 is added to queue 6
> - set group->last_qid = 6
> - set group->next_qid = 6 (because group->num_events == 1)
> - event #2 is added to queue 13
> - the next_qid bits of the last event in last_qid (6) queue are set to 13
> - set group->last_qid = 13
>
> - read() checks value of group->next_qid and reads the first event
> from queue 6 (event #1)
> - event #1 has 13 stored in next_qid bits so set group->next_qid = 13
> - read() reads first event from queue 13 (event #2)

That's an interesting idea. I like it and I think it would work. Just
instead of masking, I'd use bitfields. Or we could just restrict objectid
to 32-bits and use remaining 32-bits for the next_qid pointer. I know it
will waste some bits but 32-bits of objectid should provide us with enough
space to avoid doing full event comparison in most cases - BTW WRT naming I
find 'qid' somewhat confusing. Can we call it say 'next_bucket' or
something like that?

> Permission events require special care, but that is the idea of a simple
> singly linked list using qid's for reading events by insert order and
> merging by hashed queue.

Why are permission events special in this regard?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2021-01-27 11:24                 ` Jan Kara
@ 2021-01-27 12:57                   ` Amir Goldstein
  2021-01-27 15:15                     ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2021-01-27 12:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Wed, Jan 27, 2021 at 1:24 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 26-01-21 18:21:26, Amir Goldstein wrote:
> > On Mon, Jan 25, 2021 at 3:01 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Sat 23-01-21 15:30:59, Amir Goldstein wrote:
> > > > On Fri, Jan 22, 2021 at 3:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > > > > Hum, now thinking about this, maybe we could clean this up even a bit more.
> > > > > > > > event->inode is currently used only by inotify and fanotify for merging
> > > > > > > > purposes. Now inotify could use its 'wd' instead of inode with exactly the
> > > > > > > > same results, fanotify path or fid check is at least as strong as the inode
> > > > > > > > check. So only for the case of pure "inode" events, we need to store inode
> > > > > > > > identifier in struct fanotify_event - and we can do that in the union with
> > > > > > > > struct path and completely remove the 'inode' member from fsnotify_event.
> > > > > > > > Am I missing something?
> > > > > > >
> > > > > > > That generally sounds good and I did notice it is strange that wd is not
> > > > > > > being compared.  However, I think I was worried that comparing fid+name
> > > > > > > (in following patches) would be more expensive than comparing dentry (or
> > > > > > > object inode) as a "rule out first" in merge, so I preferred to keep the
> > > > > > > tag/dentry/id comparison for fanotify_fid case.
> > > > > >
> > > > > > Yes, that could be a concern.
> > > > > >
> > > > > > > Given this analysis (and assuming it is correct), would you like me to
> > > > > > > just go a head with the change suggested above? or anything beyond that?
> > > > > >
> > > > > > Let's go just with the change suggested above for now. We can work on this
> > > > > > later (probably with optimizing of the fanotify merging code).
> > > > > >
> > > > >
> > > > > Hi Jan,
> > > > >
> > > > > Recap:
> > > > > - fanotify_merge is very inefficient and uses extensive CPU if queue contains
> > > > >   many events, so it is rather easy for a poorly written listener to
> > > > > cripple the system
> > > > > - You had an idea to store in event->objectid a hash of all the compared
> > > > >   fields (e.g. fid+name)
> > > > > - I think you had an idea to keep a hash table of events in the queue
> > > > > to find the
> > > > >   merge candidates faster
> > > > > - For internal uses, I carry a patch that limits the linear search for
> > > > > last 128 events
> > > > >   which is enough to relieve the CPU overuse in case of unattended long queues
> > > > >
> > > > > I tried looking into implementing the hash table idea, assuming I understood you
> > > > > correctly and I struggled to choose appropriate table sizes. It seemed to make
> > > > > sense to use a global hash table, such as inode/dentry cache for all the groups
> > > > > but that would add complexity to locking rules of queue/dequeue and
> > > > > group cleanup.
> > > > >
> > > > > A simpler solution I considered, similar to my 128 events limit patch,
> > > > > is to limit
> > > > > the linear search to events queued in the last X seconds.
> > > > > The rationale is that event merging is not supposed to be long term at all.
> > > > > If a listener fails to perform read from the queue, it is not fsnotify's job to
> > > > > try and keep the queue compact. I think merging events mechanism was
> > > > > mainly meant to merge short bursts of events on objects, which are quite
> > > > > common and surely can happen concurrently on several objects.
> > > > >
> > > > > My intuition is that making event->objectid into event->hash in addition
> > > > > to limiting the age of events to merge would address the real life workloads.
> > > > > One question if we do choose this approach is what should the age limit be?
> > > > > Should it be configurable? Default to infinity and let distro cap the age or
> > > > > provide a sane default by kernel while slightly changing behavior (yes please).
> > > > >
> > > > > What are your thoughts about this?
> > > >
> > > > Aha! found it:
> > > > https://lore.kernel.org/linux-fsdevel/20200227112755.GZ10728@quack2.suse.cz/
> > > > You suggested a small hash table per group (128 slots).
> > > >
> > > > My intuition is that this will not be good enough for the worst case, which is
> > > > not that hard to hit is real life:
> > > > 1. Listener sets FAN_UNLIMITED_QUEUE
> > > > 2. Listener adds a FAN_MARK_FILESYSTEM watch
> > > > 3. Many thousands of events are queued
> > > > 4. Listener lingers (due to bad implementation?) in reading events
> > > > 5. Every single event now incurs a huge fanotify_merge() cost
> > > >
> > > > Reducing the cost of merge from O(N) to O(N/128) doesn't really fix the
> > > > problem.
> > >
> > > So my thought was that indeed reducing the overhead of merging by a factor
> > > of 128 should be enough for any practical case as much as I agree that in
> > > principle the computational complexity remains the same. And I've picked
> > > per-group hash table to avoid interferences among notification groups and
> > > to keep locking simple. That being said I'm not opposed to combining this
> > > with a limit on the number of elements traversed in a hash chain (e.g.
> > > those 128 you use yourself) - it will be naturally ordered by queue order
> > > if we are a bit careful. This will provide efficient and effective merging
> > > for ~8k queued events which seems enough to me. I find time based limits
> > > not really worth it. Yes, they provide more predictable behavior but less
> > > predictable runtime and overall I don't find the complexity worth the
> > > benefit.
> > >
> >
> > Sounds reasonable.
> > If you have time, please take a look at this WIP branch:
> > https://github.com/amir73il/linux/commits/fanotify_merge
> > and let me know if you like the direction it is taking.
> >
> > This branch is only compile tested, but I am asking w.r.t to the chosen
> > data structures.  So far it is just an array of queues selected by (yet
> > unmodified) objectid.  Reading is just from any available queue.
> >
> > My goal was to avoid having to hang the event on multiple list/hlist and
> > the idea is to implement read by order of events as follows:
>
> As a side note, since we use notification_list as a strict queue, we could
> actually use a singly linked list for linking all the events (implemented
> in include/linux/llist.h). That way we can save one pointer in
> fsnotify_event if we wish without too much complication AFAICT. But I'm not
> sure we really care.
>

Handling of the overflow event is going to be a bit subtle and permission
events are not following the strict FIFO.
Anyway, I'd rather not get into that change.

> > - With multi queue, high bit of obejctid will be masked for merge compare.
> > - Instead, they will be used to store the next_qid to read from
> >
> > For example:
> > - event #1 is added to queue 6
> > - set group->last_qid = 6
> > - set group->next_qid = 6 (because group->num_events == 1)
> > - event #2 is added to queue 13
> > - the next_qid bits of the last event in last_qid (6) queue are set to 13
> > - set group->last_qid = 13
> >
> > - read() checks value of group->next_qid and reads the first event
> > from queue 6 (event #1)
> > - event #1 has 13 stored in next_qid bits so set group->next_qid = 13
> > - read() reads first event from queue 13 (event #2)
>
> That's an interesting idea. I like it and I think it would work. Just
> instead of masking, I'd use bitfields. Or we could just restrict objectid
> to 32-bits and use remaining 32-bits for the next_qid pointer. I know it
> will waste some bits but 32-bits of objectid should provide us with enough
> space to avoid doing full event comparison in most cases

Certainly.
The entire set of objects to compare is going to be limited to 128*128,
so 32bit should be plenty of hash bits.
Simplicity is preferred.

>  - BTW WRT naming I
> find 'qid' somewhat confusing. Can we call it say 'next_bucket' or
> something like that?
>

Sure. If its going to be 32bit, I can just call it next_key for simplicity
and store the next event key instead of the next event bucket.

> > Permission events require special care, but that is the idea of a simple
> > singly linked list using qid's for reading events by insert order and
> > merging by hashed queue.
>
> Why are permission events special in this regard?
>

They are not removed from the head of the queue, so
middle event next_key may need to be updated when they
are removed.

I guess since permission events are not merged, they could
use their own queue. If we do not care about ordering of
permission events and non-permission events, we can treat this
as a priority queue and it will simplify things considerably.
Boosting priority of blocking hooks seems like the right thing to do.
I wonder if we could make that change?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2021-01-27 12:57                   ` Amir Goldstein
@ 2021-01-27 15:15                     ` Jan Kara
  2021-01-27 18:03                       ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2021-01-27 15:15 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Wed 27-01-21 14:57:56, Amir Goldstein wrote:
> On Wed, Jan 27, 2021 at 1:24 PM Jan Kara <jack@suse.cz> wrote:
> > > - With multi queue, high bit of obejctid will be masked for merge compare.
> > > - Instead, they will be used to store the next_qid to read from
> > >
> > > For example:
> > > - event #1 is added to queue 6
> > > - set group->last_qid = 6
> > > - set group->next_qid = 6 (because group->num_events == 1)
> > > - event #2 is added to queue 13
> > > - the next_qid bits of the last event in last_qid (6) queue are set to 13
> > > - set group->last_qid = 13
> > >
> > > - read() checks value of group->next_qid and reads the first event
> > > from queue 6 (event #1)
> > > - event #1 has 13 stored in next_qid bits so set group->next_qid = 13
> > > - read() reads first event from queue 13 (event #2)
> >
> > That's an interesting idea. I like it and I think it would work. Just
> > instead of masking, I'd use bitfields. Or we could just restrict objectid
> > to 32-bits and use remaining 32-bits for the next_qid pointer. I know it
> > will waste some bits but 32-bits of objectid should provide us with enough
> > space to avoid doing full event comparison in most cases
> 
> Certainly.
> The entire set of objects to compare is going to be limited to 128*128,
> so 32bit should be plenty of hash bits.
> Simplicity is preferred.
> 
> >  - BTW WRT naming I
> > find 'qid' somewhat confusing. Can we call it say 'next_bucket' or
> > something like that?
> >
> 
> Sure. If its going to be 32bit, I can just call it next_key for simplicity
> and store the next event key instead of the next event bucket.
> 
> > > Permission events require special care, but that is the idea of a simple
> > > singly linked list using qid's for reading events by insert order and
> > > merging by hashed queue.
> >
> > Why are permission events special in this regard?
> >
> 
> They are not removed from the head of the queue, so
> middle event next_key may need to be updated when they
> are removed.

Oh, you mean the special case when we receive a signal and thus remove
permission event from a notification queue? I forgot about that one and
yes, it needs a special handling...

> I guess since permission events are not merged, they could
> use their own queue. If we do not care about ordering of
> permission events and non-permission events, we can treat this
> as a priority queue and it will simplify things considerably.
> Boosting priority of blocking hooks seems like the right thing to do.
> I wonder if we could make that change?

Yes, permission events are not merged and I'm not aware of any users
actually mixing permission and other events in a notification group. OTOH
I'm somewhat reluctant to reorder events that much. It could break
someone, it could starve notification events, etc. AFAIU the pain with
permission events is updating the ->next_key field in case we want to remove
unreported permission event. Finding previous entry with this scheme is
indeed somewhat painful (we'd have to walk the queue which requires
maintaining 'cur' pointer for every queue). So maybe growing fsnotify_event
by one pointer to contain single linked list for a hash chain would be
simplest in the end? Then removing from the hash chain in the corner case of
tearing permission event out is simple enough...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2021-01-27 15:15                     ` Jan Kara
@ 2021-01-27 18:03                       ` Amir Goldstein
  2021-01-28 10:27                         ` Jan Kara
  0 siblings, 1 reply; 65+ messages in thread
From: Amir Goldstein @ 2021-01-27 18:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Wed, Jan 27, 2021 at 5:15 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 27-01-21 14:57:56, Amir Goldstein wrote:
> > On Wed, Jan 27, 2021 at 1:24 PM Jan Kara <jack@suse.cz> wrote:
> > > > - With multi queue, high bit of obejctid will be masked for merge compare.
> > > > - Instead, they will be used to store the next_qid to read from
> > > >
> > > > For example:
> > > > - event #1 is added to queue 6
> > > > - set group->last_qid = 6
> > > > - set group->next_qid = 6 (because group->num_events == 1)
> > > > - event #2 is added to queue 13
> > > > - the next_qid bits of the last event in last_qid (6) queue are set to 13
> > > > - set group->last_qid = 13
> > > >
> > > > - read() checks value of group->next_qid and reads the first event
> > > > from queue 6 (event #1)
> > > > - event #1 has 13 stored in next_qid bits so set group->next_qid = 13
> > > > - read() reads first event from queue 13 (event #2)
> > >
> > > That's an interesting idea. I like it and I think it would work. Just
> > > instead of masking, I'd use bitfields. Or we could just restrict objectid
> > > to 32-bits and use remaining 32-bits for the next_qid pointer. I know it
> > > will waste some bits but 32-bits of objectid should provide us with enough
> > > space to avoid doing full event comparison in most cases
> >
> > Certainly.
> > The entire set of objects to compare is going to be limited to 128*128,
> > so 32bit should be plenty of hash bits.
> > Simplicity is preferred.
> >
> > >  - BTW WRT naming I
> > > find 'qid' somewhat confusing. Can we call it say 'next_bucket' or
> > > something like that?
> > >
> >
> > Sure. If its going to be 32bit, I can just call it next_key for simplicity
> > and store the next event key instead of the next event bucket.
> >
> > > > Permission events require special care, but that is the idea of a simple
> > > > singly linked list using qid's for reading events by insert order and
> > > > merging by hashed queue.
> > >
> > > Why are permission events special in this regard?
> > >
> >
> > They are not removed from the head of the queue, so
> > middle event next_key may need to be updated when they
> > are removed.
>
> Oh, you mean the special case when we receive a signal and thus remove
> permission event from a notification queue? I forgot about that one and
> yes, it needs a special handling...
>
> > I guess since permission events are not merged, they could
> > use their own queue. If we do not care about ordering of
> > permission events and non-permission events, we can treat this
> > as a priority queue and it will simplify things considerably.
> > Boosting priority of blocking hooks seems like the right thing to do.
> > I wonder if we could make that change?
>
> Yes, permission events are not merged and I'm not aware of any users
> actually mixing permission and other events in a notification group. OTOH
> I'm somewhat reluctant to reorder events that much. It could break
> someone, it could starve notification events, etc. AFAIU the pain with
> permission events is updating the ->next_key field in case we want to remove
> unreported permission event. Finding previous entry with this scheme is
> indeed somewhat painful (we'd have to walk the queue which requires
> maintaining 'cur' pointer for every queue). So maybe growing fsnotify_event
> by one pointer to contain single linked list for a hash chain would be
> simplest in the end? Then removing from the hash chain in the corner case of
> tearing permission event out is simple enough...
>

Better to disable the multi queue for the very uninteresting corner case (mixing
permissions and non permissions) . The simplest thing to do is to enable multi
queue only for FAN_CLASS_NOTIF. I suppose users do not use high priority
classes for non-permission event use case and if they do, they will get less
merged events - no big deal.

The important things we get are:
1. Code remains simple
2. Deterministic CPU usage (linear search is capped to 128 events)
3. In the most common use case of async change listener we can merge
    events on up to 16K unique objects which should be sufficient

I'll try to write this up.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2021-01-27 18:03                       ` Amir Goldstein
@ 2021-01-28 10:27                         ` Jan Kara
  2021-01-28 18:50                           ` Amir Goldstein
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Kara @ 2021-01-28 10:27 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Wed 27-01-21 20:03:23, Amir Goldstein wrote:
> On Wed, Jan 27, 2021 at 5:15 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 27-01-21 14:57:56, Amir Goldstein wrote:
> > > On Wed, Jan 27, 2021 at 1:24 PM Jan Kara <jack@suse.cz> wrote:
> > > > > - With multi queue, high bit of obejctid will be masked for merge compare.
> > > > > - Instead, they will be used to store the next_qid to read from
> > > > >
> > > > > For example:
> > > > > - event #1 is added to queue 6
> > > > > - set group->last_qid = 6
> > > > > - set group->next_qid = 6 (because group->num_events == 1)
> > > > > - event #2 is added to queue 13
> > > > > - the next_qid bits of the last event in last_qid (6) queue are set to 13
> > > > > - set group->last_qid = 13
> > > > >
> > > > > - read() checks value of group->next_qid and reads the first event
> > > > > from queue 6 (event #1)
> > > > > - event #1 has 13 stored in next_qid bits so set group->next_qid = 13
> > > > > - read() reads first event from queue 13 (event #2)
> > > >
> > > > That's an interesting idea. I like it and I think it would work. Just
> > > > instead of masking, I'd use bitfields. Or we could just restrict objectid
> > > > to 32-bits and use remaining 32-bits for the next_qid pointer. I know it
> > > > will waste some bits but 32-bits of objectid should provide us with enough
> > > > space to avoid doing full event comparison in most cases
> > >
> > > Certainly.
> > > The entire set of objects to compare is going to be limited to 128*128,
> > > so 32bit should be plenty of hash bits.
> > > Simplicity is preferred.
> > >
> > > >  - BTW WRT naming I
> > > > find 'qid' somewhat confusing. Can we call it say 'next_bucket' or
> > > > something like that?
> > > >
> > >
> > > Sure. If its going to be 32bit, I can just call it next_key for simplicity
> > > and store the next event key instead of the next event bucket.
> > >
> > > > > Permission events require special care, but that is the idea of a simple
> > > > > singly linked list using qid's for reading events by insert order and
> > > > > merging by hashed queue.
> > > >
> > > > Why are permission events special in this regard?
> > > >
> > >
> > > They are not removed from the head of the queue, so
> > > middle event next_key may need to be updated when they
> > > are removed.
> >
> > Oh, you mean the special case when we receive a signal and thus remove
> > permission event from a notification queue? I forgot about that one and
> > yes, it needs a special handling...
> >
> > > I guess since permission events are not merged, they could
> > > use their own queue. If we do not care about ordering of
> > > permission events and non-permission events, we can treat this
> > > as a priority queue and it will simplify things considerably.
> > > Boosting priority of blocking hooks seems like the right thing to do.
> > > I wonder if we could make that change?
> >
> > Yes, permission events are not merged and I'm not aware of any users
> > actually mixing permission and other events in a notification group. OTOH
> > I'm somewhat reluctant to reorder events that much. It could break
> > someone, it could starve notification events, etc. AFAIU the pain with
> > permission events is updating the ->next_key field in case we want to remove
> > unreported permission event. Finding previous entry with this scheme is
> > indeed somewhat painful (we'd have to walk the queue which requires
> > maintaining 'cur' pointer for every queue). So maybe growing fsnotify_event
> > by one pointer to contain single linked list for a hash chain would be
> > simplest in the end? Then removing from the hash chain in the corner case of
> > tearing permission event out is simple enough...
> >
> 
> Better to disable the multi queue for the very uninteresting corner case (mixing
> permissions and non permissions) . The simplest thing to do is to enable multi
> queue only for FAN_CLASS_NOTIF. I suppose users do not use high priority
> classes for non-permission event use case and if they do, they will get less
> merged events - no big deal.
> 
> The important things we get are:
> 1. Code remains simple
> 2. Deterministic CPU usage (linear search is capped to 128 events)
> 3. In the most common use case of async change listener we can merge
>     events on up to 16K unique objects which should be sufficient
> 
> I'll try to write this up.

OK, sounds fine to me. We can always reconsider if some users come back
complaining...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: fanotify_merge improvements
  2021-01-28 10:27                         ` Jan Kara
@ 2021-01-28 18:50                           ` Amir Goldstein
  0 siblings, 0 replies; 65+ messages in thread
From: Amir Goldstein @ 2021-01-28 18:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Thu, Jan 28, 2021 at 12:27 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 27-01-21 20:03:23, Amir Goldstein wrote:
> > On Wed, Jan 27, 2021 at 5:15 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Wed 27-01-21 14:57:56, Amir Goldstein wrote:
> > > > On Wed, Jan 27, 2021 at 1:24 PM Jan Kara <jack@suse.cz> wrote:
> > > > > > - With multi queue, high bit of obejctid will be masked for merge compare.
> > > > > > - Instead, they will be used to store the next_qid to read from
> > > > > >
> > > > > > For example:
> > > > > > - event #1 is added to queue 6
> > > > > > - set group->last_qid = 6
> > > > > > - set group->next_qid = 6 (because group->num_events == 1)
> > > > > > - event #2 is added to queue 13
> > > > > > - the next_qid bits of the last event in last_qid (6) queue are set to 13
> > > > > > - set group->last_qid = 13
> > > > > >
> > > > > > - read() checks value of group->next_qid and reads the first event
> > > > > > from queue 6 (event #1)
> > > > > > - event #1 has 13 stored in next_qid bits so set group->next_qid = 13
> > > > > > - read() reads first event from queue 13 (event #2)
> > > > >
> > > > > That's an interesting idea. I like it and I think it would work. Just
> > > > > instead of masking, I'd use bitfields. Or we could just restrict objectid
> > > > > to 32-bits and use remaining 32-bits for the next_qid pointer. I know it
> > > > > will waste some bits but 32-bits of objectid should provide us with enough
> > > > > space to avoid doing full event comparison in most cases
> > > >
> > > > Certainly.
> > > > The entire set of objects to compare is going to be limited to 128*128,
> > > > so 32bit should be plenty of hash bits.
> > > > Simplicity is preferred.
> > > >
> > > > >  - BTW WRT naming I
> > > > > find 'qid' somewhat confusing. Can we call it say 'next_bucket' or
> > > > > something like that?
> > > > >
> > > >
> > > > Sure. If its going to be 32bit, I can just call it next_key for simplicity
> > > > and store the next event key instead of the next event bucket.
> > > >
> > > > > > Permission events require special care, but that is the idea of a simple
> > > > > > singly linked list using qid's for reading events by insert order and
> > > > > > merging by hashed queue.
> > > > >
> > > > > Why are permission events special in this regard?
> > > > >
> > > >
> > > > They are not removed from the head of the queue, so
> > > > middle event next_key may need to be updated when they
> > > > are removed.
> > >
> > > Oh, you mean the special case when we receive a signal and thus remove
> > > permission event from a notification queue? I forgot about that one and
> > > yes, it needs a special handling...
> > >
> > > > I guess since permission events are not merged, they could
> > > > use their own queue. If we do not care about ordering of
> > > > permission events and non-permission events, we can treat this
> > > > as a priority queue and it will simplify things considerably.
> > > > Boosting priority of blocking hooks seems like the right thing to do.
> > > > I wonder if we could make that change?
> > >
> > > Yes, permission events are not merged and I'm not aware of any users
> > > actually mixing permission and other events in a notification group. OTOH
> > > I'm somewhat reluctant to reorder events that much. It could break
> > > someone, it could starve notification events, etc. AFAIU the pain with
> > > permission events is updating the ->next_key field in case we want to remove
> > > unreported permission event. Finding previous entry with this scheme is
> > > indeed somewhat painful (we'd have to walk the queue which requires
> > > maintaining 'cur' pointer for every queue). So maybe growing fsnotify_event
> > > by one pointer to contain single linked list for a hash chain would be
> > > simplest in the end? Then removing from the hash chain in the corner case of
> > > tearing permission event out is simple enough...
> > >
> >
> > Better to disable the multi queue for the very uninteresting corner case (mixing
> > permissions and non permissions) . The simplest thing to do is to enable multi
> > queue only for FAN_CLASS_NOTIF. I suppose users do not use high priority
> > classes for non-permission event use case and if they do, they will get less
> > merged events - no big deal.
> >
> > The important things we get are:
> > 1. Code remains simple
> > 2. Deterministic CPU usage (linear search is capped to 128 events)
> > 3. In the most common use case of async change listener we can merge
> >     events on up to 16K unique objects which should be sufficient
> >
> > I'll try to write this up.
>
> OK, sounds fine to me. We can always reconsider if some users come back
> complaining...
>

Pushed that to fanotify_merge branch.
It's even working ;-)

Please let me know if this looks fine so far and I will complete the rest,
test performance and post the patches.

Remaining:
- Let event->key be a hash of all merge compared fields
- Limit linear search per chain

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2021-01-28 18:56 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 01/16] fsnotify: tidy up FS_ and FAN_ constants Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 02/16] fsnotify: factor helpers fsnotify_dentry() and fsnotify_file() Amir Goldstein
2020-02-25 13:46   ` Jan Kara
2020-02-25 14:27     ` Amir Goldstein
2020-02-26 13:59       ` Jan Kara
2020-02-17 13:14 ` [PATCH v2 03/16] fsnotify: funnel all dirent events through fsnotify_name() Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 04/16] fsnotify: use helpers to access data by data_type Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 05/16] fsnotify: simplify arguments passing to fsnotify_parent() Amir Goldstein
2020-02-19 10:50   ` kbuild test robot
2020-02-19 11:11   ` Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 06/16] fsnotify: pass dentry instead of inode for events possible on child Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 07/16] fsnotify: replace inode pointer with tag Amir Goldstein
2020-02-26  8:20   ` Jan Kara
2020-02-26  9:34     ` Amir Goldstein
2020-02-26  8:52   ` Jan Kara
2020-02-17 13:14 ` [PATCH v2 08/16] fanotify: merge duplicate events on parent and child Amir Goldstein
2020-02-26  9:18   ` Jan Kara
2020-02-26 12:14     ` Amir Goldstein
2020-02-26 14:38       ` Jan Kara
2021-01-22 13:59         ` fanotify_merge improvements Amir Goldstein
2021-01-23 13:30           ` Amir Goldstein
2021-01-25 13:01             ` Jan Kara
2021-01-26 16:21               ` Amir Goldstein
2021-01-27 11:24                 ` Jan Kara
2021-01-27 12:57                   ` Amir Goldstein
2021-01-27 15:15                     ` Jan Kara
2021-01-27 18:03                       ` Amir Goldstein
2021-01-28 10:27                         ` Jan Kara
2021-01-28 18:50                           ` Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 09/16] fanotify: fix merging marks masks with FAN_ONDIR Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 10/16] fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's Amir Goldstein
2020-02-26 10:23   ` Jan Kara
2020-02-26 11:53     ` Amir Goldstein
2020-02-26 17:07       ` Jan Kara
2020-02-26 17:50         ` Amir Goldstein
2020-02-27  9:06           ` Amir Goldstein
2020-02-27 11:27             ` Jan Kara
2020-02-27 12:12               ` Amir Goldstein
2020-02-27 13:30                 ` Jan Kara
2020-02-27 14:06                   ` Amir Goldstein
2020-03-01 16:26                     ` Amir Goldstein
2020-03-05 15:49                       ` Jan Kara
2020-03-06 11:19                         ` Amir Goldstein
2020-03-08  7:29                           ` Amir Goldstein
2020-03-18 17:51                             ` Jan Kara
2020-03-18 18:50                               ` Amir Goldstein
2020-03-19  9:30                                 ` Jan Kara
2020-03-19 10:07                                   ` Amir Goldstein
2020-03-30 19:29                                 ` Amir Goldstein
2020-02-27 11:01           ` Jan Kara
2020-02-17 13:14 ` [PATCH v2 12/16] fanotify: record name info for FAN_DIR_MODIFY event Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 13/16] fanotify: report " Amir Goldstein
2020-02-19  9:43   ` kbuild test robot
2020-02-19 10:17   ` kbuild test robot
2020-02-19 11:22   ` Amir Goldstein
2020-04-16 12:16   ` Michael Kerrisk (man-pages)
2020-04-20 15:53     ` Jan Kara
2020-04-20 18:45     ` Amir Goldstein
2020-04-20 18:47       ` Michael Kerrisk (man-pages)
2020-02-17 13:14 ` [PATCH v2 14/16] fanotify: report parent fid + name with FAN_REPORT_NAME Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 15/16] fanotify: refine rules for when name is reported Amir Goldstein
2020-02-17 13:14 ` [BONUS][PATCH v2 16/16] fanotify: support limited functionality for unprivileged users Amir Goldstein
2020-02-20 22:10 ` [PATCH v2 00/16] Fanotify event with name info Matthew Bobrowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).