linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/20] Prep work for fanotify named events
@ 2020-06-12  9:33 Amir Goldstein
  2020-06-12  9:33 ` [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher Amir Goldstein
                   ` (19 more replies)
  0 siblings, 20 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Hi Jan,

As you know, the fanotify named events series grew quite large due to
a lot of prep work I needed to do and some minor bugs and optimizations
I encoutered along the way, including Mel's optimization patch that
needed massive conclict resolving with my series.

Most of this series is harmless re-factoring, including some changes
that were suggested by you in the last posting of fanotify named events.
There are some minor bug fixes that change behavior, hopefully for
the better, like the patch to kernfs. But anyway, there should be no
issue with merging any of those patches independently from the rest of
the work.

Most of this series should be fairly easy to review, with the exception
of last two patches that change the way we store a variable size event
struct.

It would be great if you can provide me feedback as early as possible
about the design choices intoduces herein, such as the "implicit event
flags" infrastructure that is needed for requesting events on child for
sb/mount marks.

I was hoping that we could get those changes out of the way and into
linux-next as early as possible after rc1 to get wider testing coverage,
before we move on to reviewing the new feature.

The full work is available on my github [1] including LTP tests [2]
and man page [3]. More on these when I post the patches.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fanotify_name_fid
[2] https://github.com/amir73il/ltp/commits/fanotify_name_fid
[3] https://github.com/amir73il/man-pages/commits/fanotify_name_fid

Amir Goldstein (19):
  fsnotify: fold fsnotify() call into fsnotify_parent()
  fsnotify: return non const from fsnotify_data_inode()
  nfsd: use fsnotify_data_inode() to get the unlinked inode
  kernfs: do not call fsnotify() with name without a parent
  inotify: do not use objectid when comparing events
  fanotify: create overflow event type
  fanotify: break up fanotify_alloc_event()
  fsnotify: pass dir argument to handle_event() callback
  fanotify: generalize the handling of extra event flags
  fanotify: generalize merge logic of events on dir
  fanotify: distinguish between fid encode error and null fid
  fanotify: generalize test for FAN_REPORT_FID
  fanotify: mask out special event flags from ignored mask
  fanotify: prepare for implicit event flags in mark mask
  fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir
    marks
  fanotify: remove event FAN_DIR_MODIFY
  fsnotify: add object type "child" to object type iterator
  fanotify: move event name into fanotify_fh
  fanotify: no external fh buffer in fanotify_name_event

Mel Gorman (1):
  fsnotify: Rearrange fast path to minimise overhead when there is no
    watcher

 fs/kernfs/file.c                     |   2 +-
 fs/nfsd/filecache.c                  |   8 +-
 fs/notify/dnotify/dnotify.c          |   8 +-
 fs/notify/fanotify/fanotify.c        | 319 +++++++++++++++------------
 fs/notify/fanotify/fanotify.h        |  81 +++++--
 fs/notify/fanotify/fanotify_user.c   | 108 +++++----
 fs/notify/fsnotify.c                 |  82 ++++---
 fs/notify/inotify/inotify.h          |   6 +-
 fs/notify/inotify/inotify_fsnotify.c |  11 +-
 fs/notify/inotify/inotify_user.c     |   4 +-
 include/linux/fanotify.h             |   6 +-
 include/linux/fsnotify.h             |  44 ++--
 include/linux/fsnotify_backend.h     |  35 ++-
 include/uapi/linux/fanotify.h        |   1 -
 kernel/audit_fsnotify.c              |  10 +-
 kernel/audit_tree.c                  |   6 +-
 kernel/audit_watch.c                 |   6 +-
 17 files changed, 438 insertions(+), 299 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-07-03 14:03   ` Jan Kara
  2020-06-12  9:33 ` [PATCH 02/20] fsnotify: fold fsnotify() call into fsnotify_parent() Amir Goldstein
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Mel Gorman

From: Mel Gorman <mgorman@techsingularity.net>

The fsnotify paths are trivial to hit even when there are no watchers and
they are surprisingly expensive. For example, every successful vfs_write()
hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless
FMODE_NONOTIFY is set which is an internal flag invisible to userspace.
As it stands, fsnotify_parent is a guaranteed functional call even if there
are no watchers and fsnotify() does a substantial amount of unnecessary
work before it checks if there are any watchers. A perf profile showed
that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the
total samples taken in that function during a test. This patch rearranges
the fast paths to reduce the amount of work done when there are no
watchers.

The test motivating this was "perf bench sched messaging --pipe". Despite
the fact the pipes are anonymous, fsnotify is still called a lot and
the overhead is noticeable even though it's completely pointless. It's
likely the overhead is negligible for real IO so this is an extreme
example. This is a comparison of hackbench using processes and pipes on
a 1-socket machine with 8 CPU threads without fanotify watchers.

                              5.7.0                  5.7.0
                            vanilla      fastfsnotify-v1r1
Amean     1       0.4837 (   0.00%)      0.4630 *   4.27%*
Amean     3       1.5447 (   0.00%)      1.4557 (   5.76%)
Amean     5       2.6037 (   0.00%)      2.4363 (   6.43%)
Amean     7       3.5987 (   0.00%)      3.4757 (   3.42%)
Amean     12      5.8267 (   0.00%)      5.6983 (   2.20%)
Amean     18      8.4400 (   0.00%)      8.1327 (   3.64%)
Amean     24     11.0187 (   0.00%)     10.0290 *   8.98%*
Amean     30     13.1013 (   0.00%)     12.8510 (   1.91%)
Amean     32     13.9190 (   0.00%)     13.2410 (   4.87%)

                       5.7.0       5.7.0
                     vanilla fastfsnotify-v1r1
Duration User         157.05      152.79
Duration System      1279.98     1219.32
Duration Elapsed      182.81      174.52

This is showing that the latencies are improved by roughly 2-9%. The
variability is not shown but some of these results are within the noise
as this workload heavily overloads the machine. That said, the system CPU
usage is reduced by quite a bit so it makes sense to avoid the overhead
even if it is a bit tricky to detect at times. A perf profile of just 1
group of tasks showed that 5.14% of samples taken were in either fsnotify()
or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify,
mostly function entry and the initial check for watchers.  The check for
watchers is complicated enough that inlining it may be controversial.

[Amir] Slightly simplify with mnt_or_sb_mask => marks_mask

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fsnotify.c             | 27 +++++++++++++++------------
 include/linux/fsnotify.h         | 10 ++++++++++
 include/linux/fsnotify_backend.h |  4 ++--
 3 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 72d332ce8e12..d59a58d10b84 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
 }
 
 /* Notify this dentry's parent about a child's events. */
-int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
+int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 		    int data_type)
 {
 	struct dentry *parent;
@@ -174,7 +174,7 @@ int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(fsnotify_parent);
+EXPORT_SYMBOL_GPL(__fsnotify_parent);
 
 static int send_to_group(struct inode *to_tell,
 			 __u32 mask, const void *data,
@@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	struct fsnotify_iter_info iter_info = {};
 	struct super_block *sb = to_tell->i_sb;
 	struct mount *mnt = NULL;
-	__u32 mnt_or_sb_mask = sb->s_fsnotify_mask;
 	int ret = 0;
-	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
+	__u32 test_mask, marks_mask;
 
-	if (path) {
+	if (path)
 		mnt = real_mount(path->mnt);
-		mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
-	}
-	/* An event "on child" is not intended for a mount/sb mark */
-	if (mask & FS_EVENT_ON_CHILD)
-		mnt_or_sb_mask = 0;
 
 	/*
 	 * Optimization: srcu_read_lock() has a memory barrier which can
@@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks &&
 	    (!mnt || !mnt->mnt_fsnotify_marks))
 		return 0;
+
+	/* An event "on child" is not intended for a mount/sb mark */
+	marks_mask = to_tell->i_fsnotify_mask;
+	if (!(mask & FS_EVENT_ON_CHILD)) {
+		marks_mask |= sb->s_fsnotify_mask;
+		if (mnt)
+			marks_mask |= mnt->mnt_fsnotify_mask;
+	}
+
 	/*
 	 * if this is a modify event we may need to clear the ignored masks
 	 * otherwise return if neither the inode nor the vfsmount/sb care about
 	 * this type of event.
 	 */
-	if (!(mask & FS_MODIFY) &&
-	    !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask)))
+	test_mask = (mask & ALL_FSNOTIFY_EVENTS);
+	if (!(mask & FS_MODIFY) && !(test_mask & marks_mask))
 		return 0;
 
 	iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu);
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 5ab28f6c7d26..508f6bb0b06b 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -44,6 +44,16 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
 	fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0);
 }
 
+/* Notify this dentry's parent about a child's events. */
+static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
+				  const void *data, int data_type)
+{
+	if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
+		return 0;
+
+	return __fsnotify_parent(dentry, mask, data, data_type);
+}
+
 /*
  * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when
  * an event is on a file/dentry.
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index f0c506405b54..1626fa7d10ff 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -379,7 +379,7 @@ struct fsnotify_mark {
 /* main fsnotify call to send events */
 extern int fsnotify(struct inode *to_tell, __u32 mask, const void *data,
 		    int data_type, const struct qstr *name, u32 cookie);
-extern int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
+extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 			   int data_type);
 extern void __fsnotify_inode_delete(struct inode *inode);
 extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt);
@@ -541,7 +541,7 @@ static inline int fsnotify(struct inode *to_tell, __u32 mask, const void *data,
 	return 0;
 }
 
-static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
+static inline int __fsnotify_parent(struct dentry *dentry, __u32 mask,
 				  const void *data, int data_type)
 {
 	return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/20] fsnotify: fold fsnotify() call into fsnotify_parent()
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
  2020-06-12  9:33 ` [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 03/20] fsnotify: return non const from fsnotify_data_inode() Amir Goldstein
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

All (two) callers of fsnotify_parent() also call fsnotify() to notify
the child inode. Move the second fsnotify() call into fsnotify_parent().

This will allow more flexibility in making decisions about which of the
two event falvors should be sent.

Using 'goto notify_child' in the inline helper seems a bit strange, but
it mimics the code in __fsnotify_parent() for clarity and the goto
pattern will become less strage after following patches are applied.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fsnotify.c     | 27 ++++++++++++++++++---------
 include/linux/fsnotify.h | 34 ++++++++++++++--------------------
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index d59a58d10b84..30628a72ca01 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -142,16 +142,20 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
 	spin_unlock(&inode->i_lock);
 }
 
-/* Notify this dentry's parent about a child's events. */
+/*
+ * Notify this dentry's parent about a child's events with child name info
+ * if parent is watching.
+ * Notify also the child without name info if child inode is watching.
+ */
 int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
-		    int data_type)
+		      int data_type)
 {
 	struct dentry *parent;
 	struct inode *p_inode;
 	int ret = 0;
 
 	if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
-		return 0;
+		goto notify_child;
 
 	parent = dget_parent(dentry);
 	p_inode = parent->d_inode;
@@ -161,18 +165,23 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 	} else if (p_inode->i_fsnotify_mask & mask & ALL_FSNOTIFY_EVENTS) {
 		struct name_snapshot name;
 
-		/* we are notifying a parent so come up with the new mask which
-		 * specifies these are events which came from a child. */
-		mask |= FS_EVENT_ON_CHILD;
-
+		/*
+		 * We are notifying a parent, so set a flag in mask to inform
+		 * backend that event has information about a child entry.
+		 */
 		take_dentry_name_snapshot(&name, dentry);
-		ret = fsnotify(p_inode, mask, data, data_type, &name.name, 0);
+		ret = fsnotify(p_inode, mask | FS_EVENT_ON_CHILD, data,
+			       data_type, &name.name, 0);
 		release_dentry_name_snapshot(&name);
 	}
 
 	dput(parent);
 
-	return ret;
+	if (ret)
+		return ret;
+
+notify_child:
+	return fsnotify(d_inode(dentry), mask, data, data_type, NULL, 0);
 }
 EXPORT_SYMBOL_GPL(__fsnotify_parent);
 
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 508f6bb0b06b..e73ae6117a61 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -48,44 +48,37 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
 static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
 				  const void *data, int data_type)
 {
+	struct inode *inode = d_inode(dentry);
+
+	if (S_ISDIR(inode->i_mode))
+		mask |= FS_ISDIR;
+
 	if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
-		return 0;
+		goto notify_child;
 
 	return __fsnotify_parent(dentry, mask, data, data_type);
+
+notify_child:
+	return fsnotify(inode, mask, data, data_type, NULL, 0);
 }
 
 /*
- * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when
- * an event is on a file/dentry.
+ * Simple wrappers to consolidate calls to fsnotify_parent() when an event
+ * is on a file/dentry.
  */
 static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 {
-	struct inode *inode = d_inode(dentry);
-
-	if (S_ISDIR(inode->i_mode))
-		mask |= FS_ISDIR;
-
-	fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+	fsnotify_parent(dentry, mask, d_inode(dentry), FSNOTIFY_EVENT_INODE);
 }
 
 static inline int fsnotify_file(struct file *file, __u32 mask)
 {
 	const struct path *path = &file->f_path;
-	struct inode *inode = file_inode(file);
-	int ret;
 
 	if (file->f_mode & FMODE_NONOTIFY)
 		return 0;
 
-	if (S_ISDIR(inode->i_mode))
-		mask |= FS_ISDIR;
-
-	ret = fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
-	if (ret)
-		return ret;
-
-	return fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
+	return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
 }
 
 /* Simple call site for access decisions */
@@ -158,6 +151,7 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
 
 	if (source)
 		fsnotify(source, mask, source, FSNOTIFY_EVENT_INODE, NULL, 0);
+
 	audit_inode_child(new_dir, moved, AUDIT_TYPE_CHILD_CREATE);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/20] fsnotify: return non const from fsnotify_data_inode()
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
  2020-06-12  9:33 ` [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher Amir Goldstein
  2020-06-12  9:33 ` [PATCH 02/20] fsnotify: fold fsnotify() call into fsnotify_parent() Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 04/20] nfsd: use fsnotify_data_inode() to get the unlinked inode Amir Goldstein
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Return non const inode pointer from fsnotify_data_inode().
None of the fsnotify hooks pass const inode pointer as data and
callers often need to cast to a non const pointer.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c    | 2 +-
 include/linux/fsnotify_backend.h | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 85eda539b35f..d9fc83dd994a 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -341,7 +341,7 @@ static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask,
 	if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS)
 		return to_tell;
 
-	return (struct inode *)fsnotify_data_inode(data, data_type);
+	return fsnotify_data_inode(data, data_type);
 }
 
 struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 1626fa7d10ff..97300f3b8ff0 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -220,12 +220,11 @@ enum fsnotify_data_type {
 	FSNOTIFY_EVENT_INODE,
 };
 
-static inline const struct inode *fsnotify_data_inode(const void *data,
-						      int data_type)
+static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
 {
 	switch (data_type) {
 	case FSNOTIFY_EVENT_INODE:
-		return data;
+		return (struct inode *)data;
 	case FSNOTIFY_EVENT_PATH:
 		return d_inode(((const struct path *)data)->dentry);
 	default:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/20] nfsd: use fsnotify_data_inode() to get the unlinked inode
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (2 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 03/20] fsnotify: return non const from fsnotify_data_inode() Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12 10:25   ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 05/20] kernfs: do not call fsnotify() with name without a parent Amir Goldstein
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

The inode argument to handle_event() is about to become obsolete.
Return non const inode pointer from fsnotify_data_inode(), fsnotify
hooks do not pass const inode pointer as data.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/nfsd/filecache.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 82198d747c4c..ace8e5c30952 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -599,11 +599,13 @@ static struct notifier_block nfsd_file_lease_notifier = {
 
 static int
 nfsd_file_fsnotify_handle_event(struct fsnotify_group *group,
-				struct inode *inode,
+				struct inode *to_tell,
 				u32 mask, const void *data, int data_type,
 				const struct qstr *file_name, u32 cookie,
 				struct fsnotify_iter_info *iter_info)
 {
+	struct inode *inode = fsnotify_data_inode(data, data_type);
+
 	trace_nfsd_file_fsnotify_handle_event(inode, mask);
 
 	/* Should be no marks on non-regular files */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/20] kernfs: do not call fsnotify() with name without a parent
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (3 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 04/20] nfsd: use fsnotify_data_inode() to get the unlinked inode Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-29 13:27   ` Tejun Heo
  2020-06-29 16:11   ` Greg Kroah-Hartman
  2020-06-12  9:33 ` [PATCH 06/20] inotify: do not use objectid when comparing events Amir Goldstein
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Tejun Heo, Greg Kroah-Hartman

When creating an FS_MODIFY event on inode itself (not on parent)
the file_name argument should be NULL.

The change to send a non NULL name to inode itself was done on purpuse
as part of another commit, as Tejun writes: "...While at it, supply the
target file name to fsnotify() from kernfs_node->name.".

But this is wrong practice and inconsistent with inotify behavior when
watching a single file.  When a child is being watched (as opposed to the
parent directory) the inotify event should contain the watch descriptor,
but not the file name.

Fixes: df6a58c5c5aa ("kernfs: don't depend on d_find_any_alias()...")
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/kernfs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 06b342d8462b..e23b3f62483c 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -912,7 +912,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
 		}
 
 		fsnotify(inode, FS_MODIFY, inode, FSNOTIFY_EVENT_INODE,
-			 &name, 0);
+			 NULL, 0);
 		iput(inode);
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/20] inotify: do not use objectid when comparing events
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (4 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 05/20] kernfs: do not call fsnotify() with name without a parent Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 07/20] fanotify: create overflow event type Amir Goldstein
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

inotify's event->wd is the object identifier.
Compare that instead of the common fsnotidy event objectid, so
we can get rid of the objectid field later.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/inotify/inotify_fsnotify.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 2ebc89047153..9b481460a2dc 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -39,7 +39,7 @@ static bool event_compare(struct fsnotify_event *old_fsn,
 	if (old->mask & FS_IN_IGNORED)
 		return false;
 	if ((old->mask == new->mask) &&
-	    (old_fsn->objectid == new_fsn->objectid) &&
+	    (old->wd == new->wd) &&
 	    (old->name_len == new->name_len) &&
 	    (!old->name_len || !strcmp(old->name, new->name)))
 		return true;
@@ -116,7 +116,7 @@ int inotify_handle_event(struct fsnotify_group *group,
 		mask &= ~IN_ISDIR;
 
 	fsn_event = &event->fse;
-	fsnotify_init_event(fsn_event, (unsigned long)inode);
+	fsnotify_init_event(fsn_event, 0);
 	event->mask = mask;
 	event->wd = i_mark->wd;
 	event->sync_cookie = cookie;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/20] fanotify: create overflow event type
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (5 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 06/20] inotify: do not use objectid when comparing events Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 08/20] fanotify: break up fanotify_alloc_event() Amir Goldstein
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

The special overflow event is allocated as struct fanotify_path_event,
but with a null path.

Use a special event type to identify the overflow event, so the helper
fanotify_has_event_path() will always indicate a non null path.

Allocating the overflow event doesn't need any of the fancy stuff in
fanotify_alloc_event(), so create a simplified helper for allocating the
overflow event.

There is also no need to store and report the pid with an overflow event.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 27 +++++++++++----------------
 fs/notify/fanotify/fanotify.h      | 15 +++++++++------
 fs/notify/fanotify/fanotify_user.c | 21 ++++++++++++++++-----
 3 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index d9fc83dd994a..921ff05e1877 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -344,11 +344,11 @@ static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask,
 	return fsnotify_data_inode(data, data_type);
 }
 
-struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
-					    struct inode *inode, u32 mask,
-					    const void *data, int data_type,
-					    const struct qstr *file_name,
-					    __kernel_fsid_t *fsid)
+static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
+						struct inode *inode, u32 mask,
+						const void *data, int data_type,
+						const struct qstr *file_name,
+						__kernel_fsid_t *fsid)
 {
 	struct fanotify_event *event = NULL;
 	struct fanotify_fid_event *ffe = NULL;
@@ -426,8 +426,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	 * event queue, so event reported on parent is merged with event
 	 * reported on child when both directory and child watches exist.
 	 */
-	fsnotify_init_event(&event->fse, (unsigned long)id);
-	event->mask = mask;
+	fanotify_init_event(event, (unsigned long)id, mask);
 	if (FAN_GROUP_FLAG(group, FAN_REPORT_TID))
 		event->pid = get_pid(task_pid(current));
 	else
@@ -443,15 +442,8 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 		fanotify_encode_fh(fanotify_event_dir_fh(event), id, gfp);
 
 	if (fanotify_event_has_path(event)) {
-		struct path *p = fanotify_event_path(event);
-
-		if (path) {
-			*p = *path;
-			path_get(path);
-		} else {
-			p->mnt = NULL;
-			p->dentry = NULL;
-		}
+		*fanotify_event_path(event) = *path;
+		path_get(path);
 	}
 out:
 	memalloc_unuse_memcg();
@@ -640,6 +632,9 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
 	case FANOTIFY_EVENT_TYPE_FID_NAME:
 		fanotify_free_name_event(event);
 		break;
+	case FANOTIFY_EVENT_TYPE_OVERFLOW:
+		kfree(event);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 	}
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 8ce7ccfc4b0d..1b2a3bbe6008 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -63,6 +63,7 @@ enum fanotify_event_type {
 	FANOTIFY_EVENT_TYPE_FID_NAME, /* variable length */
 	FANOTIFY_EVENT_TYPE_PATH,
 	FANOTIFY_EVENT_TYPE_PATH_PERM,
+	FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
 };
 
 struct fanotify_event {
@@ -72,6 +73,14 @@ struct fanotify_event {
 	struct pid *pid;
 };
 
+static inline void fanotify_init_event(struct fanotify_event *event,
+				       unsigned long id, u32 mask)
+{
+	fsnotify_init_event(&event->fse, id);
+	event->mask = mask;
+	event->pid = NULL;
+}
+
 struct fanotify_fid_event {
 	struct fanotify_event fae;
 	__kernel_fsid_t fsid;
@@ -202,9 +211,3 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event)
 	else
 		return NULL;
 }
-
-struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
-					    struct inode *inode, u32 mask,
-					    const void *data, int data_type,
-					    const struct qstr *file_name,
-					    __kernel_fsid_t *fsid);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 63b5dffdca9e..8f3c70873598 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -831,13 +831,26 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group,
 				 FSNOTIFY_OBJ_TYPE_INODE, mask, flags, fsid);
 }
 
+static struct fsnotify_event *fanotify_alloc_overflow_event(void)
+{
+	struct fanotify_event *oevent;
+
+	oevent = kmalloc(sizeof(*oevent), GFP_KERNEL_ACCOUNT);
+	if (!oevent)
+		return NULL;
+
+	fanotify_init_event(oevent, 0, FS_Q_OVERFLOW);
+	oevent->type = FANOTIFY_EVENT_TYPE_OVERFLOW;
+
+	return &oevent->fse;
+}
+
 /* fanotify syscalls */
 SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 {
 	struct fsnotify_group *group;
 	int f_flags, fd;
 	struct user_struct *user;
-	struct fanotify_event *oevent;
 
 	pr_debug("%s: flags=%x event_f_flags=%x\n",
 		 __func__, flags, event_f_flags);
@@ -892,13 +905,11 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	atomic_inc(&user->fanotify_listeners);
 	group->memcg = get_mem_cgroup_from_mm(current->mm);
 
-	oevent = fanotify_alloc_event(group, NULL, FS_Q_OVERFLOW, NULL,
-				      FSNOTIFY_EVENT_NONE, NULL, NULL);
-	if (unlikely(!oevent)) {
+	group->overflow_event = fanotify_alloc_overflow_event();
+	if (unlikely(!group->overflow_event)) {
 		fd = -ENOMEM;
 		goto out_destroy_group;
 	}
-	group->overflow_event = &oevent->fse;
 
 	if (force_o_largefile())
 		event_f_flags |= O_LARGEFILE;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/20] fanotify: break up fanotify_alloc_event()
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (6 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 07/20] fanotify: create overflow event type Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 09/20] fsnotify: pass dir argument to handle_event() callback Amir Goldstein
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Break up fanotify_alloc_event() into helpers by event struct type.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c | 146 ++++++++++++++++++++--------------
 1 file changed, 85 insertions(+), 61 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 921ff05e1877..c4ada3501014 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -344,6 +344,77 @@ static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask,
 	return fsnotify_data_inode(data, data_type);
 }
 
+struct fanotify_event *fanotify_alloc_path_event(const struct path *path,
+						 gfp_t gfp)
+{
+	struct fanotify_path_event *pevent;
+
+	pevent = kmem_cache_alloc(fanotify_path_event_cachep, gfp);
+	if (!pevent)
+		return NULL;
+
+	pevent->fae.type = FANOTIFY_EVENT_TYPE_PATH;
+	pevent->path = *path;
+	path_get(path);
+
+	return &pevent->fae;
+}
+
+struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
+						 gfp_t gfp)
+{
+	struct fanotify_perm_event *pevent;
+
+	pevent = kmem_cache_alloc(fanotify_perm_event_cachep, gfp);
+	if (!pevent)
+		return NULL;
+
+	pevent->fae.type = FANOTIFY_EVENT_TYPE_PATH_PERM;
+	pevent->response = 0;
+	pevent->state = FAN_EVENT_INIT;
+	pevent->path = *path;
+	path_get(path);
+
+	return &pevent->fae;
+}
+
+struct fanotify_event *fanotify_alloc_fid_event(struct inode *id,
+						__kernel_fsid_t *fsid,
+						gfp_t gfp)
+{
+	struct fanotify_fid_event *ffe;
+
+	ffe = kmem_cache_alloc(fanotify_fid_event_cachep, gfp);
+	if (!ffe)
+		return NULL;
+
+	ffe->fae.type = FANOTIFY_EVENT_TYPE_FID;
+	ffe->fsid = *fsid;
+	fanotify_encode_fh(&ffe->object_fh, id, gfp);
+
+	return &ffe->fae;
+}
+
+struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
+						 __kernel_fsid_t *fsid,
+						 const struct qstr *file_name,
+						 gfp_t gfp)
+{
+	struct fanotify_name_event *fne;
+
+	fne = kmalloc(sizeof(*fne) + file_name->len + 1, gfp);
+	if (!fne)
+		return NULL;
+
+	fne->fae.type = FANOTIFY_EVENT_TYPE_FID_NAME;
+	fne->fsid = *fsid;
+	fanotify_encode_fh(&fne->dir_fh, id, gfp);
+	fne->name_len = file_name->len;
+	strcpy(fne->name, file_name->name);
+
+	return &fne->fae;
+}
+
 static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 						struct inode *inode, u32 mask,
 						const void *data, int data_type,
@@ -351,8 +422,6 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 						__kernel_fsid_t *fsid)
 {
 	struct fanotify_event *event = NULL;
-	struct fanotify_fid_event *ffe = NULL;
-	struct fanotify_name_event *fne = NULL;
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
 	struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
 	const struct path *path = fsnotify_data_path(data, data_type);
@@ -372,55 +441,23 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	memalloc_use_memcg(group->memcg);
 
 	if (fanotify_is_perm_event(mask)) {
-		struct fanotify_perm_event *pevent;
-
-		pevent = kmem_cache_alloc(fanotify_perm_event_cachep, gfp);
-		if (!pevent)
-			goto out;
-
-		event = &pevent->fae;
-		event->type = FANOTIFY_EVENT_TYPE_PATH_PERM;
-		pevent->response = 0;
-		pevent->state = FAN_EVENT_INIT;
-		goto init;
-	}
-
-	/*
-	 * For FAN_DIR_MODIFY event, we report the fid of the directory and
-	 * the name of the modified entry.
-	 * Allocate an fanotify_name_event struct and copy the name.
-	 */
-	if (mask & FAN_DIR_MODIFY && !(WARN_ON_ONCE(!file_name))) {
-		fne = kmalloc(sizeof(*fne) + file_name->len + 1, gfp);
-		if (!fne)
-			goto out;
-
-		event = &fne->fae;
-		event->type = FANOTIFY_EVENT_TYPE_FID_NAME;
-		fne->name_len = file_name->len;
-		strcpy(fne->name, file_name->name);
-		goto init;
-	}
-
-	if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
-		ffe = kmem_cache_alloc(fanotify_fid_event_cachep, gfp);
-		if (!ffe)
-			goto out;
-
-		event = &ffe->fae;
-		event->type = FANOTIFY_EVENT_TYPE_FID;
+		event = fanotify_alloc_perm_event(path, gfp);
+	} else if (mask & FAN_DIR_MODIFY && !(WARN_ON_ONCE(!file_name))) {
+		/*
+		 * For FAN_DIR_MODIFY event, we report the fid of the directory
+		 * and the name of the modified entry.
+		 * Allocate an fanotify_name_event struct and copy the name.
+		 */
+		event = fanotify_alloc_name_event(id, fsid, file_name, gfp);
+	} else if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
+		event = fanotify_alloc_fid_event(id, fsid, gfp);
 	} else {
-		struct fanotify_path_event *pevent;
-
-		pevent = kmem_cache_alloc(fanotify_path_event_cachep, gfp);
-		if (!pevent)
-			goto out;
-
-		event = &pevent->fae;
-		event->type = FANOTIFY_EVENT_TYPE_PATH;
+		event = fanotify_alloc_path_event(path, gfp);
 	}
 
-init:
+	if (!event)
+		goto out;
+
 	/*
 	 * Use the victim inode instead of the watching inode as the id for
 	 * event queue, so event reported on parent is merged with event
@@ -432,19 +469,6 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	else
 		event->pid = get_pid(task_tgid(current));
 
-	if (fsid && fanotify_event_fsid(event))
-		*fanotify_event_fsid(event) = *fsid;
-
-	if (fanotify_event_object_fh(event))
-		fanotify_encode_fh(fanotify_event_object_fh(event), id, gfp);
-
-	if (fanotify_event_dir_fh(event))
-		fanotify_encode_fh(fanotify_event_dir_fh(event), id, gfp);
-
-	if (fanotify_event_has_path(event)) {
-		*fanotify_event_path(event) = *path;
-		path_get(path);
-	}
 out:
 	memalloc_unuse_memcg();
 	return event;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/20] fsnotify: pass dir argument to handle_event() callback
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (7 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 08/20] fanotify: break up fanotify_alloc_event() Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-07-03 14:49   ` Jan Kara
  2020-06-12  9:33 ` [PATCH 10/20] fanotify: generalize the handling of extra event flags Amir Goldstein
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

The 'inode' argument to handle_event(), sometimes referred to as
'to_tell' is somewhat obsolete.
It is a remnant from the times when a group could only have an inode mark
associated with an event.

We now pass an iter_info array to the callback, with all marks associated
with an event.

Most backends ignore this argument, with two expections:
1. dnotify uses it for sanity check that event is on directory
2. fanotify uses it to report fid of directory on directory entry
   modification events

Remove the 'inode' argument and add a 'dir' argument.
The callback function signature is deliberately changed, because
the meaning of the argument has changed and the arguments have
been documented.

The 'dir' argument is NULL when "sending" to a non-dir inode.
When 'file_name' argument is non NULL, 'dir' is always referring to
the directory that the 'file_name' entry belongs to.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/nfsd/filecache.c                  |  6 +++---
 fs/notify/dnotify/dnotify.c          |  8 ++++----
 fs/notify/fanotify/fanotify.c        | 23 +++++++++++------------
 fs/notify/fsnotify.c                 | 26 ++++++++++++--------------
 fs/notify/inotify/inotify.h          |  6 +++---
 fs/notify/inotify/inotify_fsnotify.c |  7 +++----
 fs/notify/inotify/inotify_user.c     |  4 ++--
 include/linux/fsnotify_backend.h     | 19 ++++++++++++++++---
 kernel/audit_fsnotify.c              | 10 +++++-----
 kernel/audit_tree.c                  |  6 +++---
 kernel/audit_watch.c                 |  6 +++---
 11 files changed, 65 insertions(+), 56 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ace8e5c30952..bbc7892d2928 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -598,9 +598,9 @@ static struct notifier_block nfsd_file_lease_notifier = {
 };
 
 static int
-nfsd_file_fsnotify_handle_event(struct fsnotify_group *group,
-				struct inode *to_tell,
-				u32 mask, const void *data, int data_type,
+nfsd_file_fsnotify_handle_event(struct fsnotify_group *group, u32 mask,
+				const void *data, int data_type,
+				struct inode *dir,
 				const struct qstr *file_name, u32 cookie,
 				struct fsnotify_iter_info *iter_info)
 {
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 7a42c2ebe28d..608c3e70e81f 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -70,9 +70,9 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark)
  * destroy the dnotify struct if it was not registered to receive multiple
  * events.
  */
-static int dnotify_handle_event(struct fsnotify_group *group,
-				struct inode *inode,
-				u32 mask, const void *data, int data_type,
+static int dnotify_handle_event(struct fsnotify_group *group, u32 mask,
+				const void *data, int data_type,
+				struct inode *dir,
 				const struct qstr *file_name, u32 cookie,
 				struct fsnotify_iter_info *iter_info)
 {
@@ -84,7 +84,7 @@ static int dnotify_handle_event(struct fsnotify_group *group,
 	__u32 test_mask = mask & ~FS_EVENT_ON_CHILD;
 
 	/* not a dir, dnotify doesn't care */
-	if (!S_ISDIR(inode->i_mode))
+	if (!dir)
 		return 0;
 
 	if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info)))
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index c4ada3501014..e68a9fad98bd 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -335,11 +335,11 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
  * FS_ATTRIB reports the child inode even if reported on a watched parent.
  * FS_CREATE reports the modified dir inode and not the created inode.
  */
-static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask,
-					const void *data, int data_type)
+static struct inode *fanotify_fid_inode(u32 event_mask, const void *data,
+					int data_type, struct inode *dir)
 {
 	if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS)
-		return to_tell;
+		return dir;
 
 	return fsnotify_data_inode(data, data_type);
 }
@@ -416,14 +416,14 @@ struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
 }
 
 static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
-						struct inode *inode, u32 mask,
-						const void *data, int data_type,
+						u32 mask, const void *data,
+						int data_type, struct inode *dir,
 						const struct qstr *file_name,
 						__kernel_fsid_t *fsid)
 {
 	struct fanotify_event *event = NULL;
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
-	struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
+	struct inode *id = fanotify_fid_inode(mask, data, data_type, dir);
 	const struct path *path = fsnotify_data_path(data, data_type);
 
 	/*
@@ -507,9 +507,9 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info)
 	return fsid;
 }
 
-static int fanotify_handle_event(struct fsnotify_group *group,
-				 struct inode *inode,
-				 u32 mask, const void *data, int data_type,
+static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
+				 const void *data, int data_type,
+				 struct inode *dir,
 				 const struct qstr *file_name, u32 cookie,
 				 struct fsnotify_iter_info *iter_info)
 {
@@ -546,8 +546,7 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 	if (!mask)
 		return 0;
 
-	pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode,
-		 mask);
+	pr_debug("%s: group=%p mask=%x\n", __func__, group, mask);
 
 	if (fanotify_is_perm_event(mask)) {
 		/*
@@ -565,7 +564,7 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 			return 0;
 	}
 
-	event = fanotify_alloc_event(group, inode, mask, data, data_type,
+	event = fanotify_alloc_event(group, mask, data, data_type, dir,
 				     file_name, &fsid);
 	ret = -ENOMEM;
 	if (unlikely(!event)) {
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 30628a72ca01..e05f3b2cf664 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -185,11 +185,9 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 }
 EXPORT_SYMBOL_GPL(__fsnotify_parent);
 
-static int send_to_group(struct inode *to_tell,
-			 __u32 mask, const void *data,
-			 int data_is, u32 cookie,
-			 const struct qstr *file_name,
-			 struct fsnotify_iter_info *iter_info)
+static int send_to_group(__u32 mask, const void *data, int data_type,
+			 struct inode *dir, const struct qstr *file_name,
+			 u32 cookie, struct fsnotify_iter_info *iter_info)
 {
 	struct fsnotify_group *group = NULL;
 	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
@@ -225,15 +223,14 @@ static int send_to_group(struct inode *to_tell,
 		}
 	}
 
-	pr_debug("%s: group=%p to_tell=%p mask=%x marks_mask=%x marks_ignored_mask=%x"
-		 " data=%p data_is=%d cookie=%d\n",
-		 __func__, group, to_tell, mask, marks_mask, marks_ignored_mask,
-		 data, data_is, cookie);
+	pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignored_mask=%x data=%p data_type=%d dir=%p cookie=%d\n",
+		 __func__, group, mask, marks_mask, marks_ignored_mask,
+		 data, data_type, dir, cookie);
 
 	if (!(test_mask & marks_mask & ~marks_ignored_mask))
 		return 0;
 
-	return group->ops->handle_event(group, to_tell, mask, data, data_is,
+	return group->ops->handle_event(group, mask, data, data_type, dir,
 					file_name, cookie, iter_info);
 }
 
@@ -317,12 +314,13 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info)
  * out to all of the registered fsnotify_group.  Those groups can then use the
  * notification event in whatever means they feel necessary.
  */
-int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
+int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type,
 	     const struct qstr *file_name, u32 cookie)
 {
-	const struct path *path = fsnotify_data_path(data, data_is);
+	const struct path *path = fsnotify_data_path(data, data_type);
 	struct fsnotify_iter_info iter_info = {};
 	struct super_block *sb = to_tell->i_sb;
+	struct inode *dir = S_ISDIR(to_tell->i_mode) ? to_tell : NULL;
 	struct mount *mnt = NULL;
 	int ret = 0;
 	__u32 test_mask, marks_mask;
@@ -375,8 +373,8 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	 * That's why this traversal is so complicated...
 	 */
 	while (fsnotify_iter_select_report_types(&iter_info)) {
-		ret = send_to_group(to_tell, mask, data, data_is, cookie,
-				    file_name, &iter_info);
+		ret = send_to_group(mask, data, data_type, dir, file_name,
+				    cookie, &iter_info);
 
 		if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
 			goto out;
diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h
index 3f246f7b8a92..4327d0e9c364 100644
--- a/fs/notify/inotify/inotify.h
+++ b/fs/notify/inotify/inotify.h
@@ -24,9 +24,9 @@ static inline struct inotify_event_info *INOTIFY_E(struct fsnotify_event *fse)
 
 extern void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
 					   struct fsnotify_group *group);
-extern int inotify_handle_event(struct fsnotify_group *group,
-				struct inode *inode,
-				u32 mask, const void *data, int data_type,
+extern int inotify_handle_event(struct fsnotify_group *group, u32 mask,
+				const void *data, int data_type,
+				struct inode *dir,
 				const struct qstr *file_name, u32 cookie,
 				struct fsnotify_iter_info *iter_info);
 
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 9b481460a2dc..dfd455798a1b 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -55,9 +55,8 @@ static int inotify_merge(struct list_head *list,
 	return event_compare(last_event, event);
 }
 
-int inotify_handle_event(struct fsnotify_group *group,
-			 struct inode *inode,
-			 u32 mask, const void *data, int data_type,
+int inotify_handle_event(struct fsnotify_group *group, u32 mask,
+			 const void *data, int data_type, struct inode *dir,
 			 const struct qstr *file_name, u32 cookie,
 			 struct fsnotify_iter_info *iter_info)
 {
@@ -82,7 +81,7 @@ int inotify_handle_event(struct fsnotify_group *group,
 		alloc_len += len + 1;
 	}
 
-	pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode,
+	pr_debug("%s: group=%p mark=%p mask=%x\n", __func__, group, inode_mark,
 		 mask);
 
 	i_mark = container_of(inode_mark, struct inotify_inode_mark,
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index f88bbcc9efeb..5385d5817dd9 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -490,8 +490,8 @@ void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
 					   fsn_mark);
 
 	/* Queue ignore event for the watch */
-	inotify_handle_event(group, NULL, FS_IN_IGNORED, NULL,
-			     FSNOTIFY_EVENT_NONE, NULL, 0, &iter_info);
+	inotify_handle_event(group, FS_IN_IGNORED, NULL, FSNOTIFY_EVENT_NONE,
+			     NULL, NULL, 0, &iter_info);
 
 	i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark);
 	/* remove this mark from the idr */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 97300f3b8ff0..f5dd6a03f869 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -108,16 +108,29 @@ struct mem_cgroup;
  * these operations for each relevant group.
  *
  * handle_event - main call for a group to handle an fs event
+ * @group:	group to notify
+ * @mask:	event type and flags
+ * @data:	object that event happened on
+ * @data_type:	type of object for fanotify_data_XXX() accessors
+ * @dir:	optional directory associated with event -
+ *		if @file_name is not NULL, this is the directory that
+ *		@file_name is relative to. Otherwise, @dir is the object
+ *		inode if event happened on directory and NULL if event
+ *		happenned on a non-directory.
+ * @file_name:	optional file name associated with event
+ * @cookie:	inotify rename cookie
+ * @iter_info:	array of marks from this group that are interested in the event
+ *
  * free_group_priv - called when a group refcnt hits 0 to clean up the private union
  * freeing_mark - called when a mark is being destroyed for some reason.  The group
  * 		MUST be holding a reference on each mark and that reference must be
  * 		dropped in this function.  inotify uses this function to send
  * 		userspace messages that marks have been removed.
+ *
  */
 struct fsnotify_ops {
-	int (*handle_event)(struct fsnotify_group *group,
-			    struct inode *inode,
-			    u32 mask, const void *data, int data_type,
+	int (*handle_event)(struct fsnotify_group *group, u32 mask,
+			    const void *data, int data_type, struct inode *dir,
 			    const struct qstr *file_name, u32 cookie,
 			    struct fsnotify_iter_info *iter_info);
 	void (*free_group_priv)(struct fsnotify_group *group);
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 3596448bfdab..30ca239285a3 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -152,11 +152,11 @@ static void audit_autoremove_mark_rule(struct audit_fsnotify_mark *audit_mark)
 }
 
 /* Update mark data in audit rules based on fsnotify events. */
-static int audit_mark_handle_event(struct fsnotify_group *group,
-				    struct inode *to_tell,
-				    u32 mask, const void *data, int data_type,
-				    const struct qstr *dname, u32 cookie,
-				    struct fsnotify_iter_info *iter_info)
+static int audit_mark_handle_event(struct fsnotify_group *group, u32 mask,
+				   const void *data, int data_type,
+				   struct inode *dir,
+				   const struct qstr *dname, u32 cookie,
+				   struct fsnotify_iter_info *iter_info)
 {
 	struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info);
 	struct audit_fsnotify_mark *audit_mark;
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index e49c912f862d..2ce2ac1ce100 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -1037,9 +1037,9 @@ static void evict_chunk(struct audit_chunk *chunk)
 		audit_schedule_prune();
 }
 
-static int audit_tree_handle_event(struct fsnotify_group *group,
-				   struct inode *to_tell,
-				   u32 mask, const void *data, int data_type,
+static int audit_tree_handle_event(struct fsnotify_group *group, u32 mask,
+				   const void *data, int data_type,
+				   struct inode *dir,
 				   const struct qstr *file_name, u32 cookie,
 				   struct fsnotify_iter_info *iter_info)
 {
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index e09c551ae52d..61fd601f1edf 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -464,9 +464,9 @@ void audit_remove_watch_rule(struct audit_krule *krule)
 }
 
 /* Update watch data in audit rules based on fsnotify events. */
-static int audit_watch_handle_event(struct fsnotify_group *group,
-				    struct inode *to_tell,
-				    u32 mask, const void *data, int data_type,
+static int audit_watch_handle_event(struct fsnotify_group *group, u32 mask,
+				    const void *data, int data_type,
+				    struct inode *dir,
 				    const struct qstr *dname, u32 cookie,
 				    struct fsnotify_iter_info *iter_info)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/20] fanotify: generalize the handling of extra event flags
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (8 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 09/20] fsnotify: pass dir argument to handle_event() callback Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 11/20] fanotify: generalize merge logic of events on dir Amir Goldstein
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

In fanotify_group_event_mask() there is logic in place to make sure we
are not going to handle an event with no type and just FAN_ONDIR flag.
Generalize this logic to any FANOTIFY_EVENT_FLAGS.

There is only one more flag in this group at the moment -
FAN_EVENT_ON_CHILD. We never report it to user, but we do pass it in to
fanotify_alloc_event() when group is reporting fid as indication that
event happened on child. We will have use for this indication later on.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index e68a9fad98bd..f5997186c8b8 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -211,7 +211,8 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 				     int data_type)
 {
 	__u32 marks_mask = 0, marks_ignored_mask = 0;
-	__u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS;
+	__u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS |
+				     FANOTIFY_EVENT_FLAGS;
 	const struct path *path = fsnotify_data_path(data, data_type);
 	struct fsnotify_mark *mark;
 	int type;
@@ -265,13 +266,17 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 	 * For backward compatibility and consistency, do not report FAN_ONDIR
 	 * to user in legacy fanotify mode (reporting fd) and report FAN_ONDIR
 	 * to user in FAN_REPORT_FID mode for all event types.
+	 *
+	 * We never report FAN_EVENT_ON_CHILD to user, but we do pass it in to
+	 * fanotify_alloc_event() when group is reporting fid as indication
+	 * that event happened on child.
 	 */
 	if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
-		/* Do not report FAN_ONDIR without any event */
-		if (!(test_mask & ~FAN_ONDIR))
+		/* Do not report event flags without any event */
+		if (!(test_mask & ~FANOTIFY_EVENT_FLAGS))
 			return 0;
 	} else {
-		user_mask &= ~FAN_ONDIR;
+		user_mask &= ~FANOTIFY_EVENT_FLAGS;
 	}
 
 	return test_mask & user_mask;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/20] fanotify: generalize merge logic of events on dir
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (9 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 10/20] fanotify: generalize the handling of extra event flags Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 12/20] fanotify: distinguish between fid encode error and null fid Amir Goldstein
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

An event on directory should never be merged with an event on
non-directory regardless of the event struct type.

This change has no visible effect, because currently, with struct
fanotify_path_event, the relevant events will not be merged because
event path of dir will be different than event path of non-dir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index f5997186c8b8..63865c5373e5 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -83,22 +83,22 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn,
 	    old->type != new->type || old->pid != new->pid)
 		return false;
 
+	/*
+	 * We want to merge many dirent events in the same dir (i.e.
+	 * creates/unlinks/renames), but we do not want to merge dirent
+	 * events referring to subdirs with dirent events referring to
+	 * non subdirs, otherwise, user won't be able to tell from a
+	 * mask FAN_CREATE|FAN_DELETE|FAN_ONDIR if it describes mkdir+
+	 * unlink pair or rmdir+create pair of events.
+	 */
+	if ((old->mask & FS_ISDIR) != (new->mask & FS_ISDIR))
+		return false;
+
 	switch (old->type) {
 	case FANOTIFY_EVENT_TYPE_PATH:
 		return fanotify_path_equal(fanotify_event_path(old),
 					   fanotify_event_path(new));
 	case FANOTIFY_EVENT_TYPE_FID:
-		/*
-		 * We want to merge many dirent events in the same dir (i.e.
-		 * creates/unlinks/renames), but we do not want to merge dirent
-		 * events referring to subdirs with dirent events referring to
-		 * non subdirs, otherwise, user won't be able to tell from a
-		 * mask FAN_CREATE|FAN_DELETE|FAN_ONDIR if it describes mkdir+
-		 * unlink pair or rmdir+create pair of events.
-		 */
-		if ((old->mask & FS_ISDIR) != (new->mask & FS_ISDIR))
-			return false;
-
 		return fanotify_fid_event_equal(FANOTIFY_FE(old),
 						FANOTIFY_FE(new));
 	case FANOTIFY_EVENT_TYPE_FID_NAME:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 12/20] fanotify: distinguish between fid encode error and null fid
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (10 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 11/20] fanotify: generalize merge logic of events on dir Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 13/20] fanotify: generalize test for FAN_REPORT_FID Amir Goldstein
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

In fanotify_encode_fh(), both cases of NULL inode and failure to encode
ended up with fh type FILEID_INVALID.

Distiguish the case of NULL inode, by setting fh type to FILEID_ROOT.
This is just a semantic difference at this point.

Remove stale comment and unneeded check from fid event compare helpers.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 63865c5373e5..a982594ebca7 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -34,10 +34,6 @@ static bool fanotify_fh_equal(struct fanotify_fh *fh1,
 	if (fh1->type != fh2->type || fh1->len != fh2->len)
 		return false;
 
-	/* Do not merge events if we failed to encode fh */
-	if (fh1->type == FILEID_INVALID)
-		return false;
-
 	return !fh1->len ||
 		!memcmp(fanotify_fh_buf(fh1), fanotify_fh_buf(fh2), fh1->len);
 }
@@ -56,10 +52,7 @@ static bool fanotify_fid_event_equal(struct fanotify_fid_event *ffe1,
 static bool fanotify_name_event_equal(struct fanotify_name_event *fne1,
 				      struct fanotify_name_event *fne2)
 {
-	/*
-	 * Do not merge name events without dir fh.
-	 * FAN_DIR_MODIFY does not encode object fh, so it may be empty.
-	 */
+	/* Do not merge name events without dir fh */
 	if (!fne1->dir_fh.len)
 		return false;
 
@@ -290,8 +283,10 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
 	void *buf = fh->buf;
 	int err;
 
+	fh->type = FILEID_ROOT;
+	fh->len = 0;
 	if (!inode)
-		goto out;
+		return;
 
 	dwords = 0;
 	err = -ENOENT;
@@ -326,7 +321,6 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
 			    type, bytes, err);
 	kfree(ext_buf);
 	*fanotify_fh_ext_buf_ptr(fh) = NULL;
-out:
 	/* Report the event without a file identifier on encode error */
 	fh->type = FILEID_INVALID;
 	fh->len = 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 13/20] fanotify: generalize test for FAN_REPORT_FID
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (11 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 12/20] fanotify: distinguish between fid encode error and null fid Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 14/20] fanotify: mask out special event flags from ignored mask Amir Goldstein
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

As preparation to new flags that report fids, define a bit set
of flags for a group reporting fids, currently containing the
only bit FAN_REPORT_FID.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 12 +++++++-----
 fs/notify/fanotify/fanotify_user.c | 12 ++++++------
 include/linux/fanotify.h           |  6 ++++--
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index a982594ebca7..3a82ddb63196 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -207,13 +207,14 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 	__u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS |
 				     FANOTIFY_EVENT_FLAGS;
 	const struct path *path = fsnotify_data_path(data, data_type);
+	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
 	struct fsnotify_mark *mark;
 	int type;
 
 	pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n",
 		 __func__, iter_info->report_mask, event_mask, data, data_type);
 
-	if (!FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
+	if (!fid_mode) {
 		/* Do we have path to open a file descriptor? */
 		if (!path)
 			return 0;
@@ -258,13 +259,13 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 	 *
 	 * For backward compatibility and consistency, do not report FAN_ONDIR
 	 * to user in legacy fanotify mode (reporting fd) and report FAN_ONDIR
-	 * to user in FAN_REPORT_FID mode for all event types.
+	 * to user in fid mode for all event types.
 	 *
 	 * We never report FAN_EVENT_ON_CHILD to user, but we do pass it in to
 	 * fanotify_alloc_event() when group is reporting fid as indication
 	 * that event happened on child.
 	 */
-	if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
+	if (fid_mode) {
 		/* Do not report event flags without any event */
 		if (!(test_mask & ~FANOTIFY_EVENT_FLAGS))
 			return 0;
@@ -424,6 +425,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
 	struct inode *id = fanotify_fid_inode(mask, data, data_type, dir);
 	const struct path *path = fsnotify_data_path(data, data_type);
+	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
 
 	/*
 	 * For queues with unlimited length lost events are not expected and
@@ -448,7 +450,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 		 * Allocate an fanotify_name_event struct and copy the name.
 		 */
 		event = fanotify_alloc_name_event(id, fsid, file_name, gfp);
-	} else if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
+	} else if (fid_mode) {
 		event = fanotify_alloc_fid_event(id, fsid, gfp);
 	} else {
 		event = fanotify_alloc_path_event(path, gfp);
@@ -556,7 +558,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 			return 0;
 	}
 
-	if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
+	if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) {
 		fsid = fanotify_get_fsid(iter_info);
 		/* Racing with mark destruction or creation? */
 		if (!fsid.val[0] && !fsid.val[1])
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 8f3c70873598..92bb885b98b6 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -100,7 +100,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
 	if (fsnotify_notify_queue_is_empty(group))
 		goto out;
 
-	if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
+	if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) {
 		event_size += fanotify_event_info_len(
 			FANOTIFY_E(fsnotify_peek_first_event(group)));
 	}
@@ -877,7 +877,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 		return -EINVAL;
 	}
 
-	if ((flags & FAN_REPORT_FID) &&
+	if ((flags & FANOTIFY_FID_BITS) &&
 	    (flags & FANOTIFY_CLASS_BITS) != FAN_CLASS_NOTIF)
 		return -EINVAL;
 
@@ -1035,7 +1035,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	__kernel_fsid_t __fsid, *fsid = NULL;
 	u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
 	unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
-	unsigned int obj_type;
+	unsigned int obj_type, fid_mode;
 	int ret;
 
 	pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n",
@@ -1108,9 +1108,9 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	 * inode events are not supported on a mount mark, because they do not
 	 * carry enough information (i.e. path) to be filtered by mount point.
 	 */
+	fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
 	if (mask & FANOTIFY_INODE_EVENTS &&
-	    (!FAN_GROUP_FLAG(group, FAN_REPORT_FID) ||
-	     mark_type == FAN_MARK_MOUNT))
+	    (!fid_mode || mark_type == FAN_MARK_MOUNT))
 		goto fput_and_out;
 
 	if (flags & FAN_MARK_FLUSH) {
@@ -1135,7 +1135,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 			goto path_put_and_out;
 	}
 
-	if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) {
+	if (fid_mode) {
 		ret = fanotify_test_fid(&path, &__fsid);
 		if (ret)
 			goto path_put_and_out;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index b79fa9bb7359..bbbee11d2521 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -18,8 +18,10 @@
 #define FANOTIFY_CLASS_BITS	(FAN_CLASS_NOTIF | FAN_CLASS_CONTENT | \
 				 FAN_CLASS_PRE_CONTENT)
 
-#define FANOTIFY_INIT_FLAGS	(FANOTIFY_CLASS_BITS | \
-				 FAN_REPORT_TID | FAN_REPORT_FID | \
+#define FANOTIFY_FID_BITS	(FAN_REPORT_FID)
+
+#define FANOTIFY_INIT_FLAGS	(FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
+				 FAN_REPORT_TID | \
 				 FAN_CLOEXEC | FAN_NONBLOCK | \
 				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 14/20] fanotify: mask out special event flags from ignored mask
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (12 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 13/20] fanotify: generalize test for FAN_REPORT_FID Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 15/20] fanotify: prepare for implicit event flags in mark mask Amir Goldstein
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

The special event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) never had
any meaning in ignored mask. Mask them out explicitly.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 92bb885b98b6..27bbd67270d8 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1035,6 +1035,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	__kernel_fsid_t __fsid, *fsid = NULL;
 	u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
 	unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
+	bool ignored = flags & FAN_MARK_IGNORED_MASK;
 	unsigned int obj_type, fid_mode;
 	int ret;
 
@@ -1082,6 +1083,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	if (mask & ~valid_mask)
 		return -EINVAL;
 
+	/* Event flags (ONDIR, ON_CHILD) are meaningless in ignored mask */
+	if (ignored)
+		mask &= ~FANOTIFY_EVENT_FLAGS;
+
 	f = fdget(fanotify_fd);
 	if (unlikely(!f.file))
 		return -EBADF;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 15/20] fanotify: prepare for implicit event flags in mark mask
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (13 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 14/20] fanotify: mask out special event flags from ignored mask Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 16/20] fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marks Amir Goldstein
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

So far, all flags that can be set in an fanotify mark mask can be set
explicitly by a call to fanotify_mark(2).

Prepare for defining implicit event flags that cannot be set by user with
fanotify_mark(2), similar to how inotify/dnotify implicitly set the
FS_EVENT_ON_CHILD flag.

Implicit event flags cannot be removed by user and mark gets destroyed
when only implicit event flags remain in the mask.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 40 ++++++++++++++++++------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 27bbd67270d8..66d663baa4a6 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -651,12 +651,13 @@ static int fanotify_find_path(int dfd, const char __user *filename,
 }
 
 static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark,
-					    __u32 mask,
-					    unsigned int flags,
-					    int *destroy)
+					    __u32 mask, unsigned int flags,
+					    __u32 umask, int *destroy)
 {
 	__u32 oldmask = 0;
 
+	/* umask bits cannot be removed by user */
+	mask &= ~umask;
 	spin_lock(&fsn_mark->lock);
 	if (!(flags & FAN_MARK_IGNORED_MASK)) {
 		oldmask = fsn_mark->mask;
@@ -664,7 +665,13 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark,
 	} else {
 		fsn_mark->ignored_mask &= ~mask;
 	}
-	*destroy = !(fsn_mark->mask | fsn_mark->ignored_mask);
+	/*
+	 * We need to keep the mark around even if remaining mask cannot
+	 * result in any events (e.g. mask == FAN_ONDIR) to support incremenal
+	 * changes to the mask.
+	 * Destroy mark when only umask bits remain.
+	 */
+	*destroy = !((fsn_mark->mask | fsn_mark->ignored_mask) & ~umask);
 	spin_unlock(&fsn_mark->lock);
 
 	return mask & oldmask;
@@ -672,7 +679,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark,
 
 static int fanotify_remove_mark(struct fsnotify_group *group,
 				fsnotify_connp_t *connp, __u32 mask,
-				unsigned int flags)
+				unsigned int flags, __u32 umask)
 {
 	struct fsnotify_mark *fsn_mark = NULL;
 	__u32 removed;
@@ -686,7 +693,7 @@ static int fanotify_remove_mark(struct fsnotify_group *group,
 	}
 
 	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
-						 &destroy_mark);
+						 umask, &destroy_mark);
 	if (removed & fsnotify_conn_mask(fsn_mark->connector))
 		fsnotify_recalc_mask(fsn_mark->connector);
 	if (destroy_mark)
@@ -702,25 +709,26 @@ static int fanotify_remove_mark(struct fsnotify_group *group,
 
 static int fanotify_remove_vfsmount_mark(struct fsnotify_group *group,
 					 struct vfsmount *mnt, __u32 mask,
-					 unsigned int flags)
+					 unsigned int flags, __u32 umask)
 {
 	return fanotify_remove_mark(group, &real_mount(mnt)->mnt_fsnotify_marks,
-				    mask, flags);
+				    mask, flags, umask);
 }
 
 static int fanotify_remove_sb_mark(struct fsnotify_group *group,
-				      struct super_block *sb, __u32 mask,
-				      unsigned int flags)
+				   struct super_block *sb, __u32 mask,
+				   unsigned int flags, __u32 umask)
 {
-	return fanotify_remove_mark(group, &sb->s_fsnotify_marks, mask, flags);
+	return fanotify_remove_mark(group, &sb->s_fsnotify_marks, mask,
+				    flags, umask);
 }
 
 static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 				      struct inode *inode, __u32 mask,
-				      unsigned int flags)
+				      unsigned int flags, __u32 umask)
 {
 	return fanotify_remove_mark(group, &inode->i_fsnotify_marks, mask,
-				    flags);
+				    flags, umask);
 }
 
 static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
@@ -1170,13 +1178,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	case FAN_MARK_REMOVE:
 		if (mark_type == FAN_MARK_MOUNT)
 			ret = fanotify_remove_vfsmount_mark(group, mnt, mask,
-							    flags);
+							    flags, 0);
 		else if (mark_type == FAN_MARK_FILESYSTEM)
 			ret = fanotify_remove_sb_mark(group, mnt->mnt_sb, mask,
-						      flags);
+						      flags, 0);
 		else
 			ret = fanotify_remove_inode_mark(group, inode, mask,
-							 flags);
+							 flags, 0);
 		break;
 	default:
 		ret = -EINVAL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 16/20] fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marks
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (14 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 15/20] fanotify: prepare for implicit event flags in mark mask Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 17/20] fanotify: remove event FAN_DIR_MODIFY Amir Goldstein
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Up to now, fanotify allowed to set the FAN_EVENT_ON_CHILD flag on
sb/mount marks and non-directory inode mask, but the flag was ignored.

Mask out the flag if it is provided by user on sb/mount/non-dir marks
and define it as an implicit flag that cannot be removed by user.

This flag is going to be used internally to request for events with
parent and name info.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 66d663baa4a6..42b8cc51cb3f 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1045,6 +1045,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
 	bool ignored = flags & FAN_MARK_IGNORED_MASK;
 	unsigned int obj_type, fid_mode;
+	u32 umask = 0;
 	int ret;
 
 	pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n",
@@ -1162,6 +1163,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	else
 		mnt = path.mnt;
 
+	/* Mask out FAN_EVENT_ON_CHILD flag for sb/mount/non-dir marks */
+	if (mnt || !S_ISDIR(inode->i_mode)) {
+		mask &= ~FAN_EVENT_ON_CHILD;
+		umask = FAN_EVENT_ON_CHILD;
+	}
+
 	/* create/update an inode mark */
 	switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE)) {
 	case FAN_MARK_ADD:
@@ -1178,13 +1185,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	case FAN_MARK_REMOVE:
 		if (mark_type == FAN_MARK_MOUNT)
 			ret = fanotify_remove_vfsmount_mark(group, mnt, mask,
-							    flags, 0);
+							    flags, umask);
 		else if (mark_type == FAN_MARK_FILESYSTEM)
 			ret = fanotify_remove_sb_mark(group, mnt->mnt_sb, mask,
-						      flags, 0);
+						      flags, umask);
 		else
 			ret = fanotify_remove_inode_mark(group, inode, mask,
-							 flags, 0);
+							 flags, umask);
 		break;
 	default:
 		ret = -EINVAL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 17/20] fanotify: remove event FAN_DIR_MODIFY
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (15 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 16/20] fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marks Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 18/20] fsnotify: add object type "child" to object type iterator Amir Goldstein
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

It was never enabled in uapi and its functionality is about to be
superseded by events FAN_CREATE, FAN_DETELE, FAN_MOVE with group
flag FAN_REPORT_NAME.

Keep a place holder variable name_event instead of removing the
name recording code.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c    | 9 ++-------
 fs/notify/fsnotify.c             | 2 +-
 include/linux/fsnotify.h         | 6 ------
 include/linux/fsnotify_backend.h | 4 +---
 include/uapi/linux/fanotify.h    | 1 -
 5 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 3a82ddb63196..3885bf63976b 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -426,6 +426,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	struct inode *id = fanotify_fid_inode(mask, data, data_type, dir);
 	const struct path *path = fsnotify_data_path(data, data_type);
 	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
+	bool name_event = false;
 
 	/*
 	 * For queues with unlimited length lost events are not expected and
@@ -443,12 +444,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 
 	if (fanotify_is_perm_event(mask)) {
 		event = fanotify_alloc_perm_event(path, gfp);
-	} else if (mask & FAN_DIR_MODIFY && !(WARN_ON_ONCE(!file_name))) {
-		/*
-		 * For FAN_DIR_MODIFY event, we report the fid of the directory
-		 * and the name of the modified entry.
-		 * Allocate an fanotify_name_event struct and copy the name.
-		 */
+	} else if (name_event && file_name) {
 		event = fanotify_alloc_name_event(id, fsid, file_name, gfp);
 	} else if (fid_mode) {
 		event = fanotify_alloc_fid_event(id, fsid, gfp);
@@ -529,7 +525,6 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 	BUILD_BUG_ON(FAN_MOVED_FROM != FS_MOVED_FROM);
 	BUILD_BUG_ON(FAN_CREATE != FS_CREATE);
 	BUILD_BUG_ON(FAN_DELETE != FS_DELETE);
-	BUILD_BUG_ON(FAN_DIR_MODIFY != FS_DIR_MODIFY);
 	BUILD_BUG_ON(FAN_DELETE_SELF != FS_DELETE_SELF);
 	BUILD_BUG_ON(FAN_MOVE_SELF != FS_MOVE_SELF);
 	BUILD_BUG_ON(FAN_EVENT_ON_CHILD != FS_EVENT_ON_CHILD);
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index e05f3b2cf664..51ada3cfd2ff 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -393,7 +393,7 @@ static __init int fsnotify_init(void)
 {
 	int ret;
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 26);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
 
 	ret = init_srcu_struct(&fsnotify_mark_srcu);
 	if (ret)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index e73ae6117a61..dc68111ae856 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -30,12 +30,6 @@ static inline void fsnotify_name(struct inode *dir, __u32 mask,
 				 const struct qstr *name, u32 cookie)
 {
 	fsnotify(dir, mask, child, FSNOTIFY_EVENT_INODE, name, cookie);
-	/*
-	 * Send another flavor of the event without child inode data and
-	 * without the specific event type (e.g. FS_CREATE|FS_IS_DIR).
-	 * The name is relative to the dir inode the event is reported to.
-	 */
-	fsnotify(dir, FS_DIR_MODIFY, dir, FSNOTIFY_EVENT_INODE, name, 0);
 }
 
 static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index f5dd6a03f869..738d669f6d6d 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -47,7 +47,6 @@
 #define FS_OPEN_PERM		0x00010000	/* open event in an permission hook */
 #define FS_ACCESS_PERM		0x00020000	/* access event in a permissions hook */
 #define FS_OPEN_EXEC_PERM	0x00040000	/* open/exec event in a permission hook */
-#define FS_DIR_MODIFY		0x00080000	/* Directory entry was modified */
 
 #define FS_EXCL_UNLINK		0x04000000	/* do not send events if object is unlinked */
 /* This inode cares about things that happen to its children.  Always set for
@@ -67,8 +66,7 @@
  * The watching parent may get an FS_ATTRIB|FS_EVENT_ON_CHILD event
  * when a directory entry inside a child subdir changes.
  */
-#define ALL_FSNOTIFY_DIRENT_EVENTS	(FS_CREATE | FS_DELETE | FS_MOVE | \
-					 FS_DIR_MODIFY)
+#define ALL_FSNOTIFY_DIRENT_EVENTS	(FS_CREATE | FS_DELETE | FS_MOVE)
 
 #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
 				  FS_OPEN_EXEC_PERM)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index a88c7c6d0692..7f2f17eacbf9 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -24,7 +24,6 @@
 #define FAN_OPEN_PERM		0x00010000	/* File open in perm check */
 #define FAN_ACCESS_PERM		0x00020000	/* File accessed in perm check */
 #define FAN_OPEN_EXEC_PERM	0x00040000	/* File open/exec in perm check */
-#define FAN_DIR_MODIFY		0x00080000	/* Directory entry was modified */
 
 #define FAN_EVENT_ON_CHILD	0x08000000	/* Interested in child events */
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 18/20] fsnotify: add object type "child" to object type iterator
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (16 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 17/20] fanotify: remove event FAN_DIR_MODIFY Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-06-12  9:33 ` [PATCH 19/20] fanotify: move event name into fanotify_fh Amir Goldstein
  2020-06-12  9:33 ` [PATCH 20/20] fanotify: no external fh buffer in fanotify_name_event Amir Goldstein
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

The object type iterator is used to collect all the marks of
a specific group that have interest in an event.

It is used by fanotify to get a single handle_event callback
when an event has a match to either of inode/sb/mount marks
of the group.

The nature of fsnotify events is that they are associated with
at most one sb at most one mount and at most one inode.

When a parent and child are both watching, two events are sent
to backend, one associated to parent inode and one associated
to the child inode.

This results in duplicate events in fanotify, which usually
get merged before user reads them, but this is sub-optimal.

It would be better if the same event is sent to backend with
an object type iterator that has both the child inode and its
parent, and let the backend decide if the event should be reported
once (fanotify) or twice (inotify).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/fsnotify_backend.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 738d669f6d6d..e4bc52dcb6e0 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -256,6 +256,7 @@ static inline const struct path *fsnotify_data_path(const void *data,
 
 enum fsnotify_obj_type {
 	FSNOTIFY_OBJ_TYPE_INODE,
+	FSNOTIFY_OBJ_TYPE_CHILD,
 	FSNOTIFY_OBJ_TYPE_VFSMOUNT,
 	FSNOTIFY_OBJ_TYPE_SB,
 	FSNOTIFY_OBJ_TYPE_COUNT,
@@ -263,6 +264,7 @@ enum fsnotify_obj_type {
 };
 
 #define FSNOTIFY_OBJ_TYPE_INODE_FL	(1U << FSNOTIFY_OBJ_TYPE_INODE)
+#define FSNOTIFY_OBJ_TYPE_CHILD_FL	(1U << FSNOTIFY_OBJ_TYPE_CHILD)
 #define FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL	(1U << FSNOTIFY_OBJ_TYPE_VFSMOUNT)
 #define FSNOTIFY_OBJ_TYPE_SB_FL		(1U << FSNOTIFY_OBJ_TYPE_SB)
 #define FSNOTIFY_OBJ_ALL_TYPES_MASK	((1U << FSNOTIFY_OBJ_TYPE_COUNT) - 1)
@@ -307,6 +309,7 @@ static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \
 }
 
 FSNOTIFY_ITER_FUNCS(inode, INODE)
+FSNOTIFY_ITER_FUNCS(child, CHILD)
 FSNOTIFY_ITER_FUNCS(vfsmount, VFSMOUNT)
 FSNOTIFY_ITER_FUNCS(sb, SB)
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 19/20] fanotify: move event name into fanotify_fh
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (17 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 18/20] fsnotify: add object type "child" to object type iterator Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  2020-07-03 16:02   ` Jan Kara
  2020-06-12  9:33 ` [PATCH 20/20] fanotify: no external fh buffer in fanotify_name_event Amir Goldstein
  19 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

An fanotify event name is always recorded relative to a dir fh.
Move the name_len members of fanotify_name_event into unused space
in struct fanotify_fh.

We add a name_offset member to allow packing a binary blob before
the name string in the variable size buffer. We are going to use
that space to store the child fid.

It also fixes a bug in fanotify_alloc_name_event() which used an
allocation size 7 bytes bigger than required size, because it used
sizeof(struct fanotify_name_event) without deducting that 7 bytes
alignment padding.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 27 ++++++++-----
 fs/notify/fanotify/fanotify.h      | 62 +++++++++++++++++++++++-------
 fs/notify/fanotify/fanotify_user.c | 23 +++++------
 3 files changed, 75 insertions(+), 37 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 3885bf63976b..3a2d48edaddd 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -52,15 +52,20 @@ static bool fanotify_fid_event_equal(struct fanotify_fid_event *ffe1,
 static bool fanotify_name_event_equal(struct fanotify_name_event *fne1,
 				      struct fanotify_name_event *fne2)
 {
+	struct fanotify_fh *dfh1 = &fne1->dir_fh;
+	struct fanotify_fh *dfh2 = &fne2->dir_fh;
+
 	/* Do not merge name events without dir fh */
-	if (!fne1->dir_fh.len)
+	if (!dfh1->len)
 		return false;
 
-	if (fne1->name_len != fne2->name_len ||
-	    !fanotify_fh_equal(&fne1->dir_fh, &fne2->dir_fh))
+	if (dfh1->name_len != dfh2->name_len ||
+	    dfh1->name_offset != dfh2->name_offset ||
+	    !fanotify_fh_equal(dfh1, dfh2))
 		return false;
 
-	return !memcmp(fne1->name, fne2->name, fne1->name_len);
+	return !memcmp(fanotify_fh_name(dfh1), fanotify_fh_name(dfh2),
+		       dfh1->name_len);
 }
 
 static bool fanotify_should_merge(struct fsnotify_event *old_fsn,
@@ -284,8 +289,7 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
 	void *buf = fh->buf;
 	int err;
 
-	fh->type = FILEID_ROOT;
-	fh->len = 0;
+	fanotify_fh_init(fh);
 	if (!inode)
 		return;
 
@@ -314,6 +318,10 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
 
 	fh->type = type;
 	fh->len = bytes;
+	if (fh->len > FANOTIFY_INLINE_FH_LEN)
+		fh->name_offset = FANOTIFY_INLINE_FH_LEN;
+	else
+		fh->name_offset = fh->len;
 
 	return;
 
@@ -401,6 +409,7 @@ struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
 						 gfp_t gfp)
 {
 	struct fanotify_name_event *fne;
+	struct fanotify_fh *dfh;
 
 	fne = kmalloc(sizeof(*fne) + file_name->len + 1, gfp);
 	if (!fne)
@@ -408,9 +417,9 @@ struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
 
 	fne->fae.type = FANOTIFY_EVENT_TYPE_FID_NAME;
 	fne->fsid = *fsid;
-	fanotify_encode_fh(&fne->dir_fh, id, gfp);
-	fne->name_len = file_name->len;
-	strcpy(fne->name, file_name->name);
+	dfh = &fne->dir_fh;
+	fanotify_encode_fh(dfh, id, gfp);
+	fanotify_fh_copy_name(dfh, file_name);
 
 	return &fne->fae;
 }
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 1b2a3bbe6008..8cb062eefd3e 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -23,13 +23,24 @@ enum {
  * stored in either the first or last 2 dwords.
  */
 #define FANOTIFY_INLINE_FH_LEN	(3 << 2)
+#define FANOTIFY_FH_HDR_LEN	offsetof(struct fanotify_fh, buf)
 
 struct fanotify_fh {
-	unsigned char buf[FANOTIFY_INLINE_FH_LEN];
 	u8 type;
 	u8 len;
+	u8 name_offset;
+	u8 name_len;
+	unsigned char buf[FANOTIFY_INLINE_FH_LEN];
 } __aligned(4);
 
+static inline void fanotify_fh_init(struct fanotify_fh *fh)
+{
+	fh->type = FILEID_ROOT;
+	fh->len = 0;
+	fh->name_offset = 0;
+	fh->name_len = 0;
+}
+
 static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh)
 {
 	return fh->len > FANOTIFY_INLINE_FH_LEN;
@@ -37,6 +48,7 @@ static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh)
 
 static inline char **fanotify_fh_ext_buf_ptr(struct fanotify_fh *fh)
 {
+	BUILD_BUG_ON(FANOTIFY_FH_HDR_LEN % 4);
 	BUILD_BUG_ON(__alignof__(char *) - 4 + sizeof(char *) >
 		     FANOTIFY_INLINE_FH_LEN);
 	return (char **)ALIGN((unsigned long)(fh->buf), __alignof__(char *));
@@ -52,6 +64,35 @@ static inline void *fanotify_fh_buf(struct fanotify_fh *fh)
 	return fanotify_fh_has_ext_buf(fh) ? fanotify_fh_ext_buf(fh) : fh->buf;
 }
 
+static inline int fanotify_fh_blob_len(struct fanotify_fh *fh)
+{
+	if (fh->name_offset <= fh->len)
+		return 0;
+
+	/* Is there a space between end of fh_buf and start of name? */
+	return fh->name_offset - fh->len;
+}
+
+static inline void *fanotify_fh_blob(struct fanotify_fh *fh)
+{
+	if (fh->name_offset <= fh->len)
+		return NULL;
+
+	return fh->buf + fh->len;
+}
+
+static inline const char *fanotify_fh_name(struct fanotify_fh *fh)
+{
+	return fh->name_len ? fh->buf + fh->name_offset : NULL;
+}
+
+static inline void fanotify_fh_copy_name(struct fanotify_fh *fh,
+					 const struct qstr *name)
+{
+	fh->name_len = name->len;
+	strcpy(fh->buf + fh->name_offset, name->name);
+}
+
 /*
  * Common structure for fanotify events. Concrete structs are allocated in
  * fanotify_handle_event() and freed when the information is retrieved by
@@ -93,12 +134,16 @@ FANOTIFY_FE(struct fanotify_event *event)
 	return container_of(event, struct fanotify_fid_event, fae);
 }
 
+/*
+ * This is identical to struct fanotify_fid_event, but allocated with variable
+ * size kmalloc and should have positive value of dir_fh.name_len.
+ * Keeping the separate struct definition for semantics and type safety -
+ * an event should be cast to this type IFF it was allocated using kmalloc.
+ */
 struct fanotify_name_event {
 	struct fanotify_event fae;
 	__kernel_fsid_t fsid;
 	struct fanotify_fh dir_fh;
-	u8 name_len;
-	char name[];
 };
 
 static inline struct fanotify_name_event *
@@ -142,17 +187,6 @@ static inline int fanotify_event_object_fh_len(struct fanotify_event *event)
 	return fh ? fh->len : 0;
 }
 
-static inline bool fanotify_event_has_name(struct fanotify_event *event)
-{
-	return event->type == FANOTIFY_EVENT_TYPE_FID_NAME;
-}
-
-static inline int fanotify_event_name_len(struct fanotify_event *event)
-{
-	return fanotify_event_has_name(event) ?
-		FANOTIFY_NE(event)->name_len : 0;
-}
-
 struct fanotify_path_event {
 	struct fanotify_event fae;
 	struct path path;
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 42b8cc51cb3f..af8268b44c68 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -68,17 +68,14 @@ static int fanotify_event_info_len(struct fanotify_event *event)
 {
 	int info_len = 0;
 	int fh_len = fanotify_event_object_fh_len(event);
+	struct fanotify_fh *dfh = fanotify_event_dir_fh(event);
+
+	if (dfh)
+		info_len += fanotify_fid_info_len(dfh->len, dfh->name_len);
 
 	if (fh_len)
 		info_len += fanotify_fid_info_len(fh_len, 0);
 
-	if (fanotify_event_name_len(event)) {
-		struct fanotify_name_event *fne = FANOTIFY_NE(event);
-
-		info_len += fanotify_fid_info_len(fne->dir_fh.len,
-						  fne->name_len);
-	}
-
 	return info_len;
 }
 
@@ -305,6 +302,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 {
 	struct fanotify_event_metadata metadata;
 	struct path *path = fanotify_event_path(event);
+	struct fanotify_fh *dfh = fanotify_event_dir_fh(event);
 	struct file *f = NULL;
 	int ret, fd = FAN_NOFD;
 
@@ -346,13 +344,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 		fd_install(fd, f);
 
 	/* Event info records order is: dir fid + name, child fid */
-	if (fanotify_event_name_len(event)) {
-		struct fanotify_name_event *fne = FANOTIFY_NE(event);
-
-		ret = copy_info_to_user(fanotify_event_fsid(event),
-					fanotify_event_dir_fh(event),
-					fne->name, fne->name_len,
-					buf, count);
+	if (dfh) {
+		ret = copy_info_to_user(fanotify_event_fsid(event), dfh,
+					fanotify_fh_name(dfh),
+					dfh->name_len, buf, count);
 		if (ret < 0)
 			return ret;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 20/20] fanotify: no external fh buffer in fanotify_name_event
  2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
                   ` (18 preceding siblings ...)
  2020-06-12  9:33 ` [PATCH 19/20] fanotify: move event name into fanotify_fh Amir Goldstein
@ 2020-06-12  9:33 ` Amir Goldstein
  19 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12  9:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

The fanotify_fh struct has an inline buffer of size 12 which is enough
to store the most common local filesystem file handles (e.g. ext4, xfs).
For file handles that do not fit in the inline buffer (e.g. btrfs), an
external buffer is allocated to store the file handle.

When allocating a variable size fanotify_name_event, there is no point
in allocating also an external fh buffer when file handle does not fit
in the inline buffer.

Check required size for encoding fh, preallocate an event buffer
sufficient to contain both file handle and name and store the name at
name_offset after the file handle.

At this time, when not reporting name in event, we still allocate
the fixed size fanotify_fid_event and an external buffer for large
file handles, but fanotify_alloc_name_event() has already been prepared
to accept a NULL file_name.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c | 72 ++++++++++++++++++++++++-----------
 fs/notify/fanotify/fanotify.h |  4 ++
 2 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 3a2d48edaddd..c3986fbb6801 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -281,10 +281,22 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 	return test_mask & user_mask;
 }
 
+static unsigned int fanotify_encode_fh_len(struct inode *inode)
+{
+	int dwords = 0;
+
+	if (!inode)
+		return 0;
+
+	exportfs_encode_inode_fh(inode, NULL, &dwords, NULL);
+
+	return dwords << 2;
+}
+
 static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
-			       gfp_t gfp)
+			       unsigned int prealloc_fh_len, gfp_t gfp)
 {
-	int dwords, type, bytes = 0;
+	int dwords, bytes, type = 0;
 	char *ext_buf = NULL;
 	void *buf = fh->buf;
 	int err;
@@ -293,15 +305,21 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
 	if (!inode)
 		return;
 
-	dwords = 0;
+	/*
+	 * !gpf means preallocated variable size fh, but prealloc_fh_len could
+	 * be zero in that case if encoding fh len failed.
+	 */
 	err = -ENOENT;
-	type = exportfs_encode_inode_fh(inode, NULL, &dwords, NULL);
-	if (!dwords)
+	if (!gfp)
+		bytes = prealloc_fh_len;
+	else
+		bytes = fanotify_encode_fh_len(inode);
+	if (bytes < 4 || WARN_ON_ONCE(bytes % 4))
 		goto out_err;
 
-	bytes = dwords << 2;
-	if (bytes > FANOTIFY_INLINE_FH_LEN) {
-		/* Treat failure to allocate fh as failure to allocate event */
+	/* No external buffer in a variable size allocated fh */
+	if (gfp && bytes > FANOTIFY_INLINE_FH_LEN) {
+		/* Treat failure to allocate fh as failure to encode fh */
 		err = -ENOMEM;
 		ext_buf = kmalloc(bytes, gfp);
 		if (!ext_buf)
@@ -311,17 +329,19 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
 		buf = ext_buf;
 	}
 
+	dwords = bytes >> 2;
 	type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL);
 	err = -EINVAL;
-	if (!type || type == FILEID_INVALID || bytes != dwords << 2)
+	if (!type || type == FILEID_INVALID || bytes < (dwords << 2))
 		goto out_err;
 
 	fh->type = type;
-	fh->len = bytes;
-	if (fh->len > FANOTIFY_INLINE_FH_LEN)
-		fh->name_offset = FANOTIFY_INLINE_FH_LEN;
-	else
-		fh->name_offset = fh->len;
+	fh->len = dwords << 2;
+	/*
+	 * name_offset is set to preallocated fh len, even if actual fh len
+	 * is shorter.
+	 */
+	fh->name_offset = prealloc_fh_len;
 
 	return;
 
@@ -398,7 +418,7 @@ struct fanotify_event *fanotify_alloc_fid_event(struct inode *id,
 
 	ffe->fae.type = FANOTIFY_EVENT_TYPE_FID;
 	ffe->fsid = *fsid;
-	fanotify_encode_fh(&ffe->object_fh, id, gfp);
+	fanotify_encode_fh(&ffe->object_fh, id, 0, gfp);
 
 	return &ffe->fae;
 }
@@ -410,16 +430,26 @@ struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
 {
 	struct fanotify_name_event *fne;
 	struct fanotify_fh *dfh;
+	unsigned int prealloc_fh_len = fanotify_encode_fh_len(id);
+	unsigned int size;
 
-	fne = kmalloc(sizeof(*fne) + file_name->len + 1, gfp);
+	size = sizeof(*fne) - FANOTIFY_INLINE_FH_LEN + prealloc_fh_len;
+	if (file_name)
+		size += file_name->len + 1;
+	fne = kmalloc(size, gfp);
 	if (!fne)
 		return NULL;
 
 	fne->fae.type = FANOTIFY_EVENT_TYPE_FID_NAME;
 	fne->fsid = *fsid;
 	dfh = &fne->dir_fh;
-	fanotify_encode_fh(dfh, id, gfp);
-	fanotify_fh_copy_name(dfh, file_name);
+	fanotify_encode_fh(dfh, id, prealloc_fh_len, 0);
+	if (file_name)
+		fanotify_fh_copy_name(dfh, file_name);
+
+	pr_debug("%s: ino=%lu size=%u prealloc_fh_len=%u dir_fh_len=%u name_len=%u name='%.*s'\n",
+		 __func__, id->i_ino, size, prealloc_fh_len, dfh->len,
+		 dfh->name_len, dfh->name_len, fanotify_fh_name(dfh));
 
 	return &fne->fae;
 }
@@ -634,11 +664,7 @@ static void fanotify_free_fid_event(struct fanotify_event *event)
 
 static void fanotify_free_name_event(struct fanotify_event *event)
 {
-	struct fanotify_name_event *fne = FANOTIFY_NE(event);
-
-	if (fanotify_fh_has_ext_buf(&fne->dir_fh))
-		kfree(fanotify_fh_ext_buf(&fne->dir_fh));
-	kfree(fne);
+	kfree(FANOTIFY_NE(event));
 }
 
 static void fanotify_free_event(struct fsnotify_event *fsn_event)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 8cb062eefd3e..7cbdac4be42f 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -43,6 +43,10 @@ static inline void fanotify_fh_init(struct fanotify_fh *fh)
 
 static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh)
 {
+	/* No external buffer in a variable size allocated fh */
+	if (fh->name_offset)
+		return false;
+
 	return fh->len > FANOTIFY_INLINE_FH_LEN;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/20] nfsd: use fsnotify_data_inode() to get the unlinked inode
  2020-06-12  9:33 ` [PATCH 04/20] nfsd: use fsnotify_data_inode() to get the unlinked inode Amir Goldstein
@ 2020-06-12 10:25   ` Amir Goldstein
  0 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-06-12 10:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Fri, Jun 12, 2020 at 12:34 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> The inode argument to handle_event() is about to become obsolete.


> Return non const inode pointer from fsnotify_data_inode(), fsnotify
> hooks do not pass const inode pointer as data.

Sorry, this sentence is leftover from before I split this patch.

>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/nfsd/filecache.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 82198d747c4c..ace8e5c30952 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -599,11 +599,13 @@ static struct notifier_block nfsd_file_lease_notifier = {
>
>  static int
>  nfsd_file_fsnotify_handle_event(struct fsnotify_group *group,
> -                               struct inode *inode,
> +                               struct inode *to_tell,
>                                 u32 mask, const void *data, int data_type,
>                                 const struct qstr *file_name, u32 cookie,
>                                 struct fsnotify_iter_info *iter_info)
>  {
> +       struct inode *inode = fsnotify_data_inode(data, data_type);
> +
>         trace_nfsd_file_fsnotify_handle_event(inode, mask);
>
>         /* Should be no marks on non-regular files */
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/20] kernfs: do not call fsnotify() with name without a parent
  2020-06-12  9:33 ` [PATCH 05/20] kernfs: do not call fsnotify() with name without a parent Amir Goldstein
@ 2020-06-29 13:27   ` Tejun Heo
  2020-06-29 16:11   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 35+ messages in thread
From: Tejun Heo @ 2020-06-29 13:27 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, Greg Kroah-Hartman

On Fri, Jun 12, 2020 at 12:33:28PM +0300, Amir Goldstein wrote:
> When creating an FS_MODIFY event on inode itself (not on parent)
> the file_name argument should be NULL.
> 
> The change to send a non NULL name to inode itself was done on purpuse
> as part of another commit, as Tejun writes: "...While at it, supply the
> target file name to fsnotify() from kernfs_node->name.".
> 
> But this is wrong practice and inconsistent with inotify behavior when
> watching a single file.  When a child is being watched (as opposed to the
> parent directory) the inotify event should contain the watch descriptor,
> but not the file name.
> 
> Fixes: df6a58c5c5aa ("kernfs: don't depend on d_find_any_alias()...")
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/20] kernfs: do not call fsnotify() with name without a parent
  2020-06-12  9:33 ` [PATCH 05/20] kernfs: do not call fsnotify() with name without a parent Amir Goldstein
  2020-06-29 13:27   ` Tejun Heo
@ 2020-06-29 16:11   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 35+ messages in thread
From: Greg Kroah-Hartman @ 2020-06-29 16:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, Tejun Heo

On Fri, Jun 12, 2020 at 12:33:28PM +0300, Amir Goldstein wrote:
> When creating an FS_MODIFY event on inode itself (not on parent)
> the file_name argument should be NULL.
> 
> The change to send a non NULL name to inode itself was done on purpuse
> as part of another commit, as Tejun writes: "...While at it, supply the
> target file name to fsnotify() from kernfs_node->name.".
> 
> But this is wrong practice and inconsistent with inotify behavior when
> watching a single file.  When a child is being watched (as opposed to the
> parent directory) the inotify event should contain the watch descriptor,
> but not the file name.
> 
> Fixes: df6a58c5c5aa ("kernfs: don't depend on d_find_any_alias()...")
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher
  2020-06-12  9:33 ` [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher Amir Goldstein
@ 2020-07-03 14:03   ` Jan Kara
  2020-07-04  9:30     ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2020-07-03 14:03 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, Mel Gorman

On Fri 12-06-20 12:33:24, Amir Goldstein wrote:
> From: Mel Gorman <mgorman@techsingularity.net>
> 
> The fsnotify paths are trivial to hit even when there are no watchers and
> they are surprisingly expensive. For example, every successful vfs_write()
> hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless
> FMODE_NONOTIFY is set which is an internal flag invisible to userspace.
> As it stands, fsnotify_parent is a guaranteed functional call even if there
> are no watchers and fsnotify() does a substantial amount of unnecessary
> work before it checks if there are any watchers. A perf profile showed
> that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the
> total samples taken in that function during a test. This patch rearranges
> the fast paths to reduce the amount of work done when there are no
> watchers.
> 
> The test motivating this was "perf bench sched messaging --pipe". Despite
> the fact the pipes are anonymous, fsnotify is still called a lot and
> the overhead is noticeable even though it's completely pointless. It's
> likely the overhead is negligible for real IO so this is an extreme
> example. This is a comparison of hackbench using processes and pipes on
> a 1-socket machine with 8 CPU threads without fanotify watchers.
> 
>                               5.7.0                  5.7.0
>                             vanilla      fastfsnotify-v1r1
> Amean     1       0.4837 (   0.00%)      0.4630 *   4.27%*
> Amean     3       1.5447 (   0.00%)      1.4557 (   5.76%)
> Amean     5       2.6037 (   0.00%)      2.4363 (   6.43%)
> Amean     7       3.5987 (   0.00%)      3.4757 (   3.42%)
> Amean     12      5.8267 (   0.00%)      5.6983 (   2.20%)
> Amean     18      8.4400 (   0.00%)      8.1327 (   3.64%)
> Amean     24     11.0187 (   0.00%)     10.0290 *   8.98%*
> Amean     30     13.1013 (   0.00%)     12.8510 (   1.91%)
> Amean     32     13.9190 (   0.00%)     13.2410 (   4.87%)
> 
>                        5.7.0       5.7.0
>                      vanilla fastfsnotify-v1r1
> Duration User         157.05      152.79
> Duration System      1279.98     1219.32
> Duration Elapsed      182.81      174.52
> 
> This is showing that the latencies are improved by roughly 2-9%. The
> variability is not shown but some of these results are within the noise
> as this workload heavily overloads the machine. That said, the system CPU
> usage is reduced by quite a bit so it makes sense to avoid the overhead
> even if it is a bit tricky to detect at times. A perf profile of just 1
> group of tasks showed that 5.14% of samples taken were in either fsnotify()
> or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify,
> mostly function entry and the initial check for watchers.  The check for
> watchers is complicated enough that inlining it may be controversial.
> 
> [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fsnotify.c             | 27 +++++++++++++++------------
>  include/linux/fsnotify.h         | 10 ++++++++++
>  include/linux/fsnotify_backend.h |  4 ++--
>  3 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 72d332ce8e12..d59a58d10b84 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
>  }
>  
>  /* Notify this dentry's parent about a child's events. */
> -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
>  		    int data_type)
>  {
>  	struct dentry *parent;

Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check
from here when it's moved to fsnotify_parent() inline helper?

> @@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
>  	struct fsnotify_iter_info iter_info = {};
>  	struct super_block *sb = to_tell->i_sb;
>  	struct mount *mnt = NULL;
> -	__u32 mnt_or_sb_mask = sb->s_fsnotify_mask;
>  	int ret = 0;
> -	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> +	__u32 test_mask, marks_mask;
>  
> -	if (path) {
> +	if (path)
>  		mnt = real_mount(path->mnt);
> -		mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
> -	}
> -	/* An event "on child" is not intended for a mount/sb mark */
> -	if (mask & FS_EVENT_ON_CHILD)
> -		mnt_or_sb_mask = 0;
>  
>  	/*
>  	 * Optimization: srcu_read_lock() has a memory barrier which can
> @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
>  	if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks &&
>  	    (!mnt || !mnt->mnt_fsnotify_marks))
>  		return 0;
> +
> +	/* An event "on child" is not intended for a mount/sb mark */
> +	marks_mask = to_tell->i_fsnotify_mask;
> +	if (!(mask & FS_EVENT_ON_CHILD)) {
> +		marks_mask |= sb->s_fsnotify_mask;
> +		if (mnt)
> +			marks_mask |= mnt->mnt_fsnotify_mask;
> +	}
> +
>  	/*
>  	 * if this is a modify event we may need to clear the ignored masks
>  	 * otherwise return if neither the inode nor the vfsmount/sb care about
>  	 * this type of event.
>  	 */
> -	if (!(mask & FS_MODIFY) &&
> -	    !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask)))
> +	test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> +	if (!(mask & FS_MODIFY) && !(test_mask & marks_mask))
>  		return 0;

Otherwise the patch looks good. One observation though: The (mask &
FS_MODIFY) check means that all vfs_write() calls end up going through the
"slower" path iterating all mark types and checking whether there are marks
anyway. That could be relatively simply optimized using a hidden mask flag
like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
needing special handling of FS_MODIFY... Not sure if we care enough at this
point...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 09/20] fsnotify: pass dir argument to handle_event() callback
  2020-06-12  9:33 ` [PATCH 09/20] fsnotify: pass dir argument to handle_event() callback Amir Goldstein
@ 2020-07-03 14:49   ` Jan Kara
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2020-07-03 14:49 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Fri 12-06-20 12:33:32, Amir Goldstein wrote:
> The 'inode' argument to handle_event(), sometimes referred to as
> 'to_tell' is somewhat obsolete.
> It is a remnant from the times when a group could only have an inode mark
> associated with an event.
> 
> We now pass an iter_info array to the callback, with all marks associated
> with an event.
> 
> Most backends ignore this argument, with two expections:
						^^ exceptions

> 1. dnotify uses it for sanity check that event is on directory
> 2. fanotify uses it to report fid of directory on directory entry
>    modification events
> 
> Remove the 'inode' argument and add a 'dir' argument.
> The callback function signature is deliberately changed, because
> the meaning of the argument has changed and the arguments have
> been documented.
> 
> The 'dir' argument is NULL when "sending" to a non-dir inode.
> When 'file_name' argument is non NULL, 'dir' is always referring to
> the directory that the 'file_name' entry belongs to.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Otherwise the patch looks good to me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 19/20] fanotify: move event name into fanotify_fh
  2020-06-12  9:33 ` [PATCH 19/20] fanotify: move event name into fanotify_fh Amir Goldstein
@ 2020-07-03 16:02   ` Jan Kara
  2020-07-06  8:21     ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2020-07-03 16:02 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Fri 12-06-20 12:33:42, Amir Goldstein wrote:
> An fanotify event name is always recorded relative to a dir fh.
> Move the name_len members of fanotify_name_event into unused space
> in struct fanotify_fh.
> 
> We add a name_offset member to allow packing a binary blob before
> the name string in the variable size buffer. We are going to use
> that space to store the child fid.

So how much is this packing going to save us? Currently it is 1 byte for
name events (modulo that fanotify_alloc_name_event_bug() you mention
below). With the additional fanotify_fh in the event, we'll save two more
bytes by the packing. So that doesn't really seem to be worth it to me.
Am I missing some other benefit?

Maybe your main motivation (which is not mentioned in the changelog at all
BTW) is that the whole game of inline vs out of line file handles is
pointless when we kmalloc() the event anyway because of the name? And it's
actively wasteful in case handles don't fit in the inline space. I agree
with that and it's good observation. But I'd rather leave fanotify_fh
struct alone for the cases where we want to bother with inline vs out of line
file handles and define new way of partitioning space at the end of the
event among one or two file handles and name. Something like:

struct fanotify_dynamic_info {
	u8 dirfh_len;
	u8 filefh_len;
	u8 name_len;
	unsigned char buf[];
};

And at appropriate offsets (0, dirfh_len, dirfh_len + filefh_len) there
would be additional info (e.g. type + fh for file handles). Maybe this
format will require some tweaking so that processing of both storage types
of file handles can be reasonably uniform but at this point it seems
cleaner than what you try to do fanotify_fh with combining lenghts and
offsets and some blobs in the middle...

> It also fixes a bug in fanotify_alloc_name_event() which used an
> allocation size 7 bytes bigger than required size, because it used
> sizeof(struct fanotify_name_event) without deducting that 7 bytes
> alignment padding.

								Honza
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fanotify/fanotify.c      | 27 ++++++++-----
>  fs/notify/fanotify/fanotify.h      | 62 +++++++++++++++++++++++-------
>  fs/notify/fanotify/fanotify_user.c | 23 +++++------
>  3 files changed, 75 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 3885bf63976b..3a2d48edaddd 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -52,15 +52,20 @@ static bool fanotify_fid_event_equal(struct fanotify_fid_event *ffe1,
>  static bool fanotify_name_event_equal(struct fanotify_name_event *fne1,
>  				      struct fanotify_name_event *fne2)
>  {
> +	struct fanotify_fh *dfh1 = &fne1->dir_fh;
> +	struct fanotify_fh *dfh2 = &fne2->dir_fh;
> +
>  	/* Do not merge name events without dir fh */
> -	if (!fne1->dir_fh.len)
> +	if (!dfh1->len)
>  		return false;
>  
> -	if (fne1->name_len != fne2->name_len ||
> -	    !fanotify_fh_equal(&fne1->dir_fh, &fne2->dir_fh))
> +	if (dfh1->name_len != dfh2->name_len ||
> +	    dfh1->name_offset != dfh2->name_offset ||
> +	    !fanotify_fh_equal(dfh1, dfh2))
>  		return false;
>  
> -	return !memcmp(fne1->name, fne2->name, fne1->name_len);
> +	return !memcmp(fanotify_fh_name(dfh1), fanotify_fh_name(dfh2),
> +		       dfh1->name_len);
>  }
>  
>  static bool fanotify_should_merge(struct fsnotify_event *old_fsn,
> @@ -284,8 +289,7 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
>  	void *buf = fh->buf;
>  	int err;
>  
> -	fh->type = FILEID_ROOT;
> -	fh->len = 0;
> +	fanotify_fh_init(fh);
>  	if (!inode)
>  		return;
>  
> @@ -314,6 +318,10 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode,
>  
>  	fh->type = type;
>  	fh->len = bytes;
> +	if (fh->len > FANOTIFY_INLINE_FH_LEN)
> +		fh->name_offset = FANOTIFY_INLINE_FH_LEN;
> +	else
> +		fh->name_offset = fh->len;
>  
>  	return;
>  
> @@ -401,6 +409,7 @@ struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
>  						 gfp_t gfp)
>  {
>  	struct fanotify_name_event *fne;
> +	struct fanotify_fh *dfh;
>  
>  	fne = kmalloc(sizeof(*fne) + file_name->len + 1, gfp);
>  	if (!fne)
> @@ -408,9 +417,9 @@ struct fanotify_event *fanotify_alloc_name_event(struct inode *id,
>  
>  	fne->fae.type = FANOTIFY_EVENT_TYPE_FID_NAME;
>  	fne->fsid = *fsid;
> -	fanotify_encode_fh(&fne->dir_fh, id, gfp);
> -	fne->name_len = file_name->len;
> -	strcpy(fne->name, file_name->name);
> +	dfh = &fne->dir_fh;
> +	fanotify_encode_fh(dfh, id, gfp);
> +	fanotify_fh_copy_name(dfh, file_name);
>  
>  	return &fne->fae;
>  }
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index 1b2a3bbe6008..8cb062eefd3e 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -23,13 +23,24 @@ enum {
>   * stored in either the first or last 2 dwords.
>   */
>  #define FANOTIFY_INLINE_FH_LEN	(3 << 2)
> +#define FANOTIFY_FH_HDR_LEN	offsetof(struct fanotify_fh, buf)
>  
>  struct fanotify_fh {
> -	unsigned char buf[FANOTIFY_INLINE_FH_LEN];
>  	u8 type;
>  	u8 len;
> +	u8 name_offset;
> +	u8 name_len;
> +	unsigned char buf[FANOTIFY_INLINE_FH_LEN];
>  } __aligned(4);
>  
> +static inline void fanotify_fh_init(struct fanotify_fh *fh)
> +{
> +	fh->type = FILEID_ROOT;
> +	fh->len = 0;
> +	fh->name_offset = 0;
> +	fh->name_len = 0;
> +}
> +
>  static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh)
>  {
>  	return fh->len > FANOTIFY_INLINE_FH_LEN;
> @@ -37,6 +48,7 @@ static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh)
>  
>  static inline char **fanotify_fh_ext_buf_ptr(struct fanotify_fh *fh)
>  {
> +	BUILD_BUG_ON(FANOTIFY_FH_HDR_LEN % 4);
>  	BUILD_BUG_ON(__alignof__(char *) - 4 + sizeof(char *) >
>  		     FANOTIFY_INLINE_FH_LEN);
>  	return (char **)ALIGN((unsigned long)(fh->buf), __alignof__(char *));
> @@ -52,6 +64,35 @@ static inline void *fanotify_fh_buf(struct fanotify_fh *fh)
>  	return fanotify_fh_has_ext_buf(fh) ? fanotify_fh_ext_buf(fh) : fh->buf;
>  }
>  
> +static inline int fanotify_fh_blob_len(struct fanotify_fh *fh)
> +{
> +	if (fh->name_offset <= fh->len)
> +		return 0;
> +
> +	/* Is there a space between end of fh_buf and start of name? */
> +	return fh->name_offset - fh->len;
> +}
> +
> +static inline void *fanotify_fh_blob(struct fanotify_fh *fh)
> +{
> +	if (fh->name_offset <= fh->len)
> +		return NULL;
> +
> +	return fh->buf + fh->len;
> +}
> +
> +static inline const char *fanotify_fh_name(struct fanotify_fh *fh)
> +{
> +	return fh->name_len ? fh->buf + fh->name_offset : NULL;
> +}
> +
> +static inline void fanotify_fh_copy_name(struct fanotify_fh *fh,
> +					 const struct qstr *name)
> +{
> +	fh->name_len = name->len;
> +	strcpy(fh->buf + fh->name_offset, name->name);
> +}
> +
>  /*
>   * Common structure for fanotify events. Concrete structs are allocated in
>   * fanotify_handle_event() and freed when the information is retrieved by
> @@ -93,12 +134,16 @@ FANOTIFY_FE(struct fanotify_event *event)
>  	return container_of(event, struct fanotify_fid_event, fae);
>  }
>  
> +/*
> + * This is identical to struct fanotify_fid_event, but allocated with variable
> + * size kmalloc and should have positive value of dir_fh.name_len.
> + * Keeping the separate struct definition for semantics and type safety -
> + * an event should be cast to this type IFF it was allocated using kmalloc.
> + */
>  struct fanotify_name_event {
>  	struct fanotify_event fae;
>  	__kernel_fsid_t fsid;
>  	struct fanotify_fh dir_fh;
> -	u8 name_len;
> -	char name[];
>  };
>  
>  static inline struct fanotify_name_event *
> @@ -142,17 +187,6 @@ static inline int fanotify_event_object_fh_len(struct fanotify_event *event)
>  	return fh ? fh->len : 0;
>  }
>  
> -static inline bool fanotify_event_has_name(struct fanotify_event *event)
> -{
> -	return event->type == FANOTIFY_EVENT_TYPE_FID_NAME;
> -}
> -
> -static inline int fanotify_event_name_len(struct fanotify_event *event)
> -{
> -	return fanotify_event_has_name(event) ?
> -		FANOTIFY_NE(event)->name_len : 0;
> -}
> -
>  struct fanotify_path_event {
>  	struct fanotify_event fae;
>  	struct path path;
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 42b8cc51cb3f..af8268b44c68 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -68,17 +68,14 @@ static int fanotify_event_info_len(struct fanotify_event *event)
>  {
>  	int info_len = 0;
>  	int fh_len = fanotify_event_object_fh_len(event);
> +	struct fanotify_fh *dfh = fanotify_event_dir_fh(event);
> +
> +	if (dfh)
> +		info_len += fanotify_fid_info_len(dfh->len, dfh->name_len);
>  
>  	if (fh_len)
>  		info_len += fanotify_fid_info_len(fh_len, 0);
>  
> -	if (fanotify_event_name_len(event)) {
> -		struct fanotify_name_event *fne = FANOTIFY_NE(event);
> -
> -		info_len += fanotify_fid_info_len(fne->dir_fh.len,
> -						  fne->name_len);
> -	}
> -
>  	return info_len;
>  }
>  
> @@ -305,6 +302,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>  {
>  	struct fanotify_event_metadata metadata;
>  	struct path *path = fanotify_event_path(event);
> +	struct fanotify_fh *dfh = fanotify_event_dir_fh(event);
>  	struct file *f = NULL;
>  	int ret, fd = FAN_NOFD;
>  
> @@ -346,13 +344,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>  		fd_install(fd, f);
>  
>  	/* Event info records order is: dir fid + name, child fid */
> -	if (fanotify_event_name_len(event)) {
> -		struct fanotify_name_event *fne = FANOTIFY_NE(event);
> -
> -		ret = copy_info_to_user(fanotify_event_fsid(event),
> -					fanotify_event_dir_fh(event),
> -					fne->name, fne->name_len,
> -					buf, count);
> +	if (dfh) {
> +		ret = copy_info_to_user(fanotify_event_fsid(event), dfh,
> +					fanotify_fh_name(dfh),
> +					dfh->name_len, buf, count);
>  		if (ret < 0)
>  			return ret;
>  
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher
  2020-07-03 14:03   ` Jan Kara
@ 2020-07-04  9:30     ` Amir Goldstein
  2020-07-06 11:05       ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2020-07-04  9:30 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Mel Gorman

On Fri, Jul 3, 2020 at 5:03 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 12-06-20 12:33:24, Amir Goldstein wrote:
> > From: Mel Gorman <mgorman@techsingularity.net>
> >
> > The fsnotify paths are trivial to hit even when there are no watchers and
> > they are surprisingly expensive. For example, every successful vfs_write()
> > hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless
> > FMODE_NONOTIFY is set which is an internal flag invisible to userspace.
> > As it stands, fsnotify_parent is a guaranteed functional call even if there
> > are no watchers and fsnotify() does a substantial amount of unnecessary
> > work before it checks if there are any watchers. A perf profile showed
> > that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the
> > total samples taken in that function during a test. This patch rearranges
> > the fast paths to reduce the amount of work done when there are no
> > watchers.
> >
> > The test motivating this was "perf bench sched messaging --pipe". Despite
> > the fact the pipes are anonymous, fsnotify is still called a lot and
> > the overhead is noticeable even though it's completely pointless. It's
> > likely the overhead is negligible for real IO so this is an extreme
> > example. This is a comparison of hackbench using processes and pipes on
> > a 1-socket machine with 8 CPU threads without fanotify watchers.
> >
> >                               5.7.0                  5.7.0
> >                             vanilla      fastfsnotify-v1r1
> > Amean     1       0.4837 (   0.00%)      0.4630 *   4.27%*
> > Amean     3       1.5447 (   0.00%)      1.4557 (   5.76%)
> > Amean     5       2.6037 (   0.00%)      2.4363 (   6.43%)
> > Amean     7       3.5987 (   0.00%)      3.4757 (   3.42%)
> > Amean     12      5.8267 (   0.00%)      5.6983 (   2.20%)
> > Amean     18      8.4400 (   0.00%)      8.1327 (   3.64%)
> > Amean     24     11.0187 (   0.00%)     10.0290 *   8.98%*
> > Amean     30     13.1013 (   0.00%)     12.8510 (   1.91%)
> > Amean     32     13.9190 (   0.00%)     13.2410 (   4.87%)
> >
> >                        5.7.0       5.7.0
> >                      vanilla fastfsnotify-v1r1
> > Duration User         157.05      152.79
> > Duration System      1279.98     1219.32
> > Duration Elapsed      182.81      174.52
> >
> > This is showing that the latencies are improved by roughly 2-9%. The
> > variability is not shown but some of these results are within the noise
> > as this workload heavily overloads the machine. That said, the system CPU
> > usage is reduced by quite a bit so it makes sense to avoid the overhead
> > even if it is a bit tricky to detect at times. A perf profile of just 1
> > group of tasks showed that 5.14% of samples taken were in either fsnotify()
> > or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify,
> > mostly function entry and the initial check for watchers.  The check for
> > watchers is complicated enough that inlining it may be controversial.
> >
> > [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask
> >
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/notify/fsnotify.c             | 27 +++++++++++++++------------
> >  include/linux/fsnotify.h         | 10 ++++++++++
> >  include/linux/fsnotify_backend.h |  4 ++--
> >  3 files changed, 27 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> > index 72d332ce8e12..d59a58d10b84 100644
> > --- a/fs/notify/fsnotify.c
> > +++ b/fs/notify/fsnotify.c
> > @@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
> >  }
> >
> >  /* Notify this dentry's parent about a child's events. */
> > -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> > +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> >                   int data_type)
> >  {
> >       struct dentry *parent;
>
> Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check
> from here when it's moved to fsnotify_parent() inline helper?

No point.
It is making a comeback on:
 fsnotify: send event with parent/name info to sb/mount/non-dir marks

>
> > @@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> >       struct fsnotify_iter_info iter_info = {};
> >       struct super_block *sb = to_tell->i_sb;
> >       struct mount *mnt = NULL;
> > -     __u32 mnt_or_sb_mask = sb->s_fsnotify_mask;
> >       int ret = 0;
> > -     __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> > +     __u32 test_mask, marks_mask;
> >
> > -     if (path) {
> > +     if (path)
> >               mnt = real_mount(path->mnt);
> > -             mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
> > -     }
> > -     /* An event "on child" is not intended for a mount/sb mark */
> > -     if (mask & FS_EVENT_ON_CHILD)
> > -             mnt_or_sb_mask = 0;
> >
> >       /*
> >        * Optimization: srcu_read_lock() has a memory barrier which can
> > @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> >       if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks &&
> >           (!mnt || !mnt->mnt_fsnotify_marks))
> >               return 0;
> > +
> > +     /* An event "on child" is not intended for a mount/sb mark */
> > +     marks_mask = to_tell->i_fsnotify_mask;
> > +     if (!(mask & FS_EVENT_ON_CHILD)) {
> > +             marks_mask |= sb->s_fsnotify_mask;
> > +             if (mnt)
> > +                     marks_mask |= mnt->mnt_fsnotify_mask;
> > +     }
> > +
> >       /*
> >        * if this is a modify event we may need to clear the ignored masks
> >        * otherwise return if neither the inode nor the vfsmount/sb care about
> >        * this type of event.
> >        */
> > -     if (!(mask & FS_MODIFY) &&
> > -         !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask)))
> > +     test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> > +     if (!(mask & FS_MODIFY) && !(test_mask & marks_mask))
> >               return 0;
>
> Otherwise the patch looks good. One observation though: The (mask &
> FS_MODIFY) check means that all vfs_write() calls end up going through the
> "slower" path iterating all mark types and checking whether there are marks
> anyway. That could be relatively simply optimized using a hidden mask flag
> like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> needing special handling of FS_MODIFY... Not sure if we care enough at this
> point...

Yeh that sounds low hanging.
Actually, I Don't think we need to define a flag for that.
__fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.

I will take a look at that as part of FS_PRE_MODIFY work.
But in general, we should fight the urge to optimize theoretic
performance issues...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 19/20] fanotify: move event name into fanotify_fh
  2020-07-03 16:02   ` Jan Kara
@ 2020-07-06  8:21     ` Amir Goldstein
  2020-07-06 15:24       ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2020-07-06  8:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

On Fri, Jul 3, 2020 at 7:02 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 12-06-20 12:33:42, Amir Goldstein wrote:
> > An fanotify event name is always recorded relative to a dir fh.
> > Move the name_len members of fanotify_name_event into unused space
> > in struct fanotify_fh.
> >
> > We add a name_offset member to allow packing a binary blob before
> > the name string in the variable size buffer. We are going to use
> > that space to store the child fid.
>
> So how much is this packing going to save us? Currently it is 1 byte for
> name events (modulo that fanotify_alloc_name_event_bug() you mention
> below). With the additional fanotify_fh in the event, we'll save two more
> bytes by the packing. So that doesn't really seem to be worth it to me.
> Am I missing some other benefit?
>
> Maybe your main motivation (which is not mentioned in the changelog at all
> BTW) is that the whole game of inline vs out of line file handles is
> pointless when we kmalloc() the event anyway because of the name?

The only motivation, which is written in the commit message is to make
space to store the child file handle. Saving space is just a by product.
In fact, the new parceling code looses this space back to alignment
and I am perfectly fine with that.

> And it's
> actively wasteful in case handles don't fit in the inline space. I agree
> with that and it's good observation. But I'd rather leave fanotify_fh
> struct alone for the cases where we want to bother with inline vs out of line
> file handles and define new way of partitioning space at the end of the
> event among one or two file handles and name. Something like:
>
> struct fanotify_dynamic_info {

I called this fanotify_info. There is no ambiguity that justifies _dynamic_.
The encapsulations are:

fanotify_fid_event { ..., fanotify_fid { ..,buf[INLINE_BUF] } }
fanotify_name_event { ..., fanotify_info { fanotify_fid { ..,buf[..]
}+, name[..] }


>         u8 dirfh_len;
>         u8 filefh_len;

I called these {dir,file}_fh_totlen to distinguish from fh->len, which does not
include the size of the fanotify_fh header fields.

>         u8 name_len;
>         unsigned char buf[];

This had to be 4 bytes aligned to contain fanotify_fh.

> };
>
> And at appropriate offsets (0, dirfh_len, dirfh_len + filefh_len) there
> would be additional info (e.g. type + fh for file handles). Maybe this
> format will require some tweaking so that processing of both storage types
> of file handles can be reasonably uniform but at this point it seems
> cleaner than what you try to do fanotify_fh with combining lenghts and
> offsets and some blobs in the middle...
>

I tried your suggestion (with the minor modifications above) and I
like the result.
Pushed prep series with 2 last patches changed to branch fanotify_prep.
Old prep series is at fanotify_prep-v2.
Pushed tested full series adapted to this change to fanotify_name_fid.
Old full series is at fanotify_name_fid-v4.

There was almost no changes to the fanotify_name_fid patches besides
adapting the accessors, e.g.:
-               fanotify_fh_blob(&FANOTIFY_NE(event)->dir_fh);
+              fanotify_info_file_fh(&FANOTIFY_NE(event)->info);

Please let me know if you want me to post fanotify_name_fid-v5 with these
changes.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher
  2020-07-04  9:30     ` Amir Goldstein
@ 2020-07-06 11:05       ` Jan Kara
  2020-07-09 17:56         ` fsnotify: minimise overhead when there are no marks with ignore mask Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2020-07-06 11:05 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, Mel Gorman

On Sat 04-07-20 12:30:10, Amir Goldstein wrote:
> On Fri, Jul 3, 2020 at 5:03 PM Jan Kara <jack@suse.cz> wrote:
> > >  /* Notify this dentry's parent about a child's events. */
> > > -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> > > +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> > >                   int data_type)
> > >  {
> > >       struct dentry *parent;
> >
> > Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check
> > from here when it's moved to fsnotify_parent() inline helper?
> 
> No point.
> It is making a comeback on:
>  fsnotify: send event with parent/name info to sb/mount/non-dir marks

Right, I've noticed that later as well.

> > > @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> > >       if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks &&
> > >           (!mnt || !mnt->mnt_fsnotify_marks))
> > >               return 0;
> > > +
> > > +     /* An event "on child" is not intended for a mount/sb mark */
> > > +     marks_mask = to_tell->i_fsnotify_mask;
> > > +     if (!(mask & FS_EVENT_ON_CHILD)) {
> > > +             marks_mask |= sb->s_fsnotify_mask;
> > > +             if (mnt)
> > > +                     marks_mask |= mnt->mnt_fsnotify_mask;
> > > +     }
> > > +
> > >       /*
> > >        * if this is a modify event we may need to clear the ignored masks
> > >        * otherwise return if neither the inode nor the vfsmount/sb care about
> > >        * this type of event.
> > >        */
> > > -     if (!(mask & FS_MODIFY) &&
> > > -         !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask)))
> > > +     test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> > > +     if (!(mask & FS_MODIFY) && !(test_mask & marks_mask))
> > >               return 0;
> >
> > Otherwise the patch looks good. One observation though: The (mask &
> > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > "slower" path iterating all mark types and checking whether there are marks
> > anyway. That could be relatively simply optimized using a hidden mask flag
> > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > point...
> 
> Yeh that sounds low hanging.
> Actually, I Don't think we need to define a flag for that.
> __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.

Yes, that would be even more elegant.

> I will take a look at that as part of FS_PRE_MODIFY work.
> But in general, we should fight the urge to optimize theoretic
> performance issues...

Agreed. I just suspect this may bring measurable benefit for hackbench pipe
or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
separate idea and without some numbers confirming my suspicion I don't
think the complication is worth it so I don't want you to burn time on this
unless you're really interested :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 19/20] fanotify: move event name into fanotify_fh
  2020-07-06  8:21     ` Amir Goldstein
@ 2020-07-06 15:24       ` Jan Kara
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2020-07-06 15:24 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Mon 06-07-20 11:21:24, Amir Goldstein wrote:
> On Fri, Jul 3, 2020 at 7:02 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Fri 12-06-20 12:33:42, Amir Goldstein wrote:
> > > An fanotify event name is always recorded relative to a dir fh.
> > > Move the name_len members of fanotify_name_event into unused space
> > > in struct fanotify_fh.
> > >
> > > We add a name_offset member to allow packing a binary blob before
> > > the name string in the variable size buffer. We are going to use
> > > that space to store the child fid.
> >
> > So how much is this packing going to save us? Currently it is 1 byte for
> > name events (modulo that fanotify_alloc_name_event_bug() you mention
> > below). With the additional fanotify_fh in the event, we'll save two more
> > bytes by the packing. So that doesn't really seem to be worth it to me.
> > Am I missing some other benefit?
> >
> > Maybe your main motivation (which is not mentioned in the changelog at all
> > BTW) is that the whole game of inline vs out of line file handles is
> > pointless when we kmalloc() the event anyway because of the name?
> 
> The only motivation, which is written in the commit message is to make
> space to store the child file handle. Saving space is just a by product.
> In fact, the new parceling code looses this space back to alignment
> and I am perfectly fine with that.

Yeah, I think the loss is acceptable.

> I tried your suggestion (with the minor modifications above) and I
> like the result.
> Pushed prep series with 2 last patches changed to branch fanotify_prep.
> Old prep series is at fanotify_prep-v2.

Yeah, I like the result as well. I've left some minor comments on github.
Please repost the preparatory series once you address the comments so that
we have something for final review and picking up into my tree.

> Pushed tested full series adapted to this change to fanotify_name_fid.
> Old full series is at fanotify_name_fid-v4.
> 
> There was almost no changes to the fanotify_name_fid patches besides
> adapting the accessors, e.g.:
> -               fanotify_fh_blob(&FANOTIFY_NE(event)->dir_fh);
> +              fanotify_info_file_fh(&FANOTIFY_NE(event)->info);
> 
> Please let me know if you want me to post fanotify_name_fid-v5 with these
> changes.

No need to repost at this point I guess. I can do a high-level check with
what I have...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: fsnotify: minimise overhead when there are no marks with ignore mask
  2020-07-06 11:05       ` Jan Kara
@ 2020-07-09 17:56         ` Amir Goldstein
  2020-07-26 15:20           ` fsnotify: minimise overhead when there are no marks related to sb Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2020-07-09 17:56 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Mel Gorman

> > > Otherwise the patch looks good. One observation though: The (mask &
> > > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > > "slower" path iterating all mark types and checking whether there are marks
> > > anyway. That could be relatively simply optimized using a hidden mask flag
> > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > > point...
> >
> > Yeh that sounds low hanging.
> > Actually, I Don't think we need to define a flag for that.
> > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.
>
> Yes, that would be even more elegant.
>
> > I will take a look at that as part of FS_PRE_MODIFY work.
> > But in general, we should fight the urge to optimize theoretic
> > performance issues...
>
> Agreed. I just suspect this may bring measurable benefit for hackbench pipe
> or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
> separate idea and without some numbers confirming my suspicion I don't
> think the complication is worth it so I don't want you to burn time on this
> unless you're really interested :).
>

You know me ;-)
FS_MODIFY optimization pushed to fsnotify_pre_modify branch.
Only tested that LTP tests pass.

Note that this is only expected to improve performance in case there *are*
marks, but not marks with ignore mask, because there is an earlier
optimization in fsnotify() for the no marks case.

Sorry for bombarding you with more patches (I should let you finish with
fanotify_prep and fanotify_name_fid), but if you get a chance and can
take a quick look at these 2 patches on fsnotify_pre_modify branch:
1. fsnotify: replace igrab() with ihold() on attach connector
2. fsnotify: allow adding an inode mark without pinning inode

They are very small and simple, but I am afraid I may be missing something.

Why did we use igrab() there in the first place? Is there a reason or is it
relic from old code?

As for the second patch, I won't get into why I need the evictable inode
marks right now, but I was wondering if there was some inherent reason
that I am missing why that cannot be done and inodes *have* to be pinned
if you attach a mark to them (besides functionality of course)?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: fsnotify: minimise overhead when there are no marks related to sb
  2020-07-09 17:56         ` fsnotify: minimise overhead when there are no marks with ignore mask Amir Goldstein
@ 2020-07-26 15:20           ` Amir Goldstein
  2020-07-27  7:44             ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2020-07-26 15:20 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Jan Kara, linux-fsdevel

On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > > Otherwise the patch looks good. One observation though: The (mask &
> > > > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > > > "slower" path iterating all mark types and checking whether there are marks
> > > > anyway. That could be relatively simply optimized using a hidden mask flag
> > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > > > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > > > point...
> > >
> > > Yeh that sounds low hanging.
> > > Actually, I Don't think we need to define a flag for that.
> > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.
> >
> > Yes, that would be even more elegant.
> >
> > > I will take a look at that as part of FS_PRE_MODIFY work.
> > > But in general, we should fight the urge to optimize theoretic
> > > performance issues...
> >
> > Agreed. I just suspect this may bring measurable benefit for hackbench pipe
> > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
> > separate idea and without some numbers confirming my suspicion I don't
> > think the complication is worth it so I don't want you to burn time on this
> > unless you're really interested :).
> >
>
> You know me ;-)
> FS_MODIFY optimization pushed to fsnotify_pre_modify branch.
> Only tested that LTP tests pass.
>
> Note that this is only expected to improve performance in case there *are*
> marks, but not marks with ignore mask, because there is an earlier
> optimization in fsnotify() for the no marks case.
>

Hi Mel,

After following up on Jan's suggestion above, I realized there is another
low hanging optimization we can make.

As you may remember, one of the solutions we considered was to exclude
special or internal sb's from notifications based on some SB flag, but making
assumptions about which sb are expected to provide notifications turned out
to be a risky game.

We can however, keep a counter on sb to *know* there are no watches
on any object in this sb, so the test:

        if (!sb->s_fsnotify_marks &&
            (!mnt || !mnt->mnt_fsnotify_marks) &&
            (!inode || !inode->i_fsnotify_marks))
                return 0;

Which is not so nice for inlining, can be summarized as:

        if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0)
                return 0;

Which is nicer for inlining.

I am not sure if you had a concrete reason for:
"fs: Do not check if there is a fsnotify watcher on pseudo inodes"
or if you did it for the sport.

I have made the change above for the sport and for now I do not
intend to post it for review unless a good reason comes up.

If you are interested or curious to queue this code to Suse perf testing,
I pushed it to branch fsnotify-perf [1]. It may be interesting to see if it
won back the cpu cycles lost in the revert of your patch.

This branch is based on some changes already in Jan's tree and some
changes in my development tree (fsnotify_pre_modify), but you already
fed this development branch to perf test machine once and reported back
that there was no significant degradation.

I can also provide the optimization patches based on Linus' tree if needed.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fsnotify-perf

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: fsnotify: minimise overhead when there are no marks related to sb
  2020-07-26 15:20           ` fsnotify: minimise overhead when there are no marks related to sb Amir Goldstein
@ 2020-07-27  7:44             ` Jan Kara
  2020-07-27 10:02               ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2020-07-27  7:44 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Mel Gorman, Jan Kara, linux-fsdevel

On Sun 26-07-20 18:20:26, Amir Goldstein wrote:
> On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > > > Otherwise the patch looks good. One observation though: The (mask &
> > > > > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > > > > "slower" path iterating all mark types and checking whether there are marks
> > > > > anyway. That could be relatively simply optimized using a hidden mask flag
> > > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > > > > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > > > > point...
> > > >
> > > > Yeh that sounds low hanging.
> > > > Actually, I Don't think we need to define a flag for that.
> > > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.
> > >
> > > Yes, that would be even more elegant.
> > >
> > > > I will take a look at that as part of FS_PRE_MODIFY work.
> > > > But in general, we should fight the urge to optimize theoretic
> > > > performance issues...
> > >
> > > Agreed. I just suspect this may bring measurable benefit for hackbench pipe
> > > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
> > > separate idea and without some numbers confirming my suspicion I don't
> > > think the complication is worth it so I don't want you to burn time on this
> > > unless you're really interested :).
> > >
> >
> > You know me ;-)
> > FS_MODIFY optimization pushed to fsnotify_pre_modify branch.
> > Only tested that LTP tests pass.
> >
> > Note that this is only expected to improve performance in case there *are*
> > marks, but not marks with ignore mask, because there is an earlier
> > optimization in fsnotify() for the no marks case.
> >
> 
> Hi Mel,
> 
> After following up on Jan's suggestion above, I realized there is another
> low hanging optimization we can make.
> 
> As you may remember, one of the solutions we considered was to exclude
> special or internal sb's from notifications based on some SB flag, but making
> assumptions about which sb are expected to provide notifications turned out
> to be a risky game.
> 
> We can however, keep a counter on sb to *know* there are no watches
> on any object in this sb, so the test:
> 
>         if (!sb->s_fsnotify_marks &&
>             (!mnt || !mnt->mnt_fsnotify_marks) &&
>             (!inode || !inode->i_fsnotify_marks))
>                 return 0;
> 
> Which is not so nice for inlining, can be summarized as:
> 
>         if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0)
>                 return 0;
> 
> Which is nicer for inlining.

That's a nice idea. I was just wondering why do you account only inode
references in the superblock. Because if there's only say mount watch,
s_fsnotify_obj_refs will be 0 and you will wrongly skip reporting. Or am I
misunderstanding something? I'd rather have counter like
sb->s_fsnotify_connectors, that will account all connectors related to the
superblock - i.e., connectors attached to the superblock, mounts referring
to the superblock, or inodes referring to the superblock...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: fsnotify: minimise overhead when there are no marks related to sb
  2020-07-27  7:44             ` Jan Kara
@ 2020-07-27 10:02               ` Amir Goldstein
  0 siblings, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2020-07-27 10:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: Mel Gorman, linux-fsdevel

On Mon, Jul 27, 2020 at 10:44 AM Jan Kara <jack@suse.cz> wrote:
>
> On Sun 26-07-20 18:20:26, Amir Goldstein wrote:
> > On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > > > > Otherwise the patch looks good. One observation though: The (mask &
> > > > > > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > > > > > "slower" path iterating all mark types and checking whether there are marks
> > > > > > anyway. That could be relatively simply optimized using a hidden mask flag
> > > > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > > > > > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > > > > > point...
> > > > >
> > > > > Yeh that sounds low hanging.
> > > > > Actually, I Don't think we need to define a flag for that.
> > > > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.
> > > >
> > > > Yes, that would be even more elegant.
> > > >
> > > > > I will take a look at that as part of FS_PRE_MODIFY work.
> > > > > But in general, we should fight the urge to optimize theoretic
> > > > > performance issues...
> > > >
> > > > Agreed. I just suspect this may bring measurable benefit for hackbench pipe
> > > > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
> > > > separate idea and without some numbers confirming my suspicion I don't
> > > > think the complication is worth it so I don't want you to burn time on this
> > > > unless you're really interested :).
> > > >
> > >
> > > You know me ;-)
> > > FS_MODIFY optimization pushed to fsnotify_pre_modify branch.
> > > Only tested that LTP tests pass.
> > >
> > > Note that this is only expected to improve performance in case there *are*
> > > marks, but not marks with ignore mask, because there is an earlier
> > > optimization in fsnotify() for the no marks case.
> > >
> >
> > Hi Mel,
> >
> > After following up on Jan's suggestion above, I realized there is another
> > low hanging optimization we can make.
> >
> > As you may remember, one of the solutions we considered was to exclude
> > special or internal sb's from notifications based on some SB flag, but making
> > assumptions about which sb are expected to provide notifications turned out
> > to be a risky game.
> >
> > We can however, keep a counter on sb to *know* there are no watches
> > on any object in this sb, so the test:
> >
> >         if (!sb->s_fsnotify_marks &&
> >             (!mnt || !mnt->mnt_fsnotify_marks) &&
> >             (!inode || !inode->i_fsnotify_marks))
> >                 return 0;
> >
> > Which is not so nice for inlining, can be summarized as:
> >
> >         if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0)
> >                 return 0;
> >
> > Which is nicer for inlining.
>
> That's a nice idea. I was just wondering why do you account only inode
> references in the superblock. Because if there's only say mount watch,
> s_fsnotify_obj_refs will be 0 and you will wrongly skip reporting. Or am I
> misunderstanding something? I'd rather have counter like
> sb->s_fsnotify_connectors, that will account all connectors related to the
> superblock - i.e., connectors attached to the superblock, mounts referring
> to the superblock, or inodes referring to the superblock...
>

Yeh, that is what I did.
Those two commits change the former s_fsnotify_inode_refs
to s_fsnotify_obj_refs which counts objects (inodes/mounts/sb) pointed to
be connectors.
I agree that s_fsnotify_connectors may be a better choice of name ;-)

de1255f8a64c fsnotify: count all objects with attached connectors
5e6c3af6e2df fsnotify: count s_fsnotify_inode_refs for attached connectors

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2020-07-27 10:02 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-12  9:33 [PATCH 00/20] Prep work for fanotify named events Amir Goldstein
2020-06-12  9:33 ` [PATCH 01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher Amir Goldstein
2020-07-03 14:03   ` Jan Kara
2020-07-04  9:30     ` Amir Goldstein
2020-07-06 11:05       ` Jan Kara
2020-07-09 17:56         ` fsnotify: minimise overhead when there are no marks with ignore mask Amir Goldstein
2020-07-26 15:20           ` fsnotify: minimise overhead when there are no marks related to sb Amir Goldstein
2020-07-27  7:44             ` Jan Kara
2020-07-27 10:02               ` Amir Goldstein
2020-06-12  9:33 ` [PATCH 02/20] fsnotify: fold fsnotify() call into fsnotify_parent() Amir Goldstein
2020-06-12  9:33 ` [PATCH 03/20] fsnotify: return non const from fsnotify_data_inode() Amir Goldstein
2020-06-12  9:33 ` [PATCH 04/20] nfsd: use fsnotify_data_inode() to get the unlinked inode Amir Goldstein
2020-06-12 10:25   ` Amir Goldstein
2020-06-12  9:33 ` [PATCH 05/20] kernfs: do not call fsnotify() with name without a parent Amir Goldstein
2020-06-29 13:27   ` Tejun Heo
2020-06-29 16:11   ` Greg Kroah-Hartman
2020-06-12  9:33 ` [PATCH 06/20] inotify: do not use objectid when comparing events Amir Goldstein
2020-06-12  9:33 ` [PATCH 07/20] fanotify: create overflow event type Amir Goldstein
2020-06-12  9:33 ` [PATCH 08/20] fanotify: break up fanotify_alloc_event() Amir Goldstein
2020-06-12  9:33 ` [PATCH 09/20] fsnotify: pass dir argument to handle_event() callback Amir Goldstein
2020-07-03 14:49   ` Jan Kara
2020-06-12  9:33 ` [PATCH 10/20] fanotify: generalize the handling of extra event flags Amir Goldstein
2020-06-12  9:33 ` [PATCH 11/20] fanotify: generalize merge logic of events on dir Amir Goldstein
2020-06-12  9:33 ` [PATCH 12/20] fanotify: distinguish between fid encode error and null fid Amir Goldstein
2020-06-12  9:33 ` [PATCH 13/20] fanotify: generalize test for FAN_REPORT_FID Amir Goldstein
2020-06-12  9:33 ` [PATCH 14/20] fanotify: mask out special event flags from ignored mask Amir Goldstein
2020-06-12  9:33 ` [PATCH 15/20] fanotify: prepare for implicit event flags in mark mask Amir Goldstein
2020-06-12  9:33 ` [PATCH 16/20] fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marks Amir Goldstein
2020-06-12  9:33 ` [PATCH 17/20] fanotify: remove event FAN_DIR_MODIFY Amir Goldstein
2020-06-12  9:33 ` [PATCH 18/20] fsnotify: add object type "child" to object type iterator Amir Goldstein
2020-06-12  9:33 ` [PATCH 19/20] fanotify: move event name into fanotify_fh Amir Goldstein
2020-07-03 16:02   ` Jan Kara
2020-07-06  8:21     ` Amir Goldstein
2020-07-06 15:24       ` Jan Kara
2020-06-12  9:33 ` [PATCH 20/20] fanotify: no external fh buffer in fanotify_name_event Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).