All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] fsnotify: optimize the case of no access event watchers
@ 2024-01-09 19:48 Amir Goldstein
  2024-01-09 20:12 ` Jens Axboe
  2024-01-10 13:56 ` Jan Kara
  0 siblings, 2 replies; 12+ messages in thread
From: Amir Goldstein @ 2024-01-09 19:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jan Kara, Christian Brauner, linux-fsdevel

Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
optimized the case where there are no fsnotify watchers on any of the
filesystem's objects.

It is quite common for a system to have a single local filesystem and
it is quite common for the system to have some inotify watches on some
config files or directories, so the optimization of no marks at all is
often not in effect.

Access event watchers are far less common, so optimizing the case of
no marks with access events could improve performance for more systems,
especially for the performance sensitive hot io path.

Maintain a per-sb counter of objects that have marks with access
events in their mask and use that counter to optimize out the call to
fsnotify() in fsnotify access hooks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jens,

You may want to try if this patch improves performance for your workload
with SECURITY=Y and FANOTIFY_ACCESS_PERMISSIONS=Y.

I am not at all sure that access events are the right thing to optimize.
I am pretty sure that access event watchers are rare in the wild, unlike
FS_MODIFY watchers that are common (e.g. tail -f).

Let me know how this patch works for you and we can continue the
discussion with Jan from there.

Thanks,
Amir.

 fs/notify/mark.c                 | 27 ++++++++++++++++++++++++++-
 include/linux/fs.h               |  1 +
 include/linux/fsnotify.h         | 14 +++++++++++---
 include/linux/fsnotify_backend.h |  2 ++
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index d6944ff86ffa..14dc581df414 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -153,9 +153,29 @@ static struct inode *fsnotify_update_iref(struct fsnotify_mark_connector *conn,
 	return inode;
 }
 
+/*
+ * To avoid the performance penalty of rare access event watchers in the hot
+ * io path, keep track of whether any such watchers exists on the filesystem.
+ */
+static void fsnotify_update_sb_watchers(struct fsnotify_mark_connector *conn,
+					u32 old_mask, u32 new_mask)
+{
+	struct super_block *sb = fsnotify_connector_sb(conn);
+	bool old_watchers = old_mask & ALL_FSNOTIFY_ACCESS_EVENTS;
+	bool new_watchers = new_mask & ALL_FSNOTIFY_ACCESS_EVENTS;
+
+	if (!sb || old_watchers == new_watchers)
+		return;
+
+	if (new_watchers)
+		atomic_long_inc(&sb->s_fsnotify_access_watchers);
+	else
+		atomic_long_dec(&sb->s_fsnotify_access_watchers);
+}
+
 static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 {
-	u32 new_mask = 0;
+	u32 old_mask, new_mask = 0;
 	bool want_iref = false;
 	struct fsnotify_mark *mark;
 
@@ -163,6 +183,7 @@ static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 	/* We can get detached connector here when inode is getting unlinked. */
 	if (!fsnotify_valid_obj_type(conn->type))
 		return NULL;
+	old_mask = fsnotify_conn_mask(conn);
 	hlist_for_each_entry(mark, &conn->list, obj_list) {
 		if (!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED))
 			continue;
@@ -172,6 +193,7 @@ static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 			want_iref = true;
 	}
 	*fsnotify_conn_mask_p(conn) = new_mask;
+	fsnotify_update_sb_watchers(conn, old_mask, new_mask);
 
 	return fsnotify_update_iref(conn, want_iref);
 }
@@ -243,11 +265,13 @@ static void *fsnotify_detach_connector_from_object(
 					unsigned int *type)
 {
 	struct inode *inode = NULL;
+	u32 old_mask;
 
 	*type = conn->type;
 	if (conn->type == FSNOTIFY_OBJ_TYPE_DETACHED)
 		return NULL;
 
+	old_mask = fsnotify_conn_mask(conn);
 	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) {
 		inode = fsnotify_conn_inode(conn);
 		inode->i_fsnotify_mask = 0;
@@ -261,6 +285,7 @@ static void *fsnotify_detach_connector_from_object(
 		fsnotify_conn_sb(conn)->s_fsnotify_mask = 0;
 	}
 
+	fsnotify_update_sb_watchers(conn, old_mask, 0);
 	fsnotify_put_sb_connectors(conn);
 	rcu_assign_pointer(*(conn->obj), NULL);
 	conn->obj = NULL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 42298095b7a5..643d2aeb037d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1282,6 +1282,7 @@ struct super_block {
 	 * inodes objects are currently double-accounted.
 	 */
 	atomic_long_t s_fsnotify_connectors;
+	atomic_long_t s_fsnotify_access_watchers;
 
 	/* Read-only state of the superblock is being changed */
 	int s_readonly_remount;
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 8300a5286988..9dba2e038017 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -17,6 +17,14 @@
 #include <linux/slab.h>
 #include <linux/bug.h>
 
+/* Are there any inode/mount/sb objects that are being watched? */
+static inline int fsnotify_sb_has_watchers(struct super_block *sb, __u32 mask)
+{
+	if (mask & ALL_FSNOTIFY_ACCESS_EVENTS)
+		return atomic_long_read(&sb->s_fsnotify_access_watchers);
+	return atomic_long_read(&sb->s_fsnotify_connectors);
+}
+
 /*
  * Notify this @dir inode about a change in a child directory entry.
  * The directory entry may have turned positive or negative or its inode may
@@ -30,7 +38,7 @@ static inline int fsnotify_name(__u32 mask, const void *data, int data_type,
 				struct inode *dir, const struct qstr *name,
 				u32 cookie)
 {
-	if (atomic_long_read(&dir->i_sb->s_fsnotify_connectors) == 0)
+	if (!fsnotify_sb_has_watchers(dir->i_sb, mask))
 		return 0;
 
 	return fsnotify(mask, data, data_type, dir, name, NULL, cookie);
@@ -44,7 +52,7 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
 
 static inline void fsnotify_inode(struct inode *inode, __u32 mask)
 {
-	if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0)
+	if (!fsnotify_sb_has_watchers(inode->i_sb, mask))
 		return;
 
 	if (S_ISDIR(inode->i_mode))
@@ -59,7 +67,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
 {
 	struct inode *inode = d_inode(dentry);
 
-	if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0)
+	if (!fsnotify_sb_has_watchers(inode->i_sb, mask))
 		return 0;
 
 	if (S_ISDIR(inode->i_mode)) {
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 7f63be5ca0f1..5e0e76cbd7ee 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -77,6 +77,8 @@
  */
 #define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME)
 
+#define ALL_FSNOTIFY_ACCESS_EVENTS (FS_ACCESS | FS_ACCESS_PERM)
+
 #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
 				  FS_OPEN_EXEC_PERM)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-09 19:48 [RFC][PATCH] fsnotify: optimize the case of no access event watchers Amir Goldstein
@ 2024-01-09 20:12 ` Jens Axboe
  2024-01-09 20:24   ` Jens Axboe
  2024-01-10 13:56 ` Jan Kara
  1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2024-01-09 20:12 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Christian Brauner, linux-fsdevel

On 1/9/24 12:48 PM, Amir Goldstein wrote:
> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> optimized the case where there are no fsnotify watchers on any of the
> filesystem's objects.
> 
> It is quite common for a system to have a single local filesystem and
> it is quite common for the system to have some inotify watches on some
> config files or directories, so the optimization of no marks at all is
> often not in effect.
> 
> Access event watchers are far less common, so optimizing the case of
> no marks with access events could improve performance for more systems,
> especially for the performance sensitive hot io path.
> 
> Maintain a per-sb counter of objects that have marks with access
> events in their mask and use that counter to optimize out the call to
> fsnotify() in fsnotify access hooks.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Jens,
> 
> You may want to try if this patch improves performance for your workload
> with SECURITY=Y and FANOTIFY_ACCESS_PERMISSIONS=Y.

Ran the usual test, and this effectively removes fsnotify from the
profiles, which (as per other email) is between 5-6% of CPU time. So I'd
say it looks mighty appealing!

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-09 20:12 ` Jens Axboe
@ 2024-01-09 20:24   ` Jens Axboe
  2024-01-10  9:08     ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2024-01-09 20:24 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Christian Brauner, linux-fsdevel

On 1/9/24 1:12 PM, Jens Axboe wrote:
> On 1/9/24 12:48 PM, Amir Goldstein wrote:
>> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
>> optimized the case where there are no fsnotify watchers on any of the
>> filesystem's objects.
>>
>> It is quite common for a system to have a single local filesystem and
>> it is quite common for the system to have some inotify watches on some
>> config files or directories, so the optimization of no marks at all is
>> often not in effect.
>>
>> Access event watchers are far less common, so optimizing the case of
>> no marks with access events could improve performance for more systems,
>> especially for the performance sensitive hot io path.
>>
>> Maintain a per-sb counter of objects that have marks with access
>> events in their mask and use that counter to optimize out the call to
>> fsnotify() in fsnotify access hooks.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>
>> Jens,
>>
>> You may want to try if this patch improves performance for your workload
>> with SECURITY=Y and FANOTIFY_ACCESS_PERMISSIONS=Y.
> 
> Ran the usual test, and this effectively removes fsnotify from the
> profiles, which (as per other email) is between 5-6% of CPU time. So I'd
> say it looks mighty appealing!

Tried with an IRQ based workload as well, as those are always impacted
more by the fsnotify slowness. This patch removes ~8% of useless
overhead in that case, so even bigger win there.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-09 20:24   ` Jens Axboe
@ 2024-01-10  9:08     ` Amir Goldstein
  2024-01-10 12:46       ` Matthew Wilcox
  2024-01-11 10:05       ` Christian Brauner
  0 siblings, 2 replies; 12+ messages in thread
From: Amir Goldstein @ 2024-01-10  9:08 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara; +Cc: Christian Brauner, linux-fsdevel

On Tue, Jan 9, 2024 at 10:24 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 1/9/24 1:12 PM, Jens Axboe wrote:
> > On 1/9/24 12:48 PM, Amir Goldstein wrote:
> >> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> >> optimized the case where there are no fsnotify watchers on any of the
> >> filesystem's objects.
> >>
> >> It is quite common for a system to have a single local filesystem and
> >> it is quite common for the system to have some inotify watches on some
> >> config files or directories, so the optimization of no marks at all is
> >> often not in effect.
> >>
> >> Access event watchers are far less common, so optimizing the case of
> >> no marks with access events could improve performance for more systems,
> >> especially for the performance sensitive hot io path.
> >>
> >> Maintain a per-sb counter of objects that have marks with access
> >> events in their mask and use that counter to optimize out the call to
> >> fsnotify() in fsnotify access hooks.
> >>
> >> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >> ---
> >>
> >> Jens,
> >>
> >> You may want to try if this patch improves performance for your workload
> >> with SECURITY=Y and FANOTIFY_ACCESS_PERMISSIONS=Y.
> >
> > Ran the usual test, and this effectively removes fsnotify from the
> > profiles, which (as per other email) is between 5-6% of CPU time. So I'd
> > say it looks mighty appealing!
>
> Tried with an IRQ based workload as well, as those are always impacted
> more by the fsnotify slowness. This patch removes ~8% of useless
> overhead in that case, so even bigger win there.
>

Do the IRQ based workloads always go through io_req_io_end()?
Meaning that unlike the polled io workloads, they also incur the
overhead of the fsnotify_{modify,access}() hooks?

I remember I asked you once (cannot find where) whether
io_complete_rw_iopoll() needs the fsnotify hooks and you said that
it is a highly specialized code path for fast io, whose users will not
want those access/modify hooks.

Considering the fact that fsnotify_{modify,access}() could just as well
be bypassed by mmap() read/write, I fully support this reasoning.

Anyway, that explains (to me) why compiling-out the fsnotify_perm()
hooks took away all the regression that you observed in upstream,
because I was wondering where the overhead of fsnotify_access() was.

Jan,

What are your thoughts about this optimization patch?

My thoughts are that the optimization is clearly a win, but do we
really want to waste a full long in super_block for counting access
event watchers that may never exist?

Should we perhaps instead use a flag to say that "access watchers
existed"?

We could put s_fsnotify_access_watchers inside a struct
fsnotify_sb_mark_connector and special case alloc/free of
FSNOTIFY_OBJ_TYPE_SB connector.

Using a specialized fsnotify_sb_mark_connector will make room for
similar optimizations in the future and could be a placeholder for
the "default permission mask" that we discussed.

Thoughts?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-10  9:08     ` Amir Goldstein
@ 2024-01-10 12:46       ` Matthew Wilcox
  2024-01-10 12:55         ` Amir Goldstein
  2024-01-11 10:05       ` Christian Brauner
  1 sibling, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2024-01-10 12:46 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jens Axboe, Jan Kara, Christian Brauner, linux-fsdevel

On Wed, Jan 10, 2024 at 11:08:17AM +0200, Amir Goldstein wrote:
> My thoughts are that the optimization is clearly a win, but do we
> really want to waste a full long in super_block for counting access
> event watchers that may never exist?

Would it make more sense for it to be global, and perhaps even use the
static key infrastructure to enable/disable fsnotify?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-10 12:46       ` Matthew Wilcox
@ 2024-01-10 12:55         ` Amir Goldstein
  0 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2024-01-10 12:55 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jens Axboe, Jan Kara, Christian Brauner, linux-fsdevel

On Wed, Jan 10, 2024 at 2:46 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jan 10, 2024 at 11:08:17AM +0200, Amir Goldstein wrote:
> > My thoughts are that the optimization is clearly a win, but do we
> > really want to waste a full long in super_block for counting access
> > event watchers that may never exist?
>
> Would it make more sense for it to be global,

Ironically, we once tried to disabled fsnotify hooks on pipefs and
found out (the hard way) that some people are using IN_ACCESS
event to get notified on pipe reads or something like that, so the
global option may result in less predictable performance.

> and perhaps even use the
> static key infrastructure to enable/disable fsnotify?

We are talking about disabling specific fsnotify hooks fsnotify_access()
and fsnotify_perm().

I doubt that static key infrastructure is required, because with this patch,
Jens did not observe any regression compared to the code being compiled out.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-09 19:48 [RFC][PATCH] fsnotify: optimize the case of no access event watchers Amir Goldstein
  2024-01-09 20:12 ` Jens Axboe
@ 2024-01-10 13:56 ` Jan Kara
  2024-01-10 14:41   ` Amir Goldstein
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Kara @ 2024-01-10 13:56 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jens Axboe, Jan Kara, Christian Brauner, linux-fsdevel

On Tue 09-01-24 21:48:18, Amir Goldstein wrote:
> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> optimized the case where there are no fsnotify watchers on any of the
> filesystem's objects.
> 
> It is quite common for a system to have a single local filesystem and
> it is quite common for the system to have some inotify watches on some
> config files or directories, so the optimization of no marks at all is
> often not in effect.

I agree.

> Access event watchers are far less common, so optimizing the case of
> no marks with access events could improve performance for more systems,
> especially for the performance sensitive hot io path.
> 
> Maintain a per-sb counter of objects that have marks with access
> events in their mask and use that counter to optimize out the call to
> fsnotify() in fsnotify access hooks.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

I'm not saying no to this but let's discuss first before hacking in some
partial solution :). AFAIU what Jens sees is something similar as was
reported for example here [1]. In these cases we go through
fsnotify_parent() down to fsnotify() until the check:

        if (!(test_mask & marks_mask))
                return 0;

there. And this is all relatively cheap (pure fetches like
sb->s_fsnotify_marks, mnt->mnt_fsnotify_marks, inode->i_fsnotify_marks,
sb->s_fsnotify_mask, mnt->mnt_fsnotify_mask, inode->i_fsnotify_mask) except
for parent handling in __fsnotify_parent(). That requires multiple atomic
operations - take_dentry_name_snapshot() is lock & unlock of d_lock, dget()
is cmpxchg on d_lockref, dput() is another cmpxchg on d_lockref - and this
gets really expensive, even more so if multiple threads race for the same
parent dentry.

So I think what we ideally need is a way to avoid this expensive "stabilize
parent & its name" game unless we are pretty sure we are going to generate
the event. There's no way to avoid fetching the parent early if
dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED (we can still postpone the
take_dentry_name_snapshot() cost though). However for the case where
dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED == 0 (and this is the case
here AFAICT) we need the parent only after the check of 'test_mask &
marks_mask'. So in that case we could completely avoid the extra cost of
parent dentry handling.

So in principle I believe we could make fsnotify noticeably lighter for the
case where no event is generated. Just the question is how to refactor the
current set of functions to achieve this without creating an unmaintainable
mess. I suspect if we lifted creation & some prefilling of iter_info into
__fsnotify_parent() and then fill in the parent in case we need it for
reporting only in fsnotify(), the code could be reasonably readable. We'd
need to always propagate the dentry down to fsnotify() though, currently we
often propagate only the inode because the dentry may be (still in case of
create or already in case of unlink) negative. Similarly we'd somehow need
to communicate down into fsnotify() whether the passed name needs
snapshotting or not...

What do you think?

								Honza

[1] https://lore.kernel.org/all/SJ0PR08MB6494F5A32B7C60A5AD8B33C2AB09A@SJ0PR08MB6494.namprd08.prod.outlook.com

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-10 13:56 ` Jan Kara
@ 2024-01-10 14:41   ` Amir Goldstein
  2024-01-10 16:58     ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2024-01-10 14:41 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, Christian Brauner, linux-fsdevel, Mel Gorman

On Wed, Jan 10, 2024 at 3:56 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 09-01-24 21:48:18, Amir Goldstein wrote:
> > Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> > optimized the case where there are no fsnotify watchers on any of the
> > filesystem's objects.
> >
> > It is quite common for a system to have a single local filesystem and
> > it is quite common for the system to have some inotify watches on some
> > config files or directories, so the optimization of no marks at all is
> > often not in effect.
>
> I agree.
>
> > Access event watchers are far less common, so optimizing the case of
> > no marks with access events could improve performance for more systems,
> > especially for the performance sensitive hot io path.
> >
> > Maintain a per-sb counter of objects that have marks with access
> > events in their mask and use that counter to optimize out the call to
> > fsnotify() in fsnotify access hooks.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> I'm not saying no to this but let's discuss first before hacking in some
> partial solution :). AFAIU what Jens sees is something similar as was
> reported for example here [1]. In these cases we go through
> fsnotify_parent() down to fsnotify() until the check:
>
>         if (!(test_mask & marks_mask))
>                 return 0;
>
> there. And this is all relatively cheap (pure fetches like
> sb->s_fsnotify_marks, mnt->mnt_fsnotify_marks, inode->i_fsnotify_marks,
> sb->s_fsnotify_mask, mnt->mnt_fsnotify_mask, inode->i_fsnotify_mask) except
> for parent handling in __fsnotify_parent(). That requires multiple atomic
> operations - take_dentry_name_snapshot() is lock & unlock of d_lock, dget()
> is cmpxchg on d_lockref, dput() is another cmpxchg on d_lockref - and this
> gets really expensive, even more so if multiple threads race for the same
> parent dentry.

Sorry, I forgot to link the Jens regression report [1] which included this
partial perf report:

     3.36%             [kernel.vmlinux]  [k] fsnotify
     2.32%             [kernel.vmlinux]  [k] __fsnotify_paren

Your analysis about __fsnotify_parent() may be correct, but what would be
the explanation to time spent in fsnotify() in this report?

Jens,

Can you provide an expanded via of the perf function called by
fsnotify() and __fsnotify_parent()?

In general, I think that previous optimization work as this commit by Mel:
71d734103edf ("fsnotify: Rearrange fast path to minimise overhead when
there is no watcher")
showed improvements when checks were inlined.

[1] https://lore.kernel.org/linux-fsdevel/53682ece-f0e7-48de-9a1c-879ee34b0449@kernel.dk/

>
> So I think what we ideally need is a way to avoid this expensive "stabilize
> parent & its name" game unless we are pretty sure we are going to generate
> the event. There's no way to avoid fetching the parent early if
> dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED (we can still postpone the
> take_dentry_name_snapshot() cost though). However for the case where
> dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED == 0 (and this is the case
> here AFAICT) we need the parent only after the check of 'test_mask &
> marks_mask'. So in that case we could completely avoid the extra cost of
> parent dentry handling.
>
> So in principle I believe we could make fsnotify noticeably lighter for the
> case where no event is generated. Just the question is how to refactor the
> current set of functions to achieve this without creating an unmaintainable
> mess. I suspect if we lifted creation & some prefilling of iter_info into
> __fsnotify_parent() and then fill in the parent in case we need it for
> reporting only in fsnotify(), the code could be reasonably readable. We'd
> need to always propagate the dentry down to fsnotify() though, currently we
> often propagate only the inode because the dentry may be (still in case of
> create or already in case of unlink) negative. Similarly we'd somehow need
> to communicate down into fsnotify() whether the passed name needs
> snapshotting or not...
>
> What do you think?

I am not saying no ;)
but it sound a bit complicated so if the goal is to reduce the overhead
of fsnotify_access() and fsnotify_perm(), which I don't think any application
cares about, then I'd rather go with a much simpler solution even if it
does not cover all the corner cases.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-10 14:41   ` Amir Goldstein
@ 2024-01-10 16:58     ` Jan Kara
  2024-01-11  6:27       ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2024-01-10 16:58 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Jens Axboe, Christian Brauner, linux-fsdevel, Mel Gorman

On Wed 10-01-24 16:41:46, Amir Goldstein wrote:
> On Wed, Jan 10, 2024 at 3:56 PM Jan Kara <jack@suse.cz> wrote:
> > On Tue 09-01-24 21:48:18, Amir Goldstein wrote:
> > > Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> > > optimized the case where there are no fsnotify watchers on any of the
> > > filesystem's objects.
> > >
> > > It is quite common for a system to have a single local filesystem and
> > > it is quite common for the system to have some inotify watches on some
> > > config files or directories, so the optimization of no marks at all is
> > > often not in effect.
> >
> > I agree.
> >
> > > Access event watchers are far less common, so optimizing the case of
> > > no marks with access events could improve performance for more systems,
> > > especially for the performance sensitive hot io path.
> > >
> > > Maintain a per-sb counter of objects that have marks with access
> > > events in their mask and use that counter to optimize out the call to
> > > fsnotify() in fsnotify access hooks.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > I'm not saying no to this but let's discuss first before hacking in some
> > partial solution :). AFAIU what Jens sees is something similar as was
> > reported for example here [1]. In these cases we go through
> > fsnotify_parent() down to fsnotify() until the check:
> >
> >         if (!(test_mask & marks_mask))
> >                 return 0;
> >
> > there. And this is all relatively cheap (pure fetches like
> > sb->s_fsnotify_marks, mnt->mnt_fsnotify_marks, inode->i_fsnotify_marks,
> > sb->s_fsnotify_mask, mnt->mnt_fsnotify_mask, inode->i_fsnotify_mask) except
> > for parent handling in __fsnotify_parent(). That requires multiple atomic
> > operations - take_dentry_name_snapshot() is lock & unlock of d_lock, dget()
> > is cmpxchg on d_lockref, dput() is another cmpxchg on d_lockref - and this
> > gets really expensive, even more so if multiple threads race for the same
> > parent dentry.
> 
> Sorry, I forgot to link the Jens regression report [1] which included this
> partial perf report:
> 
>      3.36%             [kernel.vmlinux]  [k] fsnotify
>      2.32%             [kernel.vmlinux]  [k] __fsnotify_paren
> 
> Your analysis about __fsnotify_parent() may be correct, but what would be
> the explanation to time spent in fsnotify() in this report?

OK, I don't have a good explanation for why fsnotify() itself is so high in
Jens' profile. Maybe cacheline fetches caused by inode->i_fsnotify_mask and
inode->i_fsnotify_marks checks are causing the overhead but it would be
good to have it confirmed.

> In general, I think that previous optimization work as this commit by Mel:
> 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead when
> there is no watcher") showed improvements when checks were inlined.

Well, I believe that commit helped exactly because the check for
DCACHE_FSNOTIFY_PARENT_WATCHED was moved ahead of all the work now in
__fsnotify_parent().

> [1] https://lore.kernel.org/linux-fsdevel/53682ece-f0e7-48de-9a1c-879ee34b0449@kernel.dk/
> 
> > So I think what we ideally need is a way to avoid this expensive "stabilize
> > parent & its name" game unless we are pretty sure we are going to generate
> > the event. There's no way to avoid fetching the parent early if
> > dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED (we can still postpone the
> > take_dentry_name_snapshot() cost though). However for the case where
> > dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED == 0 (and this is the case
> > here AFAICT) we need the parent only after the check of 'test_mask &
> > marks_mask'. So in that case we could completely avoid the extra cost of
> > parent dentry handling.
> >
> > So in principle I believe we could make fsnotify noticeably lighter for the
> > case where no event is generated. Just the question is how to refactor the
> > current set of functions to achieve this without creating an unmaintainable
> > mess. I suspect if we lifted creation & some prefilling of iter_info into
> > __fsnotify_parent() and then fill in the parent in case we need it for
> > reporting only in fsnotify(), the code could be reasonably readable. We'd
> > need to always propagate the dentry down to fsnotify() though, currently we
> > often propagate only the inode because the dentry may be (still in case of
> > create or already in case of unlink) negative. Similarly we'd somehow need
> > to communicate down into fsnotify() whether the passed name needs
> > snapshotting or not...
> >
> > What do you think?
> 
> I am not saying no ;)
> but it sound a bit complicated so if the goal is to reduce the overhead
> of fsnotify_access() and fsnotify_perm(), which I don't think any application
> cares about, then I'd rather go with a much simpler solution even if it
> does not cover all the corner cases.

OK, let's figure out what exactly causes slowdown in Jens' case first. I
agree your solution helps mitigate the cost of fsnotify_access() for reads
but I forsee people complaining about fsnotify_modify() cost for writes in
short order :) and there it is not so simple to solve as there's likely
some watch for FS_MODIFY event somewhere.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-10 16:58     ` Jan Kara
@ 2024-01-11  6:27       ` Amir Goldstein
  0 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2024-01-11  6:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, Christian Brauner, linux-fsdevel, Mel Gorman

> > I am not saying no ;)
> > but it sound a bit complicated so if the goal is to reduce the overhead
> > of fsnotify_access() and fsnotify_perm(), which I don't think any application
> > cares about, then I'd rather go with a much simpler solution even if it
> > does not cover all the corner cases.
>
> OK, let's figure out what exactly causes slowdown in Jens' case first. I
> agree your solution helps mitigate the cost of fsnotify_access() for reads
> but I forsee people complaining about fsnotify_modify() cost for writes in
> short order :) and there it is not so simple to solve as there's likely
> some watch for FS_MODIFY event somewhere.
>

Actually, I think we may be able to eat the cake and leave it whole.
As I've written to you once in a different context, I think fsnotify
has two mostly non-overlapping use cases:
1. Watch FS_ACCESS/FS_MODIFY on selective inodes (a.k.a tail)
2. Watch sb/mount/recursive subtree

Anyone that will try watching FS_ACCESS/FS_MODIFY on a large
data set, would find that to be way too noisy to be useful.

For example, we have a filesystem monitor application which needs to
know about changes to any file in the filesystem, but we cannot use
FS_MODIFY for that monitor because it is too noisy, so we use
FS_CLOSE_WRITE as second best.

I guess we cannot rule out that the use case of watching direct
children for FS_ACCESS/FS_MODIFY may exist in the wild.

IOW, I think we should be able to optimize most of the access/modify
hooks by checking those events specifically in the inline helpers for:
1. inode mask
2. parent watching children mask
3. sb->s_iflags & SB_I_FSNOTIFY_ACCESS_MONITOR

The use of such an access monitor is currently unlikely, so
I think tainting the sb is good enough for now.

Note that the upcoming pre-content events (a.k.a HSM) are
certainly going to fall under this "unlikely" category -
If you'd want to use HSM (on-demand file content filling) on
a filesystem, you will pay the performance penalty for some intensive
io workloads unless you'd use an fd with FMODE_NONOTIFY.
I think that is to be expected.

HSM is anyway expected to incur extra performance penalty for
sb_write_barrier() (for the pre-modify events) and in my wip
sb_write_barrier branch [1], there is a similar optimization that does
activate_sb_write_barrier() on the first pre-modify event watch
and then leaves it activated on that sb forever.

I will try to write this patch.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/sb_write_barrier

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-10  9:08     ` Amir Goldstein
  2024-01-10 12:46       ` Matthew Wilcox
@ 2024-01-11 10:05       ` Christian Brauner
  2024-01-11 10:33         ` Amir Goldstein
  1 sibling, 1 reply; 12+ messages in thread
From: Christian Brauner @ 2024-01-11 10:05 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jens Axboe, Jan Kara, linux-fsdevel

On Wed, Jan 10, 2024 at 11:08:17AM +0200, Amir Goldstein wrote:
> On Tue, Jan 9, 2024 at 10:24 PM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > On 1/9/24 1:12 PM, Jens Axboe wrote:
> > > On 1/9/24 12:48 PM, Amir Goldstein wrote:
> > >> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> > >> optimized the case where there are no fsnotify watchers on any of the
> > >> filesystem's objects.
> > >>
> > >> It is quite common for a system to have a single local filesystem and
> > >> it is quite common for the system to have some inotify watches on some
> > >> config files or directories, so the optimization of no marks at all is
> > >> often not in effect.
> > >>
> > >> Access event watchers are far less common, so optimizing the case of
> > >> no marks with access events could improve performance for more systems,
> > >> especially for the performance sensitive hot io path.
> > >>
> > >> Maintain a per-sb counter of objects that have marks with access
> > >> events in their mask and use that counter to optimize out the call to
> > >> fsnotify() in fsnotify access hooks.
> > >>
> > >> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > >> ---
> > >>
> > >> Jens,
> > >>
> > >> You may want to try if this patch improves performance for your workload
> > >> with SECURITY=Y and FANOTIFY_ACCESS_PERMISSIONS=Y.
> > >
> > > Ran the usual test, and this effectively removes fsnotify from the
> > > profiles, which (as per other email) is between 5-6% of CPU time. So I'd
> > > say it looks mighty appealing!
> >
> > Tried with an IRQ based workload as well, as those are always impacted
> > more by the fsnotify slowness. This patch removes ~8% of useless
> > overhead in that case, so even bigger win there.
> >
> 
> Do the IRQ based workloads always go through io_req_io_end()?
> Meaning that unlike the polled io workloads, they also incur the
> overhead of the fsnotify_{modify,access}() hooks?
> 
> I remember I asked you once (cannot find where) whether
> io_complete_rw_iopoll() needs the fsnotify hooks and you said that
> it is a highly specialized code path for fast io, whose users will not
> want those access/modify hooks.
> 
> Considering the fact that fsnotify_{modify,access}() could just as well
> be bypassed by mmap() read/write, I fully support this reasoning.
> 
> Anyway, that explains (to me) why compiling-out the fsnotify_perm()
> hooks took away all the regression that you observed in upstream,
> because I was wondering where the overhead of fsnotify_access() was.
> 
> Jan,
> 
> What are your thoughts about this optimization patch?
> 
> My thoughts are that the optimization is clearly a win, but do we
> really want to waste a full long in super_block for counting access
> event watchers that may never exist?

Meh, not too excited about it. Couldn't you use a flag in s_fsnotify_mask?
Maybe that's what you mean below.

> 
> Should we perhaps instead use a flag to say that "access watchers
> existed"?
> 
> We could put s_fsnotify_access_watchers inside a struct
> fsnotify_sb_mark_connector and special case alloc/free of
> FSNOTIFY_OBJ_TYPE_SB connector.

Would it make sense without incurring performance impact to move
atomic_long_t s_fsnotify_connectors and your new atomic_long_t into a
struct fsnotify_data that gets allocated when an sb is created? Then we
don't waste space in struct super_block. This is basically a copy-pasta
of the LSM sb->s_security approach.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] fsnotify: optimize the case of no access event watchers
  2024-01-11 10:05       ` Christian Brauner
@ 2024-01-11 10:33         ` Amir Goldstein
  0 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2024-01-11 10:33 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jens Axboe, Jan Kara, linux-fsdevel

> > Jan,
> >
> > What are your thoughts about this optimization patch?
> >
> > My thoughts are that the optimization is clearly a win, but do we
> > really want to waste a full long in super_block for counting access
> > event watchers that may never exist?
>
> Meh, not too excited about it. Couldn't you use a flag in s_fsnotify_mask?
> Maybe that's what you mean below.
>

Yes, for my v2 I will use a "taint" flag in s_fsnotify_mask instead
of keeping count of content monitors.

> >
> > Should we perhaps instead use a flag to say that "access watchers
> > existed"?
> >
> > We could put s_fsnotify_access_watchers inside a struct
> > fsnotify_sb_mark_connector and special case alloc/free of
> > FSNOTIFY_OBJ_TYPE_SB connector.
>
> Would it make sense without incurring performance impact to move
> atomic_long_t s_fsnotify_connectors and your new atomic_long_t into a
> struct fsnotify_data that gets allocated when an sb is created? Then we
> don't waste space in struct super_block. This is basically a copy-paste
> of the LSM sb->s_security approach.

Not as trivial to move s_fsnotify_connectors, but I think it should be possible,
because basically, I don't think we need to ever detach an sb connector
once it is attached.

I did already look at this when writing the above suggestion, but it was
not straightforward so left it for later.

This change would be relevant for another feature that I need for HSM
called "default event mask" (sort of like the default iptables rule), so
I will look into making those changes when I get to this feature.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-01-11 10:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-09 19:48 [RFC][PATCH] fsnotify: optimize the case of no access event watchers Amir Goldstein
2024-01-09 20:12 ` Jens Axboe
2024-01-09 20:24   ` Jens Axboe
2024-01-10  9:08     ` Amir Goldstein
2024-01-10 12:46       ` Matthew Wilcox
2024-01-10 12:55         ` Amir Goldstein
2024-01-11 10:05       ` Christian Brauner
2024-01-11 10:33         ` Amir Goldstein
2024-01-10 13:56 ` Jan Kara
2024-01-10 14:41   ` Amir Goldstein
2024-01-10 16:58     ` Jan Kara
2024-01-11  6:27       ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.