linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fsnotify: fix unlink performance regression
@ 2019-05-05  9:15 Amir Goldstein
  2019-05-05 13:05 ` Al Viro
  2019-05-05 16:33 ` kbuild test robot
  0 siblings, 2 replies; 21+ messages in thread
From: Amir Goldstein @ 2019-05-05  9:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Linus Torvalds, linux-fsdevel, LKP

__fsnotify_parent() has an optimization in place to avoid unneeded
take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
not to call __fsnotify_parent(), we left out the optimization.
Kernel test robot reported a 5% performance regression in concurrent
unlink() workload.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jan,

The linked 5.1-rc1 performance regression report came with bad timing.
Not sure if Linus is planning an rc8. If not, you will probably not
see this before the 5.1 release and we shall have to queue it for 5.2
and backport to stable 5.1.

I crafted the patch so it applies cleanly both to master and Al's
for-next branch (there are some fsnotify changes in work.dcache).

Thanks,
Amir.

 include/linux/fsnotify.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 09587e2860b5..2272c8c2023c 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -175,12 +175,19 @@ static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
 		mask |= FS_ISDIR;
 
 	parent = dget_parent(dentry);
+	/* Avoid unneeded take_dentry_name_snapshot() */
+	if (!(d_inode(parent)->i_fsnotify_mask & FS_DELETE) &&
+	    !(dentry->d_sb->s_fsnotify_mask & FS_DELETE))
+		goto out_dput;
+
 	take_dentry_name_snapshot(&name, dentry);
 
 	fsnotify(d_inode(parent), mask, d_inode(dentry), FSNOTIFY_EVENT_INODE,
 		 name.name, 0);
 
 	release_dentry_name_snapshot(&name);
+
+out_dput:
 	dput(parent);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsnotify: fix unlink performance regression
  2019-05-05  9:15 [PATCH] fsnotify: fix unlink performance regression Amir Goldstein
@ 2019-05-05 13:05 ` Al Viro
  2019-05-05 13:19   ` Amir Goldstein
  2019-05-05 16:33 ` kbuild test robot
  1 sibling, 1 reply; 21+ messages in thread
From: Al Viro @ 2019-05-05 13:05 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Linus Torvalds, linux-fsdevel, LKP

On Sun, May 05, 2019 at 12:15:49PM +0300, Amir Goldstein wrote:
> __fsnotify_parent() has an optimization in place to avoid unneeded
> take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
> not to call __fsnotify_parent(), we left out the optimization.
> Kernel test robot reported a 5% performance regression in concurrent
> unlink() workload.
> 
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
> Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Jan,
> 
> The linked 5.1-rc1 performance regression report came with bad timing.
> Not sure if Linus is planning an rc8. If not, you will probably not
> see this before the 5.1 release and we shall have to queue it for 5.2
> and backport to stable 5.1.
> 
> I crafted the patch so it applies cleanly both to master and Al's
> for-next branch (there are some fsnotify changes in work.dcache).

Charming...  What about rename() and matching regressions there?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsnotify: fix unlink performance regression
  2019-05-05 13:05 ` Al Viro
@ 2019-05-05 13:19   ` Amir Goldstein
  2019-05-05 13:47     ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2019-05-05 13:19 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Linus Torvalds, linux-fsdevel, LKP

On Sun, May 5, 2019 at 4:05 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, May 05, 2019 at 12:15:49PM +0300, Amir Goldstein wrote:
> > __fsnotify_parent() has an optimization in place to avoid unneeded
> > take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
> > not to call __fsnotify_parent(), we left out the optimization.
> > Kernel test robot reported a 5% performance regression in concurrent
> > unlink() workload.
> >
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
> > Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> > Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Jan,
> >
> > The linked 5.1-rc1 performance regression report came with bad timing.
> > Not sure if Linus is planning an rc8. If not, you will probably not
> > see this before the 5.1 release and we shall have to queue it for 5.2
> > and backport to stable 5.1.
> >
> > I crafted the patch so it applies cleanly both to master and Al's
> > for-next branch (there are some fsnotify changes in work.dcache).
>
> Charming...  What about rename() and matching regressions there?

rename() and create() do not take_dentry_name_snapshot(), because
they are called with parent inode lock held.
I have made an analysis of callers to d_delete() and found that all callers
either hold parent inode lock or name is stable for another reason:
https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/

But Jan preferred to keep take_dentry_name_snapshot() to be safe.
I think the right thing to do is assert that parent inode is locked or
no rename op in d_delete() and take the lock in ceph/ocfs2 to conform
to the standard.

If that sounds good to you, I will follow up with a patch for next and then
remove take_dentry_name_snapshot() from fsnotify_nameremove() hook.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsnotify: fix unlink performance regression
  2019-05-05 13:19   ` Amir Goldstein
@ 2019-05-05 13:47     ` Al Viro
  2019-05-05 14:18       ` Amir Goldstein
  2019-05-06 12:43       ` Amir Goldstein
  0 siblings, 2 replies; 21+ messages in thread
From: Al Viro @ 2019-05-05 13:47 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Linus Torvalds, linux-fsdevel, LKP

On Sun, May 05, 2019 at 04:19:02PM +0300, Amir Goldstein wrote:

> I have made an analysis of callers to d_delete() and found that all callers
> either hold parent inode lock or name is stable for another reason:
> https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> 
> But Jan preferred to keep take_dentry_name_snapshot() to be safe.
> I think the right thing to do is assert that parent inode is locked or
> no rename op in d_delete() and take the lock in ceph/ocfs2 to conform
> to the standard.

Any messing with the locking in ceph_fill_trace() would have to come
with very detailed proof of correctness, convincingly stable wrt
future changes in ceph...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsnotify: fix unlink performance regression
  2019-05-05 13:47     ` Al Viro
@ 2019-05-05 14:18       ` Amir Goldstein
  2019-05-06 12:43       ` Amir Goldstein
  1 sibling, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2019-05-05 14:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Linus Torvalds, linux-fsdevel, LKP

On Sun, May 5, 2019 at 4:47 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, May 05, 2019 at 04:19:02PM +0300, Amir Goldstein wrote:
>
> > I have made an analysis of callers to d_delete() and found that all callers
> > either hold parent inode lock or name is stable for another reason:
> > https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> >
> > But Jan preferred to keep take_dentry_name_snapshot() to be safe.
> > I think the right thing to do is assert that parent inode is locked or
> > no rename op in d_delete() and take the lock in ceph/ocfs2 to conform
> > to the standard.
>
> Any messing with the locking in ceph_fill_trace() would have to come
> with very detailed proof of correctness, convincingly stable wrt
> future changes in ceph...

Yeh... Is there any other way to assert that d_name is stable in d_delete()?
My original rule was:
WARN_ON_ONCE(!inode_is_locked(dir) && dir->i_op->rename &&
      !(dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE));

In the hope that declaring FS_RENAME_DOES_D_MOVE means fs knows
what it is doing...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsnotify: fix unlink performance regression
  2019-05-05  9:15 [PATCH] fsnotify: fix unlink performance regression Amir Goldstein
  2019-05-05 13:05 ` Al Viro
@ 2019-05-05 16:33 ` kbuild test robot
  2019-05-05 18:26   ` Amir Goldstein
  1 sibling, 1 reply; 21+ messages in thread
From: kbuild test robot @ 2019-05-05 16:33 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: kbuild-all, Jan Kara, Al Viro, Linus Torvalds, linux-fsdevel, LKP

[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]

Hi Amir,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.1-rc7 next-20190503]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Amir-Goldstein/fsnotify-fix-unlink-performance-regression/20190505-233115
config: riscv-tinyconfig (attached as .config)
compiler: riscv64-linux-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.1.0 make.cross ARCH=riscv 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from fs///attr.c:15:
   include/linux/fsnotify.h: In function 'fsnotify_nameremove':
>> include/linux/fsnotify.h:179:23: error: 'struct inode' has no member named 'i_fsnotify_mask'
     if (!(d_inode(parent)->i_fsnotify_mask & FS_DELETE) &&
                          ^~
>> include/linux/fsnotify.h:180:20: error: 'struct super_block' has no member named 's_fsnotify_mask'
         !(dentry->d_sb->s_fsnotify_mask & FS_DELETE))
                       ^~

vim +179 include/linux/fsnotify.h

   153	
   154	/*
   155	 * fsnotify_nameremove - a filename was removed from a directory
   156	 *
   157	 * This is mostly called under parent vfs inode lock so name and
   158	 * dentry->d_parent should be stable. However there are some corner cases where
   159	 * inode lock is not held. So to be on the safe side and be reselient to future
   160	 * callers and out of tree users of d_delete(), we do not assume that d_parent
   161	 * and d_name are stable and we use dget_parent() and
   162	 * take_dentry_name_snapshot() to grab stable references.
   163	 */
   164	static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
   165	{
   166		struct dentry *parent;
   167		struct name_snapshot name;
   168		__u32 mask = FS_DELETE;
   169	
   170		/* d_delete() of pseudo inode? (e.g. __ns_get_path() playing tricks) */
   171		if (IS_ROOT(dentry))
   172			return;
   173	
   174		if (isdir)
   175			mask |= FS_ISDIR;
   176	
   177		parent = dget_parent(dentry);
   178		/* Avoid unneeded take_dentry_name_snapshot() */
 > 179		if (!(d_inode(parent)->i_fsnotify_mask & FS_DELETE) &&
 > 180		    !(dentry->d_sb->s_fsnotify_mask & FS_DELETE))
   181			goto out_dput;
   182	
   183		take_dentry_name_snapshot(&name, dentry);
   184	
   185		fsnotify(d_inode(parent), mask, d_inode(dentry), FSNOTIFY_EVENT_INODE,
   186			 name.name, 0);
   187	
   188		release_dentry_name_snapshot(&name);
   189	
   190	out_dput:
   191		dput(parent);
   192	}
   193	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 4624 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsnotify: fix unlink performance regression
  2019-05-05 16:33 ` kbuild test robot
@ 2019-05-05 18:26   ` Amir Goldstein
  2019-05-05 20:07     ` [PATCH v2] " Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2019-05-05 18:26 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Jan Kara, Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Sun, May 5, 2019 at 7:34 PM kbuild test robot <lkp@intel.com> wrote:
>
> Hi Amir,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v5.1-rc7 next-20190503]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Amir-Goldstein/fsnotify-fix-unlink-performance-regression/20190505-233115
> config: riscv-tinyconfig (attached as .config)
> compiler: riscv64-linux-gcc (GCC) 8.1.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=8.1.0 make.cross ARCH=riscv
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    In file included from fs///attr.c:15:
>    include/linux/fsnotify.h: In function 'fsnotify_nameremove':
> >> include/linux/fsnotify.h:179:23: error: 'struct inode' has no member named 'i_fsnotify_mask'
>      if (!(d_inode(parent)->i_fsnotify_mask & FS_DELETE) &&
>                           ^~
> >> include/linux/fsnotify.h:180:20: error: 'struct super_block' has no member named 's_fsnotify_mask'
>          !(dentry->d_sb->s_fsnotify_mask & FS_DELETE))
>                        ^~
>

Crap! forgot these wrappers are not NOOPed without CONFIG_FSNOTIFY.
It is so annoying to fix bugs in code that should not exist.

In d_delete() at this point, dentry is either negative or inode->i_nlink
which accounts for this name should be decremented.
If d_move() was possible on this dentry, bad things would happen.

I really wish I could just drop this take_dentry_name_snapshot()
and leave the WARN_ON() I suggested instead...
For now will just send an unbroken patch.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-05 18:26   ` Amir Goldstein
@ 2019-05-05 20:07     ` Amir Goldstein
  2019-05-07 16:19       ` Jan Kara
  0 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2019-05-05 20:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Linus Torvalds, linux-fsdevel, LKP

__fsnotify_parent() has an optimization in place to avoid unneeded
take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
not to call __fsnotify_parent(), we left out the optimization.
Kernel test robot reported a 5% performance regression in concurrent
unlink() workload.

Modify __fsnotify_parent() so that it can be called also to report
directory modififcation events and use it from fsnotify_nameremove().

Reported-by: kernel test robot <rong.a.chen@intel.com>
Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jan,

A nicer approach reusing __fsnotify_parent() instead of copying code
from it.

This version does not apply cleanly to Al's for-next branch (there are
some fsnotify changes in work.dcache). The conflict is trivial and
resolved on my fsnotify branch [1].

Thanks,
Amir.

Changes since v1:
- Fix build without CONFIG_FSNOTIFY
- Use __fsnotify_parent() for reporting FS_DELETE

[1] https://github.com/amir73il/linux/commits/fsnotify

 fs/notify/fsnotify.c     | 22 +++++++++++-----------
 include/linux/fsnotify.h | 15 +++------------
 2 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index df06f3da166c..265b726d6e8d 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -151,31 +151,31 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
 	spin_unlock(&inode->i_lock);
 }
 
-/* Notify this dentry's parent about a child's events. */
+/*
+ * Notify this dentry's parent about an event and make sure that name is stable.
+ * Events "on child" are only reported if parent is watching.
+ * Directory modification events are also reported if super block is watching.
+ */
 int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask)
 {
 	struct dentry *parent;
 	struct inode *p_inode;
 	int ret = 0;
+	bool on_child = (mask & FS_EVENT_ON_CHILD);
+	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
 
-	if (!dentry)
-		dentry = path->dentry;
-
-	if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
+	if (on_child && !(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
 		return 0;
 
 	parent = dget_parent(dentry);
 	p_inode = parent->d_inode;
 
-	if (unlikely(!fsnotify_inode_watches_children(p_inode))) {
+	if (on_child && unlikely(!fsnotify_inode_watches_children(p_inode))) {
 		__fsnotify_update_child_dentry_flags(p_inode);
-	} else if (p_inode->i_fsnotify_mask & mask & ALL_FSNOTIFY_EVENTS) {
+	} else if ((p_inode->i_fsnotify_mask & test_mask) ||
+		   (!on_child && (dentry->d_sb->s_fsnotify_mask & test_mask))) {
 		struct name_snapshot name;
 
-		/* we are notifying a parent so come up with the new mask which
-		 * specifies these are events which came from a child. */
-		mask |= FS_EVENT_ON_CHILD;
-
 		take_dentry_name_snapshot(&name, dentry);
 		if (path)
 			ret = fsnotify(p_inode, mask, path, FSNOTIFY_EVENT_PATH,
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 09587e2860b5..8641bf9a1eda 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -37,7 +37,7 @@ static inline int fsnotify_parent(const struct path *path,
 	if (!dentry)
 		dentry = path->dentry;
 
-	return __fsnotify_parent(path, dentry, mask);
+	return __fsnotify_parent(path, dentry, mask | FS_EVENT_ON_CHILD);
 }
 
 /*
@@ -158,13 +158,11 @@ static inline void fsnotify_vfsmount_delete(struct vfsmount *mnt)
  * dentry->d_parent should be stable. However there are some corner cases where
  * inode lock is not held. So to be on the safe side and be reselient to future
  * callers and out of tree users of d_delete(), we do not assume that d_parent
- * and d_name are stable and we use dget_parent() and
+ * and d_name are stable and we use __fsnotify_parent() to
  * take_dentry_name_snapshot() to grab stable references.
  */
 static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
 {
-	struct dentry *parent;
-	struct name_snapshot name;
 	__u32 mask = FS_DELETE;
 
 	/* d_delete() of pseudo inode? (e.g. __ns_get_path() playing tricks) */
@@ -174,14 +172,7 @@ static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
 	if (isdir)
 		mask |= FS_ISDIR;
 
-	parent = dget_parent(dentry);
-	take_dentry_name_snapshot(&name, dentry);
-
-	fsnotify(d_inode(parent), mask, d_inode(dentry), FSNOTIFY_EVENT_INODE,
-		 name.name, 0);
-
-	release_dentry_name_snapshot(&name);
-	dput(parent);
+	__fsnotify_parent(NULL, dentry, mask);
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsnotify: fix unlink performance regression
  2019-05-05 13:47     ` Al Viro
  2019-05-05 14:18       ` Amir Goldstein
@ 2019-05-06 12:43       ` Amir Goldstein
  2019-05-06 14:26         ` Al Viro
  1 sibling, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2019-05-06 12:43 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Linus Torvalds, linux-fsdevel, LKP

On Sun, May 5, 2019 at 4:47 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, May 05, 2019 at 04:19:02PM +0300, Amir Goldstein wrote:
>
> > I have made an analysis of callers to d_delete() and found that all callers
> > either hold parent inode lock or name is stable for another reason:
> > https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> >
> > But Jan preferred to keep take_dentry_name_snapshot() to be safe.
> > I think the right thing to do is assert that parent inode is locked or
> > no rename op in d_delete() and take the lock in ceph/ocfs2 to conform
> > to the standard.
>
> Any messing with the locking in ceph_fill_trace() would have to come
> with very detailed proof of correctness, convincingly stable wrt
> future changes in ceph...

OK. What do you have to say about this statement?

    Because fsnotify_nameremove() is called from d_delete() with negative
    or unhashed dentry, d_move() is not expected on this dentry, so it is
    safe to use d_parent/d_name without take_dentry_name_snapshot().

I assume it is not correct, but cannot figure out why.
Under what circumstances is d_move() expected to move an unhashed
dentry and hash it?

If this is not expected then wouldn't we be better off with:
@@ -2774,9 +2774,9 @@ static void __d_move(struct dentry *dentry,
struct dentry *target,

        /* unhash both */
-       if (!d_unhashed(dentry))
+       if (!WARN_ON(d_unhashed(dentry)))
                ___d_drop(dentry);
-       if (!d_unhashed(target))
+       if (!WARN_ON(d_unhashed(target)))
                ___d_drop(target);

And then we can get rid of take_dentry_name_snapshot() in fsnotify_nameremove()
and leave an assertion instead:

+       /* Proof of stability of d_parent and d_name */
+       WARN_ON_ONCE(d_inode(dentry) && !d_unhashed(dentry));
+

My other thought is why is fsnotify_nameremove() in d_delete() and
not in vfs_unlink()/vfs_rmdir() under parent inode lock like the rest
of the fsnotify_create/fsnotify_move hooks?

In what case would we need the fsnotify event that is not coming
from vfs_unlink()/vfs_rmdir()?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsnotify: fix unlink performance regression
  2019-05-06 12:43       ` Amir Goldstein
@ 2019-05-06 14:26         ` Al Viro
  2019-05-06 16:22           ` Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2019-05-06 14:26 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Linus Torvalds, linux-fsdevel, LKP

On Mon, May 06, 2019 at 03:43:24PM +0300, Amir Goldstein wrote:
> OK. What do you have to say about this statement?
> 
>     Because fsnotify_nameremove() is called from d_delete() with negative
>     or unhashed dentry, d_move() is not expected on this dentry, so it is
>     safe to use d_parent/d_name without take_dentry_name_snapshot().
> 
> I assume it is not correct, but cannot figure out why.
> Under what circumstances is d_move() expected to move an unhashed
> dentry and hash it?

For starters, d_splice_alias() picking an exising alias for given directory
inode.

> My other thought is why is fsnotify_nameremove() in d_delete() and
> not in vfs_unlink()/vfs_rmdir() under parent inode lock like the rest
> of the fsnotify_create/fsnotify_move hooks?
> 
> In what case would we need the fsnotify event that is not coming
> from vfs_unlink()/vfs_rmdir()?

*snort*

You can thank those who whine about notifications on sysfs/devpts/whatnot.
Go talk to them if you wish, but don't ask me to translate what you'll get
into something coherent - I'd never been able to.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsnotify: fix unlink performance regression
  2019-05-06 14:26         ` Al Viro
@ 2019-05-06 16:22           ` Amir Goldstein
  0 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2019-05-06 16:22 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Linus Torvalds, linux-fsdevel

On Mon, May 6, 2019 at 5:26 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, May 06, 2019 at 03:43:24PM +0300, Amir Goldstein wrote:
> > OK. What do you have to say about this statement?
> >
> >     Because fsnotify_nameremove() is called from d_delete() with negative
> >     or unhashed dentry, d_move() is not expected on this dentry, so it is
> >     safe to use d_parent/d_name without take_dentry_name_snapshot().
> >
> > I assume it is not correct, but cannot figure out why.
> > Under what circumstances is d_move() expected to move an unhashed
> > dentry and hash it?
>
> For starters, d_splice_alias() picking an exising alias for given directory
> inode.

Ok. But seeing that we are already in d_delete() said directory is already
IS_DEADDIR, so that can be added to the assertion that proves the
stability of d_name.
Are there any other cases? weird filesystems?

>
> > My other thought is why is fsnotify_nameremove() in d_delete() and
> > not in vfs_unlink()/vfs_rmdir() under parent inode lock like the rest
> > of the fsnotify_create/fsnotify_move hooks?
> >
> > In what case would we need the fsnotify event that is not coming
> > from vfs_unlink()/vfs_rmdir()?
>
> *snort*
>
> You can thank those who whine about notifications on sysfs/devpts/whatnot.
> Go talk to them if you wish, but don't ask me to translate what you'll get
> into something coherent - I'd never been able to.

I see. So all of those fs that are interested in notifications already have
fsnotify_create()/fsnotify_move() calls in them.
There are only 5 of them: binderfs, debugfs, devpts, tracefs, sunrpc.
It would be easier and less convoluted to also add the fsnotify_nameremove()
explicit calls in those fs.
With those fs either d_name is inherently stable or (debugfs) locks on parent
are taken properly.

But d_delete() effectively provides something else. If provides "remote server
change notifications" for clustered/networking filesystems when server
invalidates the local dentry. This is not a feature that is guarantied
by fsnotify.
notifications will be delivered randomly based on existence of local entry
in dcache. Only networking fs that provide proper remote notifications
(cifs is interested in doing that) should really be calling the fsnofity hooks.

Of course, we remove the random remote notifications from clustered/network
fs, there is bound to be someone unhappy. Urgh! out of those fs, only ocfs2
calls fsnotify_create(). If it is deemed important, can add
fsnotify_nameremove()
to ocfs2 as well.

Dare we make this change and see if people shout?
It's easy to add the random remote fsnotify_nameremove() calls per request
for a filesystems that want it.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-05 20:07     ` [PATCH v2] " Amir Goldstein
@ 2019-05-07 16:19       ` Jan Kara
  2019-05-07 19:12         ` Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2019-05-07 16:19 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Sun 05-05-19 23:07:28, Amir Goldstein wrote:
> __fsnotify_parent() has an optimization in place to avoid unneeded
> take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
> not to call __fsnotify_parent(), we left out the optimization.
> Kernel test robot reported a 5% performance regression in concurrent
> unlink() workload.
> 
> Modify __fsnotify_parent() so that it can be called also to report
> directory modififcation events and use it from fsnotify_nameremove().
> 
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
> Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Jan,
> 
> A nicer approach reusing __fsnotify_parent() instead of copying code
> from it.
> 
> This version does not apply cleanly to Al's for-next branch (there are
> some fsnotify changes in work.dcache). The conflict is trivial and
> resolved on my fsnotify branch [1].

Hum, let me check if I understand the situation right. We've changed
fsnotify_nameremove() to not use fsnotify_parent() as we don't want to
report FS_EVENT_ON_CHILD with it anymore. We should use fsnotify_dirent()
as for all other directory event notification handlers but that's
problematic due to different locking context and possible instability of
parent.

Honestly I don't like the patch below much. How we are notifying parent
without sending FS_EVENT_ON_CHILD and modify behavior based on that flag
just looks subtle. So I'd rather move the fsnotify call to vfs_unlink(),
vfs_rmdir(), simple_unlink(), simple_rmdir(), and then those few callers of
d_delete() that remain as you suggest elsewhere in this thread. And then we
get more consistent context for fsnotify_nameremove() and could just use
fsnotify_dirent(). 

								Honza

> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index df06f3da166c..265b726d6e8d 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -151,31 +151,31 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
>  	spin_unlock(&inode->i_lock);
>  }
>  
> -/* Notify this dentry's parent about a child's events. */
> +/*
> + * Notify this dentry's parent about an event and make sure that name is stable.
> + * Events "on child" are only reported if parent is watching.
> + * Directory modification events are also reported if super block is watching.
> + */
>  int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask)
>  {
>  	struct dentry *parent;
>  	struct inode *p_inode;
>  	int ret = 0;
> +	bool on_child = (mask & FS_EVENT_ON_CHILD);
> +	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
>  
> -	if (!dentry)
> -		dentry = path->dentry;
> -
> -	if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
> +	if (on_child && !(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
>  		return 0;
>  
>  	parent = dget_parent(dentry);
>  	p_inode = parent->d_inode;
>  
> -	if (unlikely(!fsnotify_inode_watches_children(p_inode))) {
> +	if (on_child && unlikely(!fsnotify_inode_watches_children(p_inode))) {
>  		__fsnotify_update_child_dentry_flags(p_inode);
> -	} else if (p_inode->i_fsnotify_mask & mask & ALL_FSNOTIFY_EVENTS) {
> +	} else if ((p_inode->i_fsnotify_mask & test_mask) ||
> +		   (!on_child && (dentry->d_sb->s_fsnotify_mask & test_mask))) {
>  		struct name_snapshot name;
>  
> -		/* we are notifying a parent so come up with the new mask which
> -		 * specifies these are events which came from a child. */
> -		mask |= FS_EVENT_ON_CHILD;
> -
>  		take_dentry_name_snapshot(&name, dentry);
>  		if (path)
>  			ret = fsnotify(p_inode, mask, path, FSNOTIFY_EVENT_PATH,
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 09587e2860b5..8641bf9a1eda 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -37,7 +37,7 @@ static inline int fsnotify_parent(const struct path *path,
>  	if (!dentry)
>  		dentry = path->dentry;
>  
> -	return __fsnotify_parent(path, dentry, mask);
> +	return __fsnotify_parent(path, dentry, mask | FS_EVENT_ON_CHILD);
>  }
>  
>  /*
> @@ -158,13 +158,11 @@ static inline void fsnotify_vfsmount_delete(struct vfsmount *mnt)
>   * dentry->d_parent should be stable. However there are some corner cases where
>   * inode lock is not held. So to be on the safe side and be reselient to future
>   * callers and out of tree users of d_delete(), we do not assume that d_parent
> - * and d_name are stable and we use dget_parent() and
> + * and d_name are stable and we use __fsnotify_parent() to
>   * take_dentry_name_snapshot() to grab stable references.
>   */
>  static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
>  {
> -	struct dentry *parent;
> -	struct name_snapshot name;
>  	__u32 mask = FS_DELETE;
>  
>  	/* d_delete() of pseudo inode? (e.g. __ns_get_path() playing tricks) */
> @@ -174,14 +172,7 @@ static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
>  	if (isdir)
>  		mask |= FS_ISDIR;
>  
> -	parent = dget_parent(dentry);
> -	take_dentry_name_snapshot(&name, dentry);
> -
> -	fsnotify(d_inode(parent), mask, d_inode(dentry), FSNOTIFY_EVENT_INODE,
> -		 name.name, 0);
> -
> -	release_dentry_name_snapshot(&name);
> -	dput(parent);
> +	__fsnotify_parent(NULL, dentry, mask);
>  }
>  
>  /*
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-07 16:19       ` Jan Kara
@ 2019-05-07 19:12         ` Amir Goldstein
  2019-05-08 16:09           ` Amir Goldstein
  2019-05-09  9:46           ` Jan Kara
  0 siblings, 2 replies; 21+ messages in thread
From: Amir Goldstein @ 2019-05-07 19:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Tue, May 7, 2019 at 7:19 PM Jan Kara <jack@suse.cz> wrote:
>
> On Sun 05-05-19 23:07:28, Amir Goldstein wrote:
> > __fsnotify_parent() has an optimization in place to avoid unneeded
> > take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
> > not to call __fsnotify_parent(), we left out the optimization.
> > Kernel test robot reported a 5% performance regression in concurrent
> > unlink() workload.
> >
> > Modify __fsnotify_parent() so that it can be called also to report
> > directory modififcation events and use it from fsnotify_nameremove().
> >
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
> > Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> > Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Jan,
> >
> > A nicer approach reusing __fsnotify_parent() instead of copying code
> > from it.
> >
> > This version does not apply cleanly to Al's for-next branch (there are
> > some fsnotify changes in work.dcache). The conflict is trivial and
> > resolved on my fsnotify branch [1].
>
> Hum, let me check if I understand the situation right. We've changed
> fsnotify_nameremove() to not use fsnotify_parent() as we don't want to
> report FS_EVENT_ON_CHILD with it anymore. We should use fsnotify_dirent()
> as for all other directory event notification handlers but that's
> problematic due to different locking context and possible instability of
> parent.
>

Yes. Not only do we not want to report FS_EVENT_ON_CHILD with
FS_DELETE, but we also need to report it to dir inode and to fs sb
regardless of DCACHE_FSNOTIFY_PARENT_WATCHED.

> Honestly I don't like the patch below much. How we are notifying parent
> without sending FS_EVENT_ON_CHILD and modify behavior based on that flag
> just looks subtle.

I see, although please note that reporting FS_EVENT_ON_CHILD
is strongly related to the "modified behavior", because unless this an
a report of event on_child, DCACHE_FSNOTIFY_PARENT_WATCHED
is not relevant.

> So I'd rather move the fsnotify call to vfs_unlink(),
> vfs_rmdir(), simple_unlink(), simple_rmdir(), and then those few callers of
> d_delete() that remain as you suggest elsewhere in this thread. And then we
> get more consistent context for fsnotify_nameremove() and could just use
> fsnotify_dirent().
>

Yes, I much prefer this solution myself and I will follow up with it,
but it would not be honest to suggest said solution as a stable fix
to the performance regression that was introduced in v5.1.
I think is it better if you choose between lesser evil:
v1 with ifdef CONFIG_FSNOTIFY to fix build issue
v2 as subtle as it is
OR another obviously safe stable fix that you can think of

The change of cleansing d_delete() from fsnotify_nameremove()
requires more research and is anyway not stable tree material -
if not for the level of complexity, then because all the users of
FS_DELETE from pseudo and clustered filesystems need more time
to find regressions (we do not have test coverage for those in LTP).

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-07 19:12         ` Amir Goldstein
@ 2019-05-08 16:09           ` Amir Goldstein
  2019-05-09 10:31             ` Jan Kara
  2019-05-09  9:46           ` Jan Kara
  1 sibling, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2019-05-08 16:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Tue, May 7, 2019 at 10:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, May 7, 2019 at 7:19 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Sun 05-05-19 23:07:28, Amir Goldstein wrote:
> > > __fsnotify_parent() has an optimization in place to avoid unneeded
> > > take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
> > > not to call __fsnotify_parent(), we left out the optimization.
> > > Kernel test robot reported a 5% performance regression in concurrent
> > > unlink() workload.
> > >
> > > Modify __fsnotify_parent() so that it can be called also to report
> > > directory modififcation events and use it from fsnotify_nameremove().
> > >
> > > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > > Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
> > > Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> > > Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >
> > > Jan,
> > >
> > > A nicer approach reusing __fsnotify_parent() instead of copying code
> > > from it.
> > >
> > > This version does not apply cleanly to Al's for-next branch (there are
> > > some fsnotify changes in work.dcache). The conflict is trivial and
> > > resolved on my fsnotify branch [1].
> >
> > Hum, let me check if I understand the situation right. We've changed
> > fsnotify_nameremove() to not use fsnotify_parent() as we don't want to
> > report FS_EVENT_ON_CHILD with it anymore. We should use fsnotify_dirent()
> > as for all other directory event notification handlers but that's
> > problematic due to different locking context and possible instability of
> > parent.
> >
>
> Yes. Not only do we not want to report FS_EVENT_ON_CHILD with
> FS_DELETE, but we also need to report it to dir inode and to fs sb
> regardless of DCACHE_FSNOTIFY_PARENT_WATCHED.
>
> > Honestly I don't like the patch below much. How we are notifying parent
> > without sending FS_EVENT_ON_CHILD and modify behavior based on that flag
> > just looks subtle.
>
> I see, although please note that reporting FS_EVENT_ON_CHILD
> is strongly related to the "modified behavior", because unless this an
> a report of event on_child, DCACHE_FSNOTIFY_PARENT_WATCHED
> is not relevant.
>
> > So I'd rather move the fsnotify call to vfs_unlink(),
> > vfs_rmdir(), simple_unlink(), simple_rmdir(), and then those few callers of
> > d_delete() that remain as you suggest elsewhere in this thread. And then we
> > get more consistent context for fsnotify_nameremove() and could just use
> > fsnotify_dirent().
> >
>
> Yes, I much prefer this solution myself and I will follow up with it,
> but it would not be honest to suggest said solution as a stable fix
> to the performance regression that was introduced in v5.1.
> I think is it better if you choose between lesser evil:
> v1 with ifdef CONFIG_FSNOTIFY to fix build issue
> v2 as subtle as it is
> OR another obviously safe stable fix that you can think of
>
> The change of cleansing d_delete() from fsnotify_nameremove()
> requires more research and is anyway not stable tree material -
> if not for the level of complexity, then because all the users of
> FS_DELETE from pseudo and clustered filesystems need more time
> to find regressions (we do not have test coverage for those in LTP).
>

Something like this:
https://github.com/amir73il/linux/commits/fsnotify_nameremove

Only partially tested. Obviously haven't tested all callers.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-07 19:12         ` Amir Goldstein
  2019-05-08 16:09           ` Amir Goldstein
@ 2019-05-09  9:46           ` Jan Kara
  2019-05-10 14:57             ` Amir Goldstein
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Kara @ 2019-05-09  9:46 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Tue 07-05-19 22:12:57, Amir Goldstein wrote:
> On Tue, May 7, 2019 at 7:19 PM Jan Kara <jack@suse.cz> wrote:
> > So I'd rather move the fsnotify call to vfs_unlink(),
> > vfs_rmdir(), simple_unlink(), simple_rmdir(), and then those few callers of
> > d_delete() that remain as you suggest elsewhere in this thread. And then we
> > get more consistent context for fsnotify_nameremove() and could just use
> > fsnotify_dirent().
> >
> 
> Yes, I much prefer this solution myself and I will follow up with it,
> but it would not be honest to suggest said solution as a stable fix
> to the performance regression that was introduced in v5.1.
> I think is it better if you choose between lesser evil:
> v1 with ifdef CONFIG_FSNOTIFY to fix build issue
> v2 as subtle as it is
> OR another obviously safe stable fix that you can think of

OK, fair enough. I'll go with v1 + build fix for current merge window +
stable as it's local to fsnotify_nameremove().

> The change of cleansing d_delete() from fsnotify_nameremove()
> requires more research and is anyway not stable tree material -
> if not for the level of complexity, then because all the users of
> FS_DELETE from pseudo and clustered filesystems need more time
> to find regressions (we do not have test coverage for those in LTP).

Agreed.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-08 16:09           ` Amir Goldstein
@ 2019-05-09 10:31             ` Jan Kara
  2019-05-10 15:24               ` Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2019-05-09 10:31 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Wed 08-05-19 19:09:56, Amir Goldstein wrote:
> On Tue, May 7, 2019 at 10:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > Yes, I much prefer this solution myself and I will follow up with it,
> > but it would not be honest to suggest said solution as a stable fix
> > to the performance regression that was introduced in v5.1.
> > I think is it better if you choose between lesser evil:
> > v1 with ifdef CONFIG_FSNOTIFY to fix build issue
> > v2 as subtle as it is
> > OR another obviously safe stable fix that you can think of
> >
> > The change of cleansing d_delete() from fsnotify_nameremove()
> > requires more research and is anyway not stable tree material -
> > if not for the level of complexity, then because all the users of
> > FS_DELETE from pseudo and clustered filesystems need more time
> > to find regressions (we do not have test coverage for those in LTP).
> >
> 
> Something like this:
> https://github.com/amir73il/linux/commits/fsnotify_nameremove
> 
> Only partially tested. Obviously haven't tested all callers.

Not quite. I'd add the fsnotify_nameremove() call also to simple_rmdir()
and simple_unlink(). That takes care of:
arch/s390/hypfs/inode.c, drivers/infiniband/hw/qib/qib_fs.c,
fs/configfs/dir.c, fs/debugfs/inode.c, fs/tracefs/inode.c,
net/sunrpc/rpc_pipe.c

So you're left only with:
drivers/usb/gadget/function/f_fs.c, fs/btrfs/ioctl.c, fs/devpts/inode.c,
fs/reiserfs/xattr.c

Out of these drivers/usb/gadget/function/f_fs.c and fs/reiserfs/xattr.c
actually also don't want the notifications to be generated. They don't
generate events on file creation AFAICS and at least in case of reiserfs I
know that xattrs are handled in "hidden" system files so notification does
not make any sense. So we are left with exactly *two* places that need
explicit fsnotify_nameremove() call. Since both these callers already take
care of calling fsnotify_create() I think that having explicit
fsnotify_nameremove() calls there is clearer than the current state.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-09  9:46           ` Jan Kara
@ 2019-05-10 14:57             ` Amir Goldstein
  2019-05-13 16:23               ` Jan Kara
  0 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2019-05-10 14:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Thu, May 9, 2019 at 12:46 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 07-05-19 22:12:57, Amir Goldstein wrote:
> > On Tue, May 7, 2019 at 7:19 PM Jan Kara <jack@suse.cz> wrote:
> > > So I'd rather move the fsnotify call to vfs_unlink(),
> > > vfs_rmdir(), simple_unlink(), simple_rmdir(), and then those few callers of
> > > d_delete() that remain as you suggest elsewhere in this thread. And then we
> > > get more consistent context for fsnotify_nameremove() and could just use
> > > fsnotify_dirent().
> > >
> >
> > Yes, I much prefer this solution myself and I will follow up with it,
> > but it would not be honest to suggest said solution as a stable fix
> > to the performance regression that was introduced in v5.1.
> > I think is it better if you choose between lesser evil:
> > v1 with ifdef CONFIG_FSNOTIFY to fix build issue
> > v2 as subtle as it is
> > OR another obviously safe stable fix that you can think of
>
> OK, fair enough. I'll go with v1 + build fix for current merge window +
> stable as it's local to fsnotify_nameremove().

Please note that the patch on your fsnotify branch conflicts with
fsnotify_nameremove() changes in master:
230c6402b1b3 ovl_lookup_real_one(): don't bother with strlen()
25b229dff4ff fsnotify(): switch to passing const struct qstr * for file_name

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-09 10:31             ` Jan Kara
@ 2019-05-10 15:24               ` Amir Goldstein
  2019-05-13 16:33                 ` Jan Kara
  0 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2019-05-10 15:24 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Thu, May 9, 2019 at 1:31 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 08-05-19 19:09:56, Amir Goldstein wrote:
> > On Tue, May 7, 2019 at 10:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > Yes, I much prefer this solution myself and I will follow up with it,
> > > but it would not be honest to suggest said solution as a stable fix
> > > to the performance regression that was introduced in v5.1.
> > > I think is it better if you choose between lesser evil:
> > > v1 with ifdef CONFIG_FSNOTIFY to fix build issue
> > > v2 as subtle as it is
> > > OR another obviously safe stable fix that you can think of
> > >
> > > The change of cleansing d_delete() from fsnotify_nameremove()
> > > requires more research and is anyway not stable tree material -
> > > if not for the level of complexity, then because all the users of
> > > FS_DELETE from pseudo and clustered filesystems need more time
> > > to find regressions (we do not have test coverage for those in LTP).
> > >
> >
> > Something like this:
> > https://github.com/amir73il/linux/commits/fsnotify_nameremove
> >
> > Only partially tested. Obviously haven't tested all callers.
>
> Not quite. I'd add the fsnotify_nameremove() call also to simple_rmdir()
> and simple_unlink(). That takes care of:
> arch/s390/hypfs/inode.c, drivers/infiniband/hw/qib/qib_fs.c,
> fs/configfs/dir.c, fs/debugfs/inode.c, fs/tracefs/inode.c,
> net/sunrpc/rpc_pipe.c
>

simple_unlink() is used as i_op->unlink() implementation of simple
filesystems, such as: fs/pstore/inode.c fs/ramfs/inode.c
fs/ocfs2/dlmfs/dlmfs.c fs/hugetlbfs/inode.c kernel/bpf/inode.c

If we place fsnotify hook in the filesystem implementation and not
in vfs_unlink(), what will cover normal fs? If we do place fsnotify hook
in vfs_unlink(), then we have a double call to hook.

The places we add explicit fsnofity hooks should only be call sites that
do not originate from vfs_unlink/vfs_rmdir.

> So you're left only with:
> drivers/usb/gadget/function/f_fs.c, fs/btrfs/ioctl.c, fs/devpts/inode.c,
> fs/reiserfs/xattr.c
>
> Out of these drivers/usb/gadget/function/f_fs.c and fs/reiserfs/xattr.c
> actually also don't want the notifications to be generated. They don't
> generate events on file creation AFAICS and at least in case of reiserfs I
> know that xattrs are handled in "hidden" system files so notification does
> not make any sense. So we are left with exactly *two* places that need

OK. good to know.

> explicit fsnotify_nameremove() call. Since both these callers already take
> care of calling fsnotify_create() I think that having explicit
> fsnotify_nameremove() calls there is clearer than the current state.
>

I though so too, but then I did not feel comfortable with "regressing"
delete notifications on many filesystems that did seem plausible for
having notification users like configfs, even though they do not have
create notification, so I decided to do the safer option of converting all
plausible callers of d_delete() that abide to the parent/name stable
constrains to use the d_delete_and_notify() wrapper.

For readers that did not follow my link, those are the call site that were
converted to opt-in for d_delete_and_notify() in the linked branch:
arch/s390/hypfs/inode.c
drivers/infiniband/hw/qib/qib_fs.c
drivers/usb/gadget/function/f_fs.c
fs/btrfs/ioctl.c
fs/configfs/dir.c
fs/debugfs/inode.c
fs/devpts/inode.c
fs/reiserfs/xattr.c
fs/tracefs/inode.c
net/sunrpc/rpc_pipe.c

In addition, nfs and afs no longer need to call fsnotify hook explcitly on
completion of silly rename (delayed unlink).

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-10 14:57             ` Amir Goldstein
@ 2019-05-13 16:23               ` Jan Kara
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2019-05-13 16:23 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Fri 10-05-19 17:57:49, Amir Goldstein wrote:
> On Thu, May 9, 2019 at 12:46 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 07-05-19 22:12:57, Amir Goldstein wrote:
> > > On Tue, May 7, 2019 at 7:19 PM Jan Kara <jack@suse.cz> wrote:
> > > > So I'd rather move the fsnotify call to vfs_unlink(),
> > > > vfs_rmdir(), simple_unlink(), simple_rmdir(), and then those few callers of
> > > > d_delete() that remain as you suggest elsewhere in this thread. And then we
> > > > get more consistent context for fsnotify_nameremove() and could just use
> > > > fsnotify_dirent().
> > > >
> > >
> > > Yes, I much prefer this solution myself and I will follow up with it,
> > > but it would not be honest to suggest said solution as a stable fix
> > > to the performance regression that was introduced in v5.1.
> > > I think is it better if you choose between lesser evil:
> > > v1 with ifdef CONFIG_FSNOTIFY to fix build issue
> > > v2 as subtle as it is
> > > OR another obviously safe stable fix that you can think of
> >
> > OK, fair enough. I'll go with v1 + build fix for current merge window +
> > stable as it's local to fsnotify_nameremove().
> 
> Please note that the patch on your fsnotify branch conflicts with
> fsnotify_nameremove() changes in master:
> 230c6402b1b3 ovl_lookup_real_one(): don't bother with strlen()
> 25b229dff4ff fsnotify(): switch to passing const struct qstr * for file_name

Thanks for the heads up! The conflict is easy enough to resolve but I've
notified Linus about it in my pull request. Hum, which reminds me that I've
forgotten to pull the latest fix in fsnotify branch into my for_next branch
and so I didn't get a notification about the conflict myself. Too late to
fix that now I guess...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-10 15:24               ` Amir Goldstein
@ 2019-05-13 16:33                 ` Jan Kara
  2019-05-13 16:47                   ` Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2019-05-13 16:33 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Fri 10-05-19 18:24:42, Amir Goldstein wrote:
> On Thu, May 9, 2019 at 1:31 PM Jan Kara <jack@suse.cz> wrote:
> > On Wed 08-05-19 19:09:56, Amir Goldstein wrote:
> > > On Tue, May 7, 2019 at 10:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > Yes, I much prefer this solution myself and I will follow up with it,
> > > > but it would not be honest to suggest said solution as a stable fix
> > > > to the performance regression that was introduced in v5.1.
> > > > I think is it better if you choose between lesser evil:
> > > > v1 with ifdef CONFIG_FSNOTIFY to fix build issue
> > > > v2 as subtle as it is
> > > > OR another obviously safe stable fix that you can think of
> > > >
> > > > The change of cleansing d_delete() from fsnotify_nameremove()
> > > > requires more research and is anyway not stable tree material -
> > > > if not for the level of complexity, then because all the users of
> > > > FS_DELETE from pseudo and clustered filesystems need more time
> > > > to find regressions (we do not have test coverage for those in LTP).
> > > >
> > >
> > > Something like this:
> > > https://github.com/amir73il/linux/commits/fsnotify_nameremove
> > >
> > > Only partially tested. Obviously haven't tested all callers.
> >
> > Not quite. I'd add the fsnotify_nameremove() call also to simple_rmdir()
> > and simple_unlink(). That takes care of:
> > arch/s390/hypfs/inode.c, drivers/infiniband/hw/qib/qib_fs.c,
> > fs/configfs/dir.c, fs/debugfs/inode.c, fs/tracefs/inode.c,
> > net/sunrpc/rpc_pipe.c
> >
> 
> simple_unlink() is used as i_op->unlink() implementation of simple
> filesystems, such as: fs/pstore/inode.c fs/ramfs/inode.c
> fs/ocfs2/dlmfs/dlmfs.c fs/hugetlbfs/inode.c kernel/bpf/inode.c
> 
> If we place fsnotify hook in the filesystem implementation and not
> in vfs_unlink(), what will cover normal fs? If we do place fsnotify hook
> in vfs_unlink(), then we have a double call to hook.
> 
> The places we add explicit fsnofity hooks should only be call sites that
> do not originate from vfs_unlink/vfs_rmdir.

Hum, right. I didn't realize simple_unlink() gets also called through
vfs_unlink() for some filesystems. But then I'd rather create variants
simple_unlink_notify() and simple_rmdir_notify() than messing with
d_delete(). As I just think that fsnotify call in d_delete() happens at a
wrong layer. d_delete() is about dcache management, not really about
filesystem name removal which is what we want to notify about.

> > So you're left only with:
> > drivers/usb/gadget/function/f_fs.c, fs/btrfs/ioctl.c, fs/devpts/inode.c,
> > fs/reiserfs/xattr.c
> >
> > Out of these drivers/usb/gadget/function/f_fs.c and fs/reiserfs/xattr.c
> > actually also don't want the notifications to be generated. They don't
> > generate events on file creation AFAICS and at least in case of reiserfs I
> > know that xattrs are handled in "hidden" system files so notification does
> > not make any sense. So we are left with exactly *two* places that need
> 
> OK. good to know.
> 
> > explicit fsnotify_nameremove() call. Since both these callers already take
> > care of calling fsnotify_create() I think that having explicit
> > fsnotify_nameremove() calls there is clearer than the current state.
> >
> 
> I though so too, but then I did not feel comfortable with "regressing"
> delete notifications on many filesystems that did seem plausible for
> having notification users like configfs, even though they do not have
> create notification, so I decided to do the safer option of converting all
> plausible callers of d_delete() that abide to the parent/name stable
> constrains to use the d_delete_and_notify() wrapper.

As I said above, I don't want to regress anyone either. But I just think
that d_delete() is a wrong function to call fsnotify hook from (as much as
it is convenient) and that's what's causing all these problems.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] fsnotify: fix unlink performance regression
  2019-05-13 16:33                 ` Jan Kara
@ 2019-05-13 16:47                   ` Amir Goldstein
  0 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2019-05-13 16:47 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Linus Torvalds, linux-fsdevel, LKP

On Mon, May 13, 2019 at 7:33 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 10-05-19 18:24:42, Amir Goldstein wrote:
> > On Thu, May 9, 2019 at 1:31 PM Jan Kara <jack@suse.cz> wrote:
> > > On Wed 08-05-19 19:09:56, Amir Goldstein wrote:
> > > > On Tue, May 7, 2019 at 10:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > Yes, I much prefer this solution myself and I will follow up with it,
> > > > > but it would not be honest to suggest said solution as a stable fix
> > > > > to the performance regression that was introduced in v5.1.
> > > > > I think is it better if you choose between lesser evil:
> > > > > v1 with ifdef CONFIG_FSNOTIFY to fix build issue
> > > > > v2 as subtle as it is
> > > > > OR another obviously safe stable fix that you can think of
> > > > >
> > > > > The change of cleansing d_delete() from fsnotify_nameremove()
> > > > > requires more research and is anyway not stable tree material -
> > > > > if not for the level of complexity, then because all the users of
> > > > > FS_DELETE from pseudo and clustered filesystems need more time
> > > > > to find regressions (we do not have test coverage for those in LTP).
> > > > >
> > > >
> > > > Something like this:
> > > > https://github.com/amir73il/linux/commits/fsnotify_nameremove
> > > >
> > > > Only partially tested. Obviously haven't tested all callers.
> > >
> > > Not quite. I'd add the fsnotify_nameremove() call also to simple_rmdir()
> > > and simple_unlink(). That takes care of:
> > > arch/s390/hypfs/inode.c, drivers/infiniband/hw/qib/qib_fs.c,
> > > fs/configfs/dir.c, fs/debugfs/inode.c, fs/tracefs/inode.c,
> > > net/sunrpc/rpc_pipe.c
> > >
> >
> > simple_unlink() is used as i_op->unlink() implementation of simple
> > filesystems, such as: fs/pstore/inode.c fs/ramfs/inode.c
> > fs/ocfs2/dlmfs/dlmfs.c fs/hugetlbfs/inode.c kernel/bpf/inode.c
> >
> > If we place fsnotify hook in the filesystem implementation and not
> > in vfs_unlink(), what will cover normal fs? If we do place fsnotify hook
> > in vfs_unlink(), then we have a double call to hook.
> >
> > The places we add explicit fsnofity hooks should only be call sites that
> > do not originate from vfs_unlink/vfs_rmdir.
>
> Hum, right. I didn't realize simple_unlink() gets also called through
> vfs_unlink() for some filesystems. But then I'd rather create variants
> simple_unlink_notify() and simple_rmdir_notify() than messing with
> d_delete(). As I just think that fsnotify call in d_delete() happens at a
> wrong layer. d_delete() is about dcache management, not really about
> filesystem name removal which is what we want to notify about.
>

Agreed.
I'll follow up with this solution, hopefully after the stable regression
fix is already merged.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-05-13 16:47 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-05  9:15 [PATCH] fsnotify: fix unlink performance regression Amir Goldstein
2019-05-05 13:05 ` Al Viro
2019-05-05 13:19   ` Amir Goldstein
2019-05-05 13:47     ` Al Viro
2019-05-05 14:18       ` Amir Goldstein
2019-05-06 12:43       ` Amir Goldstein
2019-05-06 14:26         ` Al Viro
2019-05-06 16:22           ` Amir Goldstein
2019-05-05 16:33 ` kbuild test robot
2019-05-05 18:26   ` Amir Goldstein
2019-05-05 20:07     ` [PATCH v2] " Amir Goldstein
2019-05-07 16:19       ` Jan Kara
2019-05-07 19:12         ` Amir Goldstein
2019-05-08 16:09           ` Amir Goldstein
2019-05-09 10:31             ` Jan Kara
2019-05-10 15:24               ` Amir Goldstein
2019-05-13 16:33                 ` Jan Kara
2019-05-13 16:47                   ` Amir Goldstein
2019-05-09  9:46           ` Jan Kara
2019-05-10 14:57             ` Amir Goldstein
2019-05-13 16:23               ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).