linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] kernfs: remove i_lock usage that isn't needed
@ 2022-10-18  2:32 Ian Kent
  2022-10-18  2:32 ` [PATCH 1/2] kernfs: dont take i_lock on inode attr read Ian Kent
  2022-10-18  2:32 ` [PATCH 2/2] kernfs: dont take i_lock on revalidate Ian Kent
  0 siblings, 2 replies; 25+ messages in thread
From: Ian Kent @ 2022-10-18  2:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo
  Cc: Minchan Kim, Eric Sandeen, Al Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List

In the patch series to change the global kernfs mutex to a read-write
semaphore the inode i_lock is used in several functions.

But after thinking about it the i_lock is in fact not needed.

Signed-off-by: Ian Kent <raven@themaw.net>
---

Ian Kent (2):
      kernfs: dont take i_lock on inode attr read
      kernfs: dont take i_lock on revalidate


 fs/kernfs/dir.c   | 24 +++++++++++++++++-------
 fs/kernfs/inode.c |  4 ----
 2 files changed, 17 insertions(+), 11 deletions(-)

--
Ian


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2022-10-18  2:32 [PATCH 0/2] kernfs: remove i_lock usage that isn't needed Ian Kent
@ 2022-10-18  2:32 ` Ian Kent
  2022-10-24  8:50   ` Miklos Szeredi
  2022-10-31 22:30   ` Tejun Heo
  2022-10-18  2:32 ` [PATCH 2/2] kernfs: dont take i_lock on revalidate Ian Kent
  1 sibling, 2 replies; 25+ messages in thread
From: Ian Kent @ 2022-10-18  2:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo
  Cc: Minchan Kim, Eric Sandeen, Al Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List

The kernfs write lock is held when the kernfs node inode attributes
are updated. Therefore, when either kernfs_iop_getattr() or
kernfs_iop_permission() are called the kernfs node inode attributes
won't change.

Consequently concurrent kernfs_refresh_inode() calls always copy the
same values from the kernfs node.

So there's no need to take the inode i_lock to get consistent values
for generic_fillattr() and generic_permission(), the kernfs read lock
is sufficient.

Signed-off-by: Ian Kent <raven@themaw.net>
---
 fs/kernfs/inode.c |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index 3d783d80f5da..74f3453f4639 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -190,10 +190,8 @@ int kernfs_iop_getattr(struct user_namespace *mnt_userns,
 	struct kernfs_root *root = kernfs_root(kn);
 
 	down_read(&root->kernfs_rwsem);
-	spin_lock(&inode->i_lock);
 	kernfs_refresh_inode(kn, inode);
 	generic_fillattr(&init_user_ns, inode, stat);
-	spin_unlock(&inode->i_lock);
 	up_read(&root->kernfs_rwsem);
 
 	return 0;
@@ -288,10 +286,8 @@ int kernfs_iop_permission(struct user_namespace *mnt_userns,
 	root = kernfs_root(kn);
 
 	down_read(&root->kernfs_rwsem);
-	spin_lock(&inode->i_lock);
 	kernfs_refresh_inode(kn, inode);
 	ret = generic_permission(&init_user_ns, inode, mask);
-	spin_unlock(&inode->i_lock);
 	up_read(&root->kernfs_rwsem);
 
 	return ret;



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] kernfs: dont take i_lock on revalidate
  2022-10-18  2:32 [PATCH 0/2] kernfs: remove i_lock usage that isn't needed Ian Kent
  2022-10-18  2:32 ` [PATCH 1/2] kernfs: dont take i_lock on inode attr read Ian Kent
@ 2022-10-18  2:32 ` Ian Kent
  2022-10-24  8:38   ` Miklos Szeredi
                     ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Ian Kent @ 2022-10-18  2:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo
  Cc: Minchan Kim, Eric Sandeen, Al Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List

In kernfs_dop_revalidate() when the passed in dentry is negative the
dentry directory is checked to see if it has changed and if so the
negative dentry is discarded so it can refreshed. During this check
the dentry inode i_lock is taken to mitigate against a possible
concurrent rename.

But if it's racing with a rename, becuase the dentry is negative, it
can't be the source it must be the target and it must be going to do
a d_move() otherwise the rename will return an error.

In this case the parent dentry of the target will not change, it will
be the same over the d_move(), only the source dentry parent may change
so the inode i_lock isn't needed.

Signed-off-by: Ian Kent <raven@themaw.net>
---
 fs/kernfs/dir.c |   24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 3990f3e270cb..6acd9c3d4cff 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1073,20 +1073,30 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
 
 		/* If the kernfs parent node has changed discard and
 		 * proceed to ->lookup.
+		 *
+		 * There's nothing special needed here when getting the
+		 * dentry parent, even if a concurrent rename is in
+		 * progress. That's because the dentry is negative so
+		 * it can only be the target of the rename and it will
+		 * be doing a d_move() not a replace. Consequently the
+		 * dentry d_parent won't change over the d_move().
+		 *
+		 * Also kernfs negative dentries transitioning from
+		 * negative to positive during revalidate won't happen
+		 * because they are invalidated on containing directory
+		 * changes and the lookup re-done so that a new positive
+		 * dentry can be properly created.
 		 */
-		spin_lock(&dentry->d_lock);
+		root = kernfs_root_from_sb(dentry->d_sb);
+		down_read(&root->kernfs_rwsem);
 		parent = kernfs_dentry_node(dentry->d_parent);
 		if (parent) {
-			spin_unlock(&dentry->d_lock);
-			root = kernfs_root(parent);
-			down_read(&root->kernfs_rwsem);
 			if (kernfs_dir_changed(parent, dentry)) {
 				up_read(&root->kernfs_rwsem);
 				return 0;
 			}
-			up_read(&root->kernfs_rwsem);
-		} else
-			spin_unlock(&dentry->d_lock);
+		}
+		up_read(&root->kernfs_rwsem);
 
 		/* The kernfs parent node hasn't changed, leave the
 		 * dentry negative and return success.



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] kernfs: dont take i_lock on revalidate
  2022-10-18  2:32 ` [PATCH 2/2] kernfs: dont take i_lock on revalidate Ian Kent
@ 2022-10-24  8:38   ` Miklos Szeredi
  2022-10-31 22:31   ` Tejun Heo
  2022-11-01  7:46   ` Amir Goldstein
  2 siblings, 0 replies; 25+ messages in thread
From: Miklos Szeredi @ 2022-10-24  8:38 UTC (permalink / raw)
  To: Ian Kent
  Cc: Greg Kroah-Hartman, Tejun Heo, Minchan Kim, Eric Sandeen,
	Al Viro, Rick Lindsley, David Howells, Carlos Maiolino,
	linux-fsdevel, Kernel Mailing List

On Tue, 18 Oct 2022 at 04:32, Ian Kent <raven@themaw.net> wrote:
>
> In kernfs_dop_revalidate() when the passed in dentry is negative the
> dentry directory is checked to see if it has changed and if so the
> negative dentry is discarded so it can refreshed. During this check
> the dentry inode i_lock is taken to mitigate against a possible
> concurrent rename.
>
> But if it's racing with a rename, becuase the dentry is negative, it
> can't be the source it must be the target and it must be going to do
> a d_move() otherwise the rename will return an error.
>
> In this case the parent dentry of the target will not change, it will
> be the same over the d_move(), only the source dentry parent may change
> so the inode i_lock isn't needed.
>
> Signed-off-by: Ian Kent <raven@themaw.net>

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2022-10-18  2:32 ` [PATCH 1/2] kernfs: dont take i_lock on inode attr read Ian Kent
@ 2022-10-24  8:50   ` Miklos Szeredi
  2022-10-31 22:30   ` Tejun Heo
  1 sibling, 0 replies; 25+ messages in thread
From: Miklos Szeredi @ 2022-10-24  8:50 UTC (permalink / raw)
  To: Ian Kent
  Cc: Greg Kroah-Hartman, Tejun Heo, Minchan Kim, Eric Sandeen,
	Al Viro, Rick Lindsley, David Howells, Carlos Maiolino,
	linux-fsdevel, Kernel Mailing List

On Tue, 18 Oct 2022 at 04:32, Ian Kent <raven@themaw.net> wrote:
>
> The kernfs write lock is held when the kernfs node inode attributes
> are updated. Therefore, when either kernfs_iop_getattr() or
> kernfs_iop_permission() are called the kernfs node inode attributes
> won't change.
>
> Consequently concurrent kernfs_refresh_inode() calls always copy the
> same values from the kernfs node.
>
> So there's no need to take the inode i_lock to get consistent values
> for generic_fillattr() and generic_permission(), the kernfs read lock
> is sufficient.
>
> Signed-off-by: Ian Kent <raven@themaw.net>

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2022-10-18  2:32 ` [PATCH 1/2] kernfs: dont take i_lock on inode attr read Ian Kent
  2022-10-24  8:50   ` Miklos Szeredi
@ 2022-10-31 22:30   ` Tejun Heo
  2022-12-21 13:34     ` Anders Roxell
  1 sibling, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2022-10-31 22:30 UTC (permalink / raw)
  To: Ian Kent
  Cc: Greg Kroah-Hartman, Minchan Kim, Eric Sandeen, Al Viro,
	Rick Lindsley, David Howells, Miklos Szeredi, Carlos Maiolino,
	linux-fsdevel, Kernel Mailing List

On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote:
> The kernfs write lock is held when the kernfs node inode attributes
> are updated. Therefore, when either kernfs_iop_getattr() or
> kernfs_iop_permission() are called the kernfs node inode attributes
> won't change.
> 
> Consequently concurrent kernfs_refresh_inode() calls always copy the
> same values from the kernfs node.
> 
> So there's no need to take the inode i_lock to get consistent values
> for generic_fillattr() and generic_permission(), the kernfs read lock
> is sufficient.
> 
> Signed-off-by: Ian Kent <raven@themaw.net>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] kernfs: dont take i_lock on revalidate
  2022-10-18  2:32 ` [PATCH 2/2] kernfs: dont take i_lock on revalidate Ian Kent
  2022-10-24  8:38   ` Miklos Szeredi
@ 2022-10-31 22:31   ` Tejun Heo
  2022-11-01  7:46   ` Amir Goldstein
  2 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2022-10-31 22:31 UTC (permalink / raw)
  To: Ian Kent
  Cc: Greg Kroah-Hartman, Minchan Kim, Eric Sandeen, Al Viro,
	Rick Lindsley, David Howells, Miklos Szeredi, Carlos Maiolino,
	linux-fsdevel, Kernel Mailing List

On Tue, Oct 18, 2022 at 10:32:49AM +0800, Ian Kent wrote:
> In kernfs_dop_revalidate() when the passed in dentry is negative the
> dentry directory is checked to see if it has changed and if so the
> negative dentry is discarded so it can refreshed. During this check
> the dentry inode i_lock is taken to mitigate against a possible
> concurrent rename.
> 
> But if it's racing with a rename, becuase the dentry is negative, it
> can't be the source it must be the target and it must be going to do
> a d_move() otherwise the rename will return an error.
> 
> In this case the parent dentry of the target will not change, it will
> be the same over the d_move(), only the source dentry parent may change
> so the inode i_lock isn't needed.
> 
> Signed-off-by: Ian Kent <raven@themaw.net>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] kernfs: dont take i_lock on revalidate
  2022-10-18  2:32 ` [PATCH 2/2] kernfs: dont take i_lock on revalidate Ian Kent
  2022-10-24  8:38   ` Miklos Szeredi
  2022-10-31 22:31   ` Tejun Heo
@ 2022-11-01  7:46   ` Amir Goldstein
  2022-11-01  8:09     ` Ian Kent
  2 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2022-11-01  7:46 UTC (permalink / raw)
  To: Ian Kent
  Cc: Greg Kroah-Hartman, Tejun Heo, Minchan Kim, Eric Sandeen,
	Al Viro, Rick Lindsley, David Howells, Miklos Szeredi,
	Carlos Maiolino, linux-fsdevel, Kernel Mailing List

On Tue, Oct 18, 2022 at 5:58 AM Ian Kent <raven@themaw.net> wrote:
>
> In kernfs_dop_revalidate() when the passed in dentry is negative the
> dentry directory is checked to see if it has changed and if so the
> negative dentry is discarded so it can refreshed. During this check
> the dentry inode i_lock is taken to mitigate against a possible
> concurrent rename.
>
> But if it's racing with a rename, becuase the dentry is negative, it
> can't be the source it must be the target and it must be going to do
> a d_move() otherwise the rename will return an error.
>
> In this case the parent dentry of the target will not change, it will
> be the same over the d_move(), only the source dentry parent may change
> so the inode i_lock isn't needed.

You meant d_lock.
Same for the commit title.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] kernfs: dont take i_lock on revalidate
  2022-11-01  7:46   ` Amir Goldstein
@ 2022-11-01  8:09     ` Ian Kent
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Kent @ 2022-11-01  8:09 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Greg Kroah-Hartman, Tejun Heo, Minchan Kim, Eric Sandeen,
	Al Viro, Rick Lindsley, David Howells, Miklos Szeredi,
	Carlos Maiolino, linux-fsdevel, Kernel Mailing List


On 1/11/22 15:46, Amir Goldstein wrote:
> On Tue, Oct 18, 2022 at 5:58 AM Ian Kent <raven@themaw.net> wrote:
>> In kernfs_dop_revalidate() when the passed in dentry is negative the
>> dentry directory is checked to see if it has changed and if so the
>> negative dentry is discarded so it can refreshed. During this check
>> the dentry inode i_lock is taken to mitigate against a possible
>> concurrent rename.
>>
>> But if it's racing with a rename, becuase the dentry is negative, it
>> can't be the source it must be the target and it must be going to do
>> a d_move() otherwise the rename will return an error.
>>
>> In this case the parent dentry of the target will not change, it will
>> be the same over the d_move(), only the source dentry parent may change
>> so the inode i_lock isn't needed.
> You meant d_lock.
> Same for the commit title.

Ha, well how do you like that, such an obvious mistake, how

did I not see it?


Not sure what to do about it now though ...

Any suggestions anyone?


Ian



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2022-10-31 22:30   ` Tejun Heo
@ 2022-12-21 13:34     ` Anders Roxell
  2022-12-22 23:11       ` Ian Kent
  0 siblings, 1 reply; 25+ messages in thread
From: Anders Roxell @ 2022-12-21 13:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ian Kent, Greg Kroah-Hartman, Minchan Kim, Eric Sandeen, Al Viro,
	Rick Lindsley, David Howells, Miklos Szeredi, Carlos Maiolino,
	linux-fsdevel, Kernel Mailing List, elver, arnd

On 2022-10-31 12:30, Tejun Heo wrote:
> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote:
> > The kernfs write lock is held when the kernfs node inode attributes
> > are updated. Therefore, when either kernfs_iop_getattr() or
> > kernfs_iop_permission() are called the kernfs node inode attributes
> > won't change.
> > 
> > Consequently concurrent kernfs_refresh_inode() calls always copy the
> > same values from the kernfs node.
> > 
> > So there's no need to take the inode i_lock to get consistent values
> > for generic_fillattr() and generic_permission(), the kernfs read lock
> > is sufficient.
> > 
> > Signed-off-by: Ian Kent <raven@themaw.net>
> 
> Acked-by: Tejun Heo <tj@kernel.org>

Hi,

Building an allmodconfig arm64 kernel on yesterdays next-20221220 and
booting that in qemu I see the following "BUG: KCSAN: data-race in
set_nlink / set_nlink".


==================================================================
[ 1540.388669][  T123] BUG: KCSAN: data-race in set_nlink / set_nlink
[ 1540.392779][  T123] 
[ 1540.394302][  T123] write to 0xffff00000adcc3e4 of 4 bytes by task 126 on cpu 0:
[ 1540.398828][ T123] set_nlink (/home/anders/src/kernel/next/fs/inode.c:371) 
[ 1540.401609][ T123] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:181) 
[ 1540.404925][ T123] kernfs_iop_getattr (/home/anders/src/kernel/next/fs/kernfs/inode.c:194) 
[ 1540.408088][ T123] vfs_getattr_nosec (/home/anders/src/kernel/next/fs/stat.c:133) 
[ 1540.411334][ T123] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:170) 
[ 1540.414224][ T123] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276) 
[ 1540.417166][ T123] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446) 
[ 1540.420539][ T123] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440) 
[ 1540.424003][ T123] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:142) 
[ 1540.427648][ T123] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:197) 
[ 1540.430378][ T123] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:142 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:638) 
[ 1540.433053][ T123] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:656) 
[ 1540.436421][ T123] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:584) 
[ 1540.439303][  T123] 
[ 1540.440828][  T123] 1 lock held by systemd-udevd/126:
[ 1540.444055][ T123] #0: ffff00000609b960 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_iop_getattr (/home/anders/src/kernel/next/fs/kernfs/inode.c:193) 
[ 1540.450699][  T123] irq event stamp: 185034
[ 1540.453373][ T123] hardirqs last enabled at (185034): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/mm/page_alloc.c:5302) 
[ 1540.460087][ T123] hardirqs last disabled at (185033): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:103 (discriminator 4)) 
[ 1540.466686][ T123] softirqs last enabled at (185001): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1780) 
[ 1540.473310][ T123] softirqs last disabled at (184999): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1773) 
[ 1540.479872][  T123] 
[ 1540.481406][  T123] read to 0xffff00000adcc3e4 of 4 bytes by task 123 on cpu 0:
[ 1540.485893][ T123] set_nlink (/home/anders/src/kernel/next/fs/inode.c:368) 
[ 1540.488663][ T123] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:181) 
[ 1540.491961][ T123] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:290) 
[ 1540.495260][ T123] inode_permission (/home/anders/src/kernel/next/fs/namei.c:458 /home/anders/src/kernel/next/fs/namei.c:525) 
[ 1540.498450][ T123] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1715 /home/anders/src/kernel/next/fs/namei.c:2262) 
[ 1540.501552][ T123] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2473 (discriminator 2)) 
[ 1540.504592][ T123] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2503) 
[ 1540.507740][ T123] user_path_at_empty (/home/anders/src/kernel/next/fs/namei.c:2876) 
[ 1540.511010][ T123] do_readlinkat (/home/anders/src/kernel/next/fs/stat.c:477) 
[ 1540.514097][ T123] __arm64_sys_readlinkat (/home/anders/src/kernel/next/fs/stat.c:504 /home/anders/src/kernel/next/fs/stat.c:501 /home/anders/src/kernel/next/fs/stat.c:501) 
[ 1540.517598][ T123] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:142) 
[ 1540.521319][ T123] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:197) 
[ 1540.524125][ T123] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:142 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:638) 
[ 1540.526795][ T123] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:656) 
[ 1540.530224][ T123] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:584) 
[ 1540.533176][  T123] 
[ 1540.534710][  T123] 1 lock held by systemd-udevd/123:
[ 1540.537977][ T123] #0: ffff00000609b960 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) 
[ 1540.544892][  T123] irq event stamp: 216564
[ 1540.547603][ T123] hardirqs last enabled at (216564): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/mm/page_alloc.c:5302) 
[ 1540.554368][ T123] hardirqs last disabled at (216563): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:103 (discriminator 4)) 
[ 1540.561107][ T123] softirqs last enabled at (216533): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1780) 
[ 1540.567833][ T123] softirqs last disabled at (216531): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1773) 
[ 1540.574496][  T123] 
[ 1540.576050][  T123] Reported by Kernel Concurrency Sanitizer on:
[ 1540.587925][  T123] Hardware name: linux,dummy-virt (DT)
[ 1540.591282][  T123] ==================================================================


Reverting this patch I don't see the data race anymore.

Cheers,
Anders

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2022-12-21 13:34     ` Anders Roxell
@ 2022-12-22 23:11       ` Ian Kent
  2022-12-29  9:20         ` Arnd Bergmann
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Kent @ 2022-12-22 23:11 UTC (permalink / raw)
  To: Anders Roxell, Tejun Heo
  Cc: Greg Kroah-Hartman, Minchan Kim, Eric Sandeen, Al Viro,
	Rick Lindsley, David Howells, Miklos Szeredi, Carlos Maiolino,
	linux-fsdevel, Kernel Mailing List, elver, arnd


On 21/12/22 21:34, Anders Roxell wrote:
> On 2022-10-31 12:30, Tejun Heo wrote:
>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote:
>>> The kernfs write lock is held when the kernfs node inode attributes
>>> are updated. Therefore, when either kernfs_iop_getattr() or
>>> kernfs_iop_permission() are called the kernfs node inode attributes
>>> won't change.
>>>
>>> Consequently concurrent kernfs_refresh_inode() calls always copy the
>>> same values from the kernfs node.
>>>
>>> So there's no need to take the inode i_lock to get consistent values
>>> for generic_fillattr() and generic_permission(), the kernfs read lock
>>> is sufficient.
>>>
>>> Signed-off-by: Ian Kent <raven@themaw.net>
>> Acked-by: Tejun Heo <tj@kernel.org>
> Hi,
>
> Building an allmodconfig arm64 kernel on yesterdays next-20221220 and
> booting that in qemu I see the following "BUG: KCSAN: data-race in
> set_nlink / set_nlink".


I'll check if I missed any places where set_link() could be

called where the link count could be different.


If there aren't any the question will then be can writing the

same value to this location in multiple concurrent threads

corrupt it?


Ian

>
>
> ==================================================================
> [ 1540.388669][  T123] BUG: KCSAN: data-race in set_nlink / set_nlink
> [ 1540.392779][  T123]
> [ 1540.394302][  T123] write to 0xffff00000adcc3e4 of 4 bytes by task 126 on cpu 0:
> [ 1540.398828][ T123] set_nlink (/home/anders/src/kernel/next/fs/inode.c:371)
> [ 1540.401609][ T123] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:181)
> [ 1540.404925][ T123] kernfs_iop_getattr (/home/anders/src/kernel/next/fs/kernfs/inode.c:194)
> [ 1540.408088][ T123] vfs_getattr_nosec (/home/anders/src/kernel/next/fs/stat.c:133)
> [ 1540.411334][ T123] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:170)
> [ 1540.414224][ T123] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276)
> [ 1540.417166][ T123] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446)
> [ 1540.420539][ T123] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440)
> [ 1540.424003][ T123] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:142)
> [ 1540.427648][ T123] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:197)
> [ 1540.430378][ T123] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:142 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:638)
> [ 1540.433053][ T123] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:656)
> [ 1540.436421][ T123] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:584)
> [ 1540.439303][  T123]
> [ 1540.440828][  T123] 1 lock held by systemd-udevd/126:
> [ 1540.444055][ T123] #0: ffff00000609b960 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_iop_getattr (/home/anders/src/kernel/next/fs/kernfs/inode.c:193)
> [ 1540.450699][  T123] irq event stamp: 185034
> [ 1540.453373][ T123] hardirqs last enabled at (185034): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/mm/page_alloc.c:5302)
> [ 1540.460087][ T123] hardirqs last disabled at (185033): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:103 (discriminator 4))
> [ 1540.466686][ T123] softirqs last enabled at (185001): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1780)
> [ 1540.473310][ T123] softirqs last disabled at (184999): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1773)
> [ 1540.479872][  T123]
> [ 1540.481406][  T123] read to 0xffff00000adcc3e4 of 4 bytes by task 123 on cpu 0:
> [ 1540.485893][ T123] set_nlink (/home/anders/src/kernel/next/fs/inode.c:368)
> [ 1540.488663][ T123] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:181)
> [ 1540.491961][ T123] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:290)
> [ 1540.495260][ T123] inode_permission (/home/anders/src/kernel/next/fs/namei.c:458 /home/anders/src/kernel/next/fs/namei.c:525)
> [ 1540.498450][ T123] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1715 /home/anders/src/kernel/next/fs/namei.c:2262)
> [ 1540.501552][ T123] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2473 (discriminator 2))
> [ 1540.504592][ T123] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2503)
> [ 1540.507740][ T123] user_path_at_empty (/home/anders/src/kernel/next/fs/namei.c:2876)
> [ 1540.511010][ T123] do_readlinkat (/home/anders/src/kernel/next/fs/stat.c:477)
> [ 1540.514097][ T123] __arm64_sys_readlinkat (/home/anders/src/kernel/next/fs/stat.c:504 /home/anders/src/kernel/next/fs/stat.c:501 /home/anders/src/kernel/next/fs/stat.c:501)
> [ 1540.517598][ T123] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:142)
> [ 1540.521319][ T123] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:197)
> [ 1540.524125][ T123] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:142 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:638)
> [ 1540.526795][ T123] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:656)
> [ 1540.530224][ T123] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:584)
> [ 1540.533176][  T123]
> [ 1540.534710][  T123] 1 lock held by systemd-udevd/123:
> [ 1540.537977][ T123] #0: ffff00000609b960 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> [ 1540.544892][  T123] irq event stamp: 216564
> [ 1540.547603][ T123] hardirqs last enabled at (216564): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/mm/page_alloc.c:5302)
> [ 1540.554368][ T123] hardirqs last disabled at (216563): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:103 (discriminator 4))
> [ 1540.561107][ T123] softirqs last enabled at (216533): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1780)
> [ 1540.567833][ T123] softirqs last disabled at (216531): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1773)
> [ 1540.574496][  T123]
> [ 1540.576050][  T123] Reported by Kernel Concurrency Sanitizer on:
> [ 1540.587925][  T123] Hardware name: linux,dummy-virt (DT)
> [ 1540.591282][  T123] ==================================================================
>
>
> Reverting this patch I don't see the data race anymore.
>
> Cheers,
> Anders

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2022-12-22 23:11       ` Ian Kent
@ 2022-12-29  9:20         ` Arnd Bergmann
  2022-12-29 13:07           ` Ian Kent
  0 siblings, 1 reply; 25+ messages in thread
From: Arnd Bergmann @ 2022-12-29  9:20 UTC (permalink / raw)
  To: Ian Kent, Anders Roxell, Tejun Heo
  Cc: Greg Kroah-Hartman, Minchan Kim, Eric Sandeen, Alexander Viro,
	Rick Lindsley, David Howells, Miklos Szeredi, Carlos Maiolino,
	linux-fsdevel, Kernel Mailing List, elver

On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote:
> On 21/12/22 21:34, Anders Roxell wrote:
>> On 2022-10-31 12:30, Tejun Heo wrote:
>>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote:
>>>> The kernfs write lock is held when the kernfs node inode attributes
>>>> are updated. Therefore, when either kernfs_iop_getattr() or
>>>> kernfs_iop_permission() are called the kernfs node inode attributes
>>>> won't change.
>>>>
>>>> Consequently concurrent kernfs_refresh_inode() calls always copy the
>>>> same values from the kernfs node.
>>>>
>>>> So there's no need to take the inode i_lock to get consistent values
>>>> for generic_fillattr() and generic_permission(), the kernfs read lock
>>>> is sufficient.
>>>>
>>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>> Acked-by: Tejun Heo <tj@kernel.org>
>> Hi,
>>
>> Building an allmodconfig arm64 kernel on yesterdays next-20221220 and
>> booting that in qemu I see the following "BUG: KCSAN: data-race in
>> set_nlink / set_nlink".
>
>
> I'll check if I missed any places where set_link() could be
> called where the link count could be different.
>
>
> If there aren't any the question will then be can writing the
> same value to this location in multiple concurrent threads
> corrupt it?

I think the race that is getting reported for set_nlink()
is about this bit getting called simulatenously on multiple
CPUs with only the read lock held for the inode:

     /* Yes, some filesystems do change nlink from zero to one */
     if (inode->i_nlink == 0)
               atomic_long_dec(&inode->i_sb->s_remove_count);
     inode->__i_nlink = nlink;

Since i_nlink and __i_nlink refer to the same memory location,
the 'inode->i_nlink == 0' check can be true for all of them
before the nonzero nlink value gets set, and this results in
s_remove_count being decremented more than once.

      Arnd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2022-12-29  9:20         ` Arnd Bergmann
@ 2022-12-29 13:07           ` Ian Kent
  2023-01-23  3:11             ` Ian Kent
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Kent @ 2022-12-29 13:07 UTC (permalink / raw)
  To: Arnd Bergmann, Anders Roxell, Tejun Heo
  Cc: Greg Kroah-Hartman, Minchan Kim, Eric Sandeen, Alexander Viro,
	Rick Lindsley, David Howells, Miklos Szeredi, Carlos Maiolino,
	linux-fsdevel, Kernel Mailing List, elver


On 29/12/22 17:20, Arnd Bergmann wrote:
> On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote:
>> On 21/12/22 21:34, Anders Roxell wrote:
>>> On 2022-10-31 12:30, Tejun Heo wrote:
>>>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote:
>>>>> The kernfs write lock is held when the kernfs node inode attributes
>>>>> are updated. Therefore, when either kernfs_iop_getattr() or
>>>>> kernfs_iop_permission() are called the kernfs node inode attributes
>>>>> won't change.
>>>>>
>>>>> Consequently concurrent kernfs_refresh_inode() calls always copy the
>>>>> same values from the kernfs node.
>>>>>
>>>>> So there's no need to take the inode i_lock to get consistent values
>>>>> for generic_fillattr() and generic_permission(), the kernfs read lock
>>>>> is sufficient.
>>>>>
>>>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>>> Acked-by: Tejun Heo <tj@kernel.org>
>>> Hi,
>>>
>>> Building an allmodconfig arm64 kernel on yesterdays next-20221220 and
>>> booting that in qemu I see the following "BUG: KCSAN: data-race in
>>> set_nlink / set_nlink".
>>
>> I'll check if I missed any places where set_link() could be
>> called where the link count could be different.
>>
>>
>> If there aren't any the question will then be can writing the
>> same value to this location in multiple concurrent threads
>> corrupt it?
> I think the race that is getting reported for set_nlink()
> is about this bit getting called simulatenously on multiple
> CPUs with only the read lock held for the inode:
>
>       /* Yes, some filesystems do change nlink from zero to one */
>       if (inode->i_nlink == 0)
>                 atomic_long_dec(&inode->i_sb->s_remove_count);
>       inode->__i_nlink = nlink;
>
> Since i_nlink and __i_nlink refer to the same memory location,
> the 'inode->i_nlink == 0' check can be true for all of them
> before the nonzero nlink value gets set, and this results in
> s_remove_count being decremented more than once.


Thanks for the comment Arnd.


I'll certainly have a close look at that.

It will be a little while though, given the season, ;).


Ian


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2022-12-29 13:07           ` Ian Kent
@ 2023-01-23  3:11             ` Ian Kent
  2023-07-18 19:00               ` Anders Roxell
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Kent @ 2023-01-23  3:11 UTC (permalink / raw)
  To: Arnd Bergmann, Anders Roxell, Tejun Heo
  Cc: Greg Kroah-Hartman, Minchan Kim, Eric Sandeen, Alexander Viro,
	Rick Lindsley, David Howells, Miklos Szeredi, Carlos Maiolino,
	linux-fsdevel, Kernel Mailing List, elver


On 29/12/22 21:07, Ian Kent wrote:
>
> On 29/12/22 17:20, Arnd Bergmann wrote:
>> On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote:
>>> On 21/12/22 21:34, Anders Roxell wrote:
>>>> On 2022-10-31 12:30, Tejun Heo wrote:
>>>>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote:
>>>>>> The kernfs write lock is held when the kernfs node inode attributes
>>>>>> are updated. Therefore, when either kernfs_iop_getattr() or
>>>>>> kernfs_iop_permission() are called the kernfs node inode attributes
>>>>>> won't change.
>>>>>>
>>>>>> Consequently concurrent kernfs_refresh_inode() calls always copy the
>>>>>> same values from the kernfs node.
>>>>>>
>>>>>> So there's no need to take the inode i_lock to get consistent values
>>>>>> for generic_fillattr() and generic_permission(), the kernfs read 
>>>>>> lock
>>>>>> is sufficient.
>>>>>>
>>>>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>>>> Acked-by: Tejun Heo <tj@kernel.org>
>>>> Hi,
>>>>
>>>> Building an allmodconfig arm64 kernel on yesterdays next-20221220 and
>>>> booting that in qemu I see the following "BUG: KCSAN: data-race in
>>>> set_nlink / set_nlink".
>>>
>>> I'll check if I missed any places where set_link() could be
>>> called where the link count could be different.
>>>
>>>
>>> If there aren't any the question will then be can writing the
>>> same value to this location in multiple concurrent threads
>>> corrupt it?
>> I think the race that is getting reported for set_nlink()
>> is about this bit getting called simulatenously on multiple
>> CPUs with only the read lock held for the inode:
>>
>>       /* Yes, some filesystems do change nlink from zero to one */
>>       if (inode->i_nlink == 0)
>> atomic_long_dec(&inode->i_sb->s_remove_count);
>>       inode->__i_nlink = nlink;
>>
>> Since i_nlink and __i_nlink refer to the same memory location,
>> the 'inode->i_nlink == 0' check can be true for all of them
>> before the nonzero nlink value gets set, and this results in
>> s_remove_count being decremented more than once.
>
>
> Thanks for the comment Arnd.


Hello all,


I've been looking at this and after consulting Miklos and his pointing

out that it looks like a false positive the urgency dropped off a bit. So

apologies for taking so long to report back.


Anyway it needs some description of conclusions reached so far.


I'm still looking around but in short, kernfs will set directories to <# of

directory entries> + 2 unconditionally for directories. I can't yet find

any other places where i_nlink is set or changed and if there are none

then i_nlink will never be set to zero so the race should not occur.


Consequently my claim is this is a real false positive.


There are the file system operations that may be passed at mount time

but given the way kernfs sets i_nlink it pretty much dictates those 
operations

(if there were any that modify it and there don't appear to be any) leave it

alone.


So it just doesn't make sense for users of kernfs to fiddle with i_nlink ...


Ian






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-01-23  3:11             ` Ian Kent
@ 2023-07-18 19:00               ` Anders Roxell
  2023-07-19  4:23                 ` Ian Kent
  0 siblings, 1 reply; 25+ messages in thread
From: Anders Roxell @ 2023-07-18 19:00 UTC (permalink / raw)
  To: Ian Kent
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver, imran.f.khan

On 2023-01-23 11:11, Ian Kent wrote:
> 
> On 29/12/22 21:07, Ian Kent wrote:
> > 
> > On 29/12/22 17:20, Arnd Bergmann wrote:
> > > On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote:
> > > > On 21/12/22 21:34, Anders Roxell wrote:
> > > > > On 2022-10-31 12:30, Tejun Heo wrote:
> > > > > > On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote:
> > > > > > > The kernfs write lock is held when the kernfs node inode attributes
> > > > > > > are updated. Therefore, when either kernfs_iop_getattr() or
> > > > > > > kernfs_iop_permission() are called the kernfs node inode attributes
> > > > > > > won't change.
> > > > > > > 
> > > > > > > Consequently concurrent kernfs_refresh_inode() calls always copy the
> > > > > > > same values from the kernfs node.
> > > > > > > 
> > > > > > > So there's no need to take the inode i_lock to get consistent values
> > > > > > > for generic_fillattr() and generic_permission(), the
> > > > > > > kernfs read lock
> > > > > > > is sufficient.
> > > > > > > 
> > > > > > > Signed-off-by: Ian Kent <raven@themaw.net>
> > > > > > Acked-by: Tejun Heo <tj@kernel.org>
> > > > > Hi,
> > > > > 
> > > > > Building an allmodconfig arm64 kernel on yesterdays next-20221220 and
> > > > > booting that in qemu I see the following "BUG: KCSAN: data-race in
> > > > > set_nlink / set_nlink".
> > > > 
> > > > I'll check if I missed any places where set_link() could be
> > > > called where the link count could be different.
> > > > 
> > > > 
> > > > If there aren't any the question will then be can writing the
> > > > same value to this location in multiple concurrent threads
> > > > corrupt it?
> > > I think the race that is getting reported for set_nlink()
> > > is about this bit getting called simulatenously on multiple
> > > CPUs with only the read lock held for the inode:
> > > 
> > >       /* Yes, some filesystems do change nlink from zero to one */
> > >       if (inode->i_nlink == 0)
> > > atomic_long_dec(&inode->i_sb->s_remove_count);
> > >       inode->__i_nlink = nlink;
> > > 
> > > Since i_nlink and __i_nlink refer to the same memory location,
> > > the 'inode->i_nlink == 0' check can be true for all of them
> > > before the nonzero nlink value gets set, and this results in
> > > s_remove_count being decremented more than once.
> > 
> > 
> > Thanks for the comment Arnd.
> 
> 
> Hello all,
> 
> 
> I've been looking at this and after consulting Miklos and his pointing
> 
> out that it looks like a false positive the urgency dropped off a bit. So
> 
> apologies for taking so long to report back.
> 
> 
> Anyway it needs some description of conclusions reached so far.
> 
> 
> I'm still looking around but in short, kernfs will set directories to <# of
> 
> directory entries> + 2 unconditionally for directories. I can't yet find
> 
> any other places where i_nlink is set or changed and if there are none
> 
> then i_nlink will never be set to zero so the race should not occur.
> 
> 
> Consequently my claim is this is a real false positive.
> 
> 
> There are the file system operations that may be passed at mount time
> 
> but given the way kernfs sets i_nlink it pretty much dictates those
> operations
> 
> (if there were any that modify it and there don't appear to be any) leave it
> 
> alone.
> 
> 
> So it just doesn't make sense for users of kernfs to fiddle with i_nlink ...

On todays next tag, next-20230718 this KCSAN BUG poped up again. When I
built an allmodconfig arm64 kernel and booted it in QEMU. Full log can
be found http://ix.io/4AUd

[ 1694.987789][  T137] BUG: KCSAN: data-race in inode_permission / kernfs_refresh_inode
[ 1694.992912][  T137] 
[ 1694.994532][  T137] write to 0xffff00000bab6070 of 2 bytes by task 104 on cpu 0:
[ 1694.999269][ T137] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:171) 
[ 1695.002707][ T137] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) 
[ 1695.006148][ T137] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528) 
[ 1695.009420][ T137] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) 
[ 1695.012643][ T137] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) 
[ 1695.015781][ T137] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508) 
[ 1695.019059][ T137] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:238) 
[ 1695.022024][ T137] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276) 
[ 1695.025067][ T137] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446) 
[ 1695.028497][ T137] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440) 
[ 1695.032080][ T137] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) 
[ 1695.035916][ T137] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) 
[ 1695.038796][ T137] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) 
[ 1695.041468][ T137] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) 
[ 1695.044889][ T137] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) 
[ 1695.047904][  T137] 
[ 1695.049511][  T137] 1 lock held by systemd-udevd/104:
[ 1695.052837][ T137] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) 
[ 1695.060241][  T137] irq event stamp: 82902
[ 1695.063006][ T137] hardirqs last enabled at (82901): _raw_spin_unlock_irqrestore (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative-macros.h:250 /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27 /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140 /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151 /home/anders/src/kernel/next/kernel/locking/spinlock.c:194) 
[ 1695.069673][ T137] hardirqs last disabled at (82902): el1_interrupt (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) 
[ 1695.075474][ T137] softirqs last enabled at (82792): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) 
[ 1695.082319][ T137] softirqs last disabled at (82790): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) 
[ 1695.089049][  T137] 
[ 1695.090659][  T137] read to 0xffff00000bab6070 of 2 bytes by task 137 on cpu 0:
[ 1695.095374][ T137] inode_permission (/home/anders/src/kernel/next/fs/namei.c:532) 
[ 1695.098655][ T137] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) 
[ 1695.101857][ T137] path_openat (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2)) 
[ 1695.104885][ T137] do_filp_open (/home/anders/src/kernel/next/fs/namei.c:3820) 
[ 1695.108006][ T137] do_sys_openat2 (/home/anders/src/kernel/next/fs/open.c:1418) 
[ 1695.111290][ T137] __arm64_sys_openat (/home/anders/src/kernel/next/fs/open.c:1433) 
[ 1695.114825][ T137] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) 
[ 1695.118662][ T137] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) 
[ 1695.121555][ T137] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) 
[ 1695.124207][ T137] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) 
[ 1695.127590][ T137] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) 
[ 1695.130641][  T137] 
[ 1695.132241][  T137] no locks held by systemd-udevd/137.
[ 1695.135618][  T137] irq event stamp: 3246
[ 1695.138519][ T137] hardirqs last enabled at (3245): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105) 
[ 1695.145825][ T137] hardirqs last disabled at (3246): el1_interrupt (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) 
[ 1695.151942][ T137] softirqs last enabled at (3208): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) 
[ 1695.158950][ T137] softirqs last disabled at (3206): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) 
[ 1695.166036][  T137] 
[ 1695.167621][  T137] Reported by Kernel Concurrency Sanitizer on:
[ 1695.179990][  T137] Hardware name: linux,dummy-virt (DT)
[ 1695.183687][  T137] ==================================================================

[...]

[ 1738.053819][  T104] BUG: KCSAN: data-race in set_nlink / set_nlink
[ 1738.058223][  T104] 
[ 1738.059865][  T104] read to 0xffff00000bab6918 of 4 bytes by task 108 on cpu 0:
[ 1738.064916][ T104] set_nlink (/home/anders/src/kernel/next/fs/inode.c:369) 
[ 1738.067845][ T104] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) 
[ 1738.071607][ T104] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) 
[ 1738.075467][ T104] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528) 
[ 1738.078868][ T104] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) 
[ 1738.082270][ T104] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) 
[ 1738.085488][ T104] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508) 
[ 1738.089101][ T104] user_path_at_empty (/home/anders/src/kernel/next/fs/namei.c:2907) 
[ 1738.092469][ T104] do_readlinkat (/home/anders/src/kernel/next/fs/stat.c:477) 
[ 1738.095970][ T104] __arm64_sys_readlinkat (/home/anders/src/kernel/next/fs/stat.c:504 /home/anders/src/kernel/next/fs/stat.c:501 /home/anders/src/kernel/next/fs/stat.c:501) 
[ 1738.099529][ T104] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) 
[ 1738.103696][ T104] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) 
[ 1738.106560][ T104] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) 
[ 1738.109613][ T104] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) 
[ 1738.113035][ T104] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) 
[ 1738.116346][  T104] 
[ 1738.117924][  T104] 1 lock held by systemd-udevd/108:
[ 1738.121580][ T104] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) 
[ 1738.129355][  T104] irq event stamp: 31000
[ 1738.132088][ T104] hardirqs last enabled at (31000): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105) 
[ 1738.139417][ T104] hardirqs last disabled at (30999): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:104) 
[ 1738.146781][ T104] softirqs last enabled at (30973): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) 
[ 1738.153891][ T104] softirqs last disabled at (30971): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) 
[ 1738.161012][  T104] 
[ 1738.162663][  T104] write to 0xffff00000bab6918 of 4 bytes by task 104 on cpu 0:
[ 1738.167730][ T104] set_nlink (/home/anders/src/kernel/next/fs/inode.c:372) 
[ 1738.170559][ T104] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) 
[ 1738.174355][ T104] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) 
[ 1738.177829][ T104] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528) 
[ 1738.181403][ T104] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) 
[ 1738.184738][ T104] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) 
[ 1738.188268][ T104] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508) 
[ 1738.191865][ T104] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:238) 
[ 1738.196236][ T104] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276) 
[ 1738.200120][ T104] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446) 
[ 1738.204095][ T104] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440) 
[ 1738.207676][ T104] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) 
[ 1738.211820][ T104] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) 
[ 1738.214815][ T104] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) 
[ 1738.217709][ T104] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) 
[ 1738.221239][ T104] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) 
[ 1738.224502][  T104] 
[ 1738.226090][  T104] 1 lock held by systemd-udevd/104:
[ 1738.229747][ T104] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) 
[ 1738.237504][  T104] irq event stamp: 108353
[ 1738.240262][ T104] hardirqs last enabled at (108353): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105) 
[ 1738.247443][ T104] hardirqs last disabled at (108352): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:104) 
[ 1738.254510][ T104] softirqs last enabled at (108326): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) 
[ 1738.262187][ T104] softirqs last disabled at (108324): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) 
[ 1738.270239][  T104] 
[ 1738.272140][  T104] Reported by Kernel Concurrency Sanitizer on:
[ 1738.285185][  T104] Hardware name: linux,dummy-virt (DT)
[ 1738.288703][  T104] ==================================================================


Cheers,
Anders

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-18 19:00               ` Anders Roxell
@ 2023-07-19  4:23                 ` Ian Kent
  2023-07-20  2:03                   ` Ian Kent
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Kent @ 2023-07-19  4:23 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver, imran.f.khan

On 19/7/23 03:00, Anders Roxell wrote:
> On 2023-01-23 11:11, Ian Kent wrote:
>> On 29/12/22 21:07, Ian Kent wrote:
>>> On 29/12/22 17:20, Arnd Bergmann wrote:
>>>> On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote:
>>>>> On 21/12/22 21:34, Anders Roxell wrote:
>>>>>> On 2022-10-31 12:30, Tejun Heo wrote:
>>>>>>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote:
>>>>>>>> The kernfs write lock is held when the kernfs node inode attributes
>>>>>>>> are updated. Therefore, when either kernfs_iop_getattr() or
>>>>>>>> kernfs_iop_permission() are called the kernfs node inode attributes
>>>>>>>> won't change.
>>>>>>>>
>>>>>>>> Consequently concurrent kernfs_refresh_inode() calls always copy the
>>>>>>>> same values from the kernfs node.
>>>>>>>>
>>>>>>>> So there's no need to take the inode i_lock to get consistent values
>>>>>>>> for generic_fillattr() and generic_permission(), the
>>>>>>>> kernfs read lock
>>>>>>>> is sufficient.
>>>>>>>>
>>>>>>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>>>>>> Acked-by: Tejun Heo <tj@kernel.org>
>>>>>> Hi,
>>>>>>
>>>>>> Building an allmodconfig arm64 kernel on yesterdays next-20221220 and
>>>>>> booting that in qemu I see the following "BUG: KCSAN: data-race in
>>>>>> set_nlink / set_nlink".
>>>>> I'll check if I missed any places where set_link() could be
>>>>> called where the link count could be different.
>>>>>
>>>>>
>>>>> If there aren't any the question will then be can writing the
>>>>> same value to this location in multiple concurrent threads
>>>>> corrupt it?
>>>> I think the race that is getting reported for set_nlink()
>>>> is about this bit getting called simulatenously on multiple
>>>> CPUs with only the read lock held for the inode:
>>>>
>>>>        /* Yes, some filesystems do change nlink from zero to one */
>>>>        if (inode->i_nlink == 0)
>>>> atomic_long_dec(&inode->i_sb->s_remove_count);
>>>>        inode->__i_nlink = nlink;
>>>>
>>>> Since i_nlink and __i_nlink refer to the same memory location,
>>>> the 'inode->i_nlink == 0' check can be true for all of them
>>>> before the nonzero nlink value gets set, and this results in
>>>> s_remove_count being decremented more than once.
>>>
>>> Thanks for the comment Arnd.
>>
>> Hello all,
>>
>>
>> I've been looking at this and after consulting Miklos and his pointing
>>
>> out that it looks like a false positive the urgency dropped off a bit. So
>>
>> apologies for taking so long to report back.
>>
>>
>> Anyway it needs some description of conclusions reached so far.
>>
>>
>> I'm still looking around but in short, kernfs will set directories to <# of
>>
>> directory entries> + 2 unconditionally for directories. I can't yet find
>>
>> any other places where i_nlink is set or changed and if there are none
>>
>> then i_nlink will never be set to zero so the race should not occur.
>>
>>
>> Consequently my claim is this is a real false positive.
>>
>>
>> There are the file system operations that may be passed at mount time
>>
>> but given the way kernfs sets i_nlink it pretty much dictates those
>> operations
>>
>> (if there were any that modify it and there don't appear to be any) leave it
>>
>> alone.
>>
>>
>> So it just doesn't make sense for users of kernfs to fiddle with i_nlink ...
> On todays next tag, next-20230718 this KCSAN BUG poped up again. When I
> built an allmodconfig arm64 kernel and booted it in QEMU. Full log can
> be found http://ix.io/4AUd
>
> [ 1694.987789][  T137] BUG: KCSAN: data-race in inode_permission / kernfs_refresh_inode
> [ 1694.992912][  T137]
> [ 1694.994532][  T137] write to 0xffff00000bab6070 of 2 bytes by task 104 on cpu 0:
> [ 1694.999269][ T137] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:171)
> [ 1695.002707][ T137] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> [ 1695.006148][ T137] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528)
> [ 1695.009420][ T137] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267)
> [ 1695.012643][ T137] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
> [ 1695.015781][ T137] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508)
> [ 1695.019059][ T137] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:238)
> [ 1695.022024][ T137] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276)
> [ 1695.025067][ T137] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446)
> [ 1695.028497][ T137] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440)
> [ 1695.032080][ T137] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> [ 1695.035916][ T137] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> [ 1695.038796][ T137] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> [ 1695.041468][ T137] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> [ 1695.044889][ T137] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> [ 1695.047904][  T137]
> [ 1695.049511][  T137] 1 lock held by systemd-udevd/104:
> [ 1695.052837][ T137] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
> [ 1695.060241][  T137] irq event stamp: 82902
> [ 1695.063006][ T137] hardirqs last enabled at (82901): _raw_spin_unlock_irqrestore (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative-macros.h:250 /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27 /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140 /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151 /home/anders/src/kernel/next/kernel/locking/spinlock.c:194)
> [ 1695.069673][ T137] hardirqs last disabled at (82902): el1_interrupt (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488)
> [ 1695.075474][ T137] softirqs last enabled at (82792): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> [ 1695.082319][ T137] softirqs last disabled at (82790): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> [ 1695.089049][  T137]
> [ 1695.090659][  T137] read to 0xffff00000bab6070 of 2 bytes by task 137 on cpu 0:
> [ 1695.095374][ T137] inode_permission (/home/anders/src/kernel/next/fs/namei.c:532)
> [ 1695.098655][ T137] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267)
> [ 1695.101857][ T137] path_openat (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2))
> [ 1695.104885][ T137] do_filp_open (/home/anders/src/kernel/next/fs/namei.c:3820)
> [ 1695.108006][ T137] do_sys_openat2 (/home/anders/src/kernel/next/fs/open.c:1418)
> [ 1695.111290][ T137] __arm64_sys_openat (/home/anders/src/kernel/next/fs/open.c:1433)
> [ 1695.114825][ T137] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> [ 1695.118662][ T137] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> [ 1695.121555][ T137] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> [ 1695.124207][ T137] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> [ 1695.127590][ T137] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> [ 1695.130641][  T137]
> [ 1695.132241][  T137] no locks held by systemd-udevd/137.
> [ 1695.135618][  T137] irq event stamp: 3246
> [ 1695.138519][ T137] hardirqs last enabled at (3245): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
> [ 1695.145825][ T137] hardirqs last disabled at (3246): el1_interrupt (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488)
> [ 1695.151942][ T137] softirqs last enabled at (3208): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> [ 1695.158950][ T137] softirqs last disabled at (3206): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> [ 1695.166036][  T137]
> [ 1695.167621][  T137] Reported by Kernel Concurrency Sanitizer on:
> [ 1695.179990][  T137] Hardware name: linux,dummy-virt (DT)
> [ 1695.183687][  T137] ==================================================================


This one is different to the original.


I can't see where the problem is here, can someone help me out

please.


I don't see any shared data values used by the call

devcgroup_inode_permission(inode, mask) in fs/namei.c:inode_permission()

that might be a problem during the read except possibly inode->i_mode.


I'll check on that ... maybe something's been missed when kernfs_rwsem

was changed to a separate lock (kernfs_iattr_rwsem).


>
> [...]
>
> [ 1738.053819][  T104] BUG: KCSAN: data-race in set_nlink / set_nlink
> [ 1738.058223][  T104]
> [ 1738.059865][  T104] read to 0xffff00000bab6918 of 4 bytes by task 108 on cpu 0:
> [ 1738.064916][ T104] set_nlink (/home/anders/src/kernel/next/fs/inode.c:369)
> [ 1738.067845][ T104] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:180)
> [ 1738.071607][ T104] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> [ 1738.075467][ T104] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528)
> [ 1738.078868][ T104] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267)
> [ 1738.082270][ T104] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
> [ 1738.085488][ T104] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508)
> [ 1738.089101][ T104] user_path_at_empty (/home/anders/src/kernel/next/fs/namei.c:2907)
> [ 1738.092469][ T104] do_readlinkat (/home/anders/src/kernel/next/fs/stat.c:477)
> [ 1738.095970][ T104] __arm64_sys_readlinkat (/home/anders/src/kernel/next/fs/stat.c:504 /home/anders/src/kernel/next/fs/stat.c:501 /home/anders/src/kernel/next/fs/stat.c:501)
> [ 1738.099529][ T104] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> [ 1738.103696][ T104] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> [ 1738.106560][ T104] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> [ 1738.109613][ T104] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> [ 1738.113035][ T104] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> [ 1738.116346][  T104]
> [ 1738.117924][  T104] 1 lock held by systemd-udevd/108:
> [ 1738.121580][ T104] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
> [ 1738.129355][  T104] irq event stamp: 31000
> [ 1738.132088][ T104] hardirqs last enabled at (31000): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
> [ 1738.139417][ T104] hardirqs last disabled at (30999): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:104)
> [ 1738.146781][ T104] softirqs last enabled at (30973): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> [ 1738.153891][ T104] softirqs last disabled at (30971): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> [ 1738.161012][  T104]
> [ 1738.162663][  T104] write to 0xffff00000bab6918 of 4 bytes by task 104 on cpu 0:
> [ 1738.167730][ T104] set_nlink (/home/anders/src/kernel/next/fs/inode.c:372)
> [ 1738.170559][ T104] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:180)
> [ 1738.174355][ T104] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> [ 1738.177829][ T104] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528)
> [ 1738.181403][ T104] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267)
> [ 1738.184738][ T104] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
> [ 1738.188268][ T104] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508)
> [ 1738.191865][ T104] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:238)
> [ 1738.196236][ T104] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276)
> [ 1738.200120][ T104] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446)
> [ 1738.204095][ T104] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440)
> [ 1738.207676][ T104] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> [ 1738.211820][ T104] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> [ 1738.214815][ T104] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> [ 1738.217709][ T104] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> [ 1738.221239][ T104] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> [ 1738.224502][  T104]
> [ 1738.226090][  T104] 1 lock held by systemd-udevd/104:
> [ 1738.229747][ T104] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
> [ 1738.237504][  T104] irq event stamp: 108353
> [ 1738.240262][ T104] hardirqs last enabled at (108353): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
> [ 1738.247443][ T104] hardirqs last disabled at (108352): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:104)
> [ 1738.254510][ T104] softirqs last enabled at (108326): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> [ 1738.262187][ T104] softirqs last disabled at (108324): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> [ 1738.270239][  T104]
> [ 1738.272140][  T104] Reported by Kernel Concurrency Sanitizer on:
> [ 1738.285185][  T104] Hardware name: linux,dummy-virt (DT)
> [ 1738.288703][  T104] ==================================================================

This looks just like the original except a different lock is being

used now.


The link count can't be less than two if set_nlink() is called.


Maybe I am missing something but the directory count is changed only

while holding the root->kernfs_iattr_rwsem so the value used by

set_nlink() will not change during concurrent calls to refresh_inode().

Still looks like a false positive, I'll check the write operations
again just to be sure.

Ian


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-19  4:23                 ` Ian Kent
@ 2023-07-20  2:03                   ` Ian Kent
  2023-07-26 13:49                     ` Miklos Szeredi
  2023-07-27  0:38                     ` Ian Kent
  0 siblings, 2 replies; 25+ messages in thread
From: Ian Kent @ 2023-07-20  2:03 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver, imran.f.khan

On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
> On 19/7/23 03:00, Anders Roxell wrote:
> > On 2023-01-23 11:11, Ian Kent wrote:
> > > On 29/12/22 21:07, Ian Kent wrote:
> > > > On 29/12/22 17:20, Arnd Bergmann wrote:
> > > > > On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote:
> > > > > > On 21/12/22 21:34, Anders Roxell wrote:
> > > > > > > On 2022-10-31 12:30, Tejun Heo wrote:
> > > > > > > > On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent
> > > > > > > > wrote:
> > > > > > > > > The kernfs write lock is held when the kernfs node
> > > > > > > > > inode attributes
> > > > > > > > > are updated. Therefore, when either
> > > > > > > > > kernfs_iop_getattr() or
> > > > > > > > > kernfs_iop_permission() are called the kernfs node
> > > > > > > > > inode attributes
> > > > > > > > > won't change.
> > > > > > > > > 
> > > > > > > > > Consequently concurrent kernfs_refresh_inode() calls
> > > > > > > > > always copy the
> > > > > > > > > same values from the kernfs node.
> > > > > > > > > 
> > > > > > > > > So there's no need to take the inode i_lock to get
> > > > > > > > > consistent values
> > > > > > > > > for generic_fillattr() and generic_permission(), the
> > > > > > > > > kernfs read lock
> > > > > > > > > is sufficient.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Ian Kent <raven@themaw.net>
> > > > > > > > Acked-by: Tejun Heo <tj@kernel.org>
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Building an allmodconfig arm64 kernel on yesterdays next-
> > > > > > > 20221220 and
> > > > > > > booting that in qemu I see the following "BUG: KCSAN:
> > > > > > > data-race in
> > > > > > > set_nlink / set_nlink".
> > > > > > I'll check if I missed any places where set_link() could be
> > > > > > called where the link count could be different.
> > > > > > 
> > > > > > 
> > > > > > If there aren't any the question will then be can writing
> > > > > > the
> > > > > > same value to this location in multiple concurrent threads
> > > > > > corrupt it?
> > > > > I think the race that is getting reported for set_nlink()
> > > > > is about this bit getting called simulatenously on multiple
> > > > > CPUs with only the read lock held for the inode:
> > > > > 
> > > > >        /* Yes, some filesystems do change nlink from zero to
> > > > > one */
> > > > >        if (inode->i_nlink == 0)
> > > > > atomic_long_dec(&inode->i_sb->s_remove_count);
> > > > >        inode->__i_nlink = nlink;
> > > > > 
> > > > > Since i_nlink and __i_nlink refer to the same memory
> > > > > location,
> > > > > the 'inode->i_nlink == 0' check can be true for all of them
> > > > > before the nonzero nlink value gets set, and this results in
> > > > > s_remove_count being decremented more than once.
> > > > 
> > > > Thanks for the comment Arnd.
> > > 
> > > Hello all,
> > > 
> > > 
> > > I've been looking at this and after consulting Miklos and his
> > > pointing
> > > 
> > > out that it looks like a false positive the urgency dropped off a
> > > bit. So
> > > 
> > > apologies for taking so long to report back.
> > > 
> > > 
> > > Anyway it needs some description of conclusions reached so far.
> > > 
> > > 
> > > I'm still looking around but in short, kernfs will set
> > > directories to <# of
> > > 
> > > directory entries> + 2 unconditionally for directories. I can't
> > > yet find
> > > 
> > > any other places where i_nlink is set or changed and if there are
> > > none
> > > 
> > > then i_nlink will never be set to zero so the race should not
> > > occur.
> > > 
> > > 
> > > Consequently my claim is this is a real false positive.
> > > 
> > > 
> > > There are the file system operations that may be passed at mount
> > > time
> > > 
> > > but given the way kernfs sets i_nlink it pretty much dictates
> > > those
> > > operations
> > > 
> > > (if there were any that modify it and there don't appear to be
> > > any) leave it
> > > 
> > > alone.
> > > 
> > > 
> > > So it just doesn't make sense for users of kernfs to fiddle with
> > > i_nlink ...
> > On todays next tag, next-20230718 this KCSAN BUG poped up again.
> > When I
> > built an allmodconfig arm64 kernel and booted it in QEMU. Full log
> > can
> > be found http://ix.io/4AUd
> > 
> > [ 1694.987789][  T137] BUG: KCSAN: data-race in inode_permission /
> > kernfs_refresh_inode
> > [ 1694.992912][  T137]
> > [ 1694.994532][  T137] write to 0xffff00000bab6070 of 2 bytes by
> > task 104 on cpu 0:
> > [ 1694.999269][ T137] kernfs_refresh_inode
> > (/home/anders/src/kernel/next/fs/kernfs/inode.c:171)
> > [ 1695.002707][ T137] kernfs_iop_permission
> > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> > [ 1695.006148][ T137] inode_permission
> > (/home/anders/src/kernel/next/fs/namei.c:461
> > /home/anders/src/kernel/next/fs/namei.c:528)
> > [ 1695.009420][ T137] link_path_walk
> > (/home/anders/src/kernel/next/fs/namei.c:1720
> > /home/anders/src/kernel/next/fs/namei.c:2267)
> > [ 1695.012643][ T137] path_lookupat
> > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
> > [ 1695.015781][ T137] filename_lookup
> > (/home/anders/src/kernel/next/fs/namei.c:2508)
> > [ 1695.019059][ T137] vfs_statx
> > (/home/anders/src/kernel/next/fs/stat.c:238)
> > [ 1695.022024][ T137] vfs_fstatat
> > (/home/anders/src/kernel/next/fs/stat.c:276)
> > [ 1695.025067][ T137] __do_sys_newfstatat
> > (/home/anders/src/kernel/next/fs/stat.c:446)
> > [ 1695.028497][ T137] __arm64_sys_newfstatat
> > (/home/anders/src/kernel/next/fs/stat.c:440
> > /home/anders/src/kernel/next/fs/stat.c:440)
> > [ 1695.032080][ T137] el0_svc_common.constprop.0
> > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
> > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
> > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> > [ 1695.035916][ T137] do_el0_svc
> > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> > [ 1695.038796][ T137] el0_svc
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> > [ 1695.041468][ T137] el0t_64_sync_handler
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> > [ 1695.044889][ T137] el0t_64_sync
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> > [ 1695.047904][  T137]
> > [ 1695.049511][  T137] 1 lock held by systemd-udevd/104:
> > [ 1695.052837][ T137] #0: ffff000006681e08 (&root-
> > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission
> > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
> > [ 1695.060241][  T137] irq event stamp: 82902
> > [ 1695.063006][ T137] hardirqs last enabled at (82901):
> > _raw_spin_unlock_irqrestore
> > (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative-
> > macros.h:250
> > /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27
> > /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140
> > /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151
> > /home/anders/src/kernel/next/kernel/locking/spinlock.c:194)
> > [ 1695.069673][ T137] hardirqs last disabled at (82902):
> > el1_interrupt
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488)
> > [ 1695.075474][ T137] softirqs last enabled at (82792):
> > fpsimd_restore_current_state
> > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> > [ 1695.082319][ T137] softirqs last disabled at (82790):
> > fpsimd_restore_current_state
> > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> > [ 1695.089049][  T137]
> > [ 1695.090659][  T137] read to 0xffff00000bab6070 of 2 bytes by
> > task 137 on cpu 0:
> > [ 1695.095374][ T137] inode_permission
> > (/home/anders/src/kernel/next/fs/namei.c:532)
> > [ 1695.098655][ T137] link_path_walk
> > (/home/anders/src/kernel/next/fs/namei.c:1720
> > /home/anders/src/kernel/next/fs/namei.c:2267)
> > [ 1695.101857][ T137] path_openat
> > (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2))
> > [ 1695.104885][ T137] do_filp_open
> > (/home/anders/src/kernel/next/fs/namei.c:3820)
> > [ 1695.108006][ T137] do_sys_openat2
> > (/home/anders/src/kernel/next/fs/open.c:1418)
> > [ 1695.111290][ T137] __arm64_sys_openat
> > (/home/anders/src/kernel/next/fs/open.c:1433)
> > [ 1695.114825][ T137] el0_svc_common.constprop.0
> > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
> > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
> > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> > [ 1695.118662][ T137] do_el0_svc
> > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> > [ 1695.121555][ T137] el0_svc
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> > [ 1695.124207][ T137] el0t_64_sync_handler
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> > [ 1695.127590][ T137] el0t_64_sync
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> > [ 1695.130641][  T137]
> > [ 1695.132241][  T137] no locks held by systemd-udevd/137.
> > [ 1695.135618][  T137] irq event stamp: 3246
> > [ 1695.138519][ T137] hardirqs last enabled at (3245):
> > seqcount_lockdep_reader_access
> > (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
> > [ 1695.145825][ T137] hardirqs last disabled at (3246):
> > el1_interrupt
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488)
> > [ 1695.151942][ T137] softirqs last enabled at (3208):
> > fpsimd_restore_current_state
> > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> > [ 1695.158950][ T137] softirqs last disabled at (3206):
> > fpsimd_restore_current_state
> > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> > [ 1695.166036][  T137]
> > [ 1695.167621][  T137] Reported by Kernel Concurrency Sanitizer on:
> > [ 1695.179990][  T137] Hardware name: linux,dummy-virt (DT)
> > [ 1695.183687][  T137]
> > ==================================================================
> 
> 
> This one is different to the original.
> 
> 
> I can't see where the problem is here, can someone help me out
> 
> please.
> 
> 
> I don't see any shared data values used by the call
> 
> devcgroup_inode_permission(inode, mask) in
> fs/namei.c:inode_permission()
> 
> that might be a problem during the read except possibly inode-
> >i_mode.
> 
> 
> I'll check on that ... maybe something's been missed when
> kernfs_rwsem
> 
> was changed to a separate lock (kernfs_iattr_rwsem).
> 
> 
> > 
> > [...]
> > 
> > [ 1738.053819][  T104] BUG: KCSAN: data-race in set_nlink /
> > set_nlink
> > [ 1738.058223][  T104]
> > [ 1738.059865][  T104] read to 0xffff00000bab6918 of 4 bytes by
> > task 108 on cpu 0:
> > [ 1738.064916][ T104] set_nlink
> > (/home/anders/src/kernel/next/fs/inode.c:369)
> > [ 1738.067845][ T104] kernfs_refresh_inode
> > (/home/anders/src/kernel/next/fs/kernfs/inode.c:180)
> > [ 1738.071607][ T104] kernfs_iop_permission
> > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> > [ 1738.075467][ T104] inode_permission
> > (/home/anders/src/kernel/next/fs/namei.c:461
> > /home/anders/src/kernel/next/fs/namei.c:528)
> > [ 1738.078868][ T104] link_path_walk
> > (/home/anders/src/kernel/next/fs/namei.c:1720
> > /home/anders/src/kernel/next/fs/namei.c:2267)
> > [ 1738.082270][ T104] path_lookupat
> > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
> > [ 1738.085488][ T104] filename_lookup
> > (/home/anders/src/kernel/next/fs/namei.c:2508)
> > [ 1738.089101][ T104] user_path_at_empty
> > (/home/anders/src/kernel/next/fs/namei.c:2907)
> > [ 1738.092469][ T104] do_readlinkat
> > (/home/anders/src/kernel/next/fs/stat.c:477)
> > [ 1738.095970][ T104] __arm64_sys_readlinkat
> > (/home/anders/src/kernel/next/fs/stat.c:504
> > /home/anders/src/kernel/next/fs/stat.c:501
> > /home/anders/src/kernel/next/fs/stat.c:501)
> > [ 1738.099529][ T104] el0_svc_common.constprop.0
> > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
> > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
> > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> > [ 1738.103696][ T104] do_el0_svc
> > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> > [ 1738.106560][ T104] el0_svc
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> > [ 1738.109613][ T104] el0t_64_sync_handler
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> > [ 1738.113035][ T104] el0t_64_sync
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> > [ 1738.116346][  T104]
> > [ 1738.117924][  T104] 1 lock held by systemd-udevd/108:
> > [ 1738.121580][ T104] #0: ffff000006681e08 (&root-
> > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission
> > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
> > [ 1738.129355][  T104] irq event stamp: 31000
> > [ 1738.132088][ T104] hardirqs last enabled at (31000):
> > seqcount_lockdep_reader_access
> > (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
> > [ 1738.139417][ T104] hardirqs last disabled at (30999):
> > seqcount_lockdep_reader_access
> > (/home/anders/src/kernel/next/include/linux/seqlock.h:104)
> > [ 1738.146781][ T104] softirqs last enabled at (30973):
> > fpsimd_restore_current_state
> > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> > [ 1738.153891][ T104] softirqs last disabled at (30971):
> > fpsimd_restore_current_state
> > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> > [ 1738.161012][  T104]
> > [ 1738.162663][  T104] write to 0xffff00000bab6918 of 4 bytes by
> > task 104 on cpu 0:
> > [ 1738.167730][ T104] set_nlink
> > (/home/anders/src/kernel/next/fs/inode.c:372)
> > [ 1738.170559][ T104] kernfs_refresh_inode
> > (/home/anders/src/kernel/next/fs/kernfs/inode.c:180)
> > [ 1738.174355][ T104] kernfs_iop_permission
> > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> > [ 1738.177829][ T104] inode_permission
> > (/home/anders/src/kernel/next/fs/namei.c:461
> > /home/anders/src/kernel/next/fs/namei.c:528)
> > [ 1738.181403][ T104] link_path_walk
> > (/home/anders/src/kernel/next/fs/namei.c:1720
> > /home/anders/src/kernel/next/fs/namei.c:2267)
> > [ 1738.184738][ T104] path_lookupat
> > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
> > [ 1738.188268][ T104] filename_lookup
> > (/home/anders/src/kernel/next/fs/namei.c:2508)
> > [ 1738.191865][ T104] vfs_statx
> > (/home/anders/src/kernel/next/fs/stat.c:238)
> > [ 1738.196236][ T104] vfs_fstatat
> > (/home/anders/src/kernel/next/fs/stat.c:276)
> > [ 1738.200120][ T104] __do_sys_newfstatat
> > (/home/anders/src/kernel/next/fs/stat.c:446)
> > [ 1738.204095][ T104] __arm64_sys_newfstatat
> > (/home/anders/src/kernel/next/fs/stat.c:440
> > /home/anders/src/kernel/next/fs/stat.c:440)
> > [ 1738.207676][ T104] el0_svc_common.constprop.0
> > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
> > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
> > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> > [ 1738.211820][ T104] do_el0_svc
> > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> > [ 1738.214815][ T104] el0_svc
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
> > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> > [ 1738.217709][ T104] el0t_64_sync_handler
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> > [ 1738.221239][ T104] el0t_64_sync
> > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> > [ 1738.224502][  T104]
> > [ 1738.226090][  T104] 1 lock held by systemd-udevd/104:
> > [ 1738.229747][ T104] #0: ffff000006681e08 (&root-
> > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission
> > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
> > [ 1738.237504][  T104] irq event stamp: 108353
> > [ 1738.240262][ T104] hardirqs last enabled at (108353):
> > seqcount_lockdep_reader_access
> > (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
> > [ 1738.247443][ T104] hardirqs last disabled at (108352):
> > seqcount_lockdep_reader_access
> > (/home/anders/src/kernel/next/include/linux/seqlock.h:104)
> > [ 1738.254510][ T104] softirqs last enabled at (108326):
> > fpsimd_restore_current_state
> > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> > [ 1738.262187][ T104] softirqs last disabled at (108324):
> > fpsimd_restore_current_state
> > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
> > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> > [ 1738.270239][  T104]
> > [ 1738.272140][  T104] Reported by Kernel Concurrency Sanitizer on:
> > [ 1738.285185][  T104] Hardware name: linux,dummy-virt (DT)
> > [ 1738.288703][  T104]
> > ==================================================================
> 
> This looks just like the original except a different lock is being
> 
> used now.
> 
> 
> The link count can't be less than two if set_nlink() is called.
> 
> 
> Maybe I am missing something but the directory count is changed only
> 
> while holding the root->kernfs_iattr_rwsem so the value used by
> 
> set_nlink() will not change during concurrent calls to
> refresh_inode().
> 
> Still looks like a false positive, I'll check the write operations
> again just to be sure.

I do see a problem with recent changes.

I'll send this off to Greg after I've done some testing (primarily just
compile and function).

Here's a patch which describes what I found.

Comments are, of course, welcome, ;)

kernfs: fix missing kernfs_iattr_rwsem locking

From: Ian Kent <raven@themaw.net>

When the kernfs_iattr_rwsem was introduced a case was missed.

The update of the kernfs directory node child count was also protected
by the kernfs_rwsem and needs to be included in the change so that the
child count (and so the inode n_link attribute) does not change while
holding the rwsem for read.

Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
attributes)

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Eric Sandeen <sandeen@sandeen.net>
---
 fs/kernfs/dir.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 45b6919903e6..6e84bb69602e 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
*kn)
 	rb_insert_color(&kn->rb, &kn->parent->dir.children);
 
 	/* successfully added, account subdir number */
+	down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
 	if (kernfs_type(kn) == KERNFS_DIR)
 		kn->parent->dir.subdirs++;
 	kernfs_inc_rev(kn->parent);
+	up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
 
 	return 0;
 }
@@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
kernfs_node *kn)
 	if (RB_EMPTY_NODE(&kn->rb))
 		return false;
 
+	down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
 	if (kernfs_type(kn) == KERNFS_DIR)
 		kn->parent->dir.subdirs--;
 	kernfs_inc_rev(kn->parent);
+	up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
 
 	rb_erase(&kn->rb, &kn->parent->dir.children);
 	RB_CLEAR_NODE(&kn->rb);


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-20  2:03                   ` Ian Kent
@ 2023-07-26 13:49                     ` Miklos Szeredi
  2023-07-27  0:38                     ` Ian Kent
  1 sibling, 0 replies; 25+ messages in thread
From: Miklos Szeredi @ 2023-07-26 13:49 UTC (permalink / raw)
  To: Ian Kent
  Cc: Anders Roxell, Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman,
	Minchan Kim, Eric Sandeen, Alexander Viro, Rick Lindsley,
	David Howells, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver, imran.f.khan

On Thu, 20 Jul 2023 at 04:03, Ian Kent <raven@themaw.net> wrote:
>
> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
> > On 19/7/23 03:00, Anders Roxell wrote:
> > > On 2023-01-23 11:11, Ian Kent wrote:
> > > > On 29/12/22 21:07, Ian Kent wrote:
> > > > > On 29/12/22 17:20, Arnd Bergmann wrote:
> > > > > > On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote:
> > > > > > > On 21/12/22 21:34, Anders Roxell wrote:
> > > > > > > > On 2022-10-31 12:30, Tejun Heo wrote:
> > > > > > > > > On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent
> > > > > > > > > wrote:
> > > > > > > > > > The kernfs write lock is held when the kernfs node
> > > > > > > > > > inode attributes
> > > > > > > > > > are updated. Therefore, when either
> > > > > > > > > > kernfs_iop_getattr() or
> > > > > > > > > > kernfs_iop_permission() are called the kernfs node
> > > > > > > > > > inode attributes
> > > > > > > > > > won't change.
> > > > > > > > > >
> > > > > > > > > > Consequently concurrent kernfs_refresh_inode() calls
> > > > > > > > > > always copy the
> > > > > > > > > > same values from the kernfs node.
> > > > > > > > > >
> > > > > > > > > > So there's no need to take the inode i_lock to get
> > > > > > > > > > consistent values
> > > > > > > > > > for generic_fillattr() and generic_permission(), the
> > > > > > > > > > kernfs read lock
> > > > > > > > > > is sufficient.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Ian Kent <raven@themaw.net>
> > > > > > > > > Acked-by: Tejun Heo <tj@kernel.org>
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Building an allmodconfig arm64 kernel on yesterdays next-
> > > > > > > > 20221220 and
> > > > > > > > booting that in qemu I see the following "BUG: KCSAN:
> > > > > > > > data-race in
> > > > > > > > set_nlink / set_nlink".
> > > > > > > I'll check if I missed any places where set_link() could be
> > > > > > > called where the link count could be different.
> > > > > > >
> > > > > > >
> > > > > > > If there aren't any the question will then be can writing
> > > > > > > the
> > > > > > > same value to this location in multiple concurrent threads
> > > > > > > corrupt it?
> > > > > > I think the race that is getting reported for set_nlink()
> > > > > > is about this bit getting called simulatenously on multiple
> > > > > > CPUs with only the read lock held for the inode:
> > > > > >
> > > > > >        /* Yes, some filesystems do change nlink from zero to
> > > > > > one */
> > > > > >        if (inode->i_nlink == 0)
> > > > > > atomic_long_dec(&inode->i_sb->s_remove_count);
> > > > > >        inode->__i_nlink = nlink;
> > > > > >
> > > > > > Since i_nlink and __i_nlink refer to the same memory
> > > > > > location,
> > > > > > the 'inode->i_nlink == 0' check can be true for all of them
> > > > > > before the nonzero nlink value gets set, and this results in
> > > > > > s_remove_count being decremented more than once.
> > > > >
> > > > > Thanks for the comment Arnd.
> > > >
> > > > Hello all,
> > > >
> > > >
> > > > I've been looking at this and after consulting Miklos and his
> > > > pointing
> > > >
> > > > out that it looks like a false positive the urgency dropped off a
> > > > bit. So
> > > >
> > > > apologies for taking so long to report back.
> > > >
> > > >
> > > > Anyway it needs some description of conclusions reached so far.
> > > >
> > > >
> > > > I'm still looking around but in short, kernfs will set
> > > > directories to <# of
> > > >
> > > > directory entries> + 2 unconditionally for directories. I can't
> > > > yet find
> > > >
> > > > any other places where i_nlink is set or changed and if there are
> > > > none
> > > >
> > > > then i_nlink will never be set to zero so the race should not
> > > > occur.
> > > >
> > > >
> > > > Consequently my claim is this is a real false positive.
> > > >
> > > >
> > > > There are the file system operations that may be passed at mount
> > > > time
> > > >
> > > > but given the way kernfs sets i_nlink it pretty much dictates
> > > > those
> > > > operations
> > > >
> > > > (if there were any that modify it and there don't appear to be
> > > > any) leave it
> > > >
> > > > alone.
> > > >
> > > >
> > > > So it just doesn't make sense for users of kernfs to fiddle with
> > > > i_nlink ...
> > > On todays next tag, next-20230718 this KCSAN BUG poped up again.
> > > When I
> > > built an allmodconfig arm64 kernel and booted it in QEMU. Full log
> > > can
> > > be found http://ix.io/4AUd
> > >
> > > [ 1694.987789][  T137] BUG: KCSAN: data-race in inode_permission /
> > > kernfs_refresh_inode
> > > [ 1694.992912][  T137]
> > > [ 1694.994532][  T137] write to 0xffff00000bab6070 of 2 bytes by
> > > task 104 on cpu 0:
> > > [ 1694.999269][ T137] kernfs_refresh_inode
> > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:171)
> > > [ 1695.002707][ T137] kernfs_iop_permission
> > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> > > [ 1695.006148][ T137] inode_permission
> > > (/home/anders/src/kernel/next/fs/namei.c:461
> > > /home/anders/src/kernel/next/fs/namei.c:528)
> > > [ 1695.009420][ T137] link_path_walk
> > > (/home/anders/src/kernel/next/fs/namei.c:1720
> > > /home/anders/src/kernel/next/fs/namei.c:2267)
> > > [ 1695.012643][ T137] path_lookupat
> > > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
> > > [ 1695.015781][ T137] filename_lookup
> > > (/home/anders/src/kernel/next/fs/namei.c:2508)
> > > [ 1695.019059][ T137] vfs_statx
> > > (/home/anders/src/kernel/next/fs/stat.c:238)
> > > [ 1695.022024][ T137] vfs_fstatat
> > > (/home/anders/src/kernel/next/fs/stat.c:276)
> > > [ 1695.025067][ T137] __do_sys_newfstatat
> > > (/home/anders/src/kernel/next/fs/stat.c:446)
> > > [ 1695.028497][ T137] __arm64_sys_newfstatat
> > > (/home/anders/src/kernel/next/fs/stat.c:440
> > > /home/anders/src/kernel/next/fs/stat.c:440)
> > > [ 1695.032080][ T137] el0_svc_common.constprop.0
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
> > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
> > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> > > [ 1695.035916][ T137] do_el0_svc
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> > > [ 1695.038796][ T137] el0_svc
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> > > [ 1695.041468][ T137] el0t_64_sync_handler
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> > > [ 1695.044889][ T137] el0t_64_sync
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> > > [ 1695.047904][  T137]
> > > [ 1695.049511][  T137] 1 lock held by systemd-udevd/104:
> > > [ 1695.052837][ T137] #0: ffff000006681e08 (&root-
> > > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission
> > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
> > > [ 1695.060241][  T137] irq event stamp: 82902
> > > [ 1695.063006][ T137] hardirqs last enabled at (82901):
> > > _raw_spin_unlock_irqrestore
> > > (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative-
> > > macros.h:250
> > > /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27
> > > /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140
> > > /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151
> > > /home/anders/src/kernel/next/kernel/locking/spinlock.c:194)
> > > [ 1695.069673][ T137] hardirqs last disabled at (82902):
> > > el1_interrupt
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488)
> > > [ 1695.075474][ T137] softirqs last enabled at (82792):
> > > fpsimd_restore_current_state
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> > > [ 1695.082319][ T137] softirqs last disabled at (82790):
> > > fpsimd_restore_current_state
> > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> > > [ 1695.089049][  T137]
> > > [ 1695.090659][  T137] read to 0xffff00000bab6070 of 2 bytes by
> > > task 137 on cpu 0:
> > > [ 1695.095374][ T137] inode_permission
> > > (/home/anders/src/kernel/next/fs/namei.c:532)
> > > [ 1695.098655][ T137] link_path_walk
> > > (/home/anders/src/kernel/next/fs/namei.c:1720
> > > /home/anders/src/kernel/next/fs/namei.c:2267)
> > > [ 1695.101857][ T137] path_openat
> > > (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2))
> > > [ 1695.104885][ T137] do_filp_open
> > > (/home/anders/src/kernel/next/fs/namei.c:3820)
> > > [ 1695.108006][ T137] do_sys_openat2
> > > (/home/anders/src/kernel/next/fs/open.c:1418)
> > > [ 1695.111290][ T137] __arm64_sys_openat
> > > (/home/anders/src/kernel/next/fs/open.c:1433)
> > > [ 1695.114825][ T137] el0_svc_common.constprop.0
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
> > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
> > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> > > [ 1695.118662][ T137] do_el0_svc
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> > > [ 1695.121555][ T137] el0_svc
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> > > [ 1695.124207][ T137] el0t_64_sync_handler
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> > > [ 1695.127590][ T137] el0t_64_sync
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> > > [ 1695.130641][  T137]
> > > [ 1695.132241][  T137] no locks held by systemd-udevd/137.
> > > [ 1695.135618][  T137] irq event stamp: 3246
> > > [ 1695.138519][ T137] hardirqs last enabled at (3245):
> > > seqcount_lockdep_reader_access
> > > (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
> > > [ 1695.145825][ T137] hardirqs last disabled at (3246):
> > > el1_interrupt
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488)
> > > [ 1695.151942][ T137] softirqs last enabled at (3208):
> > > fpsimd_restore_current_state
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> > > [ 1695.158950][ T137] softirqs last disabled at (3206):
> > > fpsimd_restore_current_state
> > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> > > [ 1695.166036][  T137]
> > > [ 1695.167621][  T137] Reported by Kernel Concurrency Sanitizer on:
> > > [ 1695.179990][  T137] Hardware name: linux,dummy-virt (DT)
> > > [ 1695.183687][  T137]
> > > ==================================================================
> >
> >
> > This one is different to the original.
> >
> >
> > I can't see where the problem is here, can someone help me out
> >
> > please.
> >
> >
> > I don't see any shared data values used by the call
> >
> > devcgroup_inode_permission(inode, mask) in
> > fs/namei.c:inode_permission()
> >
> > that might be a problem during the read except possibly inode-
> > >i_mode.
> >
> >
> > I'll check on that ... maybe something's been missed when
> > kernfs_rwsem
> >
> > was changed to a separate lock (kernfs_iattr_rwsem).
> >
> >
> > >
> > > [...]
> > >
> > > [ 1738.053819][  T104] BUG: KCSAN: data-race in set_nlink /
> > > set_nlink
> > > [ 1738.058223][  T104]
> > > [ 1738.059865][  T104] read to 0xffff00000bab6918 of 4 bytes by
> > > task 108 on cpu 0:
> > > [ 1738.064916][ T104] set_nlink
> > > (/home/anders/src/kernel/next/fs/inode.c:369)
> > > [ 1738.067845][ T104] kernfs_refresh_inode
> > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:180)
> > > [ 1738.071607][ T104] kernfs_iop_permission
> > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> > > [ 1738.075467][ T104] inode_permission
> > > (/home/anders/src/kernel/next/fs/namei.c:461
> > > /home/anders/src/kernel/next/fs/namei.c:528)
> > > [ 1738.078868][ T104] link_path_walk
> > > (/home/anders/src/kernel/next/fs/namei.c:1720
> > > /home/anders/src/kernel/next/fs/namei.c:2267)
> > > [ 1738.082270][ T104] path_lookupat
> > > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
> > > [ 1738.085488][ T104] filename_lookup
> > > (/home/anders/src/kernel/next/fs/namei.c:2508)
> > > [ 1738.089101][ T104] user_path_at_empty
> > > (/home/anders/src/kernel/next/fs/namei.c:2907)
> > > [ 1738.092469][ T104] do_readlinkat
> > > (/home/anders/src/kernel/next/fs/stat.c:477)
> > > [ 1738.095970][ T104] __arm64_sys_readlinkat
> > > (/home/anders/src/kernel/next/fs/stat.c:504
> > > /home/anders/src/kernel/next/fs/stat.c:501
> > > /home/anders/src/kernel/next/fs/stat.c:501)
> > > [ 1738.099529][ T104] el0_svc_common.constprop.0
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
> > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
> > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> > > [ 1738.103696][ T104] do_el0_svc
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> > > [ 1738.106560][ T104] el0_svc
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> > > [ 1738.109613][ T104] el0t_64_sync_handler
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> > > [ 1738.113035][ T104] el0t_64_sync
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> > > [ 1738.116346][  T104]
> > > [ 1738.117924][  T104] 1 lock held by systemd-udevd/108:
> > > [ 1738.121580][ T104] #0: ffff000006681e08 (&root-
> > > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission
> > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
> > > [ 1738.129355][  T104] irq event stamp: 31000
> > > [ 1738.132088][ T104] hardirqs last enabled at (31000):
> > > seqcount_lockdep_reader_access
> > > (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
> > > [ 1738.139417][ T104] hardirqs last disabled at (30999):
> > > seqcount_lockdep_reader_access
> > > (/home/anders/src/kernel/next/include/linux/seqlock.h:104)
> > > [ 1738.146781][ T104] softirqs last enabled at (30973):
> > > fpsimd_restore_current_state
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> > > [ 1738.153891][ T104] softirqs last disabled at (30971):
> > > fpsimd_restore_current_state
> > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> > > [ 1738.161012][  T104]
> > > [ 1738.162663][  T104] write to 0xffff00000bab6918 of 4 bytes by
> > > task 104 on cpu 0:
> > > [ 1738.167730][ T104] set_nlink
> > > (/home/anders/src/kernel/next/fs/inode.c:372)
> > > [ 1738.170559][ T104] kernfs_refresh_inode
> > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:180)
> > > [ 1738.174355][ T104] kernfs_iop_permission
> > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
> > > [ 1738.177829][ T104] inode_permission
> > > (/home/anders/src/kernel/next/fs/namei.c:461
> > > /home/anders/src/kernel/next/fs/namei.c:528)
> > > [ 1738.181403][ T104] link_path_walk
> > > (/home/anders/src/kernel/next/fs/namei.c:1720
> > > /home/anders/src/kernel/next/fs/namei.c:2267)
> > > [ 1738.184738][ T104] path_lookupat
> > > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
> > > [ 1738.188268][ T104] filename_lookup
> > > (/home/anders/src/kernel/next/fs/namei.c:2508)
> > > [ 1738.191865][ T104] vfs_statx
> > > (/home/anders/src/kernel/next/fs/stat.c:238)
> > > [ 1738.196236][ T104] vfs_fstatat
> > > (/home/anders/src/kernel/next/fs/stat.c:276)
> > > [ 1738.200120][ T104] __do_sys_newfstatat
> > > (/home/anders/src/kernel/next/fs/stat.c:446)
> > > [ 1738.204095][ T104] __arm64_sys_newfstatat
> > > (/home/anders/src/kernel/next/fs/stat.c:440
> > > /home/anders/src/kernel/next/fs/stat.c:440)
> > > [ 1738.207676][ T104] el0_svc_common.constprop.0
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
> > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
> > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
> > > [ 1738.211820][ T104] do_el0_svc
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
> > > [ 1738.214815][ T104] el0_svc
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
> > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
> > > [ 1738.217709][ T104] el0t_64_sync_handler
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
> > > [ 1738.221239][ T104] el0t_64_sync
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
> > > [ 1738.224502][  T104]
> > > [ 1738.226090][  T104] 1 lock held by systemd-udevd/104:
> > > [ 1738.229747][ T104] #0: ffff000006681e08 (&root-
> > > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission
> > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
> > > [ 1738.237504][  T104] irq event stamp: 108353
> > > [ 1738.240262][ T104] hardirqs last enabled at (108353):
> > > seqcount_lockdep_reader_access
> > > (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
> > > [ 1738.247443][ T104] hardirqs last disabled at (108352):
> > > seqcount_lockdep_reader_access
> > > (/home/anders/src/kernel/next/include/linux/seqlock.h:104)
> > > [ 1738.254510][ T104] softirqs last enabled at (108326):
> > > fpsimd_restore_current_state
> > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
> > > [ 1738.262187][ T104] softirqs last disabled at (108324):
> > > fpsimd_restore_current_state
> > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
> > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
> > > [ 1738.270239][  T104]
> > > [ 1738.272140][  T104] Reported by Kernel Concurrency Sanitizer on:
> > > [ 1738.285185][  T104] Hardware name: linux,dummy-virt (DT)
> > > [ 1738.288703][  T104]
> > > ==================================================================
> >
> > This looks just like the original except a different lock is being
> >
> > used now.
> >
> >
> > The link count can't be less than two if set_nlink() is called.
> >
> >
> > Maybe I am missing something but the directory count is changed only
> >
> > while holding the root->kernfs_iattr_rwsem so the value used by
> >
> > set_nlink() will not change during concurrent calls to
> > refresh_inode().
> >
> > Still looks like a false positive, I'll check the write operations
> > again just to be sure.
>
> I do see a problem with recent changes.
>
> I'll send this off to Greg after I've done some testing (primarily just
> compile and function).
>
> Here's a patch which describes what I found.
>
> Comments are, of course, welcome, ;)
>
> kernfs: fix missing kernfs_iattr_rwsem locking
>
> From: Ian Kent <raven@themaw.net>
>
> When the kernfs_iattr_rwsem was introduced a case was missed.
>
> The update of the kernfs directory node child count was also protected
> by the kernfs_rwsem and needs to be included in the change so that the
> child count (and so the inode n_link attribute) does not change while
> holding the rwsem for read.
>
> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
> attributes)
>
> Signed-off-by: Ian Kent <raven@themaw.net>
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: Imran Khan <imran.f.khan@oracle.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Eric Sandeen <sandeen@sandeen.net>

Looks good.

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-20  2:03                   ` Ian Kent
  2023-07-26 13:49                     ` Miklos Szeredi
@ 2023-07-27  0:38                     ` Ian Kent
  2023-07-27  4:30                       ` Imran Khan
  1 sibling, 1 reply; 25+ messages in thread
From: Ian Kent @ 2023-07-27  0:38 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver, imran.f.khan

On 20/7/23 10:03, Ian Kent wrote:
> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
>> On 19/7/23 03:00, Anders Roxell wrote:
>>> On 2023-01-23 11:11, Ian Kent wrote:
>>>> On 29/12/22 21:07, Ian Kent wrote:
>>>>> On 29/12/22 17:20, Arnd Bergmann wrote:
>>>>>> On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote:
>>>>>>> On 21/12/22 21:34, Anders Roxell wrote:
>>>>>>>> On 2022-10-31 12:30, Tejun Heo wrote:
>>>>>>>>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent
>>>>>>>>> wrote:
>>>>>>>>>> The kernfs write lock is held when the kernfs node
>>>>>>>>>> inode attributes
>>>>>>>>>> are updated. Therefore, when either
>>>>>>>>>> kernfs_iop_getattr() or
>>>>>>>>>> kernfs_iop_permission() are called the kernfs node
>>>>>>>>>> inode attributes
>>>>>>>>>> won't change.
>>>>>>>>>>
>>>>>>>>>> Consequently concurrent kernfs_refresh_inode() calls
>>>>>>>>>> always copy the
>>>>>>>>>> same values from the kernfs node.
>>>>>>>>>>
>>>>>>>>>> So there's no need to take the inode i_lock to get
>>>>>>>>>> consistent values
>>>>>>>>>> for generic_fillattr() and generic_permission(), the
>>>>>>>>>> kernfs read lock
>>>>>>>>>> is sufficient.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>>>>>>>> Acked-by: Tejun Heo <tj@kernel.org>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Building an allmodconfig arm64 kernel on yesterdays next-
>>>>>>>> 20221220 and
>>>>>>>> booting that in qemu I see the following "BUG: KCSAN:
>>>>>>>> data-race in
>>>>>>>> set_nlink / set_nlink".
>>>>>>> I'll check if I missed any places where set_link() could be
>>>>>>> called where the link count could be different.
>>>>>>>
>>>>>>>
>>>>>>> If there aren't any the question will then be can writing
>>>>>>> the
>>>>>>> same value to this location in multiple concurrent threads
>>>>>>> corrupt it?
>>>>>> I think the race that is getting reported for set_nlink()
>>>>>> is about this bit getting called simulatenously on multiple
>>>>>> CPUs with only the read lock held for the inode:
>>>>>>
>>>>>>         /* Yes, some filesystems do change nlink from zero to
>>>>>> one */
>>>>>>         if (inode->i_nlink == 0)
>>>>>> atomic_long_dec(&inode->i_sb->s_remove_count);
>>>>>>         inode->__i_nlink = nlink;
>>>>>>
>>>>>> Since i_nlink and __i_nlink refer to the same memory
>>>>>> location,
>>>>>> the 'inode->i_nlink == 0' check can be true for all of them
>>>>>> before the nonzero nlink value gets set, and this results in
>>>>>> s_remove_count being decremented more than once.
>>>>> Thanks for the comment Arnd.
>>>> Hello all,
>>>>
>>>>
>>>> I've been looking at this and after consulting Miklos and his
>>>> pointing
>>>>
>>>> out that it looks like a false positive the urgency dropped off a
>>>> bit. So
>>>>
>>>> apologies for taking so long to report back.
>>>>
>>>>
>>>> Anyway it needs some description of conclusions reached so far.
>>>>
>>>>
>>>> I'm still looking around but in short, kernfs will set
>>>> directories to <# of
>>>>
>>>> directory entries> + 2 unconditionally for directories. I can't
>>>> yet find
>>>>
>>>> any other places where i_nlink is set or changed and if there are
>>>> none
>>>>
>>>> then i_nlink will never be set to zero so the race should not
>>>> occur.
>>>>
>>>>
>>>> Consequently my claim is this is a real false positive.
>>>>
>>>>
>>>> There are the file system operations that may be passed at mount
>>>> time
>>>>
>>>> but given the way kernfs sets i_nlink it pretty much dictates
>>>> those
>>>> operations
>>>>
>>>> (if there were any that modify it and there don't appear to be
>>>> any) leave it
>>>>
>>>> alone.
>>>>
>>>>
>>>> So it just doesn't make sense for users of kernfs to fiddle with
>>>> i_nlink ...
>>> On todays next tag, next-20230718 this KCSAN BUG poped up again.
>>> When I
>>> built an allmodconfig arm64 kernel and booted it in QEMU. Full log
>>> can
>>> be found http://ix.io/4AUd
>>>
>>> [ 1694.987789][  T137] BUG: KCSAN: data-race in inode_permission /
>>> kernfs_refresh_inode
>>> [ 1694.992912][  T137]
>>> [ 1694.994532][  T137] write to 0xffff00000bab6070 of 2 bytes by
>>> task 104 on cpu 0:
>>> [ 1694.999269][ T137] kernfs_refresh_inode
>>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:171)
>>> [ 1695.002707][ T137] kernfs_iop_permission
>>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
>>> [ 1695.006148][ T137] inode_permission
>>> (/home/anders/src/kernel/next/fs/namei.c:461
>>> /home/anders/src/kernel/next/fs/namei.c:528)
>>> [ 1695.009420][ T137] link_path_walk
>>> (/home/anders/src/kernel/next/fs/namei.c:1720
>>> /home/anders/src/kernel/next/fs/namei.c:2267)
>>> [ 1695.012643][ T137] path_lookupat
>>> (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
>>> [ 1695.015781][ T137] filename_lookup
>>> (/home/anders/src/kernel/next/fs/namei.c:2508)
>>> [ 1695.019059][ T137] vfs_statx
>>> (/home/anders/src/kernel/next/fs/stat.c:238)
>>> [ 1695.022024][ T137] vfs_fstatat
>>> (/home/anders/src/kernel/next/fs/stat.c:276)
>>> [ 1695.025067][ T137] __do_sys_newfstatat
>>> (/home/anders/src/kernel/next/fs/stat.c:446)
>>> [ 1695.028497][ T137] __arm64_sys_newfstatat
>>> (/home/anders/src/kernel/next/fs/stat.c:440
>>> /home/anders/src/kernel/next/fs/stat.c:440)
>>> [ 1695.032080][ T137] el0_svc_common.constprop.0
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
>>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
>>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
>>> [ 1695.035916][ T137] do_el0_svc
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
>>> [ 1695.038796][ T137] el0_svc
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
>>> [ 1695.041468][ T137] el0t_64_sync_handler
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
>>> [ 1695.044889][ T137] el0t_64_sync
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
>>> [ 1695.047904][  T137]
>>> [ 1695.049511][  T137] 1 lock held by systemd-udevd/104:
>>> [ 1695.052837][ T137] #0: ffff000006681e08 (&root-
>>>> kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission
>>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
>>> [ 1695.060241][  T137] irq event stamp: 82902
>>> [ 1695.063006][ T137] hardirqs last enabled at (82901):
>>> _raw_spin_unlock_irqrestore
>>> (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative-
>>> macros.h:250
>>> /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27
>>> /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140
>>> /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151
>>> /home/anders/src/kernel/next/kernel/locking/spinlock.c:194)
>>> [ 1695.069673][ T137] hardirqs last disabled at (82902):
>>> el1_interrupt
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488)
>>> [ 1695.075474][ T137] softirqs last enabled at (82792):
>>> fpsimd_restore_current_state
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
>>> [ 1695.082319][ T137] softirqs last disabled at (82790):
>>> fpsimd_restore_current_state
>>> (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
>>> [ 1695.089049][  T137]
>>> [ 1695.090659][  T137] read to 0xffff00000bab6070 of 2 bytes by
>>> task 137 on cpu 0:
>>> [ 1695.095374][ T137] inode_permission
>>> (/home/anders/src/kernel/next/fs/namei.c:532)
>>> [ 1695.098655][ T137] link_path_walk
>>> (/home/anders/src/kernel/next/fs/namei.c:1720
>>> /home/anders/src/kernel/next/fs/namei.c:2267)
>>> [ 1695.101857][ T137] path_openat
>>> (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2))
>>> [ 1695.104885][ T137] do_filp_open
>>> (/home/anders/src/kernel/next/fs/namei.c:3820)
>>> [ 1695.108006][ T137] do_sys_openat2
>>> (/home/anders/src/kernel/next/fs/open.c:1418)
>>> [ 1695.111290][ T137] __arm64_sys_openat
>>> (/home/anders/src/kernel/next/fs/open.c:1433)
>>> [ 1695.114825][ T137] el0_svc_common.constprop.0
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
>>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
>>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
>>> [ 1695.118662][ T137] do_el0_svc
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
>>> [ 1695.121555][ T137] el0_svc
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
>>> [ 1695.124207][ T137] el0t_64_sync_handler
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
>>> [ 1695.127590][ T137] el0t_64_sync
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
>>> [ 1695.130641][  T137]
>>> [ 1695.132241][  T137] no locks held by systemd-udevd/137.
>>> [ 1695.135618][  T137] irq event stamp: 3246
>>> [ 1695.138519][ T137] hardirqs last enabled at (3245):
>>> seqcount_lockdep_reader_access
>>> (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
>>> [ 1695.145825][ T137] hardirqs last disabled at (3246):
>>> el1_interrupt
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488)
>>> [ 1695.151942][ T137] softirqs last enabled at (3208):
>>> fpsimd_restore_current_state
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
>>> [ 1695.158950][ T137] softirqs last disabled at (3206):
>>> fpsimd_restore_current_state
>>> (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
>>> [ 1695.166036][  T137]
>>> [ 1695.167621][  T137] Reported by Kernel Concurrency Sanitizer on:
>>> [ 1695.179990][  T137] Hardware name: linux,dummy-virt (DT)
>>> [ 1695.183687][  T137]
>>> ==================================================================
>>
>> This one is different to the original.
>>
>>
>> I can't see where the problem is here, can someone help me out
>>
>> please.
>>
>>
>> I don't see any shared data values used by the call
>>
>> devcgroup_inode_permission(inode, mask) in
>> fs/namei.c:inode_permission()
>>
>> that might be a problem during the read except possibly inode-
>>> i_mode.
>>
>> I'll check on that ... maybe something's been missed when
>> kernfs_rwsem
>>
>> was changed to a separate lock (kernfs_iattr_rwsem).
>>
>>
>>> [...]
>>>
>>> [ 1738.053819][  T104] BUG: KCSAN: data-race in set_nlink /
>>> set_nlink
>>> [ 1738.058223][  T104]
>>> [ 1738.059865][  T104] read to 0xffff00000bab6918 of 4 bytes by
>>> task 108 on cpu 0:
>>> [ 1738.064916][ T104] set_nlink
>>> (/home/anders/src/kernel/next/fs/inode.c:369)
>>> [ 1738.067845][ T104] kernfs_refresh_inode
>>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:180)
>>> [ 1738.071607][ T104] kernfs_iop_permission
>>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
>>> [ 1738.075467][ T104] inode_permission
>>> (/home/anders/src/kernel/next/fs/namei.c:461
>>> /home/anders/src/kernel/next/fs/namei.c:528)
>>> [ 1738.078868][ T104] link_path_walk
>>> (/home/anders/src/kernel/next/fs/namei.c:1720
>>> /home/anders/src/kernel/next/fs/namei.c:2267)
>>> [ 1738.082270][ T104] path_lookupat
>>> (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
>>> [ 1738.085488][ T104] filename_lookup
>>> (/home/anders/src/kernel/next/fs/namei.c:2508)
>>> [ 1738.089101][ T104] user_path_at_empty
>>> (/home/anders/src/kernel/next/fs/namei.c:2907)
>>> [ 1738.092469][ T104] do_readlinkat
>>> (/home/anders/src/kernel/next/fs/stat.c:477)
>>> [ 1738.095970][ T104] __arm64_sys_readlinkat
>>> (/home/anders/src/kernel/next/fs/stat.c:504
>>> /home/anders/src/kernel/next/fs/stat.c:501
>>> /home/anders/src/kernel/next/fs/stat.c:501)
>>> [ 1738.099529][ T104] el0_svc_common.constprop.0
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
>>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
>>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
>>> [ 1738.103696][ T104] do_el0_svc
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
>>> [ 1738.106560][ T104] el0_svc
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
>>> [ 1738.109613][ T104] el0t_64_sync_handler
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
>>> [ 1738.113035][ T104] el0t_64_sync
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
>>> [ 1738.116346][  T104]
>>> [ 1738.117924][  T104] 1 lock held by systemd-udevd/108:
>>> [ 1738.121580][ T104] #0: ffff000006681e08 (&root-
>>>> kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission
>>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
>>> [ 1738.129355][  T104] irq event stamp: 31000
>>> [ 1738.132088][ T104] hardirqs last enabled at (31000):
>>> seqcount_lockdep_reader_access
>>> (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
>>> [ 1738.139417][ T104] hardirqs last disabled at (30999):
>>> seqcount_lockdep_reader_access
>>> (/home/anders/src/kernel/next/include/linux/seqlock.h:104)
>>> [ 1738.146781][ T104] softirqs last enabled at (30973):
>>> fpsimd_restore_current_state
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
>>> [ 1738.153891][ T104] softirqs last disabled at (30971):
>>> fpsimd_restore_current_state
>>> (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
>>> [ 1738.161012][  T104]
>>> [ 1738.162663][  T104] write to 0xffff00000bab6918 of 4 bytes by
>>> task 104 on cpu 0:
>>> [ 1738.167730][ T104] set_nlink
>>> (/home/anders/src/kernel/next/fs/inode.c:372)
>>> [ 1738.170559][ T104] kernfs_refresh_inode
>>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:180)
>>> [ 1738.174355][ T104] kernfs_iop_permission
>>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:289)
>>> [ 1738.177829][ T104] inode_permission
>>> (/home/anders/src/kernel/next/fs/namei.c:461
>>> /home/anders/src/kernel/next/fs/namei.c:528)
>>> [ 1738.181403][ T104] link_path_walk
>>> (/home/anders/src/kernel/next/fs/namei.c:1720
>>> /home/anders/src/kernel/next/fs/namei.c:2267)
>>> [ 1738.184738][ T104] path_lookupat
>>> (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2))
>>> [ 1738.188268][ T104] filename_lookup
>>> (/home/anders/src/kernel/next/fs/namei.c:2508)
>>> [ 1738.191865][ T104] vfs_statx
>>> (/home/anders/src/kernel/next/fs/stat.c:238)
>>> [ 1738.196236][ T104] vfs_fstatat
>>> (/home/anders/src/kernel/next/fs/stat.c:276)
>>> [ 1738.200120][ T104] __do_sys_newfstatat
>>> (/home/anders/src/kernel/next/fs/stat.c:446)
>>> [ 1738.204095][ T104] __arm64_sys_newfstatat
>>> (/home/anders/src/kernel/next/fs/stat.c:440
>>> /home/anders/src/kernel/next/fs/stat.c:440)
>>> [ 1738.207676][ T104] el0_svc_common.constprop.0
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38
>>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52
>>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139)
>>> [ 1738.211820][ T104] do_el0_svc
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188)
>>> [ 1738.214815][ T104] el0_svc
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144
>>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648)
>>> [ 1738.217709][ T104] el0t_64_sync_handler
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666)
>>> [ 1738.221239][ T104] el0t_64_sync
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591)
>>> [ 1738.224502][  T104]
>>> [ 1738.226090][  T104] 1 lock held by systemd-udevd/104:
>>> [ 1738.229747][ T104] #0: ffff000006681e08 (&root-
>>>> kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission
>>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:288)
>>> [ 1738.237504][  T104] irq event stamp: 108353
>>> [ 1738.240262][ T104] hardirqs last enabled at (108353):
>>> seqcount_lockdep_reader_access
>>> (/home/anders/src/kernel/next/include/linux/seqlock.h:105)
>>> [ 1738.247443][ T104] hardirqs last disabled at (108352):
>>> seqcount_lockdep_reader_access
>>> (/home/anders/src/kernel/next/include/linux/seqlock.h:104)
>>> [ 1738.254510][ T104] softirqs last enabled at (108326):
>>> fpsimd_restore_current_state
>>> (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791)
>>> [ 1738.262187][ T104] softirqs last disabled at (108324):
>>> fpsimd_restore_current_state
>>> (/home/anders/src/kernel/next/include/linux/bottom_half.h:20
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242
>>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784)
>>> [ 1738.270239][  T104]
>>> [ 1738.272140][  T104] Reported by Kernel Concurrency Sanitizer on:
>>> [ 1738.285185][  T104] Hardware name: linux,dummy-virt (DT)
>>> [ 1738.288703][  T104]
>>> ==================================================================
>> This looks just like the original except a different lock is being
>>
>> used now.
>>
>>
>> The link count can't be less than two if set_nlink() is called.
>>
>>
>> Maybe I am missing something but the directory count is changed only
>>
>> while holding the root->kernfs_iattr_rwsem so the value used by
>>
>> set_nlink() will not change during concurrent calls to
>> refresh_inode().
>>
>> Still looks like a false positive, I'll check the write operations
>> again just to be sure.
> I do see a problem with recent changes.
>
> I'll send this off to Greg after I've done some testing (primarily just
> compile and function).
>
> Here's a patch which describes what I found.
>
> Comments are, of course, welcome, ;)

Anders I was hoping you would check if/what lockdep trace

you get with this patch.


Imran, I was hoping you would comment on my change as it

relates to the kernfs_iattr_rwsem changes.


Ian

>
> kernfs: fix missing kernfs_iattr_rwsem locking
>
> From: Ian Kent <raven@themaw.net>
>
> When the kernfs_iattr_rwsem was introduced a case was missed.
>
> The update of the kernfs directory node child count was also protected
> by the kernfs_rwsem and needs to be included in the change so that the
> child count (and so the inode n_link attribute) does not change while
> holding the rwsem for read.
>
> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
> attributes)
>
> Signed-off-by: Ian Kent <raven@themaw.net>
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: Imran Khan <imran.f.khan@oracle.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Eric Sandeen <sandeen@sandeen.net>
> ---
>   fs/kernfs/dir.c |    4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 45b6919903e6..6e84bb69602e 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
> *kn)
>   	rb_insert_color(&kn->rb, &kn->parent->dir.children);
>   
>   	/* successfully added, account subdir number */
> +	down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>   	if (kernfs_type(kn) == KERNFS_DIR)
>   		kn->parent->dir.subdirs++;
>   	kernfs_inc_rev(kn->parent);
> +	up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>   
>   	return 0;
>   }
> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
> kernfs_node *kn)
>   	if (RB_EMPTY_NODE(&kn->rb))
>   		return false;
>   
> +	down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>   	if (kernfs_type(kn) == KERNFS_DIR)
>   		kn->parent->dir.subdirs--;
>   	kernfs_inc_rev(kn->parent);
> +	up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>   
>   	rb_erase(&kn->rb, &kn->parent->dir.children);
>   	RB_CLEAR_NODE(&kn->rb);
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-27  0:38                     ` Ian Kent
@ 2023-07-27  4:30                       ` Imran Khan
  2023-07-27  5:35                         ` Imran Khan
  2023-07-28  0:00                         ` Ian Kent
  0 siblings, 2 replies; 25+ messages in thread
From: Imran Khan @ 2023-07-27  4:30 UTC (permalink / raw)
  To: Ian Kent, Anders Roxell
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver

Hello Ian,
Sorry for late reply. I was about to reply this week.

On 27/7/2023 10:38 am, Ian Kent wrote:
> On 20/7/23 10:03, Ian Kent wrote:
>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:

[...]
>> I do see a problem with recent changes.
>>
>> I'll send this off to Greg after I've done some testing (primarily just
>> compile and function).
>>
>> Here's a patch which describes what I found.
>>
>> Comments are, of course, welcome, ;)
> 
> Anders I was hoping you would check if/what lockdep trace
> 
> you get with this patch.
> 
> 
> Imran, I was hoping you would comment on my change as it
> 
> relates to the kernfs_iattr_rwsem changes.
> 
> 
> Ian
> 
>>
>> kernfs: fix missing kernfs_iattr_rwsem locking
>>
>> From: Ian Kent <raven@themaw.net>
>>
>> When the kernfs_iattr_rwsem was introduced a case was missed.
>>
>> The update of the kernfs directory node child count was also protected
>> by the kernfs_rwsem and needs to be included in the change so that the
>> child count (and so the inode n_link attribute) does not change while
>> holding the rwsem for read.
>>

kernfs direcytory node's child count changes in kernfs_(un)link_sibling and
these are getting invoked while adding (kernfs_add_one),
removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these
operations proceed under kernfs_rwsem and I see each invocation of
kernfs_link/unlink_sibling during the above mentioned operations is happening
under kernfs_rwsem.
So the child count should still be protected by kernfs_rwsem and we should not
need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling.

Kindly let me know your thoughts. I would still like to see new lockdep traces
with this change.

Thanks,
Imran

>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
>> attributes)
>>
>> Signed-off-by: Ian Kent <raven@themaw.net>
>> Cc: Anders Roxell <anders.roxell@linaro.org>
>> Cc: Imran Khan <imran.f.khan@oracle.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Eric Sandeen <sandeen@sandeen.net>
>> ---
>>   fs/kernfs/dir.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>> index 45b6919903e6..6e84bb69602e 100644
>> --- a/fs/kernfs/dir.c
>> +++ b/fs/kernfs/dir.c
>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
>> *kn)
>>       rb_insert_color(&kn->rb, &kn->parent->dir.children);
>>         /* successfully added, account subdir number */
>> +    down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>       if (kernfs_type(kn) == KERNFS_DIR)
>>           kn->parent->dir.subdirs++;
>>       kernfs_inc_rev(kn->parent);
>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>         return 0;
>>   }
>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
>> kernfs_node *kn)
>>       if (RB_EMPTY_NODE(&kn->rb))
>>           return false;
>>   +    down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>       if (kernfs_type(kn) == KERNFS_DIR)
>>           kn->parent->dir.subdirs--;
>>       kernfs_inc_rev(kn->parent);
>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>         rb_erase(&kn->rb, &kn->parent->dir.children);
>>       RB_CLEAR_NODE(&kn->rb);
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-27  4:30                       ` Imran Khan
@ 2023-07-27  5:35                         ` Imran Khan
  2023-07-28  0:00                         ` Ian Kent
  1 sibling, 0 replies; 25+ messages in thread
From: Imran Khan @ 2023-07-27  5:35 UTC (permalink / raw)
  To: Ian Kent, Anders Roxell
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver

Hello again Ian,
I take back my previous comment :).

On 27/7/2023 2:30 pm, Imran Khan wrote:
> Hello Ian,
> Sorry for late reply. I was about to reply this week.
> 
> On 27/7/2023 10:38 am, Ian Kent wrote:
>> On 20/7/23 10:03, Ian Kent wrote:
>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
> 
> [...]
>>> I do see a problem with recent changes.
>>>
>>> I'll send this off to Greg after I've done some testing (primarily just
>>> compile and function).
>>>
>>> Here's a patch which describes what I found.
>>>
>>> Comments are, of course, welcome, ;)
>>
>> Anders I was hoping you would check if/what lockdep trace
>>
>> you get with this patch.
>>
>>
>> Imran, I was hoping you would comment on my change as it
>>
>> relates to the kernfs_iattr_rwsem changes.
>>
>>
>> Ian
>>
>>>
>>> kernfs: fix missing kernfs_iattr_rwsem locking
>>>
>>> From: Ian Kent <raven@themaw.net>
>>>
>>> When the kernfs_iattr_rwsem was introduced a case was missed.
>>>
>>> The update of the kernfs directory node child count was also protected
>>> by the kernfs_rwsem and needs to be included in the change so that the
>>> child count (and so the inode n_link attribute) does not change while
>>> holding the rwsem for read.
>>>
> 
> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and
> these are getting invoked while adding (kernfs_add_one),
> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these
> operations proceed under kernfs_rwsem and I see each invocation of
> kernfs_link/unlink_sibling during the above mentioned operations is happening
> under kernfs_rwsem.
> So the child count should still be protected by kernfs_rwsem and we should not
> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling.
> 

kernfs_refresh_inode can still race against kernfs_link/unlink_siblings. So your
change looks good to me.
My tests are not showing any issues either. ( I tested on 4.14 and 5.4 kernels
as well).

Fee free to add my RB.

Reviewed-by: Imran Khan <imran.f.khan@oracle.com>

> Kindly let me know your thoughts. I would still like to see new lockdep traces
> with this change.
> 
> Thanks,
> Imran
> 
>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
>>> attributes)
>>>
>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>> Cc: Anders Roxell <anders.roxell@linaro.org>
>>> Cc: Imran Khan <imran.f.khan@oracle.com>
>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Eric Sandeen <sandeen@sandeen.net>
>>> ---
>>>   fs/kernfs/dir.c |    4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>> index 45b6919903e6..6e84bb69602e 100644
>>> --- a/fs/kernfs/dir.c
>>> +++ b/fs/kernfs/dir.c
>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
>>> *kn)
>>>       rb_insert_color(&kn->rb, &kn->parent->dir.children);
>>>         /* successfully added, account subdir number */
>>> +    down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>       if (kernfs_type(kn) == KERNFS_DIR)
>>>           kn->parent->dir.subdirs++;
>>>       kernfs_inc_rev(kn->parent);
>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>         return 0;
>>>   }
>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
>>> kernfs_node *kn)
>>>       if (RB_EMPTY_NODE(&kn->rb))
>>>           return false;
>>>   +    down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>       if (kernfs_type(kn) == KERNFS_DIR)
>>>           kn->parent->dir.subdirs--;
>>>       kernfs_inc_rev(kn->parent);
>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>         rb_erase(&kn->rb, &kn->parent->dir.children);
>>>       RB_CLEAR_NODE(&kn->rb);
>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-27  4:30                       ` Imran Khan
  2023-07-27  5:35                         ` Imran Khan
@ 2023-07-28  0:00                         ` Ian Kent
  2023-07-28  0:16                           ` Ian Kent
  1 sibling, 1 reply; 25+ messages in thread
From: Ian Kent @ 2023-07-28  0:00 UTC (permalink / raw)
  To: Imran Khan, Anders Roxell
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver

On 27/7/23 12:30, Imran Khan wrote:
> Hello Ian,
> Sorry for late reply. I was about to reply this week.
>
> On 27/7/2023 10:38 am, Ian Kent wrote:
>> On 20/7/23 10:03, Ian Kent wrote:
>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
> [...]
>>> I do see a problem with recent changes.
>>>
>>> I'll send this off to Greg after I've done some testing (primarily just
>>> compile and function).
>>>
>>> Here's a patch which describes what I found.
>>>
>>> Comments are, of course, welcome, ;)
>> Anders I was hoping you would check if/what lockdep trace
>>
>> you get with this patch.
>>
>>
>> Imran, I was hoping you would comment on my change as it
>>
>> relates to the kernfs_iattr_rwsem changes.
>>
>>
>> Ian
>>
>>> kernfs: fix missing kernfs_iattr_rwsem locking
>>>
>>> From: Ian Kent <raven@themaw.net>
>>>
>>> When the kernfs_iattr_rwsem was introduced a case was missed.
>>>
>>> The update of the kernfs directory node child count was also protected
>>> by the kernfs_rwsem and needs to be included in the change so that the
>>> child count (and so the inode n_link attribute) does not change while
>>> holding the rwsem for read.
>>>
> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and
> these are getting invoked while adding (kernfs_add_one),
> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these
> operations proceed under kernfs_rwsem and I see each invocation of
> kernfs_link/unlink_sibling during the above mentioned operations is happening
> under kernfs_rwsem.
> So the child count should still be protected by kernfs_rwsem and we should not
> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling.

Yes, that's exactly what I intended (assuming you mean write lock in 
those cases)

when I did it so now I wonder what I saw that lead to my patch, I'll 
need to look

again ...


>
> Kindly let me know your thoughts. I would still like to see new lockdep traces
> with this change.

Indeed, I hope Anders can find time to get the trace.


Ian

>
> Thanks,
> Imran
>
>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
>>> attributes)
>>>
>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>> Cc: Anders Roxell <anders.roxell@linaro.org>
>>> Cc: Imran Khan <imran.f.khan@oracle.com>
>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Eric Sandeen <sandeen@sandeen.net>
>>> ---
>>>    fs/kernfs/dir.c |    4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>> index 45b6919903e6..6e84bb69602e 100644
>>> --- a/fs/kernfs/dir.c
>>> +++ b/fs/kernfs/dir.c
>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
>>> *kn)
>>>        rb_insert_color(&kn->rb, &kn->parent->dir.children);
>>>          /* successfully added, account subdir number */
>>> +    down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>        if (kernfs_type(kn) == KERNFS_DIR)
>>>            kn->parent->dir.subdirs++;
>>>        kernfs_inc_rev(kn->parent);
>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>          return 0;
>>>    }
>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
>>> kernfs_node *kn)
>>>        if (RB_EMPTY_NODE(&kn->rb))
>>>            return false;
>>>    +    down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>        if (kernfs_type(kn) == KERNFS_DIR)
>>>            kn->parent->dir.subdirs--;
>>>        kernfs_inc_rev(kn->parent);
>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>          rb_erase(&kn->rb, &kn->parent->dir.children);
>>>        RB_CLEAR_NODE(&kn->rb);
>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-28  0:00                         ` Ian Kent
@ 2023-07-28  0:16                           ` Ian Kent
  2023-07-28  1:06                             ` Imran Khan
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Kent @ 2023-07-28  0:16 UTC (permalink / raw)
  To: Imran Khan, Anders Roxell
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver

On 28/7/23 08:00, Ian Kent wrote:
> On 27/7/23 12:30, Imran Khan wrote:
>> Hello Ian,
>> Sorry for late reply. I was about to reply this week.
>>
>> On 27/7/2023 10:38 am, Ian Kent wrote:
>>> On 20/7/23 10:03, Ian Kent wrote:
>>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
>> [...]
>>>> I do see a problem with recent changes.
>>>>
>>>> I'll send this off to Greg after I've done some testing (primarily 
>>>> just
>>>> compile and function).
>>>>
>>>> Here's a patch which describes what I found.
>>>>
>>>> Comments are, of course, welcome, ;)
>>> Anders I was hoping you would check if/what lockdep trace
>>>
>>> you get with this patch.
>>>
>>>
>>> Imran, I was hoping you would comment on my change as it
>>>
>>> relates to the kernfs_iattr_rwsem changes.
>>>
>>>
>>> Ian
>>>
>>>> kernfs: fix missing kernfs_iattr_rwsem locking
>>>>
>>>> From: Ian Kent <raven@themaw.net>
>>>>
>>>> When the kernfs_iattr_rwsem was introduced a case was missed.
>>>>
>>>> The update of the kernfs directory node child count was also protected
>>>> by the kernfs_rwsem and needs to be included in the change so that the
>>>> child count (and so the inode n_link attribute) does not change while
>>>> holding the rwsem for read.
>>>>
>> kernfs direcytory node's child count changes in 
>> kernfs_(un)link_sibling and
>> these are getting invoked while adding (kernfs_add_one),
>> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of 
>> these
>> operations proceed under kernfs_rwsem and I see each invocation of
>> kernfs_link/unlink_sibling during the above mentioned operations is 
>> happening
>> under kernfs_rwsem.
>> So the child count should still be protected by kernfs_rwsem and we 
>> should not
>> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling.
>
> Yes, that's exactly what I intended (assuming you mean write lock in 
> those cases)
>
> when I did it so now I wonder what I saw that lead to my patch, I'll 
> need to look
>
> again ...

Ahh, I see why I thought this ...

It's the hunk:

@@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap,
         kn = inode->i_private;
         root = kernfs_root(kn);

-       down_read(&root->kernfs_rwsem);
+       down_read(&root->kernfs_iattr_rwsem);
         kernfs_refresh_inode(kn, inode);
         ret = generic_permission(&nop_mnt_idmap, inode, mask);
-       up_read(&root->kernfs_rwsem);
+       up_read(&root->kernfs_iattr_rwsem);

         return ret;
  }

which takes away the kernfs_rwsem and introduces the possibility of

the change. It may be more instructive to add back taking the read

lock of kernfs_rwsem in .permission() than altering the sibling link

and unlink functions, I mean I even caught myself on it.


Ian

>
>
>>
>> Kindly let me know your thoughts. I would still like to see new 
>> lockdep traces
>> with this change.
>
> Indeed, I hope Anders can find time to get the trace.
>
>
> Ian
>
>>
>> Thanks,
>> Imran
>>
>>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
>>>> attributes)
>>>>
>>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>>> Cc: Anders Roxell <anders.roxell@linaro.org>
>>>> Cc: Imran Khan <imran.f.khan@oracle.com>
>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>> Cc: Minchan Kim <minchan@kernel.org>
>>>> Cc: Eric Sandeen <sandeen@sandeen.net>
>>>> ---
>>>>    fs/kernfs/dir.c |    4 ++++
>>>>    1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>>> index 45b6919903e6..6e84bb69602e 100644
>>>> --- a/fs/kernfs/dir.c
>>>> +++ b/fs/kernfs/dir.c
>>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
>>>> *kn)
>>>>        rb_insert_color(&kn->rb, &kn->parent->dir.children);
>>>>          /* successfully added, account subdir number */
>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>        if (kernfs_type(kn) == KERNFS_DIR)
>>>>            kn->parent->dir.subdirs++;
>>>>        kernfs_inc_rev(kn->parent);
>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>          return 0;
>>>>    }
>>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
>>>> kernfs_node *kn)
>>>>        if (RB_EMPTY_NODE(&kn->rb))
>>>>            return false;
>>>>    + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>        if (kernfs_type(kn) == KERNFS_DIR)
>>>>            kn->parent->dir.subdirs--;
>>>>        kernfs_inc_rev(kn->parent);
>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>          rb_erase(&kn->rb, &kn->parent->dir.children);
>>>>        RB_CLEAR_NODE(&kn->rb);
>>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-28  0:16                           ` Ian Kent
@ 2023-07-28  1:06                             ` Imran Khan
  2023-07-28  1:29                               ` Ian Kent
  0 siblings, 1 reply; 25+ messages in thread
From: Imran Khan @ 2023-07-28  1:06 UTC (permalink / raw)
  To: Ian Kent, Anders Roxell
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver

Hello Ian,

On 28/7/2023 10:16 am, Ian Kent wrote:
> On 28/7/23 08:00, Ian Kent wrote:
>> On 27/7/23 12:30, Imran Khan wrote:
>>> Hello Ian,
>>> Sorry for late reply. I was about to reply this week.
>>>
>>> On 27/7/2023 10:38 am, Ian Kent wrote:
>>>> On 20/7/23 10:03, Ian Kent wrote:
>>>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
>>> [...]
>>>>> I do see a problem with recent changes.
>>>>>
>>>>> I'll send this off to Greg after I've done some testing (primarily just
>>>>> compile and function).
>>>>>
>>>>> Here's a patch which describes what I found.
>>>>>
>>>>> Comments are, of course, welcome, ;)
>>>> Anders I was hoping you would check if/what lockdep trace
>>>>
>>>> you get with this patch.
>>>>
>>>>
>>>> Imran, I was hoping you would comment on my change as it
>>>>
>>>> relates to the kernfs_iattr_rwsem changes.
>>>>
>>>>
>>>> Ian
>>>>
>>>>> kernfs: fix missing kernfs_iattr_rwsem locking
>>>>>
>>>>> From: Ian Kent <raven@themaw.net>
>>>>>
>>>>> When the kernfs_iattr_rwsem was introduced a case was missed.
>>>>>
>>>>> The update of the kernfs directory node child count was also protected
>>>>> by the kernfs_rwsem and needs to be included in the change so that the
>>>>> child count (and so the inode n_link attribute) does not change while
>>>>> holding the rwsem for read.
>>>>>
>>> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and
>>> these are getting invoked while adding (kernfs_add_one),
>>> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these
>>> operations proceed under kernfs_rwsem and I see each invocation of
>>> kernfs_link/unlink_sibling during the above mentioned operations is happening
>>> under kernfs_rwsem.
>>> So the child count should still be protected by kernfs_rwsem and we should not
>>> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling.
>>
>> Yes, that's exactly what I intended (assuming you mean write lock in those cases)
>>
>> when I did it so now I wonder what I saw that lead to my patch, I'll need to look
>>
>> again ...
> 
> Ahh, I see why I thought this ...
> 
> It's the hunk:
> 
> @@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap,
>         kn = inode->i_private;
>         root = kernfs_root(kn);
> 
> -       down_read(&root->kernfs_rwsem);
> +       down_read(&root->kernfs_iattr_rwsem);
>         kernfs_refresh_inode(kn, inode);
>         ret = generic_permission(&nop_mnt_idmap, inode, mask);
> -       up_read(&root->kernfs_rwsem);
> +       up_read(&root->kernfs_iattr_rwsem);
> 
>         return ret;
>  }
> 
> which takes away the kernfs_rwsem and introduces the possibility of
> 
> the change. It may be more instructive to add back taking the read
> 
> lock of kernfs_rwsem in .permission() than altering the sibling link
> 
> and unlink functions, I mean I even caught myself on it.
> 

Yes this was the block I referred to in my second comment [1]. However adding
back read lock of kernfs_rwsem in .permission() will again introduce the
bottleneck that I mentioned at [2]. So I think taking taking the locks in
link/unlink is the better option.
I understand having one lock to synchronize everything makes it easier
debug/development wise but sometimes (such as the case mentioned in [2]), it is
not optimum performance wise.
Thoughts ?

Thanks,
Imran

[1]: https://lore.kernel.org/all/8b0a1619-1e39-fc3a-1226-f3b167e64646@oracle.com/
[2]: https://lore.kernel.org/all/20230302043203.1695051-2-imran.f.khan@oracle.com/
> 
> Ian
> 
>>
>>
>>>
>>> Kindly let me know your thoughts. I would still like to see new lockdep traces
>>> with this change.
>>
>> Indeed, I hope Anders can find time to get the trace.
>>
>>
>> Ian
>>
>>>
>>> Thanks,
>>> Imran
>>>
>>>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
>>>>> attributes)
>>>>>
>>>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>>>> Cc: Anders Roxell <anders.roxell@linaro.org>
>>>>> Cc: Imran Khan <imran.f.khan@oracle.com>
>>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>>> Cc: Minchan Kim <minchan@kernel.org>
>>>>> Cc: Eric Sandeen <sandeen@sandeen.net>
>>>>> ---
>>>>>    fs/kernfs/dir.c |    4 ++++
>>>>>    1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>>>> index 45b6919903e6..6e84bb69602e 100644
>>>>> --- a/fs/kernfs/dir.c
>>>>> +++ b/fs/kernfs/dir.c
>>>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
>>>>> *kn)
>>>>>        rb_insert_color(&kn->rb, &kn->parent->dir.children);
>>>>>          /* successfully added, account subdir number */
>>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>        if (kernfs_type(kn) == KERNFS_DIR)
>>>>>            kn->parent->dir.subdirs++;
>>>>>        kernfs_inc_rev(kn->parent);
>>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>          return 0;
>>>>>    }
>>>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
>>>>> kernfs_node *kn)
>>>>>        if (RB_EMPTY_NODE(&kn->rb))
>>>>>            return false;
>>>>>    + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>        if (kernfs_type(kn) == KERNFS_DIR)
>>>>>            kn->parent->dir.subdirs--;
>>>>>        kernfs_inc_rev(kn->parent);
>>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>          rb_erase(&kn->rb, &kn->parent->dir.children);
>>>>>        RB_CLEAR_NODE(&kn->rb);
>>>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
  2023-07-28  1:06                             ` Imran Khan
@ 2023-07-28  1:29                               ` Ian Kent
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Kent @ 2023-07-28  1:29 UTC (permalink / raw)
  To: Imran Khan, Anders Roxell
  Cc: Arnd Bergmann, Tejun Heo, Greg Kroah-Hartman, Minchan Kim,
	Eric Sandeen, Alexander Viro, Rick Lindsley, David Howells,
	Miklos Szeredi, Carlos Maiolino, linux-fsdevel,
	Kernel Mailing List, elver

On 28/7/23 09:06, Imran Khan wrote:
> Hello Ian,
>
> On 28/7/2023 10:16 am, Ian Kent wrote:
>> On 28/7/23 08:00, Ian Kent wrote:
>>> On 27/7/23 12:30, Imran Khan wrote:
>>>> Hello Ian,
>>>> Sorry for late reply. I was about to reply this week.
>>>>
>>>> On 27/7/2023 10:38 am, Ian Kent wrote:
>>>>> On 20/7/23 10:03, Ian Kent wrote:
>>>>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
>>>> [...]
>>>>>> I do see a problem with recent changes.
>>>>>>
>>>>>> I'll send this off to Greg after I've done some testing (primarily just
>>>>>> compile and function).
>>>>>>
>>>>>> Here's a patch which describes what I found.
>>>>>>
>>>>>> Comments are, of course, welcome, ;)
>>>>> Anders I was hoping you would check if/what lockdep trace
>>>>>
>>>>> you get with this patch.
>>>>>
>>>>>
>>>>> Imran, I was hoping you would comment on my change as it
>>>>>
>>>>> relates to the kernfs_iattr_rwsem changes.
>>>>>
>>>>>
>>>>> Ian
>>>>>
>>>>>> kernfs: fix missing kernfs_iattr_rwsem locking
>>>>>>
>>>>>> From: Ian Kent <raven@themaw.net>
>>>>>>
>>>>>> When the kernfs_iattr_rwsem was introduced a case was missed.
>>>>>>
>>>>>> The update of the kernfs directory node child count was also protected
>>>>>> by the kernfs_rwsem and needs to be included in the change so that the
>>>>>> child count (and so the inode n_link attribute) does not change while
>>>>>> holding the rwsem for read.
>>>>>>
>>>> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and
>>>> these are getting invoked while adding (kernfs_add_one),
>>>> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these
>>>> operations proceed under kernfs_rwsem and I see each invocation of
>>>> kernfs_link/unlink_sibling during the above mentioned operations is happening
>>>> under kernfs_rwsem.
>>>> So the child count should still be protected by kernfs_rwsem and we should not
>>>> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling.
>>> Yes, that's exactly what I intended (assuming you mean write lock in those cases)
>>>
>>> when I did it so now I wonder what I saw that lead to my patch, I'll need to look
>>>
>>> again ...
>> Ahh, I see why I thought this ...
>>
>> It's the hunk:
>>
>> @@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap,
>>          kn = inode->i_private;
>>          root = kernfs_root(kn);
>>
>> -       down_read(&root->kernfs_rwsem);
>> +       down_read(&root->kernfs_iattr_rwsem);
>>          kernfs_refresh_inode(kn, inode);
>>          ret = generic_permission(&nop_mnt_idmap, inode, mask);
>> -       up_read(&root->kernfs_rwsem);
>> +       up_read(&root->kernfs_iattr_rwsem);
>>
>>          return ret;
>>   }
>>
>> which takes away the kernfs_rwsem and introduces the possibility of
>>
>> the change. It may be more instructive to add back taking the read
>>
>> lock of kernfs_rwsem in .permission() than altering the sibling link
>>
>> and unlink functions, I mean I even caught myself on it.
>>
> Yes this was the block I referred to in my second comment [1]. However adding
> back read lock of kernfs_rwsem in .permission() will again introduce the
> bottleneck that I mentioned at [2]. So I think taking taking the locks in
> link/unlink is the better option.

Yes, sorry, I always fall into the trap of not reading through all my

mail before commenting, oops!


The fact that .permission() is called so much more than the create/remove

functions also occurred to be me too just after I posted my comment (and

is probably why I originally did it that way).


I'll forward the patch for merge but would really like to see the lockdep

trace so I'll wait a little while ...

> I understand having one lock to synchronize everything makes it easier
> debug/development wise but sometimes (such as the case mentioned in [2]), it is
> not optimum performance wise.

Indeed, the performance improvement work that has been happening over

quite some time now is very good.


I had seen some opportunities around the file open path long ago but

hadn't got to doing anything there as you have, the work looks good

to me, thanks for doing it.


Ian

> Thoughts ?
>
> Thanks,
> Imran
>
> [1]: https://lore.kernel.org/all/8b0a1619-1e39-fc3a-1226-f3b167e64646@oracle.com/
> [2]: https://lore.kernel.org/all/20230302043203.1695051-2-imran.f.khan@oracle.com/
>> Ian
>>
>>>
>>>> Kindly let me know your thoughts. I would still like to see new lockdep traces
>>>> with this change.
>>> Indeed, I hope Anders can find time to get the trace.
>>>
>>>
>>> Ian
>>>
>>>> Thanks,
>>>> Imran
>>>>
>>>>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
>>>>>> attributes)
>>>>>>
>>>>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>>>>> Cc: Anders Roxell <anders.roxell@linaro.org>
>>>>>> Cc: Imran Khan <imran.f.khan@oracle.com>
>>>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>>>> Cc: Minchan Kim <minchan@kernel.org>
>>>>>> Cc: Eric Sandeen <sandeen@sandeen.net>
>>>>>> ---
>>>>>>     fs/kernfs/dir.c |    4 ++++
>>>>>>     1 file changed, 4 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>>>>> index 45b6919903e6..6e84bb69602e 100644
>>>>>> --- a/fs/kernfs/dir.c
>>>>>> +++ b/fs/kernfs/dir.c
>>>>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
>>>>>> *kn)
>>>>>>         rb_insert_color(&kn->rb, &kn->parent->dir.children);
>>>>>>           /* successfully added, account subdir number */
>>>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>>         if (kernfs_type(kn) == KERNFS_DIR)
>>>>>>             kn->parent->dir.subdirs++;
>>>>>>         kernfs_inc_rev(kn->parent);
>>>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>>           return 0;
>>>>>>     }
>>>>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
>>>>>> kernfs_node *kn)
>>>>>>         if (RB_EMPTY_NODE(&kn->rb))
>>>>>>             return false;
>>>>>>     + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>>         if (kernfs_type(kn) == KERNFS_DIR)
>>>>>>             kn->parent->dir.subdirs--;
>>>>>>         kernfs_inc_rev(kn->parent);
>>>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>>           rb_erase(&kn->rb, &kn->parent->dir.children);
>>>>>>         RB_CLEAR_NODE(&kn->rb);
>>>>>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-07-28  1:29 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18  2:32 [PATCH 0/2] kernfs: remove i_lock usage that isn't needed Ian Kent
2022-10-18  2:32 ` [PATCH 1/2] kernfs: dont take i_lock on inode attr read Ian Kent
2022-10-24  8:50   ` Miklos Szeredi
2022-10-31 22:30   ` Tejun Heo
2022-12-21 13:34     ` Anders Roxell
2022-12-22 23:11       ` Ian Kent
2022-12-29  9:20         ` Arnd Bergmann
2022-12-29 13:07           ` Ian Kent
2023-01-23  3:11             ` Ian Kent
2023-07-18 19:00               ` Anders Roxell
2023-07-19  4:23                 ` Ian Kent
2023-07-20  2:03                   ` Ian Kent
2023-07-26 13:49                     ` Miklos Szeredi
2023-07-27  0:38                     ` Ian Kent
2023-07-27  4:30                       ` Imran Khan
2023-07-27  5:35                         ` Imran Khan
2023-07-28  0:00                         ` Ian Kent
2023-07-28  0:16                           ` Ian Kent
2023-07-28  1:06                             ` Imran Khan
2023-07-28  1:29                               ` Ian Kent
2022-10-18  2:32 ` [PATCH 2/2] kernfs: dont take i_lock on revalidate Ian Kent
2022-10-24  8:38   ` Miklos Szeredi
2022-10-31 22:31   ` Tejun Heo
2022-11-01  7:46   ` Amir Goldstein
2022-11-01  8:09     ` Ian Kent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).