All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHES] RCU pathwalk race fixes
@ 2024-02-04  2:14 Al Viro
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:14 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

	We still have some races in filesystem methods when exposed
to RCU pathwalk.  The series below is a result of code audit (the
second round of it) and it should deal with most of that stuff.
Exceptions: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
Up to maintainers (a note for NTFS folks - when documentation says
that a method may not block, it *does* imply that blocking allocations
are to be avoided.  Really).

	Branch is 6.8-rc1-based; it lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git fixes.pathwalk-rcu

Individual patches are in followups; ditto for code audit notes.  Beginning
of the latter should probably be converted into docs; if anyone is willing
to help with such conversion, please say so - I'll be glad to answer any
questions, etc.

If somebody wants to grab bits and pieces of that series into individual
filesystem git trees, please say so.  Same for any problems spotted in
the patches, obviously.  If nothing shows up, that goes into #fixes and
into mainline.

Shortlog:
      fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
      rcu pathwalk: prevent bogus hard errors from may_lookup()
      affs: free affs_sb_info with kfree_rcu()
      exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
      hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
      afs: fix __afs_break_callback() / afs_drop_open_mmap() race
      nfs: make nfs_set_verifier() safe for use in RCU pathwalk
      nfs: fix UAF on pathwalk running into umount
      procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
      procfs: make freeing proc_fs_info rcu-delayed
      fuse: fix UAF in rcu pathwalks
      cifs_get_link(): bail out in unsafe case
      ext4_get_link(): fix breakage in RCU mode

Diffstat:
 fs/affs/affs.h            |  1 +
 fs/affs/super.c           |  2 +-
 fs/afs/file.c             |  8 ++++++--
 fs/exfat/exfat_fs.h       |  1 +
 fs/exfat/nls.c            | 14 ++++----------
 fs/exfat/super.c          | 20 +++++++++++---------
 fs/ext4/symlink.c         |  8 +++++---
 fs/fuse/cuse.c            |  3 +--
 fs/fuse/fuse_i.h          |  1 +
 fs/fuse/inode.c           | 15 +++++++++++----
 fs/hfsplus/hfsplus_fs.h   |  1 +
 fs/hfsplus/super.c        | 12 +++++++++---
 fs/namei.c                |  6 +++++-
 fs/nfs/client.c           | 13 ++++++++++---
 fs/nfs/dir.c              |  4 ++--
 fs/proc/base.c            |  2 --
 fs/proc/inode.c           | 19 ++++++++-----------
 fs/proc/root.c            |  2 +-
 fs/smb/client/cifsfs.c    |  3 +++
 fs/super.c                | 13 ++++---------
 include/linux/nfs_fs_sb.h |  2 ++
 include/linux/proc_fs.h   |  1 +
 22 files changed, 88 insertions(+), 63 deletions(-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
  2024-02-04  2:14 [PATCHES] RCU pathwalk race fixes Al Viro
@ 2024-02-04  2:17 ` Al Viro
  2024-02-04  2:17   ` [PATCH 02/13] rcu pathwalk: prevent bogus hard errors from may_lookup() Al Viro
                     ` (13 more replies)
  2024-02-04  2:27 ` RCU pathwalk audit notes Al Viro
  2024-02-05 12:48 ` [PATCHES] RCU pathwalk race fixes Jeff Layton
  2 siblings, 14 replies; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

Avoids fun races in RCU pathwalk...  Same goes for freeing LSM shite
hanging off super_block's arse.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/super.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index d35e85295489..d6efeba0d0ce 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -274,9 +274,10 @@ static void destroy_super_work(struct work_struct *work)
 {
 	struct super_block *s = container_of(work, struct super_block,
 							destroy_work);
-	int i;
-
-	for (i = 0; i < SB_FREEZE_LEVELS; i++)
+	security_sb_free(s);
+	put_user_ns(s->s_user_ns);
+	kfree(s->s_subtype);
+	for (int i = 0; i < SB_FREEZE_LEVELS; i++)
 		percpu_free_rwsem(&s->s_writers.rw_sem[i]);
 	kfree(s);
 }
@@ -296,9 +297,6 @@ static void destroy_unused_super(struct super_block *s)
 	super_unlock_excl(s);
 	list_lru_destroy(&s->s_dentry_lru);
 	list_lru_destroy(&s->s_inode_lru);
-	security_sb_free(s);
-	put_user_ns(s->s_user_ns);
-	kfree(s->s_subtype);
 	shrinker_free(s->s_shrink);
 	/* no delays needed */
 	destroy_super_work(&s->destroy_work);
@@ -409,9 +407,6 @@ static void __put_super(struct super_block *s)
 		WARN_ON(s->s_dentry_lru.node);
 		WARN_ON(s->s_inode_lru.node);
 		WARN_ON(!list_empty(&s->s_mounts));
-		security_sb_free(s);
-		put_user_ns(s->s_user_ns);
-		kfree(s->s_subtype);
 		call_rcu(&s->rcu, destroy_super_rcu);
 	}
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/13] rcu pathwalk: prevent bogus hard errors from may_lookup()
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:26     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 03/13] affs: free affs_sb_info with kfree_rcu() Al Viro
                     ` (12 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

If lazy call of ->permission() returns a hard error, check that
try_to_unlazy() succeeds before returning it.  That both makes
life easier for ->permission() instances and closes the race
in ENOTDIR handling - it is possible that positive d_can_lookup()
seen in link_path_walk() applies to the state *after* unlink() +
mkdir(), while nd->inode matches the state prior to that.

Normally seeing e.g. EACCES from permission check in rcu pathwalk
means that with some timings non-rcu pathwalk would've run into
the same; however, running into a non-executable regular file
in the middle of a pathname would not get to permission check -
it would fail with ENOTDIR instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4e0de939fea1..9342fa6a38c2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1717,7 +1717,11 @@ static inline int may_lookup(struct mnt_idmap *idmap,
 {
 	if (nd->flags & LOOKUP_RCU) {
 		int err = inode_permission(idmap, nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
-		if (err != -ECHILD || !try_to_unlazy(nd))
+		if (!err)		// success, keep going
+			return 0;
+		if (!try_to_unlazy(nd))
+			return -ECHILD;	// redo it all non-lazy
+		if (err != -ECHILD)	// hard error
 			return err;
 	}
 	return inode_permission(idmap, nd->inode, MAY_EXEC);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/13] affs: free affs_sb_info with kfree_rcu()
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
  2024-02-04  2:17   ` [PATCH 02/13] rcu pathwalk: prevent bogus hard errors from may_lookup() Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:26     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 04/13] exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper Al Viro
                     ` (11 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

one of the flags in it is used by ->d_hash()/->d_compare()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/affs/affs.h  | 1 +
 fs/affs/super.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/affs/affs.h b/fs/affs/affs.h
index 60685ec76d98..2e612834329a 100644
--- a/fs/affs/affs.h
+++ b/fs/affs/affs.h
@@ -105,6 +105,7 @@ struct affs_sb_info {
 	int work_queued;		/* non-zero delayed work is queued */
 	struct delayed_work sb_work;	/* superblock flush delayed work */
 	spinlock_t work_lock;		/* protects sb_work and work_queued */
+	struct rcu_head rcu;
 };
 
 #define AFFS_MOUNT_SF_INTL		0x0001 /* International filesystem. */
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 58b391446ae1..b56a95cf414a 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -640,7 +640,7 @@ static void affs_kill_sb(struct super_block *sb)
 		affs_brelse(sbi->s_root_bh);
 		kfree(sbi->s_prefix);
 		mutex_destroy(&sbi->s_bmlock);
-		kfree(sbi);
+		kfree_rcu(sbi, rcu);
 	}
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/13] exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
  2024-02-04  2:17   ` [PATCH 02/13] rcu pathwalk: prevent bogus hard errors from may_lookup() Al Viro
  2024-02-04  2:17   ` [PATCH 03/13] affs: free affs_sb_info with kfree_rcu() Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:27     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 05/13] hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info Al Viro
                     ` (10 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
that is in process of getting shut down.

Besides, having nls and upcase table cleanup moved from ->put_super() towards
the place where sbi is freed makes for simpler failure exits.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/exfat/exfat_fs.h |  1 +
 fs/exfat/nls.c      | 14 ++++----------
 fs/exfat/super.c    | 20 +++++++++++---------
 3 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 9474cd50da6d..361595433480 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -275,6 +275,7 @@ struct exfat_sb_info {
 
 	spinlock_t inode_hash_lock;
 	struct hlist_head inode_hashtable[EXFAT_HASH_SIZE];
+	struct rcu_head rcu;
 };
 
 #define EXFAT_CACHE_VALID	0
diff --git a/fs/exfat/nls.c b/fs/exfat/nls.c
index 705710f93e2d..afdf13c34ff5 100644
--- a/fs/exfat/nls.c
+++ b/fs/exfat/nls.c
@@ -655,7 +655,6 @@ static int exfat_load_upcase_table(struct super_block *sb,
 	unsigned int sect_size = sb->s_blocksize;
 	unsigned int i, index = 0;
 	u32 chksum = 0;
-	int ret;
 	unsigned char skip = false;
 	unsigned short *upcase_table;
 
@@ -673,8 +672,7 @@ static int exfat_load_upcase_table(struct super_block *sb,
 		if (!bh) {
 			exfat_err(sb, "failed to read sector(0x%llx)",
 				  (unsigned long long)sector);
-			ret = -EIO;
-			goto free_table;
+			return -EIO;
 		}
 		sector++;
 		for (i = 0; i < sect_size && index <= 0xFFFF; i += 2) {
@@ -701,15 +699,12 @@ static int exfat_load_upcase_table(struct super_block *sb,
 
 	exfat_err(sb, "failed to load upcase table (idx : 0x%08x, chksum : 0x%08x, utbl_chksum : 0x%08x)",
 		  index, chksum, utbl_checksum);
-	ret = -EINVAL;
-free_table:
-	exfat_free_upcase_table(sbi);
-	return ret;
+	return -EINVAL;
 }
 
 static int exfat_load_default_upcase_table(struct super_block *sb)
 {
-	int i, ret = -EIO;
+	int i;
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
 	unsigned char skip = false;
 	unsigned short uni = 0, *upcase_table;
@@ -740,8 +735,7 @@ static int exfat_load_default_upcase_table(struct super_block *sb)
 		return 0;
 
 	/* FATAL error: default upcase table has error */
-	exfat_free_upcase_table(sbi);
-	return ret;
+	return -EIO;
 }
 
 int exfat_create_upcase_table(struct super_block *sb)
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index d9d4fa91010b..fcb658267765 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -39,9 +39,6 @@ static void exfat_put_super(struct super_block *sb)
 	exfat_free_bitmap(sbi);
 	brelse(sbi->boot_bh);
 	mutex_unlock(&sbi->s_lock);
-
-	unload_nls(sbi->nls_io);
-	exfat_free_upcase_table(sbi);
 }
 
 static int exfat_sync_fs(struct super_block *sb, int wait)
@@ -600,7 +597,7 @@ static int __exfat_fill_super(struct super_block *sb)
 	ret = exfat_load_bitmap(sb);
 	if (ret) {
 		exfat_err(sb, "failed to load alloc-bitmap");
-		goto free_upcase_table;
+		goto free_bh;
 	}
 
 	ret = exfat_count_used_clusters(sb, &sbi->used_clusters);
@@ -613,8 +610,6 @@ static int __exfat_fill_super(struct super_block *sb)
 
 free_alloc_bitmap:
 	exfat_free_bitmap(sbi);
-free_upcase_table:
-	exfat_free_upcase_table(sbi);
 free_bh:
 	brelse(sbi->boot_bh);
 	return ret;
@@ -701,12 +696,10 @@ static int exfat_fill_super(struct super_block *sb, struct fs_context *fc)
 	sb->s_root = NULL;
 
 free_table:
-	exfat_free_upcase_table(sbi);
 	exfat_free_bitmap(sbi);
 	brelse(sbi->boot_bh);
 
 check_nls_io:
-	unload_nls(sbi->nls_io);
 	return err;
 }
 
@@ -771,13 +764,22 @@ static int exfat_init_fs_context(struct fs_context *fc)
 	return 0;
 }
 
+static void delayed_free(struct rcu_head *p)
+{
+	struct exfat_sb_info *sbi = container_of(p, struct exfat_sb_info, rcu);
+
+	unload_nls(sbi->nls_io);
+	exfat_free_upcase_table(sbi);
+	exfat_free_sbi(sbi);
+}
+
 static void exfat_kill_sb(struct super_block *sb)
 {
 	struct exfat_sb_info *sbi = sb->s_fs_info;
 
 	kill_block_super(sb);
 	if (sbi)
-		exfat_free_sbi(sbi);
+		call_rcu(&sbi->rcu, delayed_free);
 }
 
 static struct file_system_type exfat_fs_type = {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/13] hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (2 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 04/13] exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:27     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 06/13] afs: fix __afs_break_callback() / afs_drop_open_mmap() race Al Viro
                     ` (9 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

->d_hash() and ->d_compare() use those, so we need to delay freeing
them.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/hfsplus/hfsplus_fs.h |  1 +
 fs/hfsplus/super.c      | 12 +++++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
index 7ededcb720c1..012a3d003fbe 100644
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -190,6 +190,7 @@ struct hfsplus_sb_info {
 	int work_queued;               /* non-zero delayed work is queued */
 	struct delayed_work sync_work; /* FS sync delayed work */
 	spinlock_t work_lock;          /* protects sync_work and work_queued */
+	struct rcu_head rcu;
 };
 
 #define HFSPLUS_SB_WRITEBACKUP	0
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index 1986b4f18a90..97920202790f 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -277,6 +277,14 @@ void hfsplus_mark_mdb_dirty(struct super_block *sb)
 	spin_unlock(&sbi->work_lock);
 }
 
+static void delayed_free(struct rcu_head *p)
+{
+	struct hfsplus_sb_info *sbi = container_of(p, struct hfsplus_sb_info, rcu);
+
+	unload_nls(sbi->nls);
+	kfree(sbi);
+}
+
 static void hfsplus_put_super(struct super_block *sb)
 {
 	struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
@@ -302,9 +310,7 @@ static void hfsplus_put_super(struct super_block *sb)
 	hfs_btree_close(sbi->ext_tree);
 	kfree(sbi->s_vhdr_buf);
 	kfree(sbi->s_backup_vhdr_buf);
-	unload_nls(sbi->nls);
-	kfree(sb->s_fs_info);
-	sb->s_fs_info = NULL;
+	call_rcu(&sbi->rcu, delayed_free);
 }
 
 static int hfsplus_statfs(struct dentry *dentry, struct kstatfs *buf)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/13] afs: fix __afs_break_callback() / afs_drop_open_mmap() race
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (3 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 05/13] hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:28     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 07/13] nfs: make nfs_set_verifier() safe for use in RCU pathwalk Al Viro
                     ` (8 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero
do queue_work(&vnode->cb_work).  In afs_drop_open_mmap() we decrement
->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.

The trouble is, there's nothing to prevent __afs_break_callback() from
seeing ->cb_nr_mmap before the decrement and do queue_work() after both
the decrement and flush_work().  If that happens, we might be in trouble -
vnode might get freed before the queued work runs.

__afs_break_callback() is always done under ->cb_lock, so let's make
sure that ->cb_nr_mmap can change from non-zero to zero while holding
->cb_lock (the spinlock component of it - it's a seqlock and we don't
need to mess with the counter).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/afs/file.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 3d33b221d9ca..ef2cc8f565d2 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -417,13 +417,17 @@ static void afs_add_open_mmap(struct afs_vnode *vnode)
 
 static void afs_drop_open_mmap(struct afs_vnode *vnode)
 {
-	if (!atomic_dec_and_test(&vnode->cb_nr_mmap))
+	if (atomic_add_unless(&vnode->cb_nr_mmap, -1, 1))
 		return;
 
 	down_write(&vnode->volume->open_mmaps_lock);
 
-	if (atomic_read(&vnode->cb_nr_mmap) == 0)
+	read_seqlock_excl(&vnode->cb_lock);
+	// the only place where ->cb_nr_mmap may hit 0
+	// see __afs_break_callback() for the other side...
+	if (atomic_dec_and_test(&vnode->cb_nr_mmap))
 		list_del_init(&vnode->cb_mmap_link);
+	read_sequnlock_excl(&vnode->cb_lock);
 
 	up_write(&vnode->volume->open_mmaps_lock);
 	flush_work(&vnode->cb_work);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/13] nfs: make nfs_set_verifier() safe for use in RCU pathwalk
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (4 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 06/13] afs: fix __afs_break_callback() / afs_drop_open_mmap() race Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:29     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 08/13] nfs: fix UAF on pathwalk running into umount Al Viro
                     ` (7 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

nfs_set_verifier() relies upon dentry being pinned; if that's
the case, grabbing ->d_lock stabilizes ->d_parent and guarantees
that ->d_parent points to a positive dentry.  For something
we'd run into in RCU mode that is *not* true - dentry might've
been through dentry_kill() just as we grabbed ->d_lock, with
its parent going through the same just as we get to into
nfs_set_verifier_locked().  It might get to detaching inode
(and zeroing ->d_inode) before nfs_set_verifier_locked() gets
to fetching that; we get an oops as the result.

That can happen in nfs{,4} ->d_revalidate(); the call chain in
question is nfs_set_verifier_locked() <- nfs_set_verifier() <-
nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate().
We have checked that the parent had been positive, but that's
done before we get to nfs_set_verifier() and it's possible for
memory pressure to pick our dentry as eviction candidate by that
time.  If that happens, back-to-back attempts to kill dentry and
its parent are quite normal.  Sure, in case of eviction we'll
fail the ->d_seq check in the caller, but we need to survive
until we return there...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/nfs/dir.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c8ecbe999059..ac505671efbd 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1431,9 +1431,9 @@ static bool nfs_verifier_is_delegated(struct dentry *dentry)
 static void nfs_set_verifier_locked(struct dentry *dentry, unsigned long verf)
 {
 	struct inode *inode = d_inode(dentry);
-	struct inode *dir = d_inode(dentry->d_parent);
+	struct inode *dir = d_inode_rcu(dentry->d_parent);
 
-	if (!nfs_verify_change_attribute(dir, verf))
+	if (!dir || !nfs_verify_change_attribute(dir, verf))
 		return;
 	if (inode && NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
 		nfs_set_verifier_delegated(&verf);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/13] nfs: fix UAF on pathwalk running into umount
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (5 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 07/13] nfs: make nfs_set_verifier() safe for use in RCU pathwalk Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:29     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 09/13] procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() Al Viro
                     ` (6 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

NFS ->d_revalidate(), ->permission() and ->get_link() need to access
some parts of nfs_server when called in RCU mode:
	server->flags
	server->caps
	*(server->io_stats)
and, worst of all, call
	server->nfs_client->rpc_ops->have_delegation
(the last one - as NFS_PROTO(inode)->have_delegation()).  We really
don't want to RCU-delay the entire nfs_free_server() (it would have
to be done with schedule_work() from RCU callback, since it can't
be made to run from interrupt context), but actual freeing of
nfs_server and ->io_stats can be done via call_rcu() just fine.
nfs_client part is handled simply by making nfs_free_client() use
kfree_rcu().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/nfs/client.c           | 13 ++++++++++---
 include/linux/nfs_fs_sb.h |  2 ++
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 44eca51b2808..fbdc9ca80f71 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -246,7 +246,7 @@ void nfs_free_client(struct nfs_client *clp)
 	put_nfs_version(clp->cl_nfs_mod);
 	kfree(clp->cl_hostname);
 	kfree(clp->cl_acceptor);
-	kfree(clp);
+	kfree_rcu(clp, rcu);
 }
 EXPORT_SYMBOL_GPL(nfs_free_client);
 
@@ -1006,6 +1006,14 @@ struct nfs_server *nfs_alloc_server(void)
 }
 EXPORT_SYMBOL_GPL(nfs_alloc_server);
 
+static void delayed_free(struct rcu_head *p)
+{
+	struct nfs_server *server = container_of(p, struct nfs_server, rcu);
+
+	nfs_free_iostats(server->io_stats);
+	kfree(server);
+}
+
 /*
  * Free up a server record
  */
@@ -1031,10 +1039,9 @@ void nfs_free_server(struct nfs_server *server)
 
 	ida_destroy(&server->lockowner_id);
 	ida_destroy(&server->openowner_id);
-	nfs_free_iostats(server->io_stats);
 	put_cred(server->cred);
-	kfree(server);
 	nfs_release_automount_timer();
+	call_rcu(&server->rcu, delayed_free);
 }
 EXPORT_SYMBOL_GPL(nfs_free_server);
 
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index cd797e00fe35..92de074e63b9 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -124,6 +124,7 @@ struct nfs_client {
 	char			cl_ipaddr[48];
 	struct net		*cl_net;
 	struct list_head	pending_cb_stateids;
+	struct rcu_head		rcu;
 };
 
 /*
@@ -265,6 +266,7 @@ struct nfs_server {
 	const struct cred	*cred;
 	bool			has_sec_mnt_opts;
 	struct kobject		kobj;
+	struct rcu_head		rcu;
 };
 
 /* Server capabilities */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/13] procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (6 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 08/13] nfs: fix UAF on pathwalk running into umount Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:30     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 10/13] procfs: make freeing proc_fs_info rcu-delayed Al Viro
                     ` (5 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

that keeps both around until struct inode is freed, making access
to them safe from rcu-pathwalk

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/proc/base.c  |  2 --
 fs/proc/inode.c | 19 ++++++++-----------
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 98a031ac2648..18550c071d71 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1878,8 +1878,6 @@ void proc_pid_evict_inode(struct proc_inode *ei)
 		hlist_del_init_rcu(&ei->sibling_inodes);
 		spin_unlock(&pid->lock);
 	}
-
-	put_pid(pid);
 }
 
 struct inode *proc_pid_make_inode(struct super_block *sb,
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index b33e490e3fd9..05350f3c2812 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -30,7 +30,6 @@
 
 static void proc_evict_inode(struct inode *inode)
 {
-	struct proc_dir_entry *de;
 	struct ctl_table_header *head;
 	struct proc_inode *ei = PROC_I(inode);
 
@@ -38,17 +37,8 @@ static void proc_evict_inode(struct inode *inode)
 	clear_inode(inode);
 
 	/* Stop tracking associated processes */
-	if (ei->pid) {
+	if (ei->pid)
 		proc_pid_evict_inode(ei);
-		ei->pid = NULL;
-	}
-
-	/* Let go of any associated proc directory entry */
-	de = ei->pde;
-	if (de) {
-		pde_put(de);
-		ei->pde = NULL;
-	}
 
 	head = ei->sysctl;
 	if (head) {
@@ -80,6 +70,13 @@ static struct inode *proc_alloc_inode(struct super_block *sb)
 
 static void proc_free_inode(struct inode *inode)
 {
+	struct proc_inode *ei = PROC_I(inode);
+
+	if (ei->pid)
+		put_pid(ei->pid);
+	/* Let go of any associated proc directory entry */
+	if (ei->pde)
+		pde_put(ei->pde);
 	kmem_cache_free(proc_inode_cachep, PROC_I(inode));
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/13] procfs: make freeing proc_fs_info rcu-delayed
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (7 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 09/13] procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:31     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 11/13] fuse: fix UAF in rcu pathwalks Al Viro
                     ` (4 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
is still synchronous, but that's not a problem - it does
rcu-delay everything that needs to be)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/proc/root.c          | 2 +-
 include/linux/proc_fs.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/proc/root.c b/fs/proc/root.c
index b55dbc70287b..06a297a27ba3 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -271,7 +271,7 @@ static void proc_kill_sb(struct super_block *sb)
 
 	kill_anon_super(sb);
 	put_pid_ns(fs_info->pid_ns);
-	kfree(fs_info);
+	kfree_rcu(fs_info, rcu);
 }
 
 static struct file_system_type proc_fs_type = {
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index de407e7c3b55..0b2a89854440 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -65,6 +65,7 @@ struct proc_fs_info {
 	kgid_t pid_gid;
 	enum proc_hidepid hide_pid;
 	enum proc_pidonly pidonly;
+	struct rcu_head rcu;
 };
 
 static inline struct proc_fs_info *proc_sb_info(struct super_block *sb)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/13] fuse: fix UAF in rcu pathwalks
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (8 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 10/13] procfs: make freeing proc_fs_info rcu-delayed Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:31     ` Christian Brauner
  2024-02-04  2:17   ` [PATCH 12/13] cifs_get_link(): bail out in unsafe case Al Viro
                     ` (3 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

->permission(), ->get_link() and ->inode_get_acl() might dereference
->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
as well) when called from rcu pathwalk.

Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
and dropping ->user_ns rcu-delayed too.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/fuse/cuse.c   |  3 +--
 fs/fuse/fuse_i.h |  1 +
 fs/fuse/inode.c  | 15 +++++++++++----
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 91e89e68177e..b6cad106c37e 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -474,8 +474,7 @@ static int cuse_send_init(struct cuse_conn *cc)
 
 static void cuse_fc_release(struct fuse_conn *fc)
 {
-	struct cuse_conn *cc = fc_to_cc(fc);
-	kfree_rcu(cc, fc.rcu);
+	kfree(fc_to_cc(fc));
 }
 
 /**
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 1df83eebda92..bcbe34488862 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -888,6 +888,7 @@ struct fuse_mount {
 
 	/* Entry on fc->mounts */
 	struct list_head fc_entry;
+	struct rcu_head rcu;
 };
 
 static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2a6d44f91729..516ea2979a90 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -930,6 +930,14 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
+static void delayed_release(struct rcu_head *p)
+{
+	struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
+
+	put_user_ns(fc->user_ns);
+	fc->release(fc);
+}
+
 void fuse_conn_put(struct fuse_conn *fc)
 {
 	if (refcount_dec_and_test(&fc->count)) {
@@ -941,13 +949,12 @@ void fuse_conn_put(struct fuse_conn *fc)
 		if (fiq->ops->release)
 			fiq->ops->release(fiq);
 		put_pid_ns(fc->pid_ns);
-		put_user_ns(fc->user_ns);
 		bucket = rcu_dereference_protected(fc->curr_bucket, 1);
 		if (bucket) {
 			WARN_ON(atomic_read(&bucket->count) != 1);
 			kfree(bucket);
 		}
-		fc->release(fc);
+		call_rcu(&fc->rcu, delayed_release);
 	}
 }
 EXPORT_SYMBOL_GPL(fuse_conn_put);
@@ -1366,7 +1373,7 @@ EXPORT_SYMBOL_GPL(fuse_send_init);
 void fuse_free_conn(struct fuse_conn *fc)
 {
 	WARN_ON(!list_empty(&fc->devices));
-	kfree_rcu(fc, rcu);
+	kfree(fc);
 }
 EXPORT_SYMBOL_GPL(fuse_free_conn);
 
@@ -1902,7 +1909,7 @@ static void fuse_sb_destroy(struct super_block *sb)
 void fuse_mount_destroy(struct fuse_mount *fm)
 {
 	fuse_conn_put(fm->fc);
-	kfree(fm);
+	kfree_rcu(fm, rcu);
 }
 EXPORT_SYMBOL(fuse_mount_destroy);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 12/13] cifs_get_link(): bail out in unsafe case
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (9 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 11/13] fuse: fix UAF in rcu pathwalks Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-04 15:45     ` Steve French
  2024-02-04  2:17   ` [PATCH 13/13] ext4_get_link(): fix breakage in RCU mode Al Viro
                     ` (2 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

->d_revalidate() bails out there, anyway.  It's not enough
to prevent getting into ->get_link() in RCU mode, but that
could happen only in a very contrieved setup.  Not worth
trying to do anything fancy here unless ->d_revalidate()
stops kicking out of RCU mode at least in some cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/smb/client/cifsfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index e902de4e475a..630e74628dfe 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1172,6 +1172,9 @@ const char *cifs_get_link(struct dentry *dentry, struct inode *inode,
 {
 	char *target_path;
 
+	if (!dentry)
+		return ERR_PTR(-ECHILD);
+
 	target_path = kmalloc(PATH_MAX, GFP_KERNEL);
 	if (!target_path)
 		return ERR_PTR(-ENOMEM);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 13/13] ext4_get_link(): fix breakage in RCU mode
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (10 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 12/13] cifs_get_link(): bail out in unsafe case Al Viro
@ 2024-02-04  2:17   ` Al Viro
  2024-02-05 12:24   ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Christian Brauner
  2024-02-05 12:36   ` Jeff Layton
  13 siblings, 0 replies; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

1) errors from ext4_getblk() should not be propagated to caller
unless we are really sure that we would've gotten the same error
in non-RCU pathwalk.
2) we leak buffer_heads if ext4_getblk() is successful, but bh is
not uptodate.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ext4/symlink.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 75bf1f88843c..645240cc0229 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -92,10 +92,12 @@ static const char *ext4_get_link(struct dentry *dentry, struct inode *inode,
 
 	if (!dentry) {
 		bh = ext4_getblk(NULL, inode, 0, EXT4_GET_BLOCKS_CACHED_NOWAIT);
-		if (IS_ERR(bh))
-			return ERR_CAST(bh);
-		if (!bh || !ext4_buffer_uptodate(bh))
+		if (IS_ERR(bh) || !bh)
 			return ERR_PTR(-ECHILD);
+		if (!ext4_buffer_uptodate(bh)) {
+			brelse(bh);
+			return ERR_PTR(-ECHILD);
+		}
 	} else {
 		bh = ext4_bread(NULL, inode, 0, 0);
 		if (IS_ERR(bh))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* RCU pathwalk audit notes
  2024-02-04  2:14 [PATCHES] RCU pathwalk race fixes Al Viro
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
@ 2024-02-04  2:27 ` Al Viro
  2024-02-05 12:48 ` [PATCHES] RCU pathwalk race fixes Jeff Layton
  2 siblings, 0 replies; 36+ messages in thread
From: Al Viro @ 2024-02-04  2:27 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

Below are my notes from that code audit; a part of that would probably
be useful for filesystem maintainers - right now it's between the
braindump and usable documentation.  If anyone finds it useful - great;
if somebody helps with turning that into proper documentation - even
better.
--------------------------------------------------------------------------
PRELIMINARY NOTE: it is possible for a lazy pathwalk to run into a dentry
on a filesystem that is getting shut down.  Here's how:
	* have two threads sharing fs_struct chdir'ed on that filesystem.
	* lazy-umount it, so that the only thing holding it alive is
cwd of these threads.
	* one of the threads does relative pathname resolution
and gets to e.g. ->d_hash().  It's holding rcu_read_lock().
	* at the same time another thread does fchdir(), moving to
different directory.  set_fs_pwd() flips to the new place and does
mntput() on the old one.  No RCU delays here, we calculate the
refcount of that mount and see that we are dropping the last reference.
We make sure that the first thread will fail when it comes to
legitimize_mnt() and do this:
			init_task_work(&mnt->mnt_rcu, __cleanup_mnt);
			if (!task_work_add(task, &mnt->mnt_rcu, TWA_RESUME))
				return;
As we leave the syscall, we have __cleanup_mnt() run; it calls
cleanup_mnt() on our mount, which hits deactivate_super().  That was
the last reference to superblock.

	Voila - we have a filesystem shutdown right under the nose of
a thread running in ->d_hash() of something on that filesystem.
	Mutatis mutandis, one can arrange the same for other methods
called by rcu pathwalk.

	It's not easy to hit (especially if you want to get through the
entire ->kill_sb() before the first thread gets through ->d_hash()),
and it's probably impossible on the real hardware; on KVM it might be
borderline doable.  However, it is possible and I would not swear that
other ways of arranging the same thing are equally hard to hit.

	The bottom line: methods that can be called in RCU mode need to
be careful about the per-superblock objects destruction.
[end of note]

	Exposure of filesystem code to lockless environment, or
the ways in which RCU pathwalk may fuck filesystem maintainer over.


	Filesystem methods can usually count upon VFS-provided warranties
regarding the stability of objects they are called to act upon; at the
very least, they can expect the dentries/inodes/superblocks involved to
remain live throughout the operation.

	Life would be much more painful without that; however, such
warranties do not come for free.  The problem is that access patterns are
heavily biased; every system call getting an absolute pathname will have
to start at root directory, etc.  Having each of them in effect write
"I'd been here" on the same memory objects would cost quite a bit.
As the result, we try to keep the fast path stores-free, bumping no
refcounts and taking no locks.	Details are described elsewhere, but the
bottom line for filesystems is that some methods may be called with much
looser warranties than usual.  Of course, from the filesystem POV each
of those is a potential source of headache - you are asked to operate
on an object that might start to be torn down right under you, possibly
along with the filesystem instance it lives on.

	The list of the methods that could run into that fun:

	method		| indication that the call is unsafe	| unstable objects
->d_hash(d, ...) 	|	none - any call might be	|	d
->d_compare(d, ...)	|	none - any call might be	|	d
->d_revalidate(d, f)	|	f & LOOKUP_RCU			|	d
->d_manage(d, f)	|	f				|	d
->permission(i, m)	|	m & MAY_NOT_BLOCK		|	i
->get_link(d, i, ...)	|	d == NULL			|	i
->get_inode_acl(i, t, f)|	f == LOOKUP_RCU			|	i


	Additionally, callback set by set_delayed_call() from unsafe call of
->get_link() will be run in the same environment; that one is usually not
a problem, though.

	For the sake of completeness, three of LSM methods
(->inode_permission(), ->inode_follow_link() and ->task_to_inode())
might be called in similar environment, but that's a problem for LSM
crowd, not for filesystem folks.


	Any method call is, of course, required not to crash - no stepping on
freed memory, etc.  All of the unsafe calls listed above are done under
rcu_read_lock(), so they are not allowed to block.  Further requirements
vary between the methods.


	Before going through the list of affected methods, several notes on
the things that _are_ guaranteed:
	* if a reference to struct dentry is passed to such call, it will
not be freed until the method returns.	The same goes for a reference to
struct inode and to struct super_block pointed to by ->d_sb or ->i_sb
members of dentry and inode resp.  Any of those might be in process of
being torn down or enter such state right under us; the entire point of
those unsafe calls is that we make them without telling anyone they'd
need to wait for us.
	* following ->d_parent and ->d_inode of such dentries is fine,
provided that it's done by READ_ONCE() (for ->d_inode the preferred form
is d_inode_rcu(dentry)).  The value of ->d_parent is never going to be
NULL and it will again point to a struct dentry that will not be freed
until the method call finishes.  The value of ->d_inode might be NULL;
if non-NULL, it'll be pointing to a struct inode that will not be freed
until the method call finishes.
	* none of the inodes passed to an unsafe call could have reached
fs/inode.c:evict() before the caller grabbed rcu_read_lock().
	* for inodes 'not freed' means 'not entered ->free_inode()', so
anything that won't be destroyed until ->free_inode() is safe to access.
Anything synchronously destroyed in ->evict_inode() or ->destroy_inode()
is not safe; however, one can count upon the call_rcu() callbacks issued
in those yet to be entered.  Note that unlike dentries and superblocks,
inodes are embedded into filesystem-private objects; anything stored
directly in the containing object is safe to access.
	* for dentries anything destroyed by ->d_prune() (synchronously or
not) is not safe; the same goes for the things synchronously destroyed by
->d_release().  However, call_rcu() callbacks issued in ->d_release() are
yet to be entered.
	* for superblocks we can count upon call_rcu() callbacks issued
from inside the ->kill_sb() (including the ones issued from ->put_super())
yet to be entered.
	* NOTE: we can not count upon the things like ->d_parent
being positive (or a directory); a race with rename()+rmdir()+mknod()
and you might find a FIFO as parent's inode.  NULL is even easier -
just have the dentry *and* its ex-parent already past dentry_kill()
(which is a normal situation for eviction on memory pressure) and there
you go.  Normally such pathologies are prevented by the locking (and
dentry refcounting), but... the entire point of that stuff is to avoid
informing anyone that we are there, so those mechanisms are bypassed.
What's more, if dentry is not pinned by refcount, grabbing its ->d_lock
will *not* suffice to prevent that kind of mess - the scenario with
eviction by memory pressure won't be prevented by that; you might have
grabbed ->d_lock only after the dentry_kill() had released it, and at
that point ->d_parent still points to what used to be the parent, but
there's nothing to prevent its eviction.
	* [with fix in this series] for superblocks we can count
upon ->s_user_ns still being pinned and ->s_security have not
been freed yet.


	1. ->d_compare().

	For ->d_compare() we just need to make sure it won't crash
when called for dying dentry - an incorrect return value won't harm the
caller in such case.  False positives and false negatives alike - the
callers take care of that.  To be pedantic, make that "false positives
do not cause problems unless they have ->d_manage()", but ->d_manage()
is present only on autofs and there's no autofs ->d_compare() instances.

[[ footnote:
		Some callers prevent being called for dying dentry (holding
	->d_lock and having verified !d_unhashed() or finding it in the list
	of inode's aliases under ->i_lock).  For those the scenario in question
	simply cannot arise.
		Some follow the match with lockref_get_not_dead() and treat
	the failure as mismatch.  That takes care of false positives, and false
	negatives on dying dentry are still correct - we simply pretend to have
	lost the race.
		The only caller that does not fit into the classes above is
	__d_lookup_rcu_op_compare().  There we sample ->d_seq and verify !d_unhashed()
	before calling ->d_compare().  That is not enough to prevent dentry
	from starting to die right under us; however, the sampled value of ->d_seq
	will be rechecked when the caller gets to step_into(), so for a false
	positive we will end up with a mismatch.  The corner case around ->d_manage()
	is due to the handle_mounts() done before step_into() gets to ->d_seq
	validation...
]]

	There is no indication that ->d_compare() is called in RCU mode;
the majority of callers are such, anyway, so we need to cope with that.
VFS guarantees that dentry won't be freed under us; the same goes for
the superblock pointed to by its ->d_sb.  Name points to memory object
that won't get freed under us and length does not exceed the size of
that object.  The contents of that object is *NOT* guaranteed to be
stable; d_move() might race with us, modifying the name.  However, in
that case we are free to return an arbitrary result - the callers will
take care of both false positives and false negatives in such case.
The name we are comparing dentry with (passed in qstr) is stable,
thankfully...

	If we need to access any other data, it's up to the filesystem
to protect it.  In practice it means that destruction of fs-private part
of superblock (and possibly unicode tables hanging off it, etc.) might
need to be RCU-delayed.

	*IF* you want the behaviour that varies depending upon the parent
directory, you get to be very careful with READ_ONCE() and watch out
for the object lifetimes.

	Basically, if the things get that tricky, ask for help.
Currently there are two such instances in the tree - proc_sys_compare()
and generic_ci_d_compare().  Both are... special.


	2. ->d_hash().

	For ->d_hash() on a dying dentry we are free to report any hash
value; the only extra requirement is that we should not return stray
hard errors.  In other words, if we return anything other than 0 or
-ECHILD, we'd better make sure that this error would've been correct
before the parent started dying.  Since ->d_hash() error-reporting is
usually done to reject unacceptable names (too long, contain unsuitable
characters for this filesystem, etc.), that's really not a problem -
hard errors depend only upon the name, not the parent.

	Again, VFS guarantees that freeing of dentry and of the superblock
pointed to by dentry->d_sb won't happen under us.  The name passed to
us (in qstr) is stable.  If you need anything beyond that, you are
in the same situation as with ->d_compare().  Might want to RCU-delay
freeing private part of superblock (if that's what we need to access),
might want the same for some objects hanging off that (unicode tables,
etc.).  If you need something beyond that - ask for help.


	3. ->d_revalidate().

	For this one we do have an indication of call being unsafe -
flags & LOOKUP_RCU.  With ->d_revalidate we are always allowed to bail
out and return -ECHILD; that will have the caller drop out of RCU mode.
We definitely need to do that if revalidate would require any kind of IO,
mutex-taking, etc.; we can't block in RCU mode.

	Quite a few instances of ->d_revalidate() simply treat LOOKUP_RCU
in flags as "return -ECHILD and be done with that"; it's guaranteed to
do the right thing, but you lose the benefits of RCU pathwalks whenever
you run into such dentry.

	Same as with the previous methods, we are guaranteed that
dentry and dentry->d_sb won't be freed under us.  We are also guaranteed
that ->d_parent (which is *not* stable, so use READ_ONCE) points to a
struct dentry that won't get freed under us.  As always with ->d_parent,
it's not NULL - for a detached dentry it will point to dentry itself.
d_inode_rcu() of dentry and its parent will be either NULL or will
point to a struct inode that won't get freed under us.	Anything beyond
than that is not guaranteed.  We may find parent to be negative - it can
happen if we race with d_move() and removal of old parent.  In that case
just return -ECHILD and be done with that.

	On non-RCU side you could use dget_parent() instead - that
would give a positive dentry and its ->d_inode would remain stable.
dget_parent() has to be paired with dput(), though, so it's not usable
in RCU mode.

	If you need fs-private objects associated with dentry, its parent
inode(s) or superblock - see the general notes above on how to access
those.


	4. ->d_manage()

	Can be called in RCU mode; gets an argument telling it if it has
been called so.  Pretty much autofs-only; for everyone's sanity sake,
don't inflict more of those on the kernel.  Definitely don't do that
without asking first...


	5.  ->permission()

	Can be called in RCU mode; that is indicated by MAY_NOT_BLOCK
in mask, and it can only happen for MAY_EXEC checks on directories.
In RCU mode it is not allowed to block, and it is allowed to bail out
by returning -ECHILD.  It might be called for an inode that is getting
torn down, possibly along with its filesystem.	Errors other than -ECHILD
should only be returned if they would've been returned in non-RCU mode;
several instances in procfs currently (6.5) run afoul of that one.  That's
an instructive example, BTW - what happens is that proc_pid_permission()
uses proc_get_task() to find the relevant process.  proc_get_task()
uses PID reference stored in struct proc_inode our inode is embedded
into; inode can't have been freed yet, so fetching ->pid member in that
is safe.  However, using the value you've fetched is a different story
- proc_evict_inode() would have passed it to put_pid() and replaced
it with NULL.  Unsafe caller has no way to tell if that is happening
right under it.  Solution: stop zeroing ->pid in proc_evict_inode()
and move put_pid() from proc_pid_evict_inode() to proc_free_inode().
That's not all that is needed (there's access to procfs-private part of
superblock as well), but it does make a good example of how such stuff
can be dealt with.

	Note that idmap argument is safe on all calls - its destruction
is rcu-delayed.

	The amount of headache is seriously reduced (for now) by the fact
that a lot of instances boil down to generic_permission() (which will
do the right thing in RCU mode) when mask is MAY_EXEC | MAY_NOT_BLOCK.
If we ever extend RCU mode to other ->permission() callers, the thing will
get interesting; that's not likely to happen, though, unless access(2)
goes there [this is NOT a suggestion, folks].


	6. ->get_link()

	Again, this can be called in RCU mode.	Even if your
->d_revalidate() always returns -ECHILD in RCU mode and kicks the
pathwalk out of it, you can't assume that ->get_link() won't be reached;
binding a symlink on a regular file on another filesystem is possible
and that's all it takes for RCU pathwalk to get there.	NULL dentry
argument is an indicator of unsafe call; if you can't handle it, just
return ERR_PTR(-ECHILD).  Any allocations you need to do (and with this
method you really might need that) should be done with GFP_ATOMIC in
the unsafe case.

	Whatever you pass to set_delayed_call() is going to be called
in the same mode as ->get_link() itself; not a problem for most of the
instances.  The string you return needs to stay there until the
callback gets called or, if no callback is set, until at least the
freeing of inode.  As usual, for an unsafe call the inode might be
in process of teardown, possibly along with the hosting filesystem.
The usual considerations apply.  The same, BTW, applies to whatever
you set in ->i_link - it must stay around at least until ->free_inode().


	7. ->get_inode_acl()

	Very limited exposure for that one - unsafe call is possible
only if you explicitly set ACL_DONT_CACHE as cached ACL value.
Only two filesystems (fuse and overlayfs) even bother.  Unsafe call
is indicated by explicit flag (the third argument of the method),
bailout is done by returning ERR_PTR(-CHILD) and the usual considerations
apply for any access to data structures you might need to do.


	List of potentially relevant instances:

d_hash:
	adfs_hash()
		safe; dereferences ->s_fs_info, which is freed by
		kfree_rcu() from ->put_super().  All errors are
		directory-independent.
	affs_hash_dentry()
		UAF; checks a bit in object referenced to by ->s_fs_info
		->s_fs_info is freed (synchronously) by ->kill_sb().
		Fixed by switch to kfree_rcu() ("affs: free affs_sb_info with kfree_rcu()")
		All errors are directory-independent.
	affs_intl_hash_dentry()
		same as affs_hash_dentry() above.
	cifs_ci_hash()
		safe; uses ->s_fs_info->local_nls.  
		->s_fs_info is freed from delayed_free(), via call_rcu() from
		cifs_umount() from ->kill_sb().  ->local_nls is dropped in
		the same place.
		All errors are directory-independent.
	efivarfs_d_hash()
		safe - no access to fs objects.
		All errors are directory-independent.
	exfat_d_hash()
		UAF; uses ->s_fs_info->nls_io and ->s_fs_info->vol_utbl.
		->s_fs_info and ->vol_utbl are freed (synchronously) from
		->kill_sb().  ->nls_io is dropped (also synchronously)
		from ->put_super().  All errors are directory-independent.
		Fixed by moving freeing all that stuff into a rcu-delayed helper
		called from ->kill_sb() ("exfat: move freeing sbi, upcase table and
		dropping nls into rcu-delayed helper").
	exfat_utf8_d_hash()
		UAF; similar to exfat_d_hash(), except that ->nls_io is not used.
		Fix of exfat_d_hash() covers that one.
	generic_ci_d_hash()
		safe; probably would be more idiomatic to use d_inode_rcu() in
		there...
	gfs2_dhash()
		safe - no access to fs objects.
		No errors at all.
	hfs_hash_dentry()
		safe - no access to fs objects.
		No errors at all.
	hfsplus_hash_dentry()
		UAF (and oops); access to ->s_fs_info->nls.  Dropped and freed
		(and ->s_fs_info zeroed) in ->put_super().  Fixed by making their
		destruction rcu-delayed ("hfsplus: switch to rcu-delayed unloading
		of nls and freeing ->s_fs_info").  Might be worth moving that from
		->put_super() to ->kill_sb(), but that's a separate story...
		All errors are directory-independent.
	hpfs_hash_dentry()
		safe - no access to fs objects.
		No errors at all.
	isofs_hashi()
		safe - no access to fs objects.
		No errors at all.
	isofs_hashi_ms()
		safe - no access to fs objects.
		No errors at all.
	isofs_hash_ms()
		safe - no access to fs objects.
		No errors at all.
	jfs_ci_hash()
		safe - no access to fs objects.
		No errors at all.
	msdos_hash()
		safe - dereferences ->s_fs_info, freeing is rcu-delayed
		(in delayed_free(), from ->put_super()).
		All errors are directory-independent.
	vfat_hash()
		safe - no access to fs objects.
		No errors at all.
	vfat_hashi()
		safe - uses ->s_fs_info->nls_io, freeing and unloading
		is rcu-delayed	(in delayed_free(), from ->put_super()).
		No errors at all.
	ntfs_d_hash()
		fucked in head - blocking allocations, UAF, etc.
		Not touching that one, sorry.

d_compare:
	adfs_compare()
		safe - no fs objects accessed
	affs_compare_dentry()
		same as affs_hash_dentry(), fixed by the same patch
	affs_intl_compare_dentry()
		same as affs_hash_dentry(), fixed by the same patch
	cifs_ci_compare()
		safe; uses ->s_fs_info->local_nls.  
		->s_fs_info is freed from delayed_free(), via call_rcu() from
		cifs_umount() from ->kill_sb().  ->local_nls is dropped in
		the same place.
	efivarfs_d_compare()
		safe - no access to fs objects.
	exfat_d_cmp()
		UAF; similar to exfat_d_hash(), fixed by the same patch
	exfat_utf8_d_cmp()
		UAF; similar to exfat_utf8_d_hash(), fixed by the same patch
	generic_ci_d_compare()
		safe, even though it's a really scary one.
	hfs_compare_dentry()
		safe - no access to fs objects.
	hfsplus_compare_dentry()
		UAF; similar to hfsplus_hash_dentry(), fixed by the same patch
	hpfs_compare_dentry()
		safe - uses ->s_fs_info->sb_cp_table; freeing is rcu-delayed
		(in lazy_free_sbi(), from hpfs_put_super().
	isofs_dentry_cmpi()
		safe - no access to fs objects.
	isofs_dentry_cmpi_ms()
		safe - no access to fs objects.
	isofs_dentry_cmp_ms()
		safe - no access to fs objects.
	jfs_ci_compare()
		safe - no access to fs objects.
	msdos_cmp()
		safe - dereferences ->s_fs_info, freeing is rcu-delayed
		(in delayed_free(), from ->put_super()).
	proc_sys_compare()
		safe - access of ->sysctl of containing proc_inode,
		with barriers provided by rcu_dereference() and
		protection against the case it had already been zeroed.
		Freeing of the object it points to is rcu-delayed
		(kfree_rcu() from ->evict_inode()).
	vfat_cmp()
		safe - no access to fs objects.
	vfat_cmpi()
		safe - uses ->s_fs_info->nls_io, freeing and unloading
		is rcu-delayed (in delayed_free(), from ->put_super()).
	ntfs_d_compare()
		fucked in head - blocking allocations, UAF, etc.
		Not touching that one, sorry.

d_revalidate:
	afs_d_revalidate()
		race - we might end up doing __afs_break_callback() there,
		and it could race with afs_drop_open_mmap(), leading to
		stray queued work on object that might be about to be
		freed, with nothing to flush or cancel the sucker.
		Fixed by making sure that afs_drop_open_mmap()
		will only do the final decrement while under ->cb_lock;
		since the entire __afs_break_callback() is done under
		the same, it will either see zero mmap count and do
		nothing, or it will finish queue_work() before afs_drop_open_mmap()
		gets to its flush_work() ("afs: fix __afs_break_callback() /
		afs_drop_open_mmap() race").
	afs_dynroot_d_revalidate()
		safe - no access to fs objects, no errors.
		Probably no point keeping it...
	cifs_d_revalidate()
		safe - bails out in unsafe case
	coda_dentry_revalidate()
		safe - bails out in unsafe case
	ecryptfs_d_revalidate()
		safe - bails out in unsafe case
	exfat_d_revalidate()
		safe - bails out in unsafe case
	fscrypt_d_revalidate()
		safe - accesses dentry, then either accesses no fs objects,
		or bails out in unsafe case
	fuse_dentry_revalidate()
		safe - fetches ->d_inode (safely), then accesses fields
		of dentry and of the structure inode is embedded into.
		In 32bit case we also dereference ->d_fsdata, but there
		its freeing is done by kfree_rcu() from ->d_release(),
		which means that it won't be freed until we drop
		rcu_read_lock().  Probably needs cleaning its control
		flow, but that's a separate story...
	gfs2_drevalidate()
		buggered.  Used to bail out in unsafe case, now it will
		go and do blocking IO.  Revert; proposed fixes still
		don't deal with everything - they treat "can't do it
		in RCU mode" as "invalid", not "unlazy and repeat".
		[Reverted in mainline now]
	hfs_revalidate_dentry()
		safe - bails out in unsafe case
	jfs_ci_revalidate()
		safe - accesses dentry, then no access to fs objects.
	kernfs_dop_revalidate()
		safe - bails out in unsafe case
	map_files_d_revalidate()
		safe - bails out in unsafe case
	nfs4_lookup_revalidate()
		oops - nfs_set_verifier() makes an assumption that
		having grabbed ->d_lock is enough to keep dentry's
		parent positive.  That's only true if dentry is pinned
		down...  Fixed by rechecking parent's ->d_inode in
		there ("nfs: make nfs_set_verifier() safe for use in RCU
		pathwalk").  Also a UAF - we dereference ->s_fs_info, and
		that's freed synchronously in ->kill_sb().  The same goes
		for ->s_fs_info->io_stats (ditto) and, worse yet, for
		->s_fs_info->nfs_client->rpc_ops.  nfs_client might get
		freed synchronously, if our superblock is holding the
		last reference, and this is not just a use-after-free -
		it's chasing pointers through the freed structure, so we
		might end up dereferencing an address that had never been
		mapped/is in iomem/whatnot.  What's more, after we get to
		rpc_ops, we fetch a function pointer from there and call
		it...  Fixed by RCU-delaying the actual freeing of
		objects in question (" nfs: fix UAF on pathwalk running
		into umount").
	nfs_lookup_revalidate()
		same story as with nfs4_lookup_revalidate().
	ocfs2_dentry_revalidate()
		safe - bails out in unsafe case
	orangefs_d_revalidate()
		safe - accesses dentry, then either accesses no fs objects,
		or bails out in unsafe case
	ovl_dentry_revalidate()
		Safe.  Access to dentry, verifying that it's positive,
		then walking through the ovl_entry for associated inode
		(freeing of which is RCU-delayed) and calling ->d_revalidate()
		on all components (dput() is _not_ RCU-delayed, but it isn't
		initiated until the inode refcount reaches zero, so we are
		essentially in the same conditions as with RCU calls of 
		>d_revalidate() - refcounts of dentries in question are not
		reached until after our rcu_read_lock() done by path_init().
		_WAY_ too subtle, IMO - it relies upon DCACHE_OP_REVALIDATE
		being turned off for negative dentries, so it can run into
		a negative only in RCU mode (in non-RCU we are holding a reference,
		so positive would've stayed positive).  In that case we must've
		had successful unlink or rmdir and both would've unhashed
		the sucker, so legitimization would fail.  _Ouch_.
	pid_revalidate()
		UAF; uses proc_pid(); not safe, since it's dropped in
		->evict_inode().  Fixed by dropping it in ->free_inode()
		instead ("procfs: move dropping pde and pid from ->evict_inode()
		to ->free_inode()".  It also might access LSM shite associated
		with inode; safe, AFAICS, in both LSM flavours that care
		about ->task_to_inode() hook.
	proc_misc_d_revalidate()
		safe - bails out in unsafe case.
		Incidentally, it wouldn't take much to make it work unsafe
		case - by the end of the series we would have everything
		we need for that...
	proc_net_d_revalidate()
		safe - no access to fs objects
	proc_sys_revalidate()
		safe - bails out in unsafe case
	tid_fd_revalidate()
		safe - bails out in unsafe case
	v9fs_lookup_revalidate()
		safe - bails out in unsafe case
	vboxsf_dentry_revalidate()
		safe - bails out in unsafe case.
	vfat_revalidate()
		safe - bails out in unsafe case.
	vfat_revalidate_ci()
		safe - bails out in unsafe case.
	xattr_hide_revalidate
		safe - no access to fs objects
	ceph_d_revalidate()
		broken - races on fs shutdown, AFAICS, and with that
		one I'm not familiar enough with the codebase, so
		I'd rather leave that to ceph maintainers...
		Lifetime of ceph_mds_client, etc.

d_manage:
	autofs_d_manage()
		safe; accesses fs-private objects associated
		with dentry and with superblock (freeing either is
		rcu-delayed).  The only non-obvious part is the call of
		autofs_oz_mode(), which looks like it might try to compare
		task_pgrp(current) with potentially dangling pointer;
		however, it only becomes dangling after ->kill_sb()
		had called autofs_catatonic_mode(), at which point we
		want autofs_oz_mode() to return true, same PGRP or not.
		IOW, potential false positives on struct pid reuse
		are not false at all.

permission: [reordered with the default instance put in front]
	generic_permission()
		safe; accesses fields of struct inode, relies upon the rcu-delayed
		freeing of cached acls.  Bails out if an unsafe call would have
		to extract acls from filesystem (with narrow exception used by
		fuse and overlayfs).
	afs_permission()
		same race as in afs_d_revalidate(), same fix
	autofs_dir_permission()
		safe - boils down to generic_permission() if no MAY_WRITE is given.
		Even with MAY_WRITE it would be safe - it accesses ->s_fs_data
		in that case, but that gets freed with kfree_rcu()...
	bad_inode_permission()
		safe - no access to filesystem objects
	btrfs_permission()
		safe - boils down to generic_permission() if no MAY_WRITE is given.
		With MAY_WRITE it would boil down to access to btrfs_root; no
		idea where does that get freed.
	ceph_permission()
		safe - bails out in unsafe case
	cifs_permission()
		safe - derefences ->s_fs_info, then possibly does generic_permission().
		see above (cifs_ci_hash() entry) for the reasons that's safe.
	coda_ioctl_permission()
		safe - no access to filesystem objects
	coda_permission()
		safe - bails out in unsafe case
	ecryptfs_permission()
		safe - it fetches ->wii_inode of containing
		ecryptfs_inode_info, then passes it to inode_permission().
		Note that this call of inode_permission() is itself
		unsafe; the inode passed to it might be getting torn down.
		However, the reference held in ecryptfs_inode_info does
		contribute to inode refcount and it is not dropped until
		ecryptfs_evict_inode().   IOW, that inode must have had
		a positive refcount at some point after the caller had
		grabbed rcu_read_lock().
	fuse_permission()
		UAF; accesses ->s_fs_info->fc, and ->s_fs_info
		points to struct fuse_mount which gets freed
		synchronously by fuse_mount_destroy(), from
		the end of ->kill_sb().  It proceeds to accessing
		->fc, but that part is safe - ->fc freeing is
		done via kfree_rcu() (called via ->fc->release())
		after the refcount of ->fc drops to zero.
		That can't happen until the call of fuse_conn_put()
		(from fuse_mount_destroy() from ->kill_sb()), so anything
		rcu-delayed from there won't get freed until the end of
		rcu pathwalk.
		
		Unfortunately, we also dereference fc->user_ns (pass it
		to current_in_userns).  That gets dropped via put_user_ns()
		(non-rcu-delayed) from the final fuse_conn_put() and it
		needs to be delayed.

		Solution: make freeing in ->release() synchronous
		and do call_rcu(delayed_release, &fc->rcu), with
		delayed_release() doing put_user_ns() and calling
		->release() ("fuse: fix UAF in rcu pathwalks").

		In case of MAY_ACCESS | MAY_CHDIR it would blow up in
		rcu mode; thankfully, we are not calling it that way.
	gfs2_permission()
		currently safe, with breakage reverted.
		carefully dereferences ->ip_gl of containing gfs2_inode,
		bails out if NULL.  Freeing of the object being accessed
		is rcu-delayed, and past that point it either bails
		out or does generic_permission().
	hostfs_permission()
		safe - bails out in unsafe case
	kernfs_iop_permission()
		safe - bails out in unsafe case
	nfs_permission()
		same UAF as for nfs4_lookup_revalidate(), same fix
		NB: might make sense to tell nfs_revalidate_inode() that
		we are in non-blocking mode and have it bail out with -ECHILD
		if it's about to talk to server; then bailout at
		out_notsup: would go away.
	nilfs_permission()
		safe - boils down to generic_permission() if no MAY_WRITE is given.
		IF we want to support MAY_WRITE|MAY_NOT_BLOCK, it becomes more
		interesting.  We dereference a pointer (->i_root) in
		nilfs object inode is embedded into.  Probably would be enough
		to rcu-delay freeing nilfs_root - just the kfree() -> kfree_rcu()
		in nilfs_put_root().
	ocfs2_permission()
		safe - bails out in unsafe case
	orangefs_permission()
		safe - bails out in unsafe case
	ovl_permission()
		safe.  Accesses the object inode is embedded into,
		dereferences a pointer to ovl_entry (->oe) in that
		object, accesses inodes and dentries found in ovl_entry.
		Freeing of ->oe is done from ->free_inode(), i.e. past
		an RCU delay; dropping dentry references in it happens
		from ->destroy_inode(), which couldn't have happened
		before we'd grabbed rcu_read_lock(), so wrt the dentries
		we are in the same situation as for anything found
		during the pathwalk - they might be going down, but they
		couldn't have started to do so until after our rcu_read_lock().
		We dereference ->layer pointer picked from oe and pick
		struct mount reference from there; safe, since the object
		it points to is taken apart in ovl_free_fs(), after
		ovl_free_fs() has called kern_unmount_array(), with its RCU delay.
		Probably worth a comment - it's fairly subtle.
		Call of generic_permission() is safe, as usual.
		We also access ->creator_cred in fs-private part of
		superblock; that gets dropped from ovl_free_fs(),
		again, after the call of kern_unmount_array().
	proc_fd_permission()
		UAF; same story as with pid_revalidate(), same fix
	proc_pid_permission()
		UAF; same as with pid_revalidate() + use of
		->s_fs_info, which is freed synchronously from ->kill_sb();
		to fix that part, make freeing ->s_fs_info
		rcu-delayed.  Dropping pidns doesn't need
		to be rcu-delayed - it's already careful enough.
		Fixed in "procfs: make freeing proc_fs_info rcu-delayed"
	proc_tid_comm_permission()
		safe, because it's a non-directory and will see no
		unsafe calls.  IF we allow MAY_NOT_BLOCK in other
		callers, same UAF as in pid_revalidate() (with the same
		fix).
	proc_sys_permission()
		race; we start with fetching ->sysctl of the proc_inode our
		inode is embedded into.  That gets cleared from ->evict_inode(),
		without an RCU delay.  Freeing of that thing *is* RCU-delayed,
		so dereference is safe.  However, running into a directory
		that is getting evicted is indistinguishable from running
		into /proc/sys itself, since there ->sysctl is NULL all
		along; that's an artefact of the way we represent the root
		sysctl inode.  Race is harmless, not quite by an accident -
		/proc/sys permissions are the least restrictive in the entire
		subtree, so we do not get bogus hard errors and if rcu pathwalk
		observes such inode getting evicted, we are going to get ->d_seq
		mismatch anyway.  Fixing the race would still be nice (and
		getting rid of that artefact would simplify things there),
		but that's out of scope for this series.
	reiserfs_permission()
		safe - accesses fields of inode, then proceeds to generic_permission()
	reject_all()
		safe - no access to filesystem objects

get_link: [reordered with two common instances put in front]
	simple_get_link()
		safe; returns ->i_link, which should be pointing to something
		that will live at least until inode freeing.  Separate code
		audit, but AFAICS everything's currently fine with that one.
	page_get_link()
		safe; that's "grab page in inode->i_mapping, bail out if
		not there or not uptodate" and inode->i_mapping points to
		&inode->i_data for all symlinks.
		NOTE: hitting busy symlink inodes in generic_shutdown_super()
		will poison ->i_mapping; that's really a data corruption
		scenario, though - the possibility of race hitting UAF
		here is the least of concerns.  Might insert synchronize_rcu()
		before the poisoning loop for shits and giggles, but...
		no point, really.
	autofs_get_link()
		safe - bails out in unsafe case
	bad_inode_get_link()
		safe - no fs objects accessed
	ceph_encrypted_get_link()
		safe - bails out in unsafe case
	cifs_get_link()
		buggered - unconditional GFP_KERNEL allocation
		looks like they'd assumed that ->d_revalidate() bailing
		out would suffice; it isn't.  Obvious fix: bail out in
		unsafe case and be done with that ("cifs_get_link(): bail
		out in unsafe case").  It might be tempting
		to try and return ->symlink_target and to hell with any
		allocations, but...  the damn thing can get freed and
		replaced at any point.	Might be possible to work around
		(add a refcount, store new value with rcu_assign_pointer,
		add rcu delay between refcount hitting zero and freeing
		the sucker), but that's way out of scope for this.
	ecryptfs_get_link()
		safe - bails out in unsafe case
	ext4_encrypted_get_link()
		safe - bails out in unsafe case
	ext4_get_link()
		leak - checks a bit in containing ext4_inode_info, then
		either bails out in unsafe case, or does buffer_head
		analogue of what page_get_link() would do for pages.
		Unfortunately, it has two problems - potentially bogus
		hard errors from ext4_getblk() and leak in case when
		we get a bh that is not uptodate.  Fixed in
		"ext4_get_link(): fix breakage in RCU mode".
	f2fs_encrypted_get_link()
		safe - bails out in unsafe case
	f2fs_get_link()
		safe - page_get_link(), followed by no access of fs objects
	fuse_get_link()
		UAF, similar to one in fuse_permission(),
		but without the ->user_ns part - we only check
		->s_fs_info->fc->cached_symlinks flag.	After that it
		either uses page_get_link() or buggers off in unsafe case.
		So fix for fuse_permission() is sufficient here.
	gfs2_get_link()
		safe - bails out in unsafe case
	hostfs_get_link()
		safe - bails out in unsafe case
	kernfs_iop_get_link()
		safe - bails out in unsafe case
	nfs_get_link()
		same UAF as for nfs4_lookup_revalidate(), same fix
	ntfs_get_link()
		safe - bails out in unsafe case
	ovl_get_link()
		safe - bails out in unsafe case
	policy_get_link()
		safe - bails out in unsafe case
	proc_get_link()
		UAF (and oops) - dereferences ->pde of containing
		proc_inode, which is dropped (and field zeroed) from
		->evict_inode().  Fixed in "procfs: move dropping pde
		and pid from ->evict_inode() to ->free_inode()"
	proc_map_files_get_link()
		safe - possible hard error, without any fs objects being
		accessed, then bails out in unsafe case
	proc_ns_get_link()
		safe - fetches a field of procfs container of inode, then
		bails out in unsafe case
	proc_pid_get_link()
		safe - bails out in unsafe case
	proc_self_get_link()
		UAF - accesses the result of proc_pid_ns(), i.e.
		->s_fs_info->pid_ns.  ->s_fs_info is freed synchronously
		from ->kill_sb(), immediately after (also synchronous)
		put_pid_ns() in there.  Fixed by having that taken to
		rcu-delayed helper in "procfs: make freeing proc_fs_info
		rcu-delayed".
	proc_thread_self_get_link()
		same UAF as in proc_self_get_link(), same fix
	rawdata_get_link_abi()
		safe - bails out in unsafe case
	rawdata_get_link_data()
		safe - bails out in unsafe case
	rawdata_get_link_sha1()
		safe - bails out in unsafe case
	shmem_get_link()
		safe; essentially the same thing page_get_link() does.
	ubifs_get_link()
		safe; checks inode field, then either returns a pointer into
		containing ubifs_inode or bails out in unsafe case
	v9fs_vfs_get_link()
		safe - bails out in unsafe case
	v9fs_vfs_get_link_dotl()
		safe - bails out in unsafe case
	vboxsf_get_link()
		safe - bails out in unsafe case
	xfs_vn_get_link()
		safe - bails out in unsafe case
		NOTE: really ancient comment about recursion(!) next to
		that one.  Comment ought to go...  No idea why bother
		recalculating the symlink every time, BTW - might as
		well cache the sucker in ->i_link on the first access
		and leave freeing to the time we free the inode...
		Oh, wait - XFS doesn't free them at all, they are playing
		games with reuse...  Alternatively, they could shove
		the symlink contents in page cache, a-la e.g. NFS does.
		Anyway, well out of scope for this...

get_inode_acl:
	fuse_get_inode_acl()
		same UAF as in fuse_get_link(), except that we are
		fetching a different flag from ->s_fs_info->fc.  Same fix...
	ovl_get_inode_acl()
		safe; pretty much parallel to ovl_permission(), except
		that it calls get_cached_acl_rcu() rather than inode_permission()

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 12/13] cifs_get_link(): bail out in unsafe case
  2024-02-04  2:17   ` [PATCH 12/13] cifs_get_link(): bail out in unsafe case Al Viro
@ 2024-02-04 15:45     ` Steve French
  2024-02-04 16:25       ` Al Viro
  0 siblings, 1 reply; 36+ messages in thread
From: Steve French @ 2024-02-04 15:45 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner, Miklos Szeredi,
	linux-cifs

I may be missing some additional change or proposed future change -
but it looks like the patch to add check for null dentry in
cifs_get_link causes
an extra call to cifs_get_link in pick_link() (in namei.c - see
below), so would be slightly slower than leaving code as is in
cifs_get_link

                if (nd->flags & LOOKUP_RCU) {
                        res = get(NULL, inode, &last->done);
                        if (res == ERR_PTR(-ECHILD) && try_to_unlazy(nd))
                                res = get(link->dentry, inode, &last->done);

cifs.ko doesn't use or check the dentry in cifs_get_link since the
symlink target is stored in the cifs inode, not  accessed via the
dentry, so wasn't clear to me
from the patch description why we would care if dentry is null in
cifs_get_link()

On Sat, Feb 3, 2024 at 8:18 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> ->d_revalidate() bails out there, anyway.  It's not enough
> to prevent getting into ->get_link() in RCU mode, but that
> could happen only in a very contrieved setup.  Not worth
> trying to do anything fancy here unless ->d_revalidate()
> stops kicking out of RCU mode at least in some cases.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/smb/client/cifsfs.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
> index e902de4e475a..630e74628dfe 100644
> --- a/fs/smb/client/cifsfs.c
> +++ b/fs/smb/client/cifsfs.c
> @@ -1172,6 +1172,9 @@ const char *cifs_get_link(struct dentry *dentry, struct inode *inode,
>  {
>         char *target_path;
>
> +       if (!dentry)
> +               return ERR_PTR(-ECHILD);
> +
>         target_path = kmalloc(PATH_MAX, GFP_KERNEL);
>         if (!target_path)
>                 return ERR_PTR(-ENOMEM);
> --
> 2.39.2
>
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 12/13] cifs_get_link(): bail out in unsafe case
  2024-02-04 15:45     ` Steve French
@ 2024-02-04 16:25       ` Al Viro
  2024-02-04 16:41         ` Al Viro
  0 siblings, 1 reply; 36+ messages in thread
From: Al Viro @ 2024-02-04 16:25 UTC (permalink / raw)
  To: Steve French
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner, Miklos Szeredi,
	linux-cifs

On Sun, Feb 04, 2024 at 09:45:42AM -0600, Steve French wrote:
> I may be missing some additional change or proposed future change -
> but it looks like the patch to add check for null dentry in
> cifs_get_link causes
> an extra call to cifs_get_link in pick_link() (in namei.c - see
> below), so would be slightly slower than leaving code as is in
> cifs_get_link
> 
>                 if (nd->flags & LOOKUP_RCU) {
>                         res = get(NULL, inode, &last->done);
>                         if (res == ERR_PTR(-ECHILD) && try_to_unlazy(nd))
>                                 res = get(link->dentry, inode, &last->done);
> 
> cifs.ko doesn't use or check the dentry in cifs_get_link since the
> symlink target is stored in the cifs inode, not  accessed via the
> dentry, so wasn't clear to me
> from the patch description why we would care if dentry is null in
> cifs_get_link()

The very first thing you do in there is a GFP_KERNEL allocation.
You can't do that under rcu_read_lock(), for obvious reasons.

So if you ever get there (and it takes a somewhat convoluted setup -
you need to bind a cifs symlink over a file on a local filesystem),
you need to
	* carefully grab references to all dentries involved,
verify that they are still valid, etc.
	* drop rcu_read_lock()
before you can get on with fetching the symlink target.

That's precisely what try_to_unlazy() in the fragment you've
quoted is doing.

NULL dentry argument passed to ->get_link() is the way it is told
that we are in RCU pathwalk mode; anyone who can't handle that
should just return ERR_PTR(-ECHILD) and be done with that.  The
caller will switch to the non-RCU mode (with references pinned,
etc.) and call again.

*IF* you can tell the symlink body without blocking (e.g. you
have some cached information from the last time you've asked
the server and have reasons to trust it to be still valid),
sure, you can return it without dropping out of RCU mode.

It would be fairly useless for CIFS, since ->d_revalidate() of
CIFS dentries would reject RCU mode anyway.  That's what normally
saves you from having ->get_link() called that way, but it's not
guaranteed - there are convoluted setups that avoid having
->d_revalidate() called first.

See the description of RCU mode filesystem exposure in the
last posting in this thread for more details.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 12/13] cifs_get_link(): bail out in unsafe case
  2024-02-04 16:25       ` Al Viro
@ 2024-02-04 16:41         ` Al Viro
  0 siblings, 0 replies; 36+ messages in thread
From: Al Viro @ 2024-02-04 16:41 UTC (permalink / raw)
  To: Steve French
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner, Miklos Szeredi,
	linux-cifs

On Sun, Feb 04, 2024 at 04:25:58PM +0000, Al Viro wrote:

> *IF* you can tell the symlink body without blocking (e.g. you
> have some cached information from the last time you've asked
> the server and have reasons to trust it to be still valid),
> sure, you can return it without dropping out of RCU mode.
> 
> It would be fairly useless for CIFS, since ->d_revalidate() of
> CIFS dentries would reject RCU mode anyway.  That's what normally
> saves you from having ->get_link() called that way, but it's not
> guaranteed - there are convoluted setups that avoid having
> ->d_revalidate() called first.
> 
> See the description of RCU mode filesystem exposure in the
> last posting in this thread for more details.

PS: if you decide to go for handling RCU pathwalk mode in CIFS,
you definitely want to read the first half of
https://lore.kernel.org/linux-fsdevel/20240204022743.GI2087318@ZenIV/
or whatever it eventually turns into.  It obviously needs quite
a bit of massage before it starts to resemble proper docs -
currently it's just the summary I'd put together while going through
the audit: which methods are exposed, how can they tell,
what is and what is not guaranteed for them, etc., with a bit of
"why does VFS bother with something that unpleasant" thrown into the mix
as an explanation.

Any assistance with turning that into a coherent text would be
very welcome - I think that description of RCU pathwalk regarding
its impact on the filesystems would be useful.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (11 preceding siblings ...)
  2024-02-04  2:17   ` [PATCH 13/13] ext4_get_link(): fix breakage in RCU mode Al Viro
@ 2024-02-05 12:24   ` Christian Brauner
  2024-02-05 12:36   ` Jeff Layton
  13 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:24 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:27AM +0000, Al Viro wrote:
> Avoids fun races in RCU pathwalk...  Same goes for freeing LSM shite
> hanging off super_block's arse.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Hah, I once had the same patch for the userns bit because I was
wondering about that,

Reviewed-by: Christian Brauner <brauner@kernel.org>

(Independent of whether or not this is pretty the s_user_ns should
probably be a separate type so it can't be confused with other
namespaces when checking permissions. Maybe I should respin my series
for that if I find the time.)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/13] rcu pathwalk: prevent bogus hard errors from may_lookup()
  2024-02-04  2:17   ` [PATCH 02/13] rcu pathwalk: prevent bogus hard errors from may_lookup() Al Viro
@ 2024-02-05 12:26     ` Christian Brauner
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:26 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:28AM +0000, Al Viro wrote:
> If lazy call of ->permission() returns a hard error, check that
> try_to_unlazy() succeeds before returning it.  That both makes
> life easier for ->permission() instances and closes the race
> in ENOTDIR handling - it is possible that positive d_can_lookup()
> seen in link_path_walk() applies to the state *after* unlink() +
> mkdir(), while nd->inode matches the state prior to that.
> 
> Normally seeing e.g. EACCES from permission check in rcu pathwalk
> means that with some timings non-rcu pathwalk would've run into
> the same; however, running into a non-executable regular file
> in the middle of a pathname would not get to permission check -
> it would fail with ENOTDIR instead.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/13] affs: free affs_sb_info with kfree_rcu()
  2024-02-04  2:17   ` [PATCH 03/13] affs: free affs_sb_info with kfree_rcu() Al Viro
@ 2024-02-05 12:26     ` Christian Brauner
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:26 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:29AM +0000, Al Viro wrote:
> one of the flags in it is used by ->d_hash()/->d_compare()
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/13] exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
  2024-02-04  2:17   ` [PATCH 04/13] exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper Al Viro
@ 2024-02-05 12:27     ` Christian Brauner
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:27 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:30AM +0000, Al Viro wrote:
> That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
> a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
> that is in process of getting shut down.
> 
> Besides, having nls and upcase table cleanup moved from ->put_super() towards
> the place where sbi is freed makes for simpler failure exits.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Acked-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/13] hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
  2024-02-04  2:17   ` [PATCH 05/13] hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info Al Viro
@ 2024-02-05 12:27     ` Christian Brauner
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:27 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:31AM +0000, Al Viro wrote:
> ->d_hash() and ->d_compare() use those, so we need to delay freeing
> them.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/13] afs: fix __afs_break_callback() / afs_drop_open_mmap() race
  2024-02-04  2:17   ` [PATCH 06/13] afs: fix __afs_break_callback() / afs_drop_open_mmap() race Al Viro
@ 2024-02-05 12:28     ` Christian Brauner
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:28 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:32AM +0000, Al Viro wrote:
> In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero
> do queue_work(&vnode->cb_work).  In afs_drop_open_mmap() we decrement
> ->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.
> 
> The trouble is, there's nothing to prevent __afs_break_callback() from
> seeing ->cb_nr_mmap before the decrement and do queue_work() after both
> the decrement and flush_work().  If that happens, we might be in trouble -
> vnode might get freed before the queued work runs.
> 
> __afs_break_callback() is always done under ->cb_lock, so let's make
> sure that ->cb_nr_mmap can change from non-zero to zero while holding
> ->cb_lock (the spinlock component of it - it's a seqlock and we don't
> need to mess with the counter).
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Acked-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 07/13] nfs: make nfs_set_verifier() safe for use in RCU pathwalk
  2024-02-04  2:17   ` [PATCH 07/13] nfs: make nfs_set_verifier() safe for use in RCU pathwalk Al Viro
@ 2024-02-05 12:29     ` Christian Brauner
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:29 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:33AM +0000, Al Viro wrote:
> nfs_set_verifier() relies upon dentry being pinned; if that's
> the case, grabbing ->d_lock stabilizes ->d_parent and guarantees
> that ->d_parent points to a positive dentry.  For something
> we'd run into in RCU mode that is *not* true - dentry might've
> been through dentry_kill() just as we grabbed ->d_lock, with
> its parent going through the same just as we get to into
> nfs_set_verifier_locked().  It might get to detaching inode
> (and zeroing ->d_inode) before nfs_set_verifier_locked() gets
> to fetching that; we get an oops as the result.
> 
> That can happen in nfs{,4} ->d_revalidate(); the call chain in
> question is nfs_set_verifier_locked() <- nfs_set_verifier() <-
> nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate().
> We have checked that the parent had been positive, but that's
> done before we get to nfs_set_verifier() and it's possible for
> memory pressure to pick our dentry as eviction candidate by that
> time.  If that happens, back-to-back attempts to kill dentry and
> its parent are quite normal.  Sure, in case of eviction we'll
> fail the ->d_seq check in the caller, but we need to survive
> until we return there...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Acked-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 08/13] nfs: fix UAF on pathwalk running into umount
  2024-02-04  2:17   ` [PATCH 08/13] nfs: fix UAF on pathwalk running into umount Al Viro
@ 2024-02-05 12:29     ` Christian Brauner
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:29 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:34AM +0000, Al Viro wrote:
> NFS ->d_revalidate(), ->permission() and ->get_link() need to access
> some parts of nfs_server when called in RCU mode:
> 	server->flags
> 	server->caps
> 	*(server->io_stats)
> and, worst of all, call
> 	server->nfs_client->rpc_ops->have_delegation
> (the last one - as NFS_PROTO(inode)->have_delegation()).  We really
> don't want to RCU-delay the entire nfs_free_server() (it would have
> to be done with schedule_work() from RCU callback, since it can't
> be made to run from interrupt context), but actual freeing of
> nfs_server and ->io_stats can be done via call_rcu() just fine.
> nfs_client part is handled simply by making nfs_free_client() use
> kfree_rcu().
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Acked-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/13] procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
  2024-02-04  2:17   ` [PATCH 09/13] procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() Al Viro
@ 2024-02-05 12:30     ` Christian Brauner
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:30 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:35AM +0000, Al Viro wrote:
> that keeps both around until struct inode is freed, making access
> to them safe from rcu-pathwalk
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Acked-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/13] procfs: make freeing proc_fs_info rcu-delayed
  2024-02-04  2:17   ` [PATCH 10/13] procfs: make freeing proc_fs_info rcu-delayed Al Viro
@ 2024-02-05 12:31     ` Christian Brauner
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:31 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:36AM +0000, Al Viro wrote:
> makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
> is still synchronous, but that's not a problem - it does
> rcu-delay everything that needs to be)
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/13] fuse: fix UAF in rcu pathwalks
  2024-02-04  2:17   ` [PATCH 11/13] fuse: fix UAF in rcu pathwalks Al Viro
@ 2024-02-05 12:31     ` Christian Brauner
  2024-02-05 13:51       ` Miklos Szeredi
  0 siblings, 1 reply; 36+ messages in thread
From: Christian Brauner @ 2024-02-05 12:31 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, Feb 04, 2024 at 02:17:37AM +0000, Al Viro wrote:
> ->permission(), ->get_link() and ->inode_get_acl() might dereference
> ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
> as well) when called from rcu pathwalk.
> 
> Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
> and dropping ->user_ns rcu-delayed too.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
                     ` (12 preceding siblings ...)
  2024-02-05 12:24   ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Christian Brauner
@ 2024-02-05 12:36   ` Jeff Layton
  13 siblings, 0 replies; 36+ messages in thread
From: Jeff Layton @ 2024-02-05 12:36 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, 2024-02-04 at 02:17 +0000, Al Viro wrote:
> Avoids fun races in RCU pathwalk...  Same goes for freeing LSM shite
> hanging off super_block's arse.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/super.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index d35e85295489..d6efeba0d0ce 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -274,9 +274,10 @@ static void destroy_super_work(struct work_struct *work)
>  {
>  	struct super_block *s = container_of(work, struct super_block,
>  							destroy_work);
> -	int i;
> -
> -	for (i = 0; i < SB_FREEZE_LEVELS; i++)
> +	security_sb_free(s);
> +	put_user_ns(s->s_user_ns);
> +	kfree(s->s_subtype);
> +	for (int i = 0; i < SB_FREEZE_LEVELS; i++)
>  		percpu_free_rwsem(&s->s_writers.rw_sem[i]);

nit: put_user_ns can call __put_user_ns which ends up queueing yet
another workqueue job. It might be nice in the future to come up with a
way to do the work that __put_user_ns does directly here instead of
queueing it.

OTOH, maybe it's not worth the effort...

>  	kfree(s);
>  }
> @@ -296,9 +297,6 @@ static void destroy_unused_super(struct super_block *s)
>  	super_unlock_excl(s);
>  	list_lru_destroy(&s->s_dentry_lru);
>  	list_lru_destroy(&s->s_inode_lru);
> -	security_sb_free(s);
> -	put_user_ns(s->s_user_ns);
> -	kfree(s->s_subtype);
>  	shrinker_free(s->s_shrink);
>  	/* no delays needed */
>  	destroy_super_work(&s->destroy_work);
> @@ -409,9 +407,6 @@ static void __put_super(struct super_block *s)
>  		WARN_ON(s->s_dentry_lru.node);
>  		WARN_ON(s->s_inode_lru.node);
>  		WARN_ON(!list_empty(&s->s_mounts));
> -		security_sb_free(s);
> -		put_user_ns(s->s_user_ns);
> -		kfree(s->s_subtype);
>  		call_rcu(&s->rcu, destroy_super_rcu);
>  	}
>  }

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHES] RCU pathwalk race fixes
  2024-02-04  2:14 [PATCHES] RCU pathwalk race fixes Al Viro
  2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
  2024-02-04  2:27 ` RCU pathwalk audit notes Al Viro
@ 2024-02-05 12:48 ` Jeff Layton
  2 siblings, 0 replies; 36+ messages in thread
From: Jeff Layton @ 2024-02-05 12:48 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, linux-ext4, linux-nfs,
	Miklos Szeredi, linux-cifs

On Sun, 2024-02-04 at 02:14 +0000, Al Viro wrote:
> 	We still have some races in filesystem methods when exposed
> to RCU pathwalk.  The series below is a result of code audit (the
> second round of it) and it should deal with most of that stuff.
> Exceptions: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
> Up to maintainers (a note for NTFS folks - when documentation says
> that a method may not block, it *does* imply that blocking allocations
> are to be avoided.  Really).
> 
> 	Branch is 6.8-rc1-based; it lives in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git fixes.pathwalk-rcu
> 
> Individual patches are in followups; ditto for code audit notes.  Beginning
> of the latter should probably be converted into docs; if anyone is willing
> to help with such conversion, please say so - I'll be glad to answer any
> questions, etc.
> 
> If somebody wants to grab bits and pieces of that series into individual
> filesystem git trees, please say so.  Same for any problems spotted in
> the patches, obviously.  If nothing shows up, that goes into #fixes and
> into mainline.
> 
> Shortlog:
>       fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
>       rcu pathwalk: prevent bogus hard errors from may_lookup()
>       affs: free affs_sb_info with kfree_rcu()
>       exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
>       hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
>       afs: fix __afs_break_callback() / afs_drop_open_mmap() race
>       nfs: make nfs_set_verifier() safe for use in RCU pathwalk
>       nfs: fix UAF on pathwalk running into umount
>       procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
>       procfs: make freeing proc_fs_info rcu-delayed
>       fuse: fix UAF in rcu pathwalks
>       cifs_get_link(): bail out in unsafe case
>       ext4_get_link(): fix breakage in RCU mode
> 
> Diffstat:
>  fs/affs/affs.h            |  1 +
>  fs/affs/super.c           |  2 +-
>  fs/afs/file.c             |  8 ++++++--
>  fs/exfat/exfat_fs.h       |  1 +
>  fs/exfat/nls.c            | 14 ++++----------
>  fs/exfat/super.c          | 20 +++++++++++---------
>  fs/ext4/symlink.c         |  8 +++++---
>  fs/fuse/cuse.c            |  3 +--
>  fs/fuse/fuse_i.h          |  1 +
>  fs/fuse/inode.c           | 15 +++++++++++----
>  fs/hfsplus/hfsplus_fs.h   |  1 +
>  fs/hfsplus/super.c        | 12 +++++++++---
>  fs/namei.c                |  6 +++++-
>  fs/nfs/client.c           | 13 ++++++++++---
>  fs/nfs/dir.c              |  4 ++--
>  fs/proc/base.c            |  2 --
>  fs/proc/inode.c           | 19 ++++++++-----------
>  fs/proc/root.c            |  2 +-
>  fs/smb/client/cifsfs.c    |  3 +++
>  fs/super.c                | 13 ++++---------
>  include/linux/nfs_fs_sb.h |  2 ++
>  include/linux/proc_fs.h   |  1 +
>  22 files changed, 88 insertions(+), 63 deletions(-)
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/13] fuse: fix UAF in rcu pathwalks
  2024-02-05 12:31     ` Christian Brauner
@ 2024-02-05 13:51       ` Miklos Szeredi
  2024-03-04 14:36         ` Amir Goldstein
  0 siblings, 1 reply; 36+ messages in thread
From: Miklos Szeredi @ 2024-02-05 13:51 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Al Viro, linux-fsdevel, Linus Torvalds, linux-ext4, linux-nfs,
	linux-cifs

On Mon, 5 Feb 2024 at 13:31, Christian Brauner <brauner@kernel.org> wrote:
>
> On Sun, Feb 04, 2024 at 02:17:37AM +0000, Al Viro wrote:
> > ->permission(), ->get_link() and ->inode_get_acl() might dereference
> > ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
> > as well) when called from rcu pathwalk.
> >
> > Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
> > and dropping ->user_ns rcu-delayed too.
> >
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
>
> Reviewed-by: Christian Brauner <brauner@kernel.org>

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/13] fuse: fix UAF in rcu pathwalks
  2024-02-05 13:51       ` Miklos Szeredi
@ 2024-03-04 14:36         ` Amir Goldstein
  2024-03-05 12:43           ` Miklos Szeredi
  0 siblings, 1 reply; 36+ messages in thread
From: Amir Goldstein @ 2024-03-04 14:36 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, Al Viro, linux-fsdevel, Linus Torvalds,
	linux-ext4, linux-nfs, linux-cifs

On Mon, Feb 5, 2024 at 3:52 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Mon, 5 Feb 2024 at 13:31, Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Sun, Feb 04, 2024 at 02:17:37AM +0000, Al Viro wrote:
> > > ->permission(), ->get_link() and ->inode_get_acl() might dereference
> > > ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
> > > as well) when called from rcu pathwalk.
> > >
> > > Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
> > > and dropping ->user_ns rcu-delayed too.
> > >
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > ---
> >
> > Reviewed-by: Christian Brauner <brauner@kernel.org>
>
> Acked-by: Miklos Szeredi <mszeredi@redhat.com>
>

Miklos,

FYI, this is now merged and conflicts with:

dc076c73b9f9 ("fuse: implement ioctls to manage backing files")

from fuse/for-next:

--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@@ -1373,7 -1398,9 +1405,13 @@@ EXPORT_SYMBOL_GPL(fuse_send_init)
  void fuse_free_conn(struct fuse_conn *fc)
  {
        WARN_ON(!list_empty(&fc->devices));
++<<<<<<< HEAD
 +      kfree(fc);
++=======
+       if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
+               fuse_backing_files_free(fc);
+       kfree_rcu(fc, rcu);
++>>>>>>> fuse/for-next
  }
  EXPORT_SYMBOL_GPL(fuse_free_conn);

Note that fuse_backing_files_free() calls
fuse_backing_id_free() => fuse_backing_free() => kfree_rcu()

Should we move fuse_backing_files_free() into
fuse_conn_put() above fuse_dax_conn_free()?

That will avoid the merge conflict and still be correct. no?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/13] fuse: fix UAF in rcu pathwalks
  2024-03-04 14:36         ` Amir Goldstein
@ 2024-03-05 12:43           ` Miklos Szeredi
  2024-03-06 10:18             ` Amir Goldstein
  0 siblings, 1 reply; 36+ messages in thread
From: Miklos Szeredi @ 2024-03-05 12:43 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Al Viro, linux-fsdevel, Linus Torvalds,
	linux-ext4, linux-nfs, linux-cifs

On Mon, 4 Mar 2024 at 15:36, Amir Goldstein <amir73il@gmail.com> wrote:

> Note that fuse_backing_files_free() calls
> fuse_backing_id_free() => fuse_backing_free() => kfree_rcu()
>
> Should we move fuse_backing_files_free() into
> fuse_conn_put() above fuse_dax_conn_free()?
>
> That will avoid the merge conflict and still be correct. no?

Looks like a good cleanup.

Force-pushed to fuse.git#for-next.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/13] fuse: fix UAF in rcu pathwalks
  2024-03-05 12:43           ` Miklos Szeredi
@ 2024-03-06 10:18             ` Amir Goldstein
  2024-03-06 10:21               ` Miklos Szeredi
  0 siblings, 1 reply; 36+ messages in thread
From: Amir Goldstein @ 2024-03-06 10:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, Al Viro, linux-fsdevel, Linus Torvalds,
	linux-ext4, linux-nfs, linux-cifs

On Tue, Mar 5, 2024 at 2:43 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Mon, 4 Mar 2024 at 15:36, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > Note that fuse_backing_files_free() calls
> > fuse_backing_id_free() => fuse_backing_free() => kfree_rcu()
> >
> > Should we move fuse_backing_files_free() into
> > fuse_conn_put() above fuse_dax_conn_free()?
> >
> > That will avoid the merge conflict and still be correct. no?
>
> Looks like a good cleanup.
>
> Force-pushed to fuse.git#for-next.
>

FYI, the version that you pushed will generate a minor conflict with

                }
-               fc->release(fc);
+               call_rcu(&fc->rcu, delayed_release);
        }
 }
 EXPORT_SYMBOL_GPL(fuse_conn_put);

If you move fuse_backing_files_free() to the start of the function,
I think merge conflict will be avoided:

 void fuse_conn_put(struct fuse_conn *fc)
 {
        if (refcount_dec_and_test(&fc->count)) {
                struct fuse_iqueue *fiq = &fc->iq;
                struct fuse_sync_bucket *bucket;

+               if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
+                       fuse_backing_files_free(fc);
                if (IS_ENABLED(CONFIG_FUSE_DAX))
                        fuse_dax_conn_free(fc);


Thanks,
Amir.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/13] fuse: fix UAF in rcu pathwalks
  2024-03-06 10:18             ` Amir Goldstein
@ 2024-03-06 10:21               ` Miklos Szeredi
  0 siblings, 0 replies; 36+ messages in thread
From: Miklos Szeredi @ 2024-03-06 10:21 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Al Viro, linux-fsdevel, Linus Torvalds,
	linux-ext4, linux-nfs, linux-cifs

On Wed, 6 Mar 2024 at 11:18, Amir Goldstein <amir73il@gmail.com> wrote:

> If you move fuse_backing_files_free() to the start of the function,
> I think merge conflict will be avoided:

Yeah, but I don't think it's worth messing with this just to avoid a conflict.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-03-06 10:21 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-04  2:14 [PATCHES] RCU pathwalk race fixes Al Viro
2024-02-04  2:17 ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Al Viro
2024-02-04  2:17   ` [PATCH 02/13] rcu pathwalk: prevent bogus hard errors from may_lookup() Al Viro
2024-02-05 12:26     ` Christian Brauner
2024-02-04  2:17   ` [PATCH 03/13] affs: free affs_sb_info with kfree_rcu() Al Viro
2024-02-05 12:26     ` Christian Brauner
2024-02-04  2:17   ` [PATCH 04/13] exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper Al Viro
2024-02-05 12:27     ` Christian Brauner
2024-02-04  2:17   ` [PATCH 05/13] hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info Al Viro
2024-02-05 12:27     ` Christian Brauner
2024-02-04  2:17   ` [PATCH 06/13] afs: fix __afs_break_callback() / afs_drop_open_mmap() race Al Viro
2024-02-05 12:28     ` Christian Brauner
2024-02-04  2:17   ` [PATCH 07/13] nfs: make nfs_set_verifier() safe for use in RCU pathwalk Al Viro
2024-02-05 12:29     ` Christian Brauner
2024-02-04  2:17   ` [PATCH 08/13] nfs: fix UAF on pathwalk running into umount Al Viro
2024-02-05 12:29     ` Christian Brauner
2024-02-04  2:17   ` [PATCH 09/13] procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() Al Viro
2024-02-05 12:30     ` Christian Brauner
2024-02-04  2:17   ` [PATCH 10/13] procfs: make freeing proc_fs_info rcu-delayed Al Viro
2024-02-05 12:31     ` Christian Brauner
2024-02-04  2:17   ` [PATCH 11/13] fuse: fix UAF in rcu pathwalks Al Viro
2024-02-05 12:31     ` Christian Brauner
2024-02-05 13:51       ` Miklos Szeredi
2024-03-04 14:36         ` Amir Goldstein
2024-03-05 12:43           ` Miklos Szeredi
2024-03-06 10:18             ` Amir Goldstein
2024-03-06 10:21               ` Miklos Szeredi
2024-02-04  2:17   ` [PATCH 12/13] cifs_get_link(): bail out in unsafe case Al Viro
2024-02-04 15:45     ` Steve French
2024-02-04 16:25       ` Al Viro
2024-02-04 16:41         ` Al Viro
2024-02-04  2:17   ` [PATCH 13/13] ext4_get_link(): fix breakage in RCU mode Al Viro
2024-02-05 12:24   ` [PATCH 01/13] fs/super.c: don't drop ->s_user_ns until we free struct super_block itself Christian Brauner
2024-02-05 12:36   ` Jeff Layton
2024-02-04  2:27 ` RCU pathwalk audit notes Al Viro
2024-02-05 12:48 ` [PATCHES] RCU pathwalk race fixes Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.