linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs: fix NULL dereference due to data race in prepend_path()
@ 2020-10-14 20:45 Andrii Nakryiko
  2020-10-14 21:12 ` Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Andrii Nakryiko @ 2020-10-14 20:45 UTC (permalink / raw)
  To: viro, linux-fsdevel; +Cc: torvalds, kernel-team, Andrii Nakryiko, linux-kernel

Fix data race in prepend_path() with re-reading mnt->mnt_ns twice without
holding the lock. is_mounted() does check for NULL, but is_anon_ns(mnt->mnt_ns)
might re-read the pointer again which could be NULL already, if in between
reads one of kern_unmount()/kern_unmount_array()/umount_tree() sets mnt->mnt_ns
to NULL.

This is seen in production with the following stack trace:

[22942.418012] BUG: kernel NULL pointer dereference, address: 0000000000000048
...
[22942.976884] RIP: 0010:prepend_path.isra.4+0x1ce/0x2e0
[22943.037706] Code: 89 c6 e9 0d ff ff ff 49 8b 85 c0 00 00 00 48 85 c0 0f 84 9d 00 00 00 48 3d 00 f0 ff ff 0f 87 91 00 00 00 49 8b 86 e0 00 00 00 <48> 83 78 48 00 0f 94 c0 0f b6 c0 83 c0 01 e9 3b ff ff ff 39 0d 29
[22943.264141] RSP: 0018:ffffc90020d6fd98 EFLAGS: 00010283
[22943.327058] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000007e5ee
[22943.413041] RDX: ffff889fb56ac0c0 RSI: ffffffd05dc8147e RDI: ffff88b1f845ab7b
[22943.499015] RBP: ffff889fbf8100c0 R08: ffffc90020d6fe30 R09: ffffc90020d6fe2c
[22943.584992] R10: ffffc90020d6fe2c R11: ffffea00095836c0 R12: ffffc90020d6fe30
[22943.670968] R13: ffff88b7d336bea0 R14: ffff88b7d336be80 R15: ffff88aeb78db980
[22943.756944] FS:  00007f228447e980(0000) GS:ffff889fc00c0000(0000) knlGS:0000000000000000
[22943.854448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22943.923653] CR2: 0000000000000048 CR3: 0000001ed235e001 CR4: 00000000007606e0
[22944.009630] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[22944.095604] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[22944.181581] PKRU: 55555554
[22944.214100] Call Trace:
[22944.243485]  d_path+0xe6/0x150
[22944.280202]  proc_pid_readlink+0x8f/0x100
[22944.328449]  vfs_readlink+0xf8/0x110
[22944.371456]  ? touch_atime+0x33/0xd0
[22944.414466]  do_readlinkat+0xfd/0x120
[22944.458522]  __x64_sys_readlinkat+0x1a/0x20
[22944.508868]  do_syscall_64+0x42/0x110
[22944.552928]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: f2683bd8d5bd ("[PATCH] fix d_absolute_path() interplay with fsmount()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 fs/d_path.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/d_path.c b/fs/d_path.c
index 0f1fc1743302..a69e2cd36e6e 100644
--- a/fs/d_path.c
+++ b/fs/d_path.c
@@ -102,6 +102,8 @@ static int prepend_path(const struct path *path,
 
 		if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
 			struct mount *parent = READ_ONCE(mnt->mnt_parent);
+			struct mnt_namespace *mnt_ns;
+
 			/* Escaped? */
 			if (dentry != vfsmnt->mnt_root) {
 				bptr = *buffer;
@@ -116,7 +118,9 @@ static int prepend_path(const struct path *path,
 				vfsmnt = &mnt->mnt;
 				continue;
 			}
-			if (is_mounted(vfsmnt) && !is_anon_ns(mnt->mnt_ns))
+			mnt_ns = READ_ONCE(mnt->mnt_ns);
+			/* open-coded is_mounted() to use local mnt_ns */
+			if (!IS_ERR_OR_NULL(mnt_ns) && !is_anon_ns(mnt_ns))
 				error = 1;	// absolute root
 			else
 				error = 2;	// detached or not attached yet
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix NULL dereference due to data race in prepend_path()
  2020-10-14 20:45 [PATCH] fs: fix NULL dereference due to data race in prepend_path() Andrii Nakryiko
@ 2020-10-14 21:12 ` Josef Bacik
  2020-10-14 21:49 ` Linus Torvalds
  2020-10-14 23:08 ` Al Viro
  2 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2020-10-14 21:12 UTC (permalink / raw)
  To: Andrii Nakryiko, viro, linux-fsdevel; +Cc: torvalds, kernel-team, linux-kernel

On 10/14/20 4:45 PM, Andrii Nakryiko wrote:
> Fix data race in prepend_path() with re-reading mnt->mnt_ns twice without
> holding the lock. is_mounted() does check for NULL, but is_anon_ns(mnt->mnt_ns)
> might re-read the pointer again which could be NULL already, if in between
> reads one of kern_unmount()/kern_unmount_array()/umount_tree() sets mnt->mnt_ns
> to NULL.
> 
> This is seen in production with the following stack trace:
> 
> [22942.418012] BUG: kernel NULL pointer dereference, address: 0000000000000048
> ...
> [22942.976884] RIP: 0010:prepend_path.isra.4+0x1ce/0x2e0
> [22943.037706] Code: 89 c6 e9 0d ff ff ff 49 8b 85 c0 00 00 00 48 85 c0 0f 84 9d 00 00 00 48 3d 00 f0 ff ff 0f 87 91 00 00 00 49 8b 86 e0 00 00 00 <48> 83 78 48 00 0f 94 c0 0f b6 c0 83 c0 01 e9 3b ff ff ff 39 0d 29
> [22943.264141] RSP: 0018:ffffc90020d6fd98 EFLAGS: 00010283
> [22943.327058] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000007e5ee
> [22943.413041] RDX: ffff889fb56ac0c0 RSI: ffffffd05dc8147e RDI: ffff88b1f845ab7b
> [22943.499015] RBP: ffff889fbf8100c0 R08: ffffc90020d6fe30 R09: ffffc90020d6fe2c
> [22943.584992] R10: ffffc90020d6fe2c R11: ffffea00095836c0 R12: ffffc90020d6fe30
> [22943.670968] R13: ffff88b7d336bea0 R14: ffff88b7d336be80 R15: ffff88aeb78db980
> [22943.756944] FS:  00007f228447e980(0000) GS:ffff889fc00c0000(0000) knlGS:0000000000000000
> [22943.854448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [22943.923653] CR2: 0000000000000048 CR3: 0000001ed235e001 CR4: 00000000007606e0
> [22944.009630] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [22944.095604] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [22944.181581] PKRU: 55555554
> [22944.214100] Call Trace:
> [22944.243485]  d_path+0xe6/0x150
> [22944.280202]  proc_pid_readlink+0x8f/0x100
> [22944.328449]  vfs_readlink+0xf8/0x110
> [22944.371456]  ? touch_atime+0x33/0xd0
> [22944.414466]  do_readlinkat+0xfd/0x120
> [22944.458522]  __x64_sys_readlinkat+0x1a/0x20
> [22944.508868]  do_syscall_64+0x42/0x110
> [22944.552928]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: f2683bd8d5bd ("[PATCH] fix d_absolute_path() interplay with fsmount()")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>   fs/d_path.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/d_path.c b/fs/d_path.c
> index 0f1fc1743302..a69e2cd36e6e 100644
> --- a/fs/d_path.c
> +++ b/fs/d_path.c
> @@ -102,6 +102,8 @@ static int prepend_path(const struct path *path,
>   
>   		if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
>   			struct mount *parent = READ_ONCE(mnt->mnt_parent);
> +			struct mnt_namespace *mnt_ns;
> +
>   			/* Escaped? */
>   			if (dentry != vfsmnt->mnt_root) {
>   				bptr = *buffer;
> @@ -116,7 +118,9 @@ static int prepend_path(const struct path *path,
>   				vfsmnt = &mnt->mnt;
>   				continue;
>   			}
> -			if (is_mounted(vfsmnt) && !is_anon_ns(mnt->mnt_ns))
> +			mnt_ns = READ_ONCE(mnt->mnt_ns);
> +			/* open-coded is_mounted() to use local mnt_ns */
> +			if (!IS_ERR_OR_NULL(mnt_ns) && !is_anon_ns(mnt_ns))
>   				error = 1;	// absolute root
>   			else

I had to go look at this code carefully to make sure that mnt == 
real_mount(vfsmnt), which it does.  I was also afraid that if we could have 
mnt->mnt_ns change in between checks that we were just trading a possible NULL 
deref with a UAF, but we're under RCU here so we're good there as well.  You can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix NULL dereference due to data race in prepend_path()
  2020-10-14 20:45 [PATCH] fs: fix NULL dereference due to data race in prepend_path() Andrii Nakryiko
  2020-10-14 21:12 ` Josef Bacik
@ 2020-10-14 21:49 ` Linus Torvalds
  2020-10-14 23:08   ` Al Viro
  2020-10-14 23:08 ` Al Viro
  2 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2020-10-14 21:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Al Viro, linux-fsdevel, kernel-team, Linux Kernel Mailing List

On Wed, Oct 14, 2020 at 2:40 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Fix data race in prepend_path() with re-reading mnt->mnt_ns twice without
> holding the lock. is_mounted() does check for NULL, but is_anon_ns(mnt->mnt_ns)
> might re-read the pointer again which could be NULL already, if in between
> reads one of kern_unmount()/kern_unmount_array()/umount_tree() sets mnt->mnt_ns
> to NULL.

This seems like the obviously correct fix, so I think I'll just apply
it directly.

Al? Holler if you have any issues with this..

             Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix NULL dereference due to data race in prepend_path()
  2020-10-14 20:45 [PATCH] fs: fix NULL dereference due to data race in prepend_path() Andrii Nakryiko
  2020-10-14 21:12 ` Josef Bacik
  2020-10-14 21:49 ` Linus Torvalds
@ 2020-10-14 23:08 ` Al Viro
  2020-10-15  1:07   ` Andrii Nakryiko
  2 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2020-10-14 23:08 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: linux-fsdevel, torvalds, kernel-team, linux-kernel

On Wed, Oct 14, 2020 at 01:45:28PM -0700, Andrii Nakryiko wrote:
> Fix data race in prepend_path() with re-reading mnt->mnt_ns twice without
> holding the lock. is_mounted() does check for NULL, but is_anon_ns(mnt->mnt_ns)
> might re-read the pointer again which could be NULL already, if in between
> reads one of kern_unmount()/kern_unmount_array()/umount_tree() sets mnt->mnt_ns
> to NULL.

Cute...  What config/compiler has resulted in that?  I agree with the analysis, but
I really hate the open-coded (and completely unexplained) use of IS_ERR_OR_NULL()
there.

> -			if (is_mounted(vfsmnt) && !is_anon_ns(mnt->mnt_ns))
> +			mnt_ns = READ_ONCE(mnt->mnt_ns);
> +			/* open-coded is_mounted() to use local mnt_ns */
> +			if (!IS_ERR_OR_NULL(mnt_ns) && !is_anon_ns(mnt_ns))
>  				error = 1;	// absolute root
>  			else
>  				error = 2;	// detached or not attached yet

Better turn that into an inlined helper in fs/mount.h, next to is_mounted(), IMO,
and kill that IS_ERR_OR_NULL garbage there.  What that thing does is
	if (ns == NULL || ns == MNT_NS_INTERNAL)
and it's *not* on any kind of hot path to warrant that kind of microoptimizations.

So let's make that

static inline bool is_real_ns(struct mnt_namespace *mnt_ns)
{
	return mnt_ns && mnt_ns != MNT_NS_INTERNAL;
}

turn is_mounted(m) into is_real_ns(m->mnt_ns) and replace that line in your fix
with
			if (is_real_ns(mnt_ns) && !is_anon_ns(mnt_ns))

Objections?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix NULL dereference due to data race in prepend_path()
  2020-10-14 21:49 ` Linus Torvalds
@ 2020-10-14 23:08   ` Al Viro
  2020-10-14 23:12     ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2020-10-14 23:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrii Nakryiko, linux-fsdevel, kernel-team, Linux Kernel Mailing List

On Wed, Oct 14, 2020 at 02:49:18PM -0700, Linus Torvalds wrote:
> On Wed, Oct 14, 2020 at 2:40 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Fix data race in prepend_path() with re-reading mnt->mnt_ns twice without
> > holding the lock. is_mounted() does check for NULL, but is_anon_ns(mnt->mnt_ns)
> > might re-read the pointer again which could be NULL already, if in between
> > reads one of kern_unmount()/kern_unmount_array()/umount_tree() sets mnt->mnt_ns
> > to NULL.
> 
> This seems like the obviously correct fix, so I think I'll just apply
> it directly.
> 
> Al? Holler if you have any issues with this..

See upthread.  If you've already grabbed it, I'll just push a followup cleanup.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix NULL dereference due to data race in prepend_path()
  2020-10-14 23:08   ` Al Viro
@ 2020-10-14 23:12     ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2020-10-14 23:12 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrii Nakryiko, linux-fsdevel, kernel-team, Linux Kernel Mailing List

On Wed, Oct 14, 2020 at 4:09 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>  If you've already grabbed it, I'll just push a followup cleanup.

Already grabbed (along with the ppc32 csum fix). Your suggested helper
function cleanup sounds good.

          Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs: fix NULL dereference due to data race in prepend_path()
  2020-10-14 23:08 ` Al Viro
@ 2020-10-15  1:07   ` Andrii Nakryiko
  0 siblings, 0 replies; 7+ messages in thread
From: Andrii Nakryiko @ 2020-10-15  1:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrii Nakryiko, linux-fsdevel, Linus Torvalds, Kernel Team, open list

On Wed, Oct 14, 2020 at 4:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Oct 14, 2020 at 01:45:28PM -0700, Andrii Nakryiko wrote:
> > Fix data race in prepend_path() with re-reading mnt->mnt_ns twice without
> > holding the lock. is_mounted() does check for NULL, but is_anon_ns(mnt->mnt_ns)
> > might re-read the pointer again which could be NULL already, if in between
> > reads one of kern_unmount()/kern_unmount_array()/umount_tree() sets mnt->mnt_ns
> > to NULL.
>
> Cute...  What config/compiler has resulted in that?  I agree with the analysis, but

Don't know for sure, but nothing special or exotic, a typical Facebook
production kernel config and some version of GCC (I didn't check
exactly which one).

Just given enough servers in the fleet, with time and intensive
workloads races like this, however unlikely, do surface up pretty
regularly.

> I really hate the open-coded (and completely unexplained) use of IS_ERR_OR_NULL()
> there.
>
> > -                     if (is_mounted(vfsmnt) && !is_anon_ns(mnt->mnt_ns))
> > +                     mnt_ns = READ_ONCE(mnt->mnt_ns);
> > +                     /* open-coded is_mounted() to use local mnt_ns */
> > +                     if (!IS_ERR_OR_NULL(mnt_ns) && !is_anon_ns(mnt_ns))
> >                               error = 1;      // absolute root
> >                       else
> >                               error = 2;      // detached or not attached yet
>
> Better turn that into an inlined helper in fs/mount.h, next to is_mounted(), IMO,
> and kill that IS_ERR_OR_NULL garbage there.  What that thing does is
>         if (ns == NULL || ns == MNT_NS_INTERNAL)
> and it's *not* on any kind of hot path to warrant that kind of microoptimizations.

Sounds good. I didn't know code well enough to infer NULL ||
MNT_NS_INTERNAL instead of IS_ERR_OR_NULL from is_mounted(), so just
open-coded the latter.

>
> So let's make that
>
> static inline bool is_real_ns(struct mnt_namespace *mnt_ns)
> {
>         return mnt_ns && mnt_ns != MNT_NS_INTERNAL;
> }
>
> turn is_mounted(m) into is_real_ns(m->mnt_ns) and replace that line in your fix
> with
>                         if (is_real_ns(mnt_ns) && !is_anon_ns(mnt_ns))
>
> Objections?

Nope, I'll send a follow-up patch, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-10-15  1:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14 20:45 [PATCH] fs: fix NULL dereference due to data race in prepend_path() Andrii Nakryiko
2020-10-14 21:12 ` Josef Bacik
2020-10-14 21:49 ` Linus Torvalds
2020-10-14 23:08   ` Al Viro
2020-10-14 23:12     ` Linus Torvalds
2020-10-14 23:08 ` Al Viro
2020-10-15  1:07   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).