All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
@ 2021-06-16 12:51 Paul Gortmaker
  2021-06-16 15:23   ` Tejun Heo
  2021-07-12 17:34   ` Tejun Heo
  0 siblings, 2 replies; 10+ messages in thread
From: Paul Gortmaker @ 2021-06-16 12:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: cgroups, Paul Gortmaker, Al Viro, Tejun Heo, Zefan Li,
	Johannes Weiner, stable, Richard Purdie

Richard reported sporadic (roughly one in 10 or so) null dereferences and
other strange behaviour for a set of automated LTP tests.  Things like:

   BUG: kernel NULL pointer dereference, address: 0000000000000008
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x0000) - not-present page
   PGD 0 P4D 0
   Oops: 0000 [#1] PREEMPT SMP PTI
   CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
   RIP: 0010:kernfs_sop_show_path+0x1b/0x60

...or these others:

   RIP: 0010:do_mkdirat+0x6a/0xf0
   RIP: 0010:d_alloc_parallel+0x98/0x510
   RIP: 0010:do_readlinkat+0x86/0x120

There were other less common instances of some kind of a general scribble
but the common theme was mount and cgroup and a dubious dentry triggering
the NULL dereference.  I was only able to reproduce it under qemu by
replicating Richard's setup as closely as possible - I never did get it
to happen on bare metal, even while keeping everything else the same.

In commit 71d883c37e8d ("cgroup_do_mount(): massage calling conventions")
we see this as a part of the overall change:

   --------------
           struct cgroup_subsys *ss;
   -       struct dentry *dentry;

   [...]

   -       dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root,
   -                                CGROUP_SUPER_MAGIC, ns);

   [...]

   -       if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
   -               struct super_block *sb = dentry->d_sb;
   -               dput(dentry);
   +       ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns);
   +       if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
   +               struct super_block *sb = fc->root->d_sb;
   +               dput(fc->root);
                   deactivate_locked_super(sb);
                   msleep(10);
                   return restart_syscall();
           }
   --------------

In changing from the local "*dentry" variable to using fc->root, we now
export/leave that dentry pointer in the file context after doing the dput()
in the unlikely "is_dying" case.   With LTP doing a crazy amount of back to
back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely
becomes slightly likely and then bad things happen.

A fix would be to not leave the stale reference in fc->root as follows:

   --------------
                  dput(fc->root);
  +               fc->root = NULL;
                  deactivate_locked_super(sb);
   --------------

...but then we are just open-coding a duplicate of fc_drop_locked() so we
simply use that instead.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org      # v5.1+
Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Fixes: 71d883c37e8d ("cgroup_do_mount(): massage calling conventions")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

diff --git a/fs/internal.h b/fs/internal.h
index 6aeae7ef3380..728f8d70d7f1 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -61,7 +61,6 @@ extern void __init chrdev_init(void);
  */
 extern const struct fs_context_operations legacy_fs_context_ops;
 extern int parse_monolithic_mount_data(struct fs_context *, void *);
-extern void fc_drop_locked(struct fs_context *);
 extern void vfs_clean_context(struct fs_context *fc);
 extern int finish_clean_context(struct fs_context *fc);
 
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index 37e1e8f7f08d..5b44b0195a28 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -139,6 +139,7 @@ extern int vfs_parse_fs_string(struct fs_context *fc, const char *key,
 extern int generic_parse_monolithic(struct fs_context *fc, void *data);
 extern int vfs_get_tree(struct fs_context *fc);
 extern void put_fs_context(struct fs_context *fc);
+extern void fc_drop_locked(struct fs_context *fc);
 
 /*
  * sget() wrappers to be called from the ->get_tree() op.
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 1f274d7fc934..6e554743cf2b 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -1223,9 +1223,7 @@ int cgroup1_get_tree(struct fs_context *fc)
 		ret = cgroup_do_get_tree(fc);
 
 	if (!ret && percpu_ref_is_dying(&ctx->root->cgrp.self.refcnt)) {
-		struct super_block *sb = fc->root->d_sb;
-		dput(fc->root);
-		deactivate_locked_super(sb);
+		fc_drop_locked(fc);
 		ret = 1;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
@ 2021-06-16 15:23   ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2021-06-16 15:23 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel, cgroups, Al Viro, Zefan Li, Johannes Weiner,
	stable, Richard Purdie

Hello,

On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote:
> A fix would be to not leave the stale reference in fc->root as follows:
> 
>    --------------
>                   dput(fc->root);
>   +               fc->root = NULL;
>                   deactivate_locked_super(sb);
>    --------------
> 
> ...but then we are just open-coding a duplicate of fc_drop_locked() so we
> simply use that instead.

As this is unlikely to be a real-world problem both in probability and
circumstances, I'm applying this to cgroup/for-5.14 instead of
cgroup/for-5.13-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
@ 2021-06-16 15:23   ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2021-06-16 15:23 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Al Viro, Zefan Li,
	Johannes Weiner, stable-u79uwXL29TY76Z2rM5mHXA, Richard Purdie

Hello,

On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote:
> A fix would be to not leave the stale reference in fc->root as follows:
> 
>    --------------
>                   dput(fc->root);
>   +               fc->root = NULL;
>                   deactivate_locked_super(sb);
>    --------------
> 
> ...but then we are just open-coding a duplicate of fc_drop_locked() so we
> simply use that instead.

As this is unlikely to be a real-world problem both in probability and
circumstances, I'm applying this to cgroup/for-5.14 instead of
cgroup/for-5.13-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
  2021-06-16 15:23   ` Tejun Heo
  (?)
@ 2021-06-30 16:10   ` Mark Brown
  2021-06-30 22:31       ` Richard Purdie
  -1 siblings, 1 reply; 10+ messages in thread
From: Mark Brown @ 2021-06-30 16:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Paul Gortmaker, linux-kernel, cgroups, Al Viro, Zefan Li,
	Johannes Weiner, stable, Richard Purdie

[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]

On Wed, Jun 16, 2021 at 11:23:34AM -0400, Tejun Heo wrote:
> On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote:

> > A fix would be to not leave the stale reference in fc->root as follows:

> >    --------------
> >                   dput(fc->root);
> >   +               fc->root = NULL;
> >                   deactivate_locked_super(sb);
> >    --------------

> > ...but then we are just open-coding a duplicate of fc_drop_locked() so we
> > simply use that instead.

> As this is unlikely to be a real-world problem both in probability and
> circumstances, I'm applying this to cgroup/for-5.14 instead of
> cgroup/for-5.13-fixes.

FWIW at Arm we've started seeing what appears to be this issue blow up
very frequently in some of our internal LTP CI runs against -next, seems
to be mostly on lower end platforms.  We seem to have started finding it
at roughly the same time that the Yocto people did, I guess some other
change made it more likely to trigger.  Not exactly real world usage
obviously but it's creating quite a bit of noise in testing which is
disruptive so it'd be good to get it into -next as a fix.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
@ 2021-06-30 22:31       ` Richard Purdie
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Purdie @ 2021-06-30 22:31 UTC (permalink / raw)
  To: Mark Brown, Tejun Heo
  Cc: Paul Gortmaker, linux-kernel, cgroups, Al Viro, Zefan Li,
	Johannes Weiner, stable

On Wed, 2021-06-30 at 17:10 +0100, Mark Brown wrote:
> On Wed, Jun 16, 2021 at 11:23:34AM -0400, Tejun Heo wrote:
> > On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote:
> 
> > > A fix would be to not leave the stale reference in fc->root as follows:
> 
> > >    --------------
> > >                   dput(fc->root);
> > >   +               fc->root = NULL;
> > >                   deactivate_locked_super(sb);
> > >    --------------
> 
> > > ...but then we are just open-coding a duplicate of fc_drop_locked() so we
> > > simply use that instead.
> 
> > As this is unlikely to be a real-world problem both in probability and
> > circumstances, I'm applying this to cgroup/for-5.14 instead of
> > cgroup/for-5.13-fixes.
> 
> FWIW at Arm we've started seeing what appears to be this issue blow up
> very frequently in some of our internal LTP CI runs against -next, seems
> to be mostly on lower end platforms.  We seem to have started finding it
> at roughly the same time that the Yocto people did, I guess some other
> change made it more likely to trigger.  Not exactly real world usage
> obviously but it's creating quite a bit of noise in testing which is
> disruptive so it'd be good to get it into -next as a fix.

It is a horrible bug to debug as you end up with "random" failures on the 
systems which are hard to pin down. Along with the RCU stall hangs it
was all a bit of a nightmare.

Out of interest are you also seeing the proc01 test hang on a non-blocking
read of /proc/kmsg periodically?

https://bugzilla.yoctoproject.org/show_bug.cgi?id=14460

I've not figured out a way to reproduce it at will yet and it seems strace
was enough to unblock it. It seems arm specific.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
@ 2021-06-30 22:31       ` Richard Purdie
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Purdie @ 2021-06-30 22:31 UTC (permalink / raw)
  To: Mark Brown, Tejun Heo
  Cc: Paul Gortmaker, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Al Viro, Zefan Li,
	Johannes Weiner, stable-u79uwXL29TY76Z2rM5mHXA

On Wed, 2021-06-30 at 17:10 +0100, Mark Brown wrote:
> On Wed, Jun 16, 2021 at 11:23:34AM -0400, Tejun Heo wrote:
> > On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote:
> 
> > > A fix would be to not leave the stale reference in fc->root as follows:
> 
> > >    --------------
> > >                   dput(fc->root);
> > >   +               fc->root = NULL;
> > >                   deactivate_locked_super(sb);
> > >    --------------
> 
> > > ...but then we are just open-coding a duplicate of fc_drop_locked() so we
> > > simply use that instead.
> 
> > As this is unlikely to be a real-world problem both in probability and
> > circumstances, I'm applying this to cgroup/for-5.14 instead of
> > cgroup/for-5.13-fixes.
> 
> FWIW at Arm we've started seeing what appears to be this issue blow up
> very frequently in some of our internal LTP CI runs against -next, seems
> to be mostly on lower end platforms.  We seem to have started finding it
> at roughly the same time that the Yocto people did, I guess some other
> change made it more likely to trigger.  Not exactly real world usage
> obviously but it's creating quite a bit of noise in testing which is
> disruptive so it'd be good to get it into -next as a fix.

It is a horrible bug to debug as you end up with "random" failures on the 
systems which are hard to pin down. Along with the RCU stall hangs it
was all a bit of a nightmare.

Out of interest are you also seeing the proc01 test hang on a non-blocking
read of /proc/kmsg periodically?

https://bugzilla.yoctoproject.org/show_bug.cgi?id=14460

I've not figured out a way to reproduce it at will yet and it seems strace
was enough to unblock it. It seems arm specific.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
  2021-06-30 22:31       ` Richard Purdie
  (?)
@ 2021-07-01 12:11       ` Mark Brown
  2021-07-01 12:43         ` Richard Purdie
  -1 siblings, 1 reply; 10+ messages in thread
From: Mark Brown @ 2021-07-01 12:11 UTC (permalink / raw)
  To: Richard Purdie
  Cc: Tejun Heo, Paul Gortmaker, linux-kernel, cgroups, Al Viro,
	Zefan Li, Johannes Weiner, stable

[-- Attachment #1: Type: text/plain, Size: 498 bytes --]

On Wed, Jun 30, 2021 at 11:31:06PM +0100, Richard Purdie wrote:

> Out of interest are you also seeing the proc01 test hang on a non-blocking
> read of /proc/kmsg periodically?

> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14460

> I've not figured out a way to reproduce it at will yet and it seems strace

I've been talking to Ross, he mentioned it but no, rings no bells.

> was enough to unblock it. It seems arm specific.

If it's 32 bit I'm not really doing anything that looks at it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
  2021-07-01 12:11       ` Mark Brown
@ 2021-07-01 12:43         ` Richard Purdie
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Purdie @ 2021-07-01 12:43 UTC (permalink / raw)
  To: Mark Brown
  Cc: Tejun Heo, Paul Gortmaker, linux-kernel, cgroups, Al Viro,
	Zefan Li, Johannes Weiner, stable

On Thu, 2021-07-01 at 13:11 +0100, Mark Brown wrote:
> On Wed, Jun 30, 2021 at 11:31:06PM +0100, Richard Purdie wrote:
> 
> > Out of interest are you also seeing the proc01 test hang on a non-blocking
> > read of /proc/kmsg periodically?
> 
> > https://bugzilla.yoctoproject.org/show_bug.cgi?id=14460
> 
> > I've not figured out a way to reproduce it at will yet and it seems strace
> 
> I've been talking to Ross, he mentioned it but no, rings no bells.
> 
> > was enough to unblock it. It seems arm specific.
> 
> If it's 32 bit I'm not really doing anything that looks at it.

Its aarch64 in qemu running on aarch64 hardware with KVM.

If you do happen to see anything let us know!

Cheers,

Richard



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
@ 2021-07-12 17:34   ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2021-07-12 17:34 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel, cgroups, Al Viro, Zefan Li, Johannes Weiner,
	stable, Richard Purdie

On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote:
> Richard reported sporadic (roughly one in 10 or so) null dereferences and
> other strange behaviour for a set of automated LTP tests.  Things like:
> 
>    BUG: kernel NULL pointer dereference, address: 0000000000000008
>    #PF: supervisor read access in kernel mode
>    #PF: error_code(0x0000) - not-present page
>    PGD 0 P4D 0
>    Oops: 0000 [#1] PREEMPT SMP PTI
>    CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1
>    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
>    RIP: 0010:kernfs_sop_show_path+0x1b/0x60
> 
> ...or these others:
> 
>    RIP: 0010:do_mkdirat+0x6a/0xf0
>    RIP: 0010:d_alloc_parallel+0x98/0x510
>    RIP: 0010:do_readlinkat+0x86/0x120
> 
> There were other less common instances of some kind of a general scribble
> but the common theme was mount and cgroup and a dubious dentry triggering
> the NULL dereference.  I was only able to reproduce it under qemu by
> replicating Richard's setup as closely as possible - I never did get it
> to happen on bare metal, even while keeping everything else the same.
> 
> In commit 71d883c37e8d ("cgroup_do_mount(): massage calling conventions")
> we see this as a part of the overall change:
> 
>    --------------
>            struct cgroup_subsys *ss;
>    -       struct dentry *dentry;
> 
>    [...]
> 
>    -       dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root,
>    -                                CGROUP_SUPER_MAGIC, ns);
> 
>    [...]
> 
>    -       if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
>    -               struct super_block *sb = dentry->d_sb;
>    -               dput(dentry);
>    +       ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns);
>    +       if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
>    +               struct super_block *sb = fc->root->d_sb;
>    +               dput(fc->root);
>                    deactivate_locked_super(sb);
>                    msleep(10);
>                    return restart_syscall();
>            }
>    --------------
> 
> In changing from the local "*dentry" variable to using fc->root, we now
> export/leave that dentry pointer in the file context after doing the dput()
> in the unlikely "is_dying" case.   With LTP doing a crazy amount of back to
> back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely
> becomes slightly likely and then bad things happen.
> 
> A fix would be to not leave the stale reference in fc->root as follows:
> 
>    --------------
>                   dput(fc->root);
>   +               fc->root = NULL;
>                   deactivate_locked_super(sb);
>    --------------
> 
> ...but then we are just open-coding a duplicate of fc_drop_locked() so we
> simply use that instead.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Zefan Li <lizefan.x@bytedance.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: stable@vger.kernel.org      # v5.1+
> Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> Fixes: 71d883c37e8d ("cgroup_do_mount(): massage calling conventions")
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

I dropped the ball on this and this didn't get pushed. Re-applied to
for-5.14-fixes. Will send out in a few days.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP
@ 2021-07-12 17:34   ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2021-07-12 17:34 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Al Viro, Zefan Li,
	Johannes Weiner, stable-u79uwXL29TY76Z2rM5mHXA, Richard Purdie

On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote:
> Richard reported sporadic (roughly one in 10 or so) null dereferences and
> other strange behaviour for a set of automated LTP tests.  Things like:
> 
>    BUG: kernel NULL pointer dereference, address: 0000000000000008
>    #PF: supervisor read access in kernel mode
>    #PF: error_code(0x0000) - not-present page
>    PGD 0 P4D 0
>    Oops: 0000 [#1] PREEMPT SMP PTI
>    CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1
>    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
>    RIP: 0010:kernfs_sop_show_path+0x1b/0x60
> 
> ...or these others:
> 
>    RIP: 0010:do_mkdirat+0x6a/0xf0
>    RIP: 0010:d_alloc_parallel+0x98/0x510
>    RIP: 0010:do_readlinkat+0x86/0x120
> 
> There were other less common instances of some kind of a general scribble
> but the common theme was mount and cgroup and a dubious dentry triggering
> the NULL dereference.  I was only able to reproduce it under qemu by
> replicating Richard's setup as closely as possible - I never did get it
> to happen on bare metal, even while keeping everything else the same.
> 
> In commit 71d883c37e8d ("cgroup_do_mount(): massage calling conventions")
> we see this as a part of the overall change:
> 
>    --------------
>            struct cgroup_subsys *ss;
>    -       struct dentry *dentry;
> 
>    [...]
> 
>    -       dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root,
>    -                                CGROUP_SUPER_MAGIC, ns);
> 
>    [...]
> 
>    -       if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
>    -               struct super_block *sb = dentry->d_sb;
>    -               dput(dentry);
>    +       ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns);
>    +       if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
>    +               struct super_block *sb = fc->root->d_sb;
>    +               dput(fc->root);
>                    deactivate_locked_super(sb);
>                    msleep(10);
>                    return restart_syscall();
>            }
>    --------------
> 
> In changing from the local "*dentry" variable to using fc->root, we now
> export/leave that dentry pointer in the file context after doing the dput()
> in the unlikely "is_dying" case.   With LTP doing a crazy amount of back to
> back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely
> becomes slightly likely and then bad things happen.
> 
> A fix would be to not leave the stale reference in fc->root as follows:
> 
>    --------------
>                   dput(fc->root);
>   +               fc->root = NULL;
>                   deactivate_locked_super(sb);
>    --------------
> 
> ...but then we are just open-coding a duplicate of fc_drop_locked() so we
> simply use that instead.
> 
> Cc: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org      # v5.1+
> Reported-by: Richard Purdie <richard.purdie-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
> Fixes: 71d883c37e8d ("cgroup_do_mount(): massage calling conventions")
> Signed-off-by: Paul Gortmaker <paul.gortmaker-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org>

I dropped the ball on this and this didn't get pushed. Re-applied to
for-5.14-fixes. Will send out in a few days.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-07-12 17:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 12:51 [PATCH] cgroup1: fix leaked context root causing sporadic NULL deref in LTP Paul Gortmaker
2021-06-16 15:23 ` Tejun Heo
2021-06-16 15:23   ` Tejun Heo
2021-06-30 16:10   ` Mark Brown
2021-06-30 22:31     ` Richard Purdie
2021-06-30 22:31       ` Richard Purdie
2021-07-01 12:11       ` Mark Brown
2021-07-01 12:43         ` Richard Purdie
2021-07-12 17:34 ` Tejun Heo
2021-07-12 17:34   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.