[PATCH] cgroup: avoid attaching a cgroup root to two different superblocks

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks
@ 2017-04-07  8:51 Zefan Li
  2017-04-11  0:01 ` Tejun Heo
  2017-04-14 23:27 ` Andrei Vagin
  0 siblings, 2 replies; 8+ messages in thread
From: Zefan Li @ 2017-04-07  8:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: dvyukov, LKML, Cgroups

Run this:

    touch file0
    for ((; ;))
    {
        mount -t cpuset xxx file0
    }

And this concurrently:

    touch file1
    for ((; ;))
    {
        mount -t cpuset xxx file1
    }

We'll trigger a warning like this:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0
 percpu_ref_kill_and_confirm called more than once on css_release!
 CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 Call Trace:
  dump_stack+0x63/0x84
  __warn+0xd1/0xf0
  warn_slowpath_fmt+0x5f/0x80
  percpu_ref_kill_and_confirm+0x92/0xb0
  cgroup_kill_sb+0x95/0xb0
  deactivate_locked_super+0x43/0x70
  deactivate_super+0x46/0x60
 ...
 ---[ end trace a79f61c2a2633700 ]---

Here's a race:

  Thread A				Thread B

  cgroup1_mount()
    # alloc a new cgroup root
    cgroup_setup_root()
					cgroup1_mount()
					  # no sb yet, returns NULL
					  kernfs_pin_sb()

					  # but succeeds in getting the refcnt,
					  # so re-use cgroup root
					  percpu_ref_tryget_live()
    # alloc sb with cgroup root
    cgroup_do_mount()

  cgroup_kill_sb()
					  # alloc another sb with same root
					  cgroup_do_mount()

					cgroup_kill_sb()

We end up using the same cgroup root for two different superblocks,
so percpu_ref_kill() will be called twice on the same root when the
two superblocks are destroyed.

We should fix to make sure the superblock pinning is really successful.

Cc: stable@vger.kernel.org # 3.16+
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
---
 kernel/cgroup/cgroup-v1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 1dc22f6..12e19f0 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -1146,7 +1146,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 		 * path is super cold.  Let's just sleep a bit and retry.
 		 */
 		pinned_sb = kernfs_pin_sb(root->kf_root, NULL);
-		if (IS_ERR(pinned_sb) ||
+		if (IS_ERR_OR_NULL(pinned_sb) ||
 		    !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
 			mutex_unlock(&cgroup_mutex);
 			if (!IS_ERR_OR_NULL(pinned_sb))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks
  2017-04-07  8:51 [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks Zefan Li
@ 2017-04-11  0:01 ` Tejun Heo
  2017-04-14 23:27 ` Andrei Vagin
  1 sibling, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2017-04-11  0:01 UTC (permalink / raw)
  To: Zefan Li; +Cc: dvyukov, LKML, Cgroups

On Fri, Apr 07, 2017 at 04:51:55PM +0800, Zefan Li wrote:
> We end up using the same cgroup root for two different superblocks,
> so percpu_ref_kill() will be called twice on the same root when the
> two superblocks are destroyed.
> 
> We should fix to make sure the superblock pinning is really successful.
> 
> Cc: stable@vger.kernel.org # 3.16+
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Zefan Li <lizefan@huawei.com>

Applied to cgroup/for-4.11-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: cgroup: avoid attaching a cgroup root to two different superblocks
  2017-04-07  8:51 [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks Zefan Li
  2017-04-11  0:01 ` Tejun Heo
@ 2017-04-14 23:27 ` Andrei Vagin
  2017-04-14 23:32   ` Andrei Vagin
  2017-04-16 15:24   ` Tejun Heo
  1 sibling, 2 replies; 8+ messages in thread
From: Andrei Vagin @ 2017-04-14 23:27 UTC (permalink / raw)
  To: Li Zefan; +Cc: Tejun Heo, dvyukov, LKML, Cgroups

Hello,

One of our CRIU tests hangs with this patch.

Steps to reproduce:
curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt
gcc cgroupns.c -o cgroupns
./cgroupns
./cgroupns

[root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns 
mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = 0
unshare(CLONE_NEWCGROUP)                = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe5da0b89d0) = 529
strace: Process 529 attached
[pid   529] setns(3, CLONE_NEWCGROUP)   = 0
[pid   529] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=529, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++
[root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns 
mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
....

Thanks,
Andrei

On Fri, Apr 07, 2017 at 04:51:55PM +0800, Li Zefan wrote:
> Run this:
> 
>     touch file0
>     for ((; ;))
>     {
>         mount -t cpuset xxx file0
>     }
> 
> And this concurrently:
> 
>     touch file1
>     for ((; ;))
>     {
>         mount -t cpuset xxx file1
>     }
> 
> We'll trigger a warning like this:
> 
>  ------------[ cut here ]------------
>  WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0
>  percpu_ref_kill_and_confirm called more than once on css_release!
>  CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
>  Call Trace:
>   dump_stack+0x63/0x84
>   __warn+0xd1/0xf0
>   warn_slowpath_fmt+0x5f/0x80
>   percpu_ref_kill_and_confirm+0x92/0xb0
>   cgroup_kill_sb+0x95/0xb0
>   deactivate_locked_super+0x43/0x70
>   deactivate_super+0x46/0x60
>  ...
>  ---[ end trace a79f61c2a2633700 ]---
> 
> Here's a race:
> 
>   Thread A				Thread B
> 
>   cgroup1_mount()
>     # alloc a new cgroup root
>     cgroup_setup_root()
> 					cgroup1_mount()
> 					  # no sb yet, returns NULL
> 					  kernfs_pin_sb()
> 
> 					  # but succeeds in getting the refcnt,
> 					  # so re-use cgroup root
> 					  percpu_ref_tryget_live()
>     # alloc sb with cgroup root
>     cgroup_do_mount()
> 
>   cgroup_kill_sb()
> 					  # alloc another sb with same root
> 					  cgroup_do_mount()
> 
> 					cgroup_kill_sb()
> 
> We end up using the same cgroup root for two different superblocks,
> so percpu_ref_kill() will be called twice on the same root when the
> two superblocks are destroyed.
> 
> We should fix to make sure the superblock pinning is really successful.
> 
> Cc: stable@vger.kernel.org # 3.16+
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Zefan Li <lizefan@huawei.com>
> ---
>  kernel/cgroup/cgroup-v1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index 1dc22f6..12e19f0 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -1146,7 +1146,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
>  		 * path is super cold.  Let's just sleep a bit and retry.
>  		 */
>  		pinned_sb = kernfs_pin_sb(root->kf_root, NULL);
> -		if (IS_ERR(pinned_sb) ||
> +		if (IS_ERR_OR_NULL(pinned_sb) ||
>  		    !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
>  			mutex_unlock(&cgroup_mutex);
>  			if (!IS_ERR_OR_NULL(pinned_sb))

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: cgroup: avoid attaching a cgroup root to two different superblocks
  2017-04-14 23:27 ` Andrei Vagin
@ 2017-04-14 23:32   ` Andrei Vagin
  2017-04-17 10:41     ` Zefan Li
  2017-04-16 15:24   ` Tejun Heo
  1 sibling, 1 reply; 8+ messages in thread
From: Andrei Vagin @ 2017-04-14 23:32 UTC (permalink / raw)
  To: Li Zefan; +Cc: Tejun Heo, dvyukov, LKML, Cgroups

On Fri, Apr 14, 2017 at 04:27:37PM -0700, Andrei Vagin wrote:
> Hello,
> 
> One of our CRIU tests hangs with this patch.
> 
> Steps to reproduce:
> curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt
> gcc cgroupns.c -o cgroupns
> ./cgroupns
> ./cgroupns

I've found a trivial reproducer:
mkdir /tmp/xxx
mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx
mkdir /tmp/xxx/xxx
umount /tmp/xxx
mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx

> 
> [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns 
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = 0
> unshare(CLONE_NEWCGROUP)                = 0
> clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe5da0b89d0) = 529
> strace: Process 529 attached
> [pid   529] setns(3, CLONE_NEWCGROUP)   = 0
> [pid   529] +++ exited with 0 +++
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=529, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
> +++ exited with 0 +++
> [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns 
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted)
> ....
> 
> Thanks,
> Andrei
> 
> On Fri, Apr 07, 2017 at 04:51:55PM +0800, Li Zefan wrote:
> > Run this:
> > 
> >     touch file0
> >     for ((; ;))
> >     {
> >         mount -t cpuset xxx file0
> >     }
> > 
> > And this concurrently:
> > 
> >     touch file1
> >     for ((; ;))
> >     {
> >         mount -t cpuset xxx file1
> >     }
> > 
> > We'll trigger a warning like this:
> > 
> >  ------------[ cut here ]------------
> >  WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0
> >  percpu_ref_kill_and_confirm called more than once on css_release!
> >  CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5
> >  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> >  Call Trace:
> >   dump_stack+0x63/0x84
> >   __warn+0xd1/0xf0
> >   warn_slowpath_fmt+0x5f/0x80
> >   percpu_ref_kill_and_confirm+0x92/0xb0
> >   cgroup_kill_sb+0x95/0xb0
> >   deactivate_locked_super+0x43/0x70
> >   deactivate_super+0x46/0x60
> >  ...
> >  ---[ end trace a79f61c2a2633700 ]---
> > 
> > Here's a race:
> > 
> >   Thread A				Thread B
> > 
> >   cgroup1_mount()
> >     # alloc a new cgroup root
> >     cgroup_setup_root()
> > 					cgroup1_mount()
> > 					  # no sb yet, returns NULL
> > 					  kernfs_pin_sb()
> > 
> > 					  # but succeeds in getting the refcnt,
> > 					  # so re-use cgroup root
> > 					  percpu_ref_tryget_live()
> >     # alloc sb with cgroup root
> >     cgroup_do_mount()
> > 
> >   cgroup_kill_sb()
> > 					  # alloc another sb with same root
> > 					  cgroup_do_mount()
> > 
> > 					cgroup_kill_sb()
> > 
> > We end up using the same cgroup root for two different superblocks,
> > so percpu_ref_kill() will be called twice on the same root when the
> > two superblocks are destroyed.
> > 
> > We should fix to make sure the superblock pinning is really successful.
> > 
> > Cc: stable@vger.kernel.org # 3.16+
> > Reported-by: Dmitry Vyukov <dvyukov@google.com>
> > Signed-off-by: Zefan Li <lizefan@huawei.com>
> > ---
> >  kernel/cgroup/cgroup-v1.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> > index 1dc22f6..12e19f0 100644
> > --- a/kernel/cgroup/cgroup-v1.c
> > +++ b/kernel/cgroup/cgroup-v1.c
> > @@ -1146,7 +1146,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
> >  		 * path is super cold.  Let's just sleep a bit and retry.
> >  		 */
> >  		pinned_sb = kernfs_pin_sb(root->kf_root, NULL);
> > -		if (IS_ERR(pinned_sb) ||
> > +		if (IS_ERR_OR_NULL(pinned_sb) ||
> >  		    !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
> >  			mutex_unlock(&cgroup_mutex);
> >  			if (!IS_ERR_OR_NULL(pinned_sb))

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: cgroup: avoid attaching a cgroup root to two different superblocks
  2017-04-14 23:32   ` Andrei Vagin
@ 2017-04-17 10:41     ` Zefan Li
  2017-04-18  4:09       ` Andrei Vagin
  2017-04-18  6:39       ` Tejun Heo
  0 siblings, 2 replies; 8+ messages in thread
From: Zefan Li @ 2017-04-17 10:41 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: Tejun Heo, dvyukov, LKML, Cgroups

On 2017/4/15 7:32, Andrei Vagin wrote:
> On Fri, Apr 14, 2017 at 04:27:37PM -0700, Andrei Vagin wrote:
>> Hello,
>>
>> One of our CRIU tests hangs with this patch.
>>
>> Steps to reproduce:
>> curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt
>> gcc cgroupns.c -o cgroupns
>> ./cgroupns
>> ./cgroupns
> 
> I've found a trivial reproducer:
> mkdir /tmp/xxx
> mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx
> mkdir /tmp/xxx/xxx
> umount /tmp/xxx
> mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx
> 

Now I remember why it didn't check NULL pointer... Could you try the following fix?
It also reverts my previous patch. I would appreciate if you run the full test suit,
to make sure it won't break anything.

PS: Tejun, I found recently I can no longer receive your emails. Don't know why...

=======

[PATCH] cgruop: avoid attaching a cgroup root to two different superblocks, take 2

Commit bfb0b80db5f9 is broken. Now we try to fix the race by delaying
the initialization of cgroup root refcnt until a superblock has been
allocated.

Cc: stable@vger.kernel.org # 3.16+
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
---
 kernel/cgroup/cgroup-internal.h |  2 +-
 kernel/cgroup/cgroup-v1.c       | 18 ++++++++++++++++--
 kernel/cgroup/cgroup.c          |  8 ++++----
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index 9203bfb..e470268 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -163,7 +163,7 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen,
 
 void cgroup_free_root(struct cgroup_root *root);
 void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts);
-int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask);
+int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags);
 int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
 struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
 			       struct cgroup_root *root, unsigned long magic,
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 12e19f0..6ca9b12 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -1072,6 +1072,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 	struct cgroup_subsys *ss;
 	struct dentry *dentry;
 	int i, ret;
+	bool new_root = false;
 
 	cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
 
@@ -1146,7 +1147,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 		 * path is super cold.  Let's just sleep a bit and retry.
 		 */
 		pinned_sb = kernfs_pin_sb(root->kf_root, NULL);
-		if (IS_ERR_OR_NULL(pinned_sb) ||
+		if (IS_ERR(pinned_sb) ||
 		    !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
 			mutex_unlock(&cgroup_mutex);
 			if (!IS_ERR_OR_NULL(pinned_sb))
@@ -1181,10 +1182,11 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 		ret = -ENOMEM;
 		goto out_unlock;
 	}
+	new_root = true;
 
 	init_cgroup_root(root, &opts);
 
-	ret = cgroup_setup_root(root, opts.subsys_mask);
+	ret = cgroup_setup_root(root, opts.subsys_mask, PERCPU_REF_INIT_DEAD);
 	if (ret)
 		cgroup_free_root(root);
 
@@ -1201,6 +1203,18 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 				 CGROUP_SUPER_MAGIC, ns);
 
 	/*
+	 * There's a race window after we release cgroup_mutex and before
+	 * allocating a superblock. Make sure a concurrent process won't
+	 * be able to re-use the root during this window by delaying the
+	 * initialization of root refcnt.
+	 */
+	if (new_root) {
+		mutex_lock(&cgroup_mutex);
+		percpu_ref_reinit(&root->cgrp.self.refcnt);
+		mutex_unlock(&cgroup_mutex);
+	}
+
+	/*
 	 * If @pinned_sb, we're reusing an existing root and holding an
 	 * extra ref on its sb.  Mount is complete.  Put the extra ref.
 	 */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 4885132..0f98010 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1640,7 +1640,7 @@ void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts)
 		set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
 }
 
-int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
+int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
 {
 	LIST_HEAD(tmp_links);
 	struct cgroup *root_cgrp = &root->cgrp;
@@ -1656,8 +1656,8 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 	root_cgrp->id = ret;
 	root_cgrp->ancestor_ids[0] = ret;
 
-	ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, 0,
-			      GFP_KERNEL);
+	ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release,
+			      ref_flags, GFP_KERNEL);
 	if (ret)
 		goto out;
 
@@ -4512,7 +4512,7 @@ int __init cgroup_init(void)
 	hash_add(css_set_table, &init_css_set.hlist,
 		 css_set_hash(init_css_set.subsys));
 
-	BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0));
+	BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0, 0));
 
 	mutex_unlock(&cgroup_mutex);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: cgroup: avoid attaching a cgroup root to two different superblocks
  2017-04-17 10:41     ` Zefan Li
@ 2017-04-18  4:09       ` Andrei Vagin
  2017-04-18  6:39       ` Tejun Heo
  1 sibling, 0 replies; 8+ messages in thread
From: Andrei Vagin @ 2017-04-18  4:09 UTC (permalink / raw)
  To: Zefan Li; +Cc: Tejun Heo, dvyukov, LKML, Cgroups

On Mon, Apr 17, 2017 at 06:41:38PM +0800, Zefan Li wrote:
> On 2017/4/15 7:32, Andrei Vagin wrote:
> > On Fri, Apr 14, 2017 at 04:27:37PM -0700, Andrei Vagin wrote:
> >> Hello,
> >>
> >> One of our CRIU tests hangs with this patch.
> >>
> >> Steps to reproduce:
> >> curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt
> >> gcc cgroupns.c -o cgroupns
> >> ./cgroupns
> >> ./cgroupns
> > 
> > I've found a trivial reproducer:
> > mkdir /tmp/xxx
> > mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx
> > mkdir /tmp/xxx/xxx
> > umount /tmp/xxx
> > mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx
> > 
> 
> Now I remember why it didn't check NULL pointer... Could you try the following fix?
> It also reverts my previous patch. I would appreciate if you run the full test suit,
> to make sure it won't break anything.

It works for me. Thanks.

Tested-by: Andrei Vagin <avagin@virtuozzo.com>

> 
> PS: Tejun, I found recently I can no longer receive your emails. Don't know why...
> 
> =======
> 
> [PATCH] cgruop: avoid attaching a cgroup root to two different superblocks, take 2
> 
> Commit bfb0b80db5f9 is broken. Now we try to fix the race by delaying
> the initialization of cgroup root refcnt until a superblock has been
> allocated.
> 
> Cc: stable@vger.kernel.org # 3.16+
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Reported-by: Andrei Vagin <avagin@virtuozzo.com>
> Signed-off-by: Zefan Li <lizefan@huawei.com>
> ---
>  kernel/cgroup/cgroup-internal.h |  2 +-
>  kernel/cgroup/cgroup-v1.c       | 18 ++++++++++++++++--
>  kernel/cgroup/cgroup.c          |  8 ++++----
>  3 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
> index 9203bfb..e470268 100644
> --- a/kernel/cgroup/cgroup-internal.h
> +++ b/kernel/cgroup/cgroup-internal.h
> @@ -163,7 +163,7 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen,
>  
>  void cgroup_free_root(struct cgroup_root *root);
>  void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts);
> -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask);
> +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags);
>  int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
>  struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
>  			       struct cgroup_root *root, unsigned long magic,
> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index 12e19f0..6ca9b12 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -1072,6 +1072,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
>  	struct cgroup_subsys *ss;
>  	struct dentry *dentry;
>  	int i, ret;
> +	bool new_root = false;
>  
>  	cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
>  
> @@ -1146,7 +1147,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
>  		 * path is super cold.  Let's just sleep a bit and retry.
>  		 */
>  		pinned_sb = kernfs_pin_sb(root->kf_root, NULL);
> -		if (IS_ERR_OR_NULL(pinned_sb) ||
> +		if (IS_ERR(pinned_sb) ||
>  		    !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
>  			mutex_unlock(&cgroup_mutex);
>  			if (!IS_ERR_OR_NULL(pinned_sb))
> @@ -1181,10 +1182,11 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
>  		ret = -ENOMEM;
>  		goto out_unlock;
>  	}
> +	new_root = true;
>  
>  	init_cgroup_root(root, &opts);
>  
> -	ret = cgroup_setup_root(root, opts.subsys_mask);
> +	ret = cgroup_setup_root(root, opts.subsys_mask, PERCPU_REF_INIT_DEAD);
>  	if (ret)
>  		cgroup_free_root(root);
>  
> @@ -1201,6 +1203,18 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
>  				 CGROUP_SUPER_MAGIC, ns);
>  
>  	/*
> +	 * There's a race window after we release cgroup_mutex and before
> +	 * allocating a superblock. Make sure a concurrent process won't
> +	 * be able to re-use the root during this window by delaying the
> +	 * initialization of root refcnt.
> +	 */
> +	if (new_root) {
> +		mutex_lock(&cgroup_mutex);
> +		percpu_ref_reinit(&root->cgrp.self.refcnt);
> +		mutex_unlock(&cgroup_mutex);
> +	}
> +
> +	/*
>  	 * If @pinned_sb, we're reusing an existing root and holding an
>  	 * extra ref on its sb.  Mount is complete.  Put the extra ref.
>  	 */
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 4885132..0f98010 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -1640,7 +1640,7 @@ void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts)
>  		set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
>  }
>  
> -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
> +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
>  {
>  	LIST_HEAD(tmp_links);
>  	struct cgroup *root_cgrp = &root->cgrp;
> @@ -1656,8 +1656,8 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
>  	root_cgrp->id = ret;
>  	root_cgrp->ancestor_ids[0] = ret;
>  
> -	ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, 0,
> -			      GFP_KERNEL);
> +	ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release,
> +			      ref_flags, GFP_KERNEL);
>  	if (ret)
>  		goto out;
>  
> @@ -4512,7 +4512,7 @@ int __init cgroup_init(void)
>  	hash_add(css_set_table, &init_css_set.hlist,
>  		 css_set_hash(init_css_set.subsys));
>  
> -	BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0));
> +	BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0, 0));
>  
>  	mutex_unlock(&cgroup_mutex);
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: cgroup: avoid attaching a cgroup root to two different superblocks
  2017-04-17 10:41     ` Zefan Li
  2017-04-18  4:09       ` Andrei Vagin
@ 2017-04-18  6:39       ` Tejun Heo
  1 sibling, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2017-04-18  6:39 UTC (permalink / raw)
  To: Zefan Li; +Cc: Andrei Vagin, dvyukov, LKML, Cgroups

Hello, Li.

On Mon, Apr 17, 2017 at 06:41:38PM +0800, Zefan Li wrote:
> Now I remember why it didn't check NULL pointer... Could you try the following fix?
> It also reverts my previous patch. I would appreciate if you run the full test suit,
> to make sure it won't break anything.

Can you please refresh the patch on top of cgroup/for-next which
already has the original patch reverted?  Let's apply this one for
for-4.12 w/o -stable.  The bug isn't too likely to trigger in the wild
and the fix is kinda subtle.  If necessary, we can last ask Greg to
pick it up once we know that the patch is safe.

> PS: Tejun, I found recently I can no longer receive your emails. Don't know why...

Hah, no idea.  I didn't change anything on my side - all emails are
sent through gmail.com.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: cgroup: avoid attaching a cgroup root to two different superblocks
  2017-04-14 23:27 ` Andrei Vagin
  2017-04-14 23:32   ` Andrei Vagin
@ 2017-04-16 15:24   ` Tejun Heo
  1 sibling, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2017-04-16 15:24 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: Li Zefan, dvyukov, LKML, Cgroups

On Fri, Apr 14, 2017 at 04:27:38PM -0700, Andrei Vagin wrote:
> Hello,
> 
> One of our CRIU tests hangs with this patch.
> 
> Steps to reproduce:
> curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt
> gcc cgroupns.c -o cgroupns
> ./cgroupns
> ./cgroupns

Reverted the patch for now.  Let's try again later.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-18  6:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07  8:51 [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks Zefan Li
2017-04-11  0:01 ` Tejun Heo
2017-04-14 23:27 ` Andrei Vagin
2017-04-14 23:32   ` Andrei Vagin
2017-04-17 10:41     ` Zefan Li
2017-04-18  4:09       ` Andrei Vagin
2017-04-18  6:39       ` Tejun Heo
2017-04-16 15:24   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).