* [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks @ 2017-04-07 8:51 Zefan Li 2017-04-11 0:01 ` Tejun Heo 2017-04-14 23:27 ` Andrei Vagin 0 siblings, 2 replies; 8+ messages in thread From: Zefan Li @ 2017-04-07 8:51 UTC (permalink / raw) To: Tejun Heo; +Cc: dvyukov, LKML, Cgroups Run this: touch file0 for ((; ;)) { mount -t cpuset xxx file0 } And this concurrently: touch file1 for ((; ;)) { mount -t cpuset xxx file1 } We'll trigger a warning like this: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0 percpu_ref_kill_and_confirm called more than once on css_release! CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 Call Trace: dump_stack+0x63/0x84 __warn+0xd1/0xf0 warn_slowpath_fmt+0x5f/0x80 percpu_ref_kill_and_confirm+0x92/0xb0 cgroup_kill_sb+0x95/0xb0 deactivate_locked_super+0x43/0x70 deactivate_super+0x46/0x60 ... ---[ end trace a79f61c2a2633700 ]--- Here's a race: Thread A Thread B cgroup1_mount() # alloc a new cgroup root cgroup_setup_root() cgroup1_mount() # no sb yet, returns NULL kernfs_pin_sb() # but succeeds in getting the refcnt, # so re-use cgroup root percpu_ref_tryget_live() # alloc sb with cgroup root cgroup_do_mount() cgroup_kill_sb() # alloc another sb with same root cgroup_do_mount() cgroup_kill_sb() We end up using the same cgroup root for two different superblocks, so percpu_ref_kill() will be called twice on the same root when the two superblocks are destroyed. We should fix to make sure the superblock pinning is really successful. Cc: stable@vger.kernel.org # 3.16+ Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Zefan Li <lizefan@huawei.com> --- kernel/cgroup/cgroup-v1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 1dc22f6..12e19f0 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -1146,7 +1146,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, * path is super cold. Let's just sleep a bit and retry. */ pinned_sb = kernfs_pin_sb(root->kf_root, NULL); - if (IS_ERR(pinned_sb) || + if (IS_ERR_OR_NULL(pinned_sb) || !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) { mutex_unlock(&cgroup_mutex); if (!IS_ERR_OR_NULL(pinned_sb)) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks 2017-04-07 8:51 [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks Zefan Li @ 2017-04-11 0:01 ` Tejun Heo 2017-04-14 23:27 ` Andrei Vagin 1 sibling, 0 replies; 8+ messages in thread From: Tejun Heo @ 2017-04-11 0:01 UTC (permalink / raw) To: Zefan Li; +Cc: dvyukov, LKML, Cgroups On Fri, Apr 07, 2017 at 04:51:55PM +0800, Zefan Li wrote: > We end up using the same cgroup root for two different superblocks, > so percpu_ref_kill() will be called twice on the same root when the > two superblocks are destroyed. > > We should fix to make sure the superblock pinning is really successful. > > Cc: stable@vger.kernel.org # 3.16+ > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Zefan Li <lizefan@huawei.com> Applied to cgroup/for-4.11-fixes. Thanks. -- tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: cgroup: avoid attaching a cgroup root to two different superblocks 2017-04-07 8:51 [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks Zefan Li 2017-04-11 0:01 ` Tejun Heo @ 2017-04-14 23:27 ` Andrei Vagin 2017-04-14 23:32 ` Andrei Vagin 2017-04-16 15:24 ` Tejun Heo 1 sibling, 2 replies; 8+ messages in thread From: Andrei Vagin @ 2017-04-14 23:27 UTC (permalink / raw) To: Li Zefan; +Cc: Tejun Heo, dvyukov, LKML, Cgroups Hello, One of our CRIU tests hangs with this patch. Steps to reproduce: curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt gcc cgroupns.c -o cgroupns ./cgroupns ./cgroupns [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = 0 unshare(CLONE_NEWCGROUP) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe5da0b89d0) = 529 strace: Process 529 attached [pid 529] setns(3, CLONE_NEWCGROUP) = 0 [pid 529] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=529, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- +++ exited with 0 +++ [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) .... Thanks, Andrei On Fri, Apr 07, 2017 at 04:51:55PM +0800, Li Zefan wrote: > Run this: > > touch file0 > for ((; ;)) > { > mount -t cpuset xxx file0 > } > > And this concurrently: > > touch file1 > for ((; ;)) > { > mount -t cpuset xxx file1 > } > > We'll trigger a warning like this: > > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0 > percpu_ref_kill_and_confirm called more than once on css_release! > CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > Call Trace: > dump_stack+0x63/0x84 > __warn+0xd1/0xf0 > warn_slowpath_fmt+0x5f/0x80 > percpu_ref_kill_and_confirm+0x92/0xb0 > cgroup_kill_sb+0x95/0xb0 > deactivate_locked_super+0x43/0x70 > deactivate_super+0x46/0x60 > ... > ---[ end trace a79f61c2a2633700 ]--- > > Here's a race: > > Thread A Thread B > > cgroup1_mount() > # alloc a new cgroup root > cgroup_setup_root() > cgroup1_mount() > # no sb yet, returns NULL > kernfs_pin_sb() > > # but succeeds in getting the refcnt, > # so re-use cgroup root > percpu_ref_tryget_live() > # alloc sb with cgroup root > cgroup_do_mount() > > cgroup_kill_sb() > # alloc another sb with same root > cgroup_do_mount() > > cgroup_kill_sb() > > We end up using the same cgroup root for two different superblocks, > so percpu_ref_kill() will be called twice on the same root when the > two superblocks are destroyed. > > We should fix to make sure the superblock pinning is really successful. > > Cc: stable@vger.kernel.org # 3.16+ > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Zefan Li <lizefan@huawei.com> > --- > kernel/cgroup/cgroup-v1.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c > index 1dc22f6..12e19f0 100644 > --- a/kernel/cgroup/cgroup-v1.c > +++ b/kernel/cgroup/cgroup-v1.c > @@ -1146,7 +1146,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, > * path is super cold. Let's just sleep a bit and retry. > */ > pinned_sb = kernfs_pin_sb(root->kf_root, NULL); > - if (IS_ERR(pinned_sb) || > + if (IS_ERR_OR_NULL(pinned_sb) || > !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) { > mutex_unlock(&cgroup_mutex); > if (!IS_ERR_OR_NULL(pinned_sb)) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: cgroup: avoid attaching a cgroup root to two different superblocks 2017-04-14 23:27 ` Andrei Vagin @ 2017-04-14 23:32 ` Andrei Vagin 2017-04-17 10:41 ` Zefan Li 2017-04-16 15:24 ` Tejun Heo 1 sibling, 1 reply; 8+ messages in thread From: Andrei Vagin @ 2017-04-14 23:32 UTC (permalink / raw) To: Li Zefan; +Cc: Tejun Heo, dvyukov, LKML, Cgroups On Fri, Apr 14, 2017 at 04:27:37PM -0700, Andrei Vagin wrote: > Hello, > > One of our CRIU tests hangs with this patch. > > Steps to reproduce: > curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt > gcc cgroupns.c -o cgroupns > ./cgroupns > ./cgroupns I've found a trivial reproducer: mkdir /tmp/xxx mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx mkdir /tmp/xxx/xxx umount /tmp/xxx mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx > > [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = 0 > unshare(CLONE_NEWCGROUP) = 0 > clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe5da0b89d0) = 529 > strace: Process 529 attached > [pid 529] setns(3, CLONE_NEWCGROUP) = 0 > [pid 529] +++ exited with 0 +++ > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=529, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- > +++ exited with 0 +++ > [root@fc24 ~]# strace -s 256 -fe clone,unshare,setns,mount ./cgroupns > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > mount("none", "/tmp/cgroupns.test/zdtmtst", "cgroup", 0, "none,name=zdtmtst") = ? ERESTARTNOINTR (To be restarted) > .... > > Thanks, > Andrei > > On Fri, Apr 07, 2017 at 04:51:55PM +0800, Li Zefan wrote: > > Run this: > > > > touch file0 > > for ((; ;)) > > { > > mount -t cpuset xxx file0 > > } > > > > And this concurrently: > > > > touch file1 > > for ((; ;)) > > { > > mount -t cpuset xxx file1 > > } > > > > We'll trigger a warning like this: > > > > ------------[ cut here ]------------ > > WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0 > > percpu_ref_kill_and_confirm called more than once on css_release! > > CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5 > > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > > Call Trace: > > dump_stack+0x63/0x84 > > __warn+0xd1/0xf0 > > warn_slowpath_fmt+0x5f/0x80 > > percpu_ref_kill_and_confirm+0x92/0xb0 > > cgroup_kill_sb+0x95/0xb0 > > deactivate_locked_super+0x43/0x70 > > deactivate_super+0x46/0x60 > > ... > > ---[ end trace a79f61c2a2633700 ]--- > > > > Here's a race: > > > > Thread A Thread B > > > > cgroup1_mount() > > # alloc a new cgroup root > > cgroup_setup_root() > > cgroup1_mount() > > # no sb yet, returns NULL > > kernfs_pin_sb() > > > > # but succeeds in getting the refcnt, > > # so re-use cgroup root > > percpu_ref_tryget_live() > > # alloc sb with cgroup root > > cgroup_do_mount() > > > > cgroup_kill_sb() > > # alloc another sb with same root > > cgroup_do_mount() > > > > cgroup_kill_sb() > > > > We end up using the same cgroup root for two different superblocks, > > so percpu_ref_kill() will be called twice on the same root when the > > two superblocks are destroyed. > > > > We should fix to make sure the superblock pinning is really successful. > > > > Cc: stable@vger.kernel.org # 3.16+ > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > > Signed-off-by: Zefan Li <lizefan@huawei.com> > > --- > > kernel/cgroup/cgroup-v1.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c > > index 1dc22f6..12e19f0 100644 > > --- a/kernel/cgroup/cgroup-v1.c > > +++ b/kernel/cgroup/cgroup-v1.c > > @@ -1146,7 +1146,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, > > * path is super cold. Let's just sleep a bit and retry. > > */ > > pinned_sb = kernfs_pin_sb(root->kf_root, NULL); > > - if (IS_ERR(pinned_sb) || > > + if (IS_ERR_OR_NULL(pinned_sb) || > > !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) { > > mutex_unlock(&cgroup_mutex); > > if (!IS_ERR_OR_NULL(pinned_sb)) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: cgroup: avoid attaching a cgroup root to two different superblocks 2017-04-14 23:32 ` Andrei Vagin @ 2017-04-17 10:41 ` Zefan Li 2017-04-18 4:09 ` Andrei Vagin 2017-04-18 6:39 ` Tejun Heo 0 siblings, 2 replies; 8+ messages in thread From: Zefan Li @ 2017-04-17 10:41 UTC (permalink / raw) To: Andrei Vagin; +Cc: Tejun Heo, dvyukov, LKML, Cgroups On 2017/4/15 7:32, Andrei Vagin wrote: > On Fri, Apr 14, 2017 at 04:27:37PM -0700, Andrei Vagin wrote: >> Hello, >> >> One of our CRIU tests hangs with this patch. >> >> Steps to reproduce: >> curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt >> gcc cgroupns.c -o cgroupns >> ./cgroupns >> ./cgroupns > > I've found a trivial reproducer: > mkdir /tmp/xxx > mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx > mkdir /tmp/xxx/xxx > umount /tmp/xxx > mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx > Now I remember why it didn't check NULL pointer... Could you try the following fix? It also reverts my previous patch. I would appreciate if you run the full test suit, to make sure it won't break anything. PS: Tejun, I found recently I can no longer receive your emails. Don't know why... ======= [PATCH] cgruop: avoid attaching a cgroup root to two different superblocks, take 2 Commit bfb0b80db5f9 is broken. Now we try to fix the race by delaying the initialization of cgroup root refcnt until a superblock has been allocated. Cc: stable@vger.kernel.org # 3.16+ Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Zefan Li <lizefan@huawei.com> --- kernel/cgroup/cgroup-internal.h | 2 +- kernel/cgroup/cgroup-v1.c | 18 ++++++++++++++++-- kernel/cgroup/cgroup.c | 8 ++++---- 3 files changed, 21 insertions(+), 7 deletions(-) diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index 9203bfb..e470268 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -163,7 +163,7 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen, void cgroup_free_root(struct cgroup_root *root); void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts); -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask); +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags); int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask); struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags, struct cgroup_root *root, unsigned long magic, diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 12e19f0..6ca9b12 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -1072,6 +1072,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, struct cgroup_subsys *ss; struct dentry *dentry; int i, ret; + bool new_root = false; cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp); @@ -1146,7 +1147,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, * path is super cold. Let's just sleep a bit and retry. */ pinned_sb = kernfs_pin_sb(root->kf_root, NULL); - if (IS_ERR_OR_NULL(pinned_sb) || + if (IS_ERR(pinned_sb) || !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) { mutex_unlock(&cgroup_mutex); if (!IS_ERR_OR_NULL(pinned_sb)) @@ -1181,10 +1182,11 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, ret = -ENOMEM; goto out_unlock; } + new_root = true; init_cgroup_root(root, &opts); - ret = cgroup_setup_root(root, opts.subsys_mask); + ret = cgroup_setup_root(root, opts.subsys_mask, PERCPU_REF_INIT_DEAD); if (ret) cgroup_free_root(root); @@ -1201,6 +1203,18 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, CGROUP_SUPER_MAGIC, ns); /* + * There's a race window after we release cgroup_mutex and before + * allocating a superblock. Make sure a concurrent process won't + * be able to re-use the root during this window by delaying the + * initialization of root refcnt. + */ + if (new_root) { + mutex_lock(&cgroup_mutex); + percpu_ref_reinit(&root->cgrp.self.refcnt); + mutex_unlock(&cgroup_mutex); + } + + /* * If @pinned_sb, we're reusing an existing root and holding an * extra ref on its sb. Mount is complete. Put the extra ref. */ diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 4885132..0f98010 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1640,7 +1640,7 @@ void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts) set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags); } -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags) { LIST_HEAD(tmp_links); struct cgroup *root_cgrp = &root->cgrp; @@ -1656,8 +1656,8 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) root_cgrp->id = ret; root_cgrp->ancestor_ids[0] = ret; - ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, 0, - GFP_KERNEL); + ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, + ref_flags, GFP_KERNEL); if (ret) goto out; @@ -4512,7 +4512,7 @@ int __init cgroup_init(void) hash_add(css_set_table, &init_css_set.hlist, css_set_hash(init_css_set.subsys)); - BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0)); + BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0, 0)); mutex_unlock(&cgroup_mutex); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: cgroup: avoid attaching a cgroup root to two different superblocks 2017-04-17 10:41 ` Zefan Li @ 2017-04-18 4:09 ` Andrei Vagin 2017-04-18 6:39 ` Tejun Heo 1 sibling, 0 replies; 8+ messages in thread From: Andrei Vagin @ 2017-04-18 4:09 UTC (permalink / raw) To: Zefan Li; +Cc: Tejun Heo, dvyukov, LKML, Cgroups On Mon, Apr 17, 2017 at 06:41:38PM +0800, Zefan Li wrote: > On 2017/4/15 7:32, Andrei Vagin wrote: > > On Fri, Apr 14, 2017 at 04:27:37PM -0700, Andrei Vagin wrote: > >> Hello, > >> > >> One of our CRIU tests hangs with this patch. > >> > >> Steps to reproduce: > >> curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt > >> gcc cgroupns.c -o cgroupns > >> ./cgroupns > >> ./cgroupns > > > > I've found a trivial reproducer: > > mkdir /tmp/xxx > > mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx > > mkdir /tmp/xxx/xxx > > umount /tmp/xxx > > mount -t cgroup -o none,name=zdtmtst xxx /tmp/xxx > > > > Now I remember why it didn't check NULL pointer... Could you try the following fix? > It also reverts my previous patch. I would appreciate if you run the full test suit, > to make sure it won't break anything. It works for me. Thanks. Tested-by: Andrei Vagin <avagin@virtuozzo.com> > > PS: Tejun, I found recently I can no longer receive your emails. Don't know why... > > ======= > > [PATCH] cgruop: avoid attaching a cgroup root to two different superblocks, take 2 > > Commit bfb0b80db5f9 is broken. Now we try to fix the race by delaying > the initialization of cgroup root refcnt until a superblock has been > allocated. > > Cc: stable@vger.kernel.org # 3.16+ > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Reported-by: Andrei Vagin <avagin@virtuozzo.com> > Signed-off-by: Zefan Li <lizefan@huawei.com> > --- > kernel/cgroup/cgroup-internal.h | 2 +- > kernel/cgroup/cgroup-v1.c | 18 ++++++++++++++++-- > kernel/cgroup/cgroup.c | 8 ++++---- > 3 files changed, 21 insertions(+), 7 deletions(-) > > diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h > index 9203bfb..e470268 100644 > --- a/kernel/cgroup/cgroup-internal.h > +++ b/kernel/cgroup/cgroup-internal.h > @@ -163,7 +163,7 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen, > > void cgroup_free_root(struct cgroup_root *root); > void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts); > -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask); > +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags); > int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask); > struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags, > struct cgroup_root *root, unsigned long magic, > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c > index 12e19f0..6ca9b12 100644 > --- a/kernel/cgroup/cgroup-v1.c > +++ b/kernel/cgroup/cgroup-v1.c > @@ -1072,6 +1072,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, > struct cgroup_subsys *ss; > struct dentry *dentry; > int i, ret; > + bool new_root = false; > > cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp); > > @@ -1146,7 +1147,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, > * path is super cold. Let's just sleep a bit and retry. > */ > pinned_sb = kernfs_pin_sb(root->kf_root, NULL); > - if (IS_ERR_OR_NULL(pinned_sb) || > + if (IS_ERR(pinned_sb) || > !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) { > mutex_unlock(&cgroup_mutex); > if (!IS_ERR_OR_NULL(pinned_sb)) > @@ -1181,10 +1182,11 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, > ret = -ENOMEM; > goto out_unlock; > } > + new_root = true; > > init_cgroup_root(root, &opts); > > - ret = cgroup_setup_root(root, opts.subsys_mask); > + ret = cgroup_setup_root(root, opts.subsys_mask, PERCPU_REF_INIT_DEAD); > if (ret) > cgroup_free_root(root); > > @@ -1201,6 +1203,18 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, > CGROUP_SUPER_MAGIC, ns); > > /* > + * There's a race window after we release cgroup_mutex and before > + * allocating a superblock. Make sure a concurrent process won't > + * be able to re-use the root during this window by delaying the > + * initialization of root refcnt. > + */ > + if (new_root) { > + mutex_lock(&cgroup_mutex); > + percpu_ref_reinit(&root->cgrp.self.refcnt); > + mutex_unlock(&cgroup_mutex); > + } > + > + /* > * If @pinned_sb, we're reusing an existing root and holding an > * extra ref on its sb. Mount is complete. Put the extra ref. > */ > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index 4885132..0f98010 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -1640,7 +1640,7 @@ void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts) > set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags); > } > > -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) > +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags) > { > LIST_HEAD(tmp_links); > struct cgroup *root_cgrp = &root->cgrp; > @@ -1656,8 +1656,8 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) > root_cgrp->id = ret; > root_cgrp->ancestor_ids[0] = ret; > > - ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, 0, > - GFP_KERNEL); > + ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, > + ref_flags, GFP_KERNEL); > if (ret) > goto out; > > @@ -4512,7 +4512,7 @@ int __init cgroup_init(void) > hash_add(css_set_table, &init_css_set.hlist, > css_set_hash(init_css_set.subsys)); > > - BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0)); > + BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0, 0)); > > mutex_unlock(&cgroup_mutex); > > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: cgroup: avoid attaching a cgroup root to two different superblocks 2017-04-17 10:41 ` Zefan Li 2017-04-18 4:09 ` Andrei Vagin @ 2017-04-18 6:39 ` Tejun Heo 1 sibling, 0 replies; 8+ messages in thread From: Tejun Heo @ 2017-04-18 6:39 UTC (permalink / raw) To: Zefan Li; +Cc: Andrei Vagin, dvyukov, LKML, Cgroups Hello, Li. On Mon, Apr 17, 2017 at 06:41:38PM +0800, Zefan Li wrote: > Now I remember why it didn't check NULL pointer... Could you try the following fix? > It also reverts my previous patch. I would appreciate if you run the full test suit, > to make sure it won't break anything. Can you please refresh the patch on top of cgroup/for-next which already has the original patch reverted? Let's apply this one for for-4.12 w/o -stable. The bug isn't too likely to trigger in the wild and the fix is kinda subtle. If necessary, we can last ask Greg to pick it up once we know that the patch is safe. > PS: Tejun, I found recently I can no longer receive your emails. Don't know why... Hah, no idea. I didn't change anything on my side - all emails are sent through gmail.com. Thanks. -- tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: cgroup: avoid attaching a cgroup root to two different superblocks 2017-04-14 23:27 ` Andrei Vagin 2017-04-14 23:32 ` Andrei Vagin @ 2017-04-16 15:24 ` Tejun Heo 1 sibling, 0 replies; 8+ messages in thread From: Tejun Heo @ 2017-04-16 15:24 UTC (permalink / raw) To: Andrei Vagin; +Cc: Li Zefan, dvyukov, LKML, Cgroups On Fri, Apr 14, 2017 at 04:27:38PM -0700, Andrei Vagin wrote: > Hello, > > One of our CRIU tests hangs with this patch. > > Steps to reproduce: > curl -o cgroupns.c https://gist.githubusercontent.com/avagin/f87c8a8bd2a0de9afcc74976327786bc/raw/5843701ef3679f50dd2427cf57a80871082eb28c/gistfile1.txt > gcc cgroupns.c -o cgroupns > ./cgroupns > ./cgroupns Reverted the patch for now. Let's try again later. Thanks. -- tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-04-18 6:39 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-07 8:51 [PATCH] cgroup: avoid attaching a cgroup root to two different superblocks Zefan Li 2017-04-11 0:01 ` Tejun Heo 2017-04-14 23:27 ` Andrei Vagin 2017-04-14 23:32 ` Andrei Vagin 2017-04-17 10:41 ` Zefan Li 2017-04-18 4:09 ` Andrei Vagin 2017-04-18 6:39 ` Tejun Heo 2017-04-16 15:24 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).