[PATCH v4 0/6] clone3 & cgroups: allow spawning processes into cgroups

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/6] clone3 & cgroups: allow spawning processes into cgroups
@ 2020-01-17 18:12 ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api, linux-kernel, Tejun Heo; +Cc: Oleg Nesterov, Christian Brauner

Hey Tejun,

This is v4 of the promised series to enable spawning processes into a
target cgroup different from the parent's cgroup.

/* v1 */
Link: https://lore.kernel.org/r/20191218173516.7875-1-christian.brauner@ubuntu.com

/* v2 */
Link: https://lore.kernel.org/r/20191223061504.28716-1-christian.brauner@ubuntu.com
Rework locking and remove unneeded helper functions. Please see
individual patch changelogs for details.
With this I've been able to run the cgroup selftests and stress tests in
loops for a long time without any regressions or deadlocks; lockdep and
kasan did not complain either.

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-1-christian.brauner@ubuntu.com
Split preliminary work into separate patches.
See changelog of individual commits.

/* v4 */
Verify that we have write access to the target cgroup. This is usually
done by the vfs but since we aren't going through the vfs with
CLONE_INTO_CGROUP we need to do it ourselves.

With this cgroup migration will be a lot easier, and accounting will be
more exact. It also allows for nice features such as creating a frozen
process by spawning it into a frozen cgroup.
The code simplifies container creation and exec logic quite a bit as
well.

I've tried to contain all core changes for this features in
kernel/cgroup/* to avoid exposing cgroup internals. This has mostly
worked.
When a new process is supposed to be spawned in a cgroup different from
the parent's then we briefly acquire the cgroup mutex right before
fork()'s point of no return and drop it once the child process has been
attached to the tasklist and to its css_set. This is done to ensure that
the cgroup isn't removed behind our back. The cgroup mutex is _only_
held in this case; the usual case, where the child is created in the
same cgroup as the parent does not acquire it since the cgroup can't be
removed.

The series already comes with proper testing. Once we've decided that
this approach is good I'll expand the test-suite even more.

The branch can be found in the following locations:
[1]: kernel.org: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone_into_cgroup
[2]: github.com: https://github.com/brauner/linux/tree/clone_into_cgroup
[3]: gitlab.com: https://gitlab.com/brauner/linux/commits/clone_into_cgroup

Thanks!
Christian

Christian Brauner (6):
  cgroup: unify attach permission checking
  cgroup: add cgroup_get_from_file() helper
  cgroup: refactor fork helpers
  cgroup: add cgroup_may_write() helper
  clone3: allow spawning processes into cgroups
  selftests/cgroup: add tests for cloning into cgroups

 include/linux/cgroup-defs.h                   |   6 +-
 include/linux/cgroup.h                        |  26 +-
 include/linux/sched/task.h                    |   4 +
 include/uapi/linux/sched.h                    |   5 +
 kernel/cgroup/cgroup.c                        | 310 ++++++++++++++----
 kernel/cgroup/pids.c                          |  16 +-
 kernel/fork.c                                 |  19 +-
 tools/testing/selftests/cgroup/Makefile       |   6 +-
 tools/testing/selftests/cgroup/cgroup_util.c  | 126 +++++++
 tools/testing/selftests/cgroup/cgroup_util.h  |   4 +
 tools/testing/selftests/cgroup/test_core.c    |  64 ++++
 .../selftests/clone3/clone3_selftests.h       |  19 +-
 12 files changed, 521 insertions(+), 84 deletions(-)

base-commit: b3a987b0264d3ddbb24293ebff10eddfc472f653
-- 
2.25.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 0/6] clone3 & cgroups: allow spawning processes into cgroups
@ 2020-01-17 18:12 ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner

Hey Tejun,

This is v4 of the promised series to enable spawning processes into a
target cgroup different from the parent's cgroup.

/* v1 */
Link: https://lore.kernel.org/r/20191218173516.7875-1-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org

/* v2 */
Link: https://lore.kernel.org/r/20191223061504.28716-1-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
Rework locking and remove unneeded helper functions. Please see
individual patch changelogs for details.
With this I've been able to run the cgroup selftests and stress tests in
loops for a long time without any regressions or deadlocks; lockdep and
kasan did not complain either.

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-1-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
Split preliminary work into separate patches.
See changelog of individual commits.

/* v4 */
Verify that we have write access to the target cgroup. This is usually
done by the vfs but since we aren't going through the vfs with
CLONE_INTO_CGROUP we need to do it ourselves.

With this cgroup migration will be a lot easier, and accounting will be
more exact. It also allows for nice features such as creating a frozen
process by spawning it into a frozen cgroup.
The code simplifies container creation and exec logic quite a bit as
well.

I've tried to contain all core changes for this features in
kernel/cgroup/* to avoid exposing cgroup internals. This has mostly
worked.
When a new process is supposed to be spawned in a cgroup different from
the parent's then we briefly acquire the cgroup mutex right before
fork()'s point of no return and drop it once the child process has been
attached to the tasklist and to its css_set. This is done to ensure that
the cgroup isn't removed behind our back. The cgroup mutex is _only_
held in this case; the usual case, where the child is created in the
same cgroup as the parent does not acquire it since the cgroup can't be
removed.

The series already comes with proper testing. Once we've decided that
this approach is good I'll expand the test-suite even more.

The branch can be found in the following locations:
[1]: kernel.org: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone_into_cgroup
[2]: github.com: https://github.com/brauner/linux/tree/clone_into_cgroup
[3]: gitlab.com: https://gitlab.com/brauner/linux/commits/clone_into_cgroup

Thanks!
Christian

Christian Brauner (6):
  cgroup: unify attach permission checking
  cgroup: add cgroup_get_from_file() helper
  cgroup: refactor fork helpers
  cgroup: add cgroup_may_write() helper
  clone3: allow spawning processes into cgroups
  selftests/cgroup: add tests for cloning into cgroups

 include/linux/cgroup-defs.h                   |   6 +-
 include/linux/cgroup.h                        |  26 +-
 include/linux/sched/task.h                    |   4 +
 include/uapi/linux/sched.h                    |   5 +
 kernel/cgroup/cgroup.c                        | 310 ++++++++++++++----
 kernel/cgroup/pids.c                          |  16 +-
 kernel/fork.c                                 |  19 +-
 tools/testing/selftests/cgroup/Makefile       |   6 +-
 tools/testing/selftests/cgroup/cgroup_util.c  | 126 +++++++
 tools/testing/selftests/cgroup/cgroup_util.h  |   4 +
 tools/testing/selftests/cgroup/test_core.c    |  64 ++++
 .../selftests/clone3/clone3_selftests.h       |  19 +-
 12 files changed, 521 insertions(+), 84 deletions(-)

base-commit: b3a987b0264d3ddbb24293ebff10eddfc472f653
-- 
2.25.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 1/6] cgroup: unify attach permission checking
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api, linux-kernel, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Li Zefan, Johannes Weiner, cgroups

The core codepaths to check whether a process can be attached to a
cgroup are the same for threads and thread-group leaders. Only a small
piece of code verifying that source and destination cgroup are in the
same domain differentiates the thread permission checking from
thread-group leader permission checking.
Since cgroup_migrate_vet_dst() only matters cgroup2 - it is a noop on
cgroup1 - we can move it out of cgroup_attach_task().
All checks can now be consolidated into a new helper
cgroup_attach_permissions() callable from both cgroup_procs_write() and
cgroup_threads_write().

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v1 */
Link: https://lore.kernel.org/r/20191218173516.7875-2-christian.brauner@ubuntu.com

/* v2 */
Link: https://lore.kernel.org/r/20191223061504.28716-2-christian.brauner@ubuntu.com
- Christian Brauner <christian.brauner@ubuntu.com>:
  - Fix return value of cgroup_attach_permissions. It used to return 0
    when it should've returned -EOPNOTSUPP.
  - Fix call to cgroup_attach_permissions() in cgroup_procs_write(). It
    accidently specified that a thread was moved causing an additional
    check for domain-group equality to be executed that is not needed.

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-2-christian.brauner@ubuntu.com
unchanged

/* v4 */
unchanged
---
 kernel/cgroup/cgroup.c | 46 +++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 735af8f15f95..ad1f9fea5c14 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2719,11 +2719,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
 {
 	DEFINE_CGROUP_MGCTX(mgctx);
 	struct task_struct *task;
-	int ret;
-
-	ret = cgroup_migrate_vet_dst(dst_cgrp);
-	if (ret)
-		return ret;
+	int ret = 0;
 
 	/* look up all src csets */
 	spin_lock_irq(&css_set_lock);
@@ -4690,6 +4686,33 @@ static int cgroup_procs_write_permission(struct cgroup *src_cgrp,
 	return 0;
 }
 
+static inline bool cgroup_same_domain(const struct cgroup *src_cgrp,
+				      const struct cgroup *dst_cgrp)
+{
+	return src_cgrp->dom_cgrp == dst_cgrp->dom_cgrp;
+}
+
+static int cgroup_attach_permissions(struct cgroup *src_cgrp,
+				     struct cgroup *dst_cgrp,
+				     struct super_block *sb, bool thread)
+{
+	int ret = 0;
+
+	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, sb);
+	if (ret)
+		return ret;
+
+	ret = cgroup_migrate_vet_dst(dst_cgrp);
+	if (ret)
+		return ret;
+
+	if (thread &&
+	    !cgroup_same_domain(src_cgrp->dom_cgrp, dst_cgrp->dom_cgrp))
+		ret = -EOPNOTSUPP;
+
+	return ret;
+}
+
 static ssize_t cgroup_procs_write(struct kernfs_open_file *of,
 				  char *buf, size_t nbytes, loff_t off)
 {
@@ -4712,8 +4735,8 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of,
 	src_cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
 	spin_unlock_irq(&css_set_lock);
 
-	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp,
-					    of->file->f_path.dentry->d_sb);
+	ret = cgroup_attach_permissions(src_cgrp, dst_cgrp,
+					of->file->f_path.dentry->d_sb, false);
 	if (ret)
 		goto out_finish;
 
@@ -4757,16 +4780,11 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
 	spin_unlock_irq(&css_set_lock);
 
 	/* thread migrations follow the cgroup.procs delegation rule */
-	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp,
-					    of->file->f_path.dentry->d_sb);
+	ret = cgroup_attach_permissions(src_cgrp, dst_cgrp,
+					of->file->f_path.dentry->d_sb, true);
 	if (ret)
 		goto out_finish;
 
-	/* and must be contained in the same domain */
-	ret = -EOPNOTSUPP;
-	if (src_cgrp->dom_cgrp != dst_cgrp->dom_cgrp)
-		goto out_finish;
-
 	ret = cgroup_attach_task(dst_cgrp, task, false);
 
 out_finish:
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 1/6] cgroup: unify attach permission checking
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Li Zefan, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA

The core codepaths to check whether a process can be attached to a
cgroup are the same for threads and thread-group leaders. Only a small
piece of code verifying that source and destination cgroup are in the
same domain differentiates the thread permission checking from
thread-group leader permission checking.
Since cgroup_migrate_vet_dst() only matters cgroup2 - it is a noop on
cgroup1 - we can move it out of cgroup_attach_task().
All checks can now be consolidated into a new helper
cgroup_attach_permissions() callable from both cgroup_procs_write() and
cgroup_threads_write().

Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
/* v1 */
Link: https://lore.kernel.org/r/20191218173516.7875-2-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org

/* v2 */
Link: https://lore.kernel.org/r/20191223061504.28716-2-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
- Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>:
  - Fix return value of cgroup_attach_permissions. It used to return 0
    when it should've returned -EOPNOTSUPP.
  - Fix call to cgroup_attach_permissions() in cgroup_procs_write(). It
    accidently specified that a thread was moved causing an additional
    check for domain-group equality to be executed that is not needed.

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-2-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
unchanged

/* v4 */
unchanged
---
 kernel/cgroup/cgroup.c | 46 +++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 735af8f15f95..ad1f9fea5c14 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2719,11 +2719,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
 {
 	DEFINE_CGROUP_MGCTX(mgctx);
 	struct task_struct *task;
-	int ret;
-
-	ret = cgroup_migrate_vet_dst(dst_cgrp);
-	if (ret)
-		return ret;
+	int ret = 0;
 
 	/* look up all src csets */
 	spin_lock_irq(&css_set_lock);
@@ -4690,6 +4686,33 @@ static int cgroup_procs_write_permission(struct cgroup *src_cgrp,
 	return 0;
 }
 
+static inline bool cgroup_same_domain(const struct cgroup *src_cgrp,
+				      const struct cgroup *dst_cgrp)
+{
+	return src_cgrp->dom_cgrp == dst_cgrp->dom_cgrp;
+}
+
+static int cgroup_attach_permissions(struct cgroup *src_cgrp,
+				     struct cgroup *dst_cgrp,
+				     struct super_block *sb, bool thread)
+{
+	int ret = 0;
+
+	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, sb);
+	if (ret)
+		return ret;
+
+	ret = cgroup_migrate_vet_dst(dst_cgrp);
+	if (ret)
+		return ret;
+
+	if (thread &&
+	    !cgroup_same_domain(src_cgrp->dom_cgrp, dst_cgrp->dom_cgrp))
+		ret = -EOPNOTSUPP;
+
+	return ret;
+}
+
 static ssize_t cgroup_procs_write(struct kernfs_open_file *of,
 				  char *buf, size_t nbytes, loff_t off)
 {
@@ -4712,8 +4735,8 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of,
 	src_cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
 	spin_unlock_irq(&css_set_lock);
 
-	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp,
-					    of->file->f_path.dentry->d_sb);
+	ret = cgroup_attach_permissions(src_cgrp, dst_cgrp,
+					of->file->f_path.dentry->d_sb, false);
 	if (ret)
 		goto out_finish;
 
@@ -4757,16 +4780,11 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
 	spin_unlock_irq(&css_set_lock);
 
 	/* thread migrations follow the cgroup.procs delegation rule */
-	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp,
-					    of->file->f_path.dentry->d_sb);
+	ret = cgroup_attach_permissions(src_cgrp, dst_cgrp,
+					of->file->f_path.dentry->d_sb, true);
 	if (ret)
 		goto out_finish;
 
-	/* and must be contained in the same domain */
-	ret = -EOPNOTSUPP;
-	if (src_cgrp->dom_cgrp != dst_cgrp->dom_cgrp)
-		goto out_finish;
-
 	ret = cgroup_attach_task(dst_cgrp, task, false);
 
 out_finish:
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 2/6] cgroup: add cgroup_get_from_file() helper
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api, linux-kernel, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Johannes Weiner, Li Zefan, cgroups

Add a helper cgroup_get_from_file(). The helper will be used in
subsequent patches to retrieve a cgroup while holding a reference to the
struct file it was taken from.

Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v1 */
patch not present

/* v2 */
patch not present

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-3-christian.brauner@ubuntu.com
patch introduced
- Tejun Heo <tj@kernel.org>:
  - split cgroup_get_from_file() changes into separate commmit

/* v4 */
unchanged
---
 kernel/cgroup/cgroup.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index ad1f9fea5c14..f05efc2677c8 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5882,6 +5882,24 @@ void cgroup_fork(struct task_struct *child)
 	INIT_LIST_HEAD(&child->cg_list);
 }
 
+static struct cgroup *cgroup_get_from_file(struct file *f)
+{
+	struct cgroup_subsys_state *css;
+	struct cgroup *cgrp;
+
+	css = css_tryget_online_from_dir(f->f_path.dentry, NULL);
+	if (IS_ERR(css))
+		return ERR_CAST(css);
+
+	cgrp = css->cgroup;
+	if (!cgroup_on_dfl(cgrp)) {
+		cgroup_put(cgrp);
+		return ERR_PTR(-EBADF);
+	}
+
+	return cgrp;
+}
+
 /**
  * cgroup_can_fork - called on a new task before the process is exposed
  * @child: the task in question.
@@ -6170,7 +6188,6 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
  */
 struct cgroup *cgroup_get_from_fd(int fd)
 {
-	struct cgroup_subsys_state *css;
 	struct cgroup *cgrp;
 	struct file *f;
 
@@ -6178,17 +6195,8 @@ struct cgroup *cgroup_get_from_fd(int fd)
 	if (!f)
 		return ERR_PTR(-EBADF);
 
-	css = css_tryget_online_from_dir(f->f_path.dentry, NULL);
+	cgrp = cgroup_get_from_file(f);
 	fput(f);
-	if (IS_ERR(css))
-		return ERR_CAST(css);
-
-	cgrp = css->cgroup;
-	if (!cgroup_on_dfl(cgrp)) {
-		cgroup_put(cgrp);
-		return ERR_PTR(-EBADF);
-	}
-
 	return cgrp;
 }
 EXPORT_SYMBOL_GPL(cgroup_get_from_fd);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 2/6] cgroup: add cgroup_get_from_file() helper
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Johannes Weiner, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Add a helper cgroup_get_from_file(). The helper will be used in
subsequent patches to retrieve a cgroup while holding a reference to the
struct file it was taken from.

Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
/* v1 */
patch not present

/* v2 */
patch not present

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-3-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
patch introduced
- Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>:
  - split cgroup_get_from_file() changes into separate commmit

/* v4 */
unchanged
---
 kernel/cgroup/cgroup.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index ad1f9fea5c14..f05efc2677c8 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5882,6 +5882,24 @@ void cgroup_fork(struct task_struct *child)
 	INIT_LIST_HEAD(&child->cg_list);
 }
 
+static struct cgroup *cgroup_get_from_file(struct file *f)
+{
+	struct cgroup_subsys_state *css;
+	struct cgroup *cgrp;
+
+	css = css_tryget_online_from_dir(f->f_path.dentry, NULL);
+	if (IS_ERR(css))
+		return ERR_CAST(css);
+
+	cgrp = css->cgroup;
+	if (!cgroup_on_dfl(cgrp)) {
+		cgroup_put(cgrp);
+		return ERR_PTR(-EBADF);
+	}
+
+	return cgrp;
+}
+
 /**
  * cgroup_can_fork - called on a new task before the process is exposed
  * @child: the task in question.
@@ -6170,7 +6188,6 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
  */
 struct cgroup *cgroup_get_from_fd(int fd)
 {
-	struct cgroup_subsys_state *css;
 	struct cgroup *cgrp;
 	struct file *f;
 
@@ -6178,17 +6195,8 @@ struct cgroup *cgroup_get_from_fd(int fd)
 	if (!f)
 		return ERR_PTR(-EBADF);
 
-	css = css_tryget_online_from_dir(f->f_path.dentry, NULL);
+	cgrp = cgroup_get_from_file(f);
 	fput(f);
-	if (IS_ERR(css))
-		return ERR_CAST(css);
-
-	cgrp = css->cgroup;
-	if (!cgroup_on_dfl(cgrp)) {
-		cgroup_put(cgrp);
-		return ERR_PTR(-EBADF);
-	}
-
 	return cgrp;
 }
 EXPORT_SYMBOL_GPL(cgroup_get_from_fd);
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 3/6] cgroup: refactor fork helpers
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api, linux-kernel, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Johannes Weiner, Li Zefan, cgroups

This refactors the fork helpers so they can be easily modified in the
next patches. The patch just passes in the parent task_struct and moves
the cgroup threadgroup rwsem grab and release into the helpers. The
don't need to be directly exposed in fork.c.

Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v1 */
patch not present

/* v2 */
patch not present

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-4-christian.brauner@ubuntu.com
patch introduced
- Tejun Heo <tj@kernel.org>:
  - split into separate commmit

/* v4 */
unchanged
---
 include/linux/cgroup.h | 18 +++++++++-----
 kernel/cgroup/cgroup.c | 56 ++++++++++++++++++++++++++----------------
 kernel/fork.c          | 12 +++------
 3 files changed, 51 insertions(+), 35 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d7ddebd0cdec..5349465bfac1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -121,9 +121,12 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 		     struct pid *pid, struct task_struct *tsk);
 
 void cgroup_fork(struct task_struct *p);
-extern int cgroup_can_fork(struct task_struct *p);
-extern void cgroup_cancel_fork(struct task_struct *p);
-extern void cgroup_post_fork(struct task_struct *p);
+extern int cgroup_can_fork(struct task_struct *parent,
+			   struct task_struct *child);
+extern void cgroup_cancel_fork(struct task_struct *parent,
+			       struct task_struct *child);
+extern void cgroup_post_fork(struct task_struct *parent,
+			     struct task_struct *child);
 void cgroup_exit(struct task_struct *p);
 void cgroup_release(struct task_struct *p);
 void cgroup_free(struct task_struct *p);
@@ -707,9 +710,12 @@ static inline int cgroupstats_build(struct cgroupstats *stats,
 				    struct dentry *dentry) { return -EINVAL; }
 
 static inline void cgroup_fork(struct task_struct *p) {}
-static inline int cgroup_can_fork(struct task_struct *p) { return 0; }
-static inline void cgroup_cancel_fork(struct task_struct *p) {}
-static inline void cgroup_post_fork(struct task_struct *p) {}
+static inline int cgroup_can_fork(struct task_struct *parent,
+				  struct task_struct *child) { return 0; }
+static inline void cgroup_cancel_fork(struct task_struct *parent,
+			       struct task_struct *child) {};
+static inline void cgroup_post_fork(struct task_struct *parent,
+				    struct task_struct *child) {};
 static inline void cgroup_exit(struct task_struct *p) {}
 static inline void cgroup_release(struct task_struct *p) {}
 static inline void cgroup_free(struct task_struct *p) {}
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f05efc2677c8..49d8cf087e10 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5902,17 +5902,22 @@ static struct cgroup *cgroup_get_from_file(struct file *f)
 
 /**
  * cgroup_can_fork - called on a new task before the process is exposed
- * @child: the task in question.
+ * @parent: the parent process of @child
+ * @child: the child process of @parent
+ * @kargs: the arguments passed to create the child process
  *
- * This calls the subsystem can_fork() callbacks. If the can_fork() callback
- * returns an error, the fork aborts with that error code. This allows for
- * a cgroup subsystem to conditionally allow or deny new forks.
+ * This calls the subsystem can_fork() callbacks. If the cgroup_can_fork()
+ * callback returns an error, the fork aborts with that error code. This
+ * allows for a cgroup subsystem to conditionally allow or deny new forks.
  */
-int cgroup_can_fork(struct task_struct *child)
+int cgroup_can_fork(struct task_struct *parent, struct task_struct *child)
+	__acquires(&cgroup_threadgroup_rwsem) __releases(&cgroup_threadgroup_rwsem)
 {
 	struct cgroup_subsys *ss;
 	int i, j, ret;
 
+	cgroup_threadgroup_change_begin(parent);
+
 	do_each_subsys_mask(ss, i, have_canfork_callback) {
 		ret = ss->can_fork(child);
 		if (ret)
@@ -5929,17 +5934,22 @@ int cgroup_can_fork(struct task_struct *child)
 			ss->cancel_fork(child);
 	}
 
+	cgroup_threadgroup_change_end(parent);
+
 	return ret;
 }
 
 /**
- * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
- * @child: the task in question
- *
- * This calls the cancel_fork() callbacks if a fork failed *after*
- * cgroup_can_fork() succeded.
- */
-void cgroup_cancel_fork(struct task_struct *child)
+  * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
+  * @parent: the parent process of @child
+  * @child: the child process of @parent
+  * @kargs: the arguments passed to create the child process
+  *
+  * This calls the cancel_fork() callbacks if a fork failed *after*
+  * cgroup_can_fork() succeded.
+  */
+void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child)
+	__releases(&cgroup_threadgroup_rwsem)
 {
 	struct cgroup_subsys *ss;
 	int i;
@@ -5947,19 +5957,21 @@ void cgroup_cancel_fork(struct task_struct *child)
 	for_each_subsys(ss, i)
 		if (ss->cancel_fork)
 			ss->cancel_fork(child);
+
+	cgroup_threadgroup_change_end(parent);
 }
 
 /**
- * cgroup_post_fork - called on a new task after adding it to the task list
- * @child: the task in question
- *
- * Adds the task to the list running through its css_set if necessary and
- * call the subsystem fork() callbacks.  Has to be after the task is
- * visible on the task list in case we race with the first call to
- * cgroup_task_iter_start() - to guarantee that the new task ends up on its
- * list.
+ * cgroup_post_fork - finalize cgroup setup for the child process
+ * @parent: the parent process of @child
+ * @child: the child process of @parent
+ * @kargs: the arguments passed to create the child process
+ *
+ * Attach the child process to its css_set calling the subsystem fork()
+ * callbacks.
  */
-void cgroup_post_fork(struct task_struct *child)
+void cgroup_post_fork(struct task_struct *parent, struct task_struct *child)
+	__releases(&cgroup_threadgroup_rwsem)
 {
 	struct cgroup_subsys *ss;
 	struct css_set *cset;
@@ -6002,6 +6014,8 @@ void cgroup_post_fork(struct task_struct *child)
 	do_each_subsys_mask(ss, i, have_fork_callback) {
 		ss->fork(child);
 	} while_each_subsys_mask();
+
+	cgroup_threadgroup_change_end(parent);
 }
 
 /**
diff --git a/kernel/fork.c b/kernel/fork.c
index 080809560072..c76758dbd594 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2165,16 +2165,15 @@ static __latent_entropy struct task_struct *copy_process(
 	INIT_LIST_HEAD(&p->thread_group);
 	p->task_works = NULL;
 
-	cgroup_threadgroup_change_begin(current);
 	/*
 	 * Ensure that the cgroup subsystem policies allow the new process to be
 	 * forked. It should be noted the the new process's css_set can be changed
 	 * between here and cgroup_post_fork() if an organisation operation is in
 	 * progress.
 	 */
-	retval = cgroup_can_fork(p);
+	retval = cgroup_can_fork(current, p);
 	if (retval)
-		goto bad_fork_cgroup_threadgroup_change_end;
+		goto bad_fork_put_pidfd;
 
 	/*
 	 * From this point on we must avoid any synchronous user-space
@@ -2279,8 +2278,7 @@ static __latent_entropy struct task_struct *copy_process(
 	write_unlock_irq(&tasklist_lock);
 
 	proc_fork_connector(p);
-	cgroup_post_fork(p);
-	cgroup_threadgroup_change_end(current);
+	cgroup_post_fork(current, p);
 	perf_event_fork(p);
 
 	trace_task_newtask(p, clone_flags);
@@ -2291,9 +2289,7 @@ static __latent_entropy struct task_struct *copy_process(
 bad_fork_cancel_cgroup:
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
-	cgroup_cancel_fork(p);
-bad_fork_cgroup_threadgroup_change_end:
-	cgroup_threadgroup_change_end(current);
+	cgroup_cancel_fork(current, p);
 bad_fork_put_pidfd:
 	if (clone_flags & CLONE_PIDFD) {
 		fput(pidfile);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 3/6] cgroup: refactor fork helpers
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Johannes Weiner, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

This refactors the fork helpers so they can be easily modified in the
next patches. The patch just passes in the parent task_struct and moves
the cgroup threadgroup rwsem grab and release into the helpers. The
don't need to be directly exposed in fork.c.

Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
/* v1 */
patch not present

/* v2 */
patch not present

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-4-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
patch introduced
- Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>:
  - split into separate commmit

/* v4 */
unchanged
---
 include/linux/cgroup.h | 18 +++++++++-----
 kernel/cgroup/cgroup.c | 56 ++++++++++++++++++++++++++----------------
 kernel/fork.c          | 12 +++------
 3 files changed, 51 insertions(+), 35 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d7ddebd0cdec..5349465bfac1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -121,9 +121,12 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 		     struct pid *pid, struct task_struct *tsk);
 
 void cgroup_fork(struct task_struct *p);
-extern int cgroup_can_fork(struct task_struct *p);
-extern void cgroup_cancel_fork(struct task_struct *p);
-extern void cgroup_post_fork(struct task_struct *p);
+extern int cgroup_can_fork(struct task_struct *parent,
+			   struct task_struct *child);
+extern void cgroup_cancel_fork(struct task_struct *parent,
+			       struct task_struct *child);
+extern void cgroup_post_fork(struct task_struct *parent,
+			     struct task_struct *child);
 void cgroup_exit(struct task_struct *p);
 void cgroup_release(struct task_struct *p);
 void cgroup_free(struct task_struct *p);
@@ -707,9 +710,12 @@ static inline int cgroupstats_build(struct cgroupstats *stats,
 				    struct dentry *dentry) { return -EINVAL; }
 
 static inline void cgroup_fork(struct task_struct *p) {}
-static inline int cgroup_can_fork(struct task_struct *p) { return 0; }
-static inline void cgroup_cancel_fork(struct task_struct *p) {}
-static inline void cgroup_post_fork(struct task_struct *p) {}
+static inline int cgroup_can_fork(struct task_struct *parent,
+				  struct task_struct *child) { return 0; }
+static inline void cgroup_cancel_fork(struct task_struct *parent,
+			       struct task_struct *child) {};
+static inline void cgroup_post_fork(struct task_struct *parent,
+				    struct task_struct *child) {};
 static inline void cgroup_exit(struct task_struct *p) {}
 static inline void cgroup_release(struct task_struct *p) {}
 static inline void cgroup_free(struct task_struct *p) {}
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f05efc2677c8..49d8cf087e10 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5902,17 +5902,22 @@ static struct cgroup *cgroup_get_from_file(struct file *f)
 
 /**
  * cgroup_can_fork - called on a new task before the process is exposed
- * @child: the task in question.
+ * @parent: the parent process of @child
+ * @child: the child process of @parent
+ * @kargs: the arguments passed to create the child process
  *
- * This calls the subsystem can_fork() callbacks. If the can_fork() callback
- * returns an error, the fork aborts with that error code. This allows for
- * a cgroup subsystem to conditionally allow or deny new forks.
+ * This calls the subsystem can_fork() callbacks. If the cgroup_can_fork()
+ * callback returns an error, the fork aborts with that error code. This
+ * allows for a cgroup subsystem to conditionally allow or deny new forks.
  */
-int cgroup_can_fork(struct task_struct *child)
+int cgroup_can_fork(struct task_struct *parent, struct task_struct *child)
+	__acquires(&cgroup_threadgroup_rwsem) __releases(&cgroup_threadgroup_rwsem)
 {
 	struct cgroup_subsys *ss;
 	int i, j, ret;
 
+	cgroup_threadgroup_change_begin(parent);
+
 	do_each_subsys_mask(ss, i, have_canfork_callback) {
 		ret = ss->can_fork(child);
 		if (ret)
@@ -5929,17 +5934,22 @@ int cgroup_can_fork(struct task_struct *child)
 			ss->cancel_fork(child);
 	}
 
+	cgroup_threadgroup_change_end(parent);
+
 	return ret;
 }
 
 /**
- * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
- * @child: the task in question
- *
- * This calls the cancel_fork() callbacks if a fork failed *after*
- * cgroup_can_fork() succeded.
- */
-void cgroup_cancel_fork(struct task_struct *child)
+  * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
+  * @parent: the parent process of @child
+  * @child: the child process of @parent
+  * @kargs: the arguments passed to create the child process
+  *
+  * This calls the cancel_fork() callbacks if a fork failed *after*
+  * cgroup_can_fork() succeded.
+  */
+void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child)
+	__releases(&cgroup_threadgroup_rwsem)
 {
 	struct cgroup_subsys *ss;
 	int i;
@@ -5947,19 +5957,21 @@ void cgroup_cancel_fork(struct task_struct *child)
 	for_each_subsys(ss, i)
 		if (ss->cancel_fork)
 			ss->cancel_fork(child);
+
+	cgroup_threadgroup_change_end(parent);
 }
 
 /**
- * cgroup_post_fork - called on a new task after adding it to the task list
- * @child: the task in question
- *
- * Adds the task to the list running through its css_set if necessary and
- * call the subsystem fork() callbacks.  Has to be after the task is
- * visible on the task list in case we race with the first call to
- * cgroup_task_iter_start() - to guarantee that the new task ends up on its
- * list.
+ * cgroup_post_fork - finalize cgroup setup for the child process
+ * @parent: the parent process of @child
+ * @child: the child process of @parent
+ * @kargs: the arguments passed to create the child process
+ *
+ * Attach the child process to its css_set calling the subsystem fork()
+ * callbacks.
  */
-void cgroup_post_fork(struct task_struct *child)
+void cgroup_post_fork(struct task_struct *parent, struct task_struct *child)
+	__releases(&cgroup_threadgroup_rwsem)
 {
 	struct cgroup_subsys *ss;
 	struct css_set *cset;
@@ -6002,6 +6014,8 @@ void cgroup_post_fork(struct task_struct *child)
 	do_each_subsys_mask(ss, i, have_fork_callback) {
 		ss->fork(child);
 	} while_each_subsys_mask();
+
+	cgroup_threadgroup_change_end(parent);
 }
 
 /**
diff --git a/kernel/fork.c b/kernel/fork.c
index 080809560072..c76758dbd594 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2165,16 +2165,15 @@ static __latent_entropy struct task_struct *copy_process(
 	INIT_LIST_HEAD(&p->thread_group);
 	p->task_works = NULL;
 
-	cgroup_threadgroup_change_begin(current);
 	/*
 	 * Ensure that the cgroup subsystem policies allow the new process to be
 	 * forked. It should be noted the the new process's css_set can be changed
 	 * between here and cgroup_post_fork() if an organisation operation is in
 	 * progress.
 	 */
-	retval = cgroup_can_fork(p);
+	retval = cgroup_can_fork(current, p);
 	if (retval)
-		goto bad_fork_cgroup_threadgroup_change_end;
+		goto bad_fork_put_pidfd;
 
 	/*
 	 * From this point on we must avoid any synchronous user-space
@@ -2279,8 +2278,7 @@ static __latent_entropy struct task_struct *copy_process(
 	write_unlock_irq(&tasklist_lock);
 
 	proc_fork_connector(p);
-	cgroup_post_fork(p);
-	cgroup_threadgroup_change_end(current);
+	cgroup_post_fork(current, p);
 	perf_event_fork(p);
 
 	trace_task_newtask(p, clone_flags);
@@ -2291,9 +2289,7 @@ static __latent_entropy struct task_struct *copy_process(
 bad_fork_cancel_cgroup:
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
-	cgroup_cancel_fork(p);
-bad_fork_cgroup_threadgroup_change_end:
-	cgroup_threadgroup_change_end(current);
+	cgroup_cancel_fork(current, p);
 bad_fork_put_pidfd:
 	if (clone_flags & CLONE_PIDFD) {
 		fput(pidfile);
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 4/6] cgroup: add cgroup_may_write() helper
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api, linux-kernel, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Johannes Weiner, Li Zefan, cgroups

Add a cgroup_may_write() helper which we can use in the
CLONE_INTO_CGROUP patch series to verify that we can write to the
destination cgroup.

Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v1 */
patch not present

/* v2 */
patch not present

/* v3 */
patch not present

/* v4 */
patch introduced
---
 kernel/cgroup/cgroup.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 49d8cf087e10..b110b435ae49 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4649,13 +4649,28 @@ static int cgroup_procs_show(struct seq_file *s, void *v)
 	return 0;
 }
 
+static int cgroup_may_write(const struct cgroup *cgrp, struct super_block *sb)
+{
+	int ret;
+	struct inode *inode;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
+	if (!inode)
+		return -ENOMEM;
+
+	ret = inode_permission(inode, MAY_WRITE);
+	iput(inode);
+	return ret;
+}
+
 static int cgroup_procs_write_permission(struct cgroup *src_cgrp,
 					 struct cgroup *dst_cgrp,
 					 struct super_block *sb)
 {
 	struct cgroup_namespace *ns = current->nsproxy->cgroup_ns;
 	struct cgroup *com_cgrp = src_cgrp;
-	struct inode *inode;
 	int ret;
 
 	lockdep_assert_held(&cgroup_mutex);
@@ -4665,12 +4680,7 @@ static int cgroup_procs_write_permission(struct cgroup *src_cgrp,
 		com_cgrp = cgroup_parent(com_cgrp);
 
 	/* %current should be authorized to migrate to the common ancestor */
-	inode = kernfs_get_inode(sb, com_cgrp->procs_file.kn);
-	if (!inode)
-		return -ENOMEM;
-
-	ret = inode_permission(inode, MAY_WRITE);
-	iput(inode);
+	ret = cgroup_may_write(com_cgrp, sb);
 	if (ret)
 		return ret;
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 4/6] cgroup: add cgroup_may_write() helper
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Johannes Weiner, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Add a cgroup_may_write() helper which we can use in the
CLONE_INTO_CGROUP patch series to verify that we can write to the
destination cgroup.

Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
/* v1 */
patch not present

/* v2 */
patch not present

/* v3 */
patch not present

/* v4 */
patch introduced
---
 kernel/cgroup/cgroup.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 49d8cf087e10..b110b435ae49 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4649,13 +4649,28 @@ static int cgroup_procs_show(struct seq_file *s, void *v)
 	return 0;
 }
 
+static int cgroup_may_write(const struct cgroup *cgrp, struct super_block *sb)
+{
+	int ret;
+	struct inode *inode;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
+	if (!inode)
+		return -ENOMEM;
+
+	ret = inode_permission(inode, MAY_WRITE);
+	iput(inode);
+	return ret;
+}
+
 static int cgroup_procs_write_permission(struct cgroup *src_cgrp,
 					 struct cgroup *dst_cgrp,
 					 struct super_block *sb)
 {
 	struct cgroup_namespace *ns = current->nsproxy->cgroup_ns;
 	struct cgroup *com_cgrp = src_cgrp;
-	struct inode *inode;
 	int ret;
 
 	lockdep_assert_held(&cgroup_mutex);
@@ -4665,12 +4680,7 @@ static int cgroup_procs_write_permission(struct cgroup *src_cgrp,
 		com_cgrp = cgroup_parent(com_cgrp);
 
 	/* %current should be authorized to migrate to the common ancestor */
-	inode = kernfs_get_inode(sb, com_cgrp->procs_file.kn);
-	if (!inode)
-		return -ENOMEM;
-
-	ret = inode_permission(inode, MAY_WRITE);
-	iput(inode);
+	ret = cgroup_may_write(com_cgrp, sb);
 	if (ret)
 		return ret;
 
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 5/6] clone3: allow spawning processes into cgroups
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api, linux-kernel, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Ingo Molnar, Johannes Weiner,
	Li Zefan, Peter Zijlstra, cgroups

This adds support for creating a process in a different cgroup than its
parent. Callers can limit and account processes and threads right from
the moment they are spawned:
- A service manager can directly spawn new services into dedicated
  cgroups.
- A process can be directly created in a frozen cgroup and will be
  frozen as well.
- The initial accounting jitter experienced by process supervisors and
  daemons is eliminated with this.
- Threaded applications or even thread implementations can choose to
  create a specific cgroup layout where each thread is spawned
  directly into a dedicated cgroup.

This feature is limited to the unified hierarchy. Callers need to pass
an directory file descriptor for the target cgroup. The caller can
choose to pass an O_PATH file descriptor. All usual migration
restrictions apply, i.e. there can be no processes in inner nodes. In
general, creating a process directly in a target cgroup adheres to all
migration restrictions.

Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v1 */
Link: https://lore.kernel.org/r/20191218173516.7875-3-christian.brauner@ubuntu.com

/* v2 */
Link: https://lore.kernel.org/r/20191223061504.28716-3-christian.brauner@ubuntu.com
- Oleg Nesterov <oleg@redhat.com>:
  - prevent deadlock from wrong locking order
- Christian Brauner <christian.brauner@ubuntu.com>:
  - Rework locking. In the previous patch version we would have already
    acquired the cgroup_threadgroup_rwsem before we grabbed cgroup mutex
    we need to hold when CLONE_INTO_CGROUP is specified. This meant we
    could deadlock with other codepaths that all require it to be done
    the other way around. Fix this by first grabbing cgroup mutex when
    CLONE_INTO_CGROUP is specified and then grabbing
    cgroup_threadgroup_rwsem unconditionally after. This way we don't
    require the cgroup mutex be held in codepaths that don't need it.
  - Switch from mutex_lock() to mutex_lock_killable().

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-5-christian.brauner@ubuntu.com
- Tejun Heo <tj@kernel.org>:
  - s/mutex_lock_killable()/mutex_lock()/ because it should only ever
    be held for a short time:
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index a9fedcfeae4b..d68d3fb6af1d 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -5927,11 +5927,8 @@ static int cgroup_css_set_fork(struct task_struct *parent,
            struct super_block *sb;
            struct file *f;

    -       if (kargs->flags & CLONE_INTO_CGROUP) {
    -               ret = mutex_lock_killable(&cgroup_mutex);
    -               if (ret)
    -                       return ret;
    -       }
    +       if (kargs->flags & CLONE_INTO_CGROUP)
    +               mutex_lock(&cgroup_mutex);

            cgroup_threadgroup_change_begin(parent);
  - s/task_cgroup_from_root/cset->dfl_cgrp/:
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index d68d3fb6af1d..3ceef006d144 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -5922,7 +5922,7 @@ static int cgroup_css_set_fork(struct task_struct *parent,
            __acquires(&cgroup_mutex) __acquires(&cgroup_threadgroup_rwsem)
     {
            int ret;
    -       struct cgroup *dst_cgrp = NULL, *src_cgrp;
    +       struct cgroup *dst_cgrp = NULL;
            struct css_set *cset;
            struct super_block *sb;
            struct file *f;
    @@ -5956,11 +5956,7 @@ static int cgroup_css_set_fork(struct task_struct *parent,
                    goto err;
            }

    -       spin_lock_irq(&css_set_lock);
    -       src_cgrp = task_cgroup_from_root(parent, &cgrp_dfl_cgrp);
    -       spin_unlock_irq(&css_set_lock);
    -
    -       ret = cgroup_attach_permissions(src_cgrp, dst_cgrp, sb,
    +       ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb,
                                            !!(kargs->flags & CLONE_THREAD));
            if (ret)
                    goto err;
  - pass struct css_set instead of struct kernel_clone_args into cgroup
    fork subsystem callbacks:
    diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
    index cd848c6bac4a..058bb16d073f 100644
    --- a/include/linux/cgroup-defs.h
    +++ b/include/linux/cgroup-defs.h
    @@ -630,9 +630,8 @@ struct cgroup_subsys {
     	void (*attach)(struct cgroup_taskset *tset);
     	void (*post_attach)(void);
     	int (*can_fork)(struct task_struct *parent, struct task_struct *child,
    -			struct kernel_clone_args *kargs);
    -	void (*cancel_fork)(struct task_struct *child,
    -			    struct kernel_clone_args *kargs);
    +			struct css_set *cset);
    +	void (*cancel_fork)(struct task_struct *child, struct css_set *cset);
     	void (*fork)(struct task_struct *task);
     	void (*exit)(struct task_struct *task);
     	void (*release)(struct task_struct *task);
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index 3ceef006d144..2ac1c37a3fcb 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -6044,7 +6044,7 @@ int cgroup_can_fork(struct task_struct *parent, struct task_struct *child,
     		return ret;

     	do_each_subsys_mask(ss, i, have_canfork_callback) {
    -		ret = ss->can_fork(parent, child, kargs);
    +		ret = ss->can_fork(parent, child, kargs->cset);
     		if (ret)
     			goto out_revert;
     	} while_each_subsys_mask();
    @@ -6056,7 +6056,7 @@ int cgroup_can_fork(struct task_struct *parent, struct task_struct *child,
     		if (j >= i)
     			break;
     		if (ss->cancel_fork)
    -			ss->cancel_fork(child, kargs);
    +			ss->cancel_fork(child, kargs->cset);
     	}

     	cgroup_css_set_put_fork(parent, kargs);
    @@ -6082,7 +6082,7 @@ void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child,

     	for_each_subsys(ss, i)
     		if (ss->cancel_fork)
    -			ss->cancel_fork(child, kargs);
    +			ss->cancel_fork(child, kargs->cset);

     	cgroup_css_set_put_fork(parent, kargs);
     }
    diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
    index e5955bc1fb00..4e7c8819c8df 100644
    --- a/kernel/cgroup/pids.c
    +++ b/kernel/cgroup/pids.c
    @@ -216,20 +216,16 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
      * on cgroup_threadgroup_change_begin() held by the copy_process().
      */
     static int pids_can_fork(struct task_struct *parent, struct task_struct *child,
    -			 struct kernel_clone_args *args)
    +			 struct css_set *cset)
     {
    -	struct css_set *new_cset = NULL;
     	struct cgroup_subsys_state *css;
     	struct pids_cgroup *pids;
     	int err;

    -	if (args)
    -		new_cset = args->cset;
    -
    -	if (!new_cset)
    -		css = task_css_check(current, pids_cgrp_id, true);
    +	if (cset)
    +		css = cset->subsys[pids_cgrp_id];
     	else
    -		css = new_cset->subsys[pids_cgrp_id];
    +		css = task_css_check(current, pids_cgrp_id, true);
     	pids = css_pids(css);
     	err = pids_try_charge(pids, 1);
     	if (err) {
    @@ -244,20 +240,15 @@ static int pids_can_fork(struct task_struct *parent, struct task_struct *child,
     	return err;
     }

    -static void pids_cancel_fork(struct task_struct *task,
    -			     struct kernel_clone_args *args)
    +static void pids_cancel_fork(struct task_struct *task, struct css_set *cset)
     {
    -	struct css_set *new_cset = NULL;
     	struct cgroup_subsys_state *css;
     	struct pids_cgroup *pids;

    -	if (args)
    -		new_cset = args->cset;
    -
    -	if (!new_cset)
    -		css = task_css_check(current, pids_cgrp_id, true);
    +	if (cset)
    +		css = cset->subsys[pids_cgrp_id];
     	else
    -		css = new_cset->subsys[pids_cgrp_id];
    +		css = task_css_check(current, pids_cgrp_id, true);
     	pids = css_pids(css);
     	pids_uncharge(pids, 1);
     }
- Michal Koutný <mkoutny@suse.com>:
  - update comment for cgroup_fork()
  - if CLONE_NEWCGROUP and CLONE_INTO_CGROUP is requested, set the
    root_cset of the new cgroup namespace to the child's cset

/* v4 */
- Tejun Heo <tj@kernel.org>:
  - verify that we can write to the target cgroup since we're not going through
    the vfs layer which would do it for us
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index 61d1a6cd0059..6b38b2545667 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -5966,6 +5966,15 @@ static int cgroup_css_set_fork(struct task_struct *parent,
                    goto err;
            }

    +       /*
    +        * Verify that we can the target cgroup is writable for us. This is
    +        * usally done by the vfs layer but since we're not going through the
    +        * vfs layer here we need to do it.
    +        */
    +       ret = cgroup_may_write(dst_cgrp, sb);
    +       if (ret)
    +               goto err;
    +
            ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb,
                                            !!(kargs->flags & CLONE_THREAD));
            if (ret)
---
 include/linux/cgroup-defs.h |   6 +-
 include/linux/cgroup.h      |  20 ++--
 include/linux/sched/task.h  |   4 +
 include/uapi/linux/sched.h  |   5 +
 kernel/cgroup/cgroup.c      | 194 +++++++++++++++++++++++++++++++-----
 kernel/cgroup/pids.c        |  16 ++-
 kernel/fork.c               |  13 ++-
 7 files changed, 217 insertions(+), 41 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 63097cb243cb..058bb16d073f 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -33,6 +33,7 @@ struct kernfs_ops;
 struct kernfs_open_file;
 struct seq_file;
 struct poll_table_struct;
+struct kernel_clone_args;
 
 #define MAX_CGROUP_TYPE_NAMELEN 32
 #define MAX_CGROUP_ROOT_NAMELEN 64
@@ -628,8 +629,9 @@ struct cgroup_subsys {
 	void (*cancel_attach)(struct cgroup_taskset *tset);
 	void (*attach)(struct cgroup_taskset *tset);
 	void (*post_attach)(void);
-	int (*can_fork)(struct task_struct *task);
-	void (*cancel_fork)(struct task_struct *task);
+	int (*can_fork)(struct task_struct *parent, struct task_struct *child,
+			struct css_set *cset);
+	void (*cancel_fork)(struct task_struct *child, struct css_set *cset);
 	void (*fork)(struct task_struct *task);
 	void (*exit)(struct task_struct *task);
 	void (*release)(struct task_struct *task);
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 5349465bfac1..dfd5e095f4ee 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -27,6 +27,8 @@
 
 #include <linux/cgroup-defs.h>
 
+struct kernel_clone_args;
+
 #ifdef CONFIG_CGROUPS
 
 /*
@@ -122,11 +124,14 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 
 void cgroup_fork(struct task_struct *p);
 extern int cgroup_can_fork(struct task_struct *parent,
-			   struct task_struct *child);
+			   struct task_struct *child,
+			   struct kernel_clone_args *kargs);
 extern void cgroup_cancel_fork(struct task_struct *parent,
-			       struct task_struct *child);
+			       struct task_struct *child,
+			       struct kernel_clone_args *kargs);
 extern void cgroup_post_fork(struct task_struct *parent,
-			     struct task_struct *child);
+			     struct task_struct *child,
+			     struct kernel_clone_args *kargs);
 void cgroup_exit(struct task_struct *p);
 void cgroup_release(struct task_struct *p);
 void cgroup_free(struct task_struct *p);
@@ -711,11 +716,14 @@ static inline int cgroupstats_build(struct cgroupstats *stats,
 
 static inline void cgroup_fork(struct task_struct *p) {}
 static inline int cgroup_can_fork(struct task_struct *parent,
-				  struct task_struct *child) { return 0; }
+				  struct task_struct *child,
+				  struct kernel_clone_args *kargs) { return 0; }
 static inline void cgroup_cancel_fork(struct task_struct *parent,
-			       struct task_struct *child) {};
+				      struct task_struct *child,
+				      struct kernel_clone_args *kargs) {};
 static inline void cgroup_post_fork(struct task_struct *parent,
-				    struct task_struct *child) {};
+				    struct task_struct *child,
+				    struct kernel_clone_args *kargs) {};
 static inline void cgroup_exit(struct task_struct *p) {}
 static inline void cgroup_release(struct task_struct *p) {}
 static inline void cgroup_free(struct task_struct *p) {}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index f1879884238e..38359071236a 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -13,6 +13,7 @@
 struct task_struct;
 struct rusage;
 union thread_union;
+struct css_set;
 
 /* All the bits taken by the old clone syscall. */
 #define CLONE_LEGACY_FLAGS 0xffffffffULL
@@ -29,6 +30,9 @@ struct kernel_clone_args {
 	pid_t *set_tid;
 	/* Number of elements in *set_tid */
 	size_t set_tid_size;
+	int cgroup;
+	struct cgroup *cgrp;
+	struct css_set *cset;
 };
 
 /*
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 4a0217832464..08620c220f30 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -35,6 +35,7 @@
 
 /* Flags for the clone3() syscall. */
 #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
+#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
 
 #ifndef __ASSEMBLY__
 /**
@@ -75,6 +76,8 @@
  * @set_tid_size: This defines the size of the array referenced
  *                in @set_tid. This cannot be larger than the
  *                kernel's limit of nested PID namespaces.
+ * @cgroup:       If CLONE_INTO_CGROUP is specified set this to
+ *                a file descriptor for the cgroup.
  *
  * The structure is versioned by size and thus extensible.
  * New struct members must go at the end of the struct and
@@ -91,11 +94,13 @@ struct clone_args {
 	__aligned_u64 tls;
 	__aligned_u64 set_tid;
 	__aligned_u64 set_tid_size;
+	__aligned_u64 cgroup;
 };
 #endif
 
 #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
 #define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
+#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
 
 /*
  * Scheduling policies
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index b110b435ae49..276cb6053987 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5883,8 +5883,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
  * @child: pointer to task_struct of forking parent process.
  *
  * A task is associated with the init_css_set until cgroup_post_fork()
- * attaches it to the parent's css_set.  Empty cg_list indicates that
- * @child isn't holding reference to its css_set.
+ * attaches it to the target css_set.
  */
 void cgroup_fork(struct task_struct *child)
 {
@@ -5910,26 +5909,160 @@ static struct cgroup *cgroup_get_from_file(struct file *f)
 	return cgrp;
 }
 
+/**
+ * cgroup_css_set_fork - find or create a css_set for a child process
+ * @parent: the parent of the child process
+ * @kargs: the arguments passed to create the child process
+ *
+ * This functions finds or creates a new css_set which the child
+ * process will be attached to in cgroup_post_fork(). By default,
+ * the child process will be given the same css_set as its parent.
+ *
+ * If CLONE_INTO_CGROUP is specified this function will try to find an
+ * existing css_set which includes the request cgroup and if not create
+ * a new css_set that the child will be attached to. If this function
+ * succeeds it will hold cgroup_threadgroup_rwsem on return. If
+ * CLONE_INTO_CGROUP is requested this function will grab cgroup mutex
+ * before grabbing cgroup_threadgroup_rwsem and will hold a reference
+ * to the target cgroup.
+ */
+static int cgroup_css_set_fork(struct task_struct *parent,
+			       struct kernel_clone_args *kargs)
+	__acquires(&cgroup_mutex) __acquires(&cgroup_threadgroup_rwsem)
+{
+	int ret;
+	struct cgroup *dst_cgrp = NULL;
+	struct css_set *cset;
+	struct super_block *sb;
+	struct file *f;
+
+	if (kargs->flags & CLONE_INTO_CGROUP)
+		mutex_lock(&cgroup_mutex);
+
+	cgroup_threadgroup_change_begin(parent);
+
+	spin_lock_irq(&css_set_lock);
+	cset = task_css_set(parent);
+	get_css_set(cset);
+	spin_unlock_irq(&css_set_lock);
+
+	if (!(kargs->flags & CLONE_INTO_CGROUP)) {
+		kargs->cset = cset;
+		return 0;
+	}
+
+	f = fget_raw(kargs->cgroup);
+	if (!f) {
+		ret = -EBADF;
+		goto err;
+	}
+	sb = f->f_path.dentry->d_sb;
+
+	dst_cgrp = cgroup_get_from_file(f);
+	if (IS_ERR(dst_cgrp)) {
+		ret = PTR_ERR(dst_cgrp);
+		dst_cgrp = NULL;
+		goto err;
+	}
+
+	/*
+	 * Verify that we can the target cgroup is writable for us. This is
+	 * usally done by the vfs layer but since we're not going through the
+	 * vfs layer here we need to do it.
+	 */
+	ret = cgroup_may_write(dst_cgrp, sb);
+	if (ret)
+		goto err;
+
+	ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb,
+					!!(kargs->flags & CLONE_THREAD));
+	if (ret)
+		goto err;
+
+	kargs->cset = find_css_set(cset, dst_cgrp);
+	if (!kargs->cset) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	if (cgroup_is_dead(dst_cgrp)) {
+		ret = -ENODEV;
+		goto err;
+	}
+
+	put_css_set(cset);
+	fput(f);
+	kargs->cgrp = dst_cgrp;
+	return ret;
+
+err:
+	cgroup_threadgroup_change_end(parent);
+	mutex_unlock(&cgroup_mutex);
+	if (f)
+		fput(f);
+	if (dst_cgrp)
+		cgroup_put(dst_cgrp);
+	put_css_set(cset);
+	return ret;
+}
+
+/**
+ * cgroup_css_set_put_fork - drop references we took during fork
+ * @parent: the parent of the child process
+ * @kargs: the arguments passed to create the child process
+ *
+ * Drop references to the prepared css_set and target cgroup if
+ * CLONE_INTO_CGROUP was requested. This function can only be
+ * called before fork()'s point of no return.
+ */
+static void cgroup_css_set_put_fork(struct task_struct *parent,
+				    struct kernel_clone_args *kargs)
+	__releases(&cgroup_threadgroup_rwsem) __releases(&cgroup_mutex)
+{
+	cgroup_threadgroup_change_end(parent);
+
+	if (kargs->flags & CLONE_INTO_CGROUP) {
+		struct cgroup *cgrp = kargs->cgrp;
+		struct css_set *cset = kargs->cset;
+
+		mutex_unlock(&cgroup_mutex);
+
+		if (cset) {
+			put_css_set(cset);
+			kargs->cset = NULL;
+		}
+
+		if (cgrp) {
+			cgroup_put(cgrp);
+			kargs->cgrp = NULL;
+		}
+	}
+}
+
 /**
  * cgroup_can_fork - called on a new task before the process is exposed
  * @parent: the parent process of @child
  * @child: the child process of @parent
  * @kargs: the arguments passed to create the child process
  *
+ * This prepares a new css_set for the child process which the child will
+ * be attached to in cgroup_post_fork().
  * This calls the subsystem can_fork() callbacks. If the cgroup_can_fork()
  * callback returns an error, the fork aborts with that error code. This
  * allows for a cgroup subsystem to conditionally allow or deny new forks.
  */
-int cgroup_can_fork(struct task_struct *parent, struct task_struct *child)
-	__acquires(&cgroup_threadgroup_rwsem) __releases(&cgroup_threadgroup_rwsem)
+int cgroup_can_fork(struct task_struct *parent, struct task_struct *child,
+			struct kernel_clone_args *kargs)
 {
 	struct cgroup_subsys *ss;
 	int i, j, ret;
 
-	cgroup_threadgroup_change_begin(parent);
+	ret = cgroup_css_set_fork(parent, kargs);
+	if (ret)
+		return ret;
 
 	do_each_subsys_mask(ss, i, have_canfork_callback) {
-		ret = ss->can_fork(child);
+		ret = ss->can_fork(parent, child, kargs->cset);
 		if (ret)
 			goto out_revert;
 	} while_each_subsys_mask();
@@ -5941,34 +6074,35 @@ int cgroup_can_fork(struct task_struct *parent, struct task_struct *child)
 		if (j >= i)
 			break;
 		if (ss->cancel_fork)
-			ss->cancel_fork(child);
+			ss->cancel_fork(child, kargs->cset);
 	}
 
-	cgroup_threadgroup_change_end(parent);
+	cgroup_css_set_put_fork(parent, kargs);
 
 	return ret;
 }
 
 /**
-  * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
-  * @parent: the parent process of @child
-  * @child: the child process of @parent
-  * @kargs: the arguments passed to create the child process
-  *
-  * This calls the cancel_fork() callbacks if a fork failed *after*
-  * cgroup_can_fork() succeded.
-  */
-void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child)
-	__releases(&cgroup_threadgroup_rwsem)
+ * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
+ * @parent: the parent process of @child
+ * @child: the child process of @parent
+ * @kargs: the arguments passed to create the child process
+ *
+ * This calls the cancel_fork() callbacks if a fork failed *after*
+ * cgroup_can_fork() succeded and cleans up references we took to
+ * prepare a new css_set for the child process in cgroup_can_fork().
+ */
+void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child,
+			struct kernel_clone_args *kargs)
 {
 	struct cgroup_subsys *ss;
 	int i;
 
 	for_each_subsys(ss, i)
 		if (ss->cancel_fork)
-			ss->cancel_fork(child);
+			ss->cancel_fork(child, kargs->cset);
 
-	cgroup_threadgroup_change_end(parent);
+	cgroup_css_set_put_fork(parent, kargs);
 }
 
 /**
@@ -5980,18 +6114,17 @@ void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child)
  * Attach the child process to its css_set calling the subsystem fork()
  * callbacks.
  */
-void cgroup_post_fork(struct task_struct *parent, struct task_struct *child)
-	__releases(&cgroup_threadgroup_rwsem)
+void cgroup_post_fork(struct task_struct *parent, struct task_struct *child,
+		      struct kernel_clone_args *kargs)
+	__releases(&cgroup_threadgroup_rwsem) __releases(&cgroup_mutex)
 {
 	struct cgroup_subsys *ss;
-	struct css_set *cset;
+	struct css_set *cset = kargs->cset;
 	int i;
 
 	spin_lock_irq(&css_set_lock);
 
 	WARN_ON_ONCE(!list_empty(&child->cg_list));
-	cset = task_css_set(current); /* current is @child's parent */
-	get_css_set(cset);
 	cset->nr_tasks++;
 	css_set_move_task(child, NULL, cset, false);
 
@@ -6026,6 +6159,17 @@ void cgroup_post_fork(struct task_struct *parent, struct task_struct *child)
 	} while_each_subsys_mask();
 
 	cgroup_threadgroup_change_end(parent);
+
+	if (kargs->flags & CLONE_INTO_CGROUP) {
+		mutex_unlock(&cgroup_mutex);
+
+		cgroup_put(kargs->cgrp);
+		kargs->cgrp = NULL;
+	}
+
+	/* Make the new cset the root_cset of the new cgroup namespace. */
+	if (kargs->flags & CLONE_NEWCGROUP)
+		child->nsproxy->cgroup_ns->root_cset = cset;
 }
 
 /**
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 138059eb730d..4e7c8819c8df 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -33,6 +33,7 @@
 #include <linux/atomic.h>
 #include <linux/cgroup.h>
 #include <linux/slab.h>
+#include <linux/sched/task.h>
 
 #define PIDS_MAX (PID_MAX_LIMIT + 1ULL)
 #define PIDS_MAX_STR "max"
@@ -214,13 +215,17 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
  * task_css_check(true) in pids_can_fork() and pids_cancel_fork() relies
  * on cgroup_threadgroup_change_begin() held by the copy_process().
  */
-static int pids_can_fork(struct task_struct *task)
+static int pids_can_fork(struct task_struct *parent, struct task_struct *child,
+			 struct css_set *cset)
 {
 	struct cgroup_subsys_state *css;
 	struct pids_cgroup *pids;
 	int err;
 
-	css = task_css_check(current, pids_cgrp_id, true);
+	if (cset)
+		css = cset->subsys[pids_cgrp_id];
+	else
+		css = task_css_check(current, pids_cgrp_id, true);
 	pids = css_pids(css);
 	err = pids_try_charge(pids, 1);
 	if (err) {
@@ -235,12 +240,15 @@ static int pids_can_fork(struct task_struct *task)
 	return err;
 }
 
-static void pids_cancel_fork(struct task_struct *task)
+static void pids_cancel_fork(struct task_struct *task, struct css_set *cset)
 {
 	struct cgroup_subsys_state *css;
 	struct pids_cgroup *pids;
 
-	css = task_css_check(current, pids_cgrp_id, true);
+	if (cset)
+		css = cset->subsys[pids_cgrp_id];
+	else
+		css = task_css_check(current, pids_cgrp_id, true);
 	pids = css_pids(css);
 	pids_uncharge(pids, 1);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index c76758dbd594..15d6576ad8c0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2171,7 +2171,7 @@ static __latent_entropy struct task_struct *copy_process(
 	 * between here and cgroup_post_fork() if an organisation operation is in
 	 * progress.
 	 */
-	retval = cgroup_can_fork(current, p);
+	retval = cgroup_can_fork(current, p, args);
 	if (retval)
 		goto bad_fork_put_pidfd;
 
@@ -2278,7 +2278,7 @@ static __latent_entropy struct task_struct *copy_process(
 	write_unlock_irq(&tasklist_lock);
 
 	proc_fork_connector(p);
-	cgroup_post_fork(current, p);
+	cgroup_post_fork(current, p, args);
 	perf_event_fork(p);
 
 	trace_task_newtask(p, clone_flags);
@@ -2289,7 +2289,7 @@ static __latent_entropy struct task_struct *copy_process(
 bad_fork_cancel_cgroup:
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
-	cgroup_cancel_fork(current, p);
+	cgroup_cancel_fork(current, p, args);
 bad_fork_put_pidfd:
 	if (clone_flags & CLONE_PIDFD) {
 		fput(pidfile);
@@ -2618,6 +2618,9 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 		     !valid_signal(args.exit_signal)))
 		return -EINVAL;
 
+	if ((args.flags & CLONE_INTO_CGROUP) && args.cgroup < 0)
+		return -EINVAL;
+
 	*kargs = (struct kernel_clone_args){
 		.flags		= args.flags,
 		.pidfd		= u64_to_user_ptr(args.pidfd),
@@ -2628,6 +2631,7 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 		.stack_size	= args.stack_size,
 		.tls		= args.tls,
 		.set_tid_size	= args.set_tid_size,
+		.cgroup		= args.cgroup,
 	};
 
 	if (args.set_tid &&
@@ -2671,7 +2675,8 @@ static inline bool clone3_stack_valid(struct kernel_clone_args *kargs)
 static bool clone3_args_valid(struct kernel_clone_args *kargs)
 {
 	/* Verify that no unknown flags are passed along. */
-	if (kargs->flags & ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND))
+	if (kargs->flags &
+	    ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND | CLONE_INTO_CGROUP))
 		return false;
 
 	/*
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 5/6] clone3: allow spawning processes into cgroups
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Ingo Molnar, Johannes Weiner,
	Li Zefan, Peter Zijlstra, cgroups-u79uwXL29TY76Z2rM5mHXA

This adds support for creating a process in a different cgroup than its
parent. Callers can limit and account processes and threads right from
the moment they are spawned:
- A service manager can directly spawn new services into dedicated
  cgroups.
- A process can be directly created in a frozen cgroup and will be
  frozen as well.
- The initial accounting jitter experienced by process supervisors and
  daemons is eliminated with this.
- Threaded applications or even thread implementations can choose to
  create a specific cgroup layout where each thread is spawned
  directly into a dedicated cgroup.

This feature is limited to the unified hierarchy. Callers need to pass
an directory file descriptor for the target cgroup. The caller can
choose to pass an O_PATH file descriptor. All usual migration
restrictions apply, i.e. there can be no processes in inner nodes. In
general, creating a process directly in a target cgroup adheres to all
migration restrictions.

Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
/* v1 */
Link: https://lore.kernel.org/r/20191218173516.7875-3-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org

/* v2 */
Link: https://lore.kernel.org/r/20191223061504.28716-3-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
- Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
  - prevent deadlock from wrong locking order
- Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>:
  - Rework locking. In the previous patch version we would have already
    acquired the cgroup_threadgroup_rwsem before we grabbed cgroup mutex
    we need to hold when CLONE_INTO_CGROUP is specified. This meant we
    could deadlock with other codepaths that all require it to be done
    the other way around. Fix this by first grabbing cgroup mutex when
    CLONE_INTO_CGROUP is specified and then grabbing
    cgroup_threadgroup_rwsem unconditionally after. This way we don't
    require the cgroup mutex be held in codepaths that don't need it.
  - Switch from mutex_lock() to mutex_lock_killable().

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-5-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
- Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>:
  - s/mutex_lock_killable()/mutex_lock()/ because it should only ever
    be held for a short time:
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index a9fedcfeae4b..d68d3fb6af1d 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -5927,11 +5927,8 @@ static int cgroup_css_set_fork(struct task_struct *parent,
            struct super_block *sb;
            struct file *f;

    -       if (kargs->flags & CLONE_INTO_CGROUP) {
    -               ret = mutex_lock_killable(&cgroup_mutex);
    -               if (ret)
    -                       return ret;
    -       }
    +       if (kargs->flags & CLONE_INTO_CGROUP)
    +               mutex_lock(&cgroup_mutex);

            cgroup_threadgroup_change_begin(parent);
  - s/task_cgroup_from_root/cset->dfl_cgrp/:
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index d68d3fb6af1d..3ceef006d144 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -5922,7 +5922,7 @@ static int cgroup_css_set_fork(struct task_struct *parent,
            __acquires(&cgroup_mutex) __acquires(&cgroup_threadgroup_rwsem)
     {
            int ret;
    -       struct cgroup *dst_cgrp = NULL, *src_cgrp;
    +       struct cgroup *dst_cgrp = NULL;
            struct css_set *cset;
            struct super_block *sb;
            struct file *f;
    @@ -5956,11 +5956,7 @@ static int cgroup_css_set_fork(struct task_struct *parent,
                    goto err;
            }

    -       spin_lock_irq(&css_set_lock);
    -       src_cgrp = task_cgroup_from_root(parent, &cgrp_dfl_cgrp);
    -       spin_unlock_irq(&css_set_lock);
    -
    -       ret = cgroup_attach_permissions(src_cgrp, dst_cgrp, sb,
    +       ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb,
                                            !!(kargs->flags & CLONE_THREAD));
            if (ret)
                    goto err;
  - pass struct css_set instead of struct kernel_clone_args into cgroup
    fork subsystem callbacks:
    diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
    index cd848c6bac4a..058bb16d073f 100644
    --- a/include/linux/cgroup-defs.h
    +++ b/include/linux/cgroup-defs.h
    @@ -630,9 +630,8 @@ struct cgroup_subsys {
     	void (*attach)(struct cgroup_taskset *tset);
     	void (*post_attach)(void);
     	int (*can_fork)(struct task_struct *parent, struct task_struct *child,
    -			struct kernel_clone_args *kargs);
    -	void (*cancel_fork)(struct task_struct *child,
    -			    struct kernel_clone_args *kargs);
    +			struct css_set *cset);
    +	void (*cancel_fork)(struct task_struct *child, struct css_set *cset);
     	void (*fork)(struct task_struct *task);
     	void (*exit)(struct task_struct *task);
     	void (*release)(struct task_struct *task);
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index 3ceef006d144..2ac1c37a3fcb 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -6044,7 +6044,7 @@ int cgroup_can_fork(struct task_struct *parent, struct task_struct *child,
     		return ret;

     	do_each_subsys_mask(ss, i, have_canfork_callback) {
    -		ret = ss->can_fork(parent, child, kargs);
    +		ret = ss->can_fork(parent, child, kargs->cset);
     		if (ret)
     			goto out_revert;
     	} while_each_subsys_mask();
    @@ -6056,7 +6056,7 @@ int cgroup_can_fork(struct task_struct *parent, struct task_struct *child,
     		if (j >= i)
     			break;
     		if (ss->cancel_fork)
    -			ss->cancel_fork(child, kargs);
    +			ss->cancel_fork(child, kargs->cset);
     	}

     	cgroup_css_set_put_fork(parent, kargs);
    @@ -6082,7 +6082,7 @@ void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child,

     	for_each_subsys(ss, i)
     		if (ss->cancel_fork)
    -			ss->cancel_fork(child, kargs);
    +			ss->cancel_fork(child, kargs->cset);

     	cgroup_css_set_put_fork(parent, kargs);
     }
    diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
    index e5955bc1fb00..4e7c8819c8df 100644
    --- a/kernel/cgroup/pids.c
    +++ b/kernel/cgroup/pids.c
    @@ -216,20 +216,16 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
      * on cgroup_threadgroup_change_begin() held by the copy_process().
      */
     static int pids_can_fork(struct task_struct *parent, struct task_struct *child,
    -			 struct kernel_clone_args *args)
    +			 struct css_set *cset)
     {
    -	struct css_set *new_cset = NULL;
     	struct cgroup_subsys_state *css;
     	struct pids_cgroup *pids;
     	int err;

    -	if (args)
    -		new_cset = args->cset;
    -
    -	if (!new_cset)
    -		css = task_css_check(current, pids_cgrp_id, true);
    +	if (cset)
    +		css = cset->subsys[pids_cgrp_id];
     	else
    -		css = new_cset->subsys[pids_cgrp_id];
    +		css = task_css_check(current, pids_cgrp_id, true);
     	pids = css_pids(css);
     	err = pids_try_charge(pids, 1);
     	if (err) {
    @@ -244,20 +240,15 @@ static int pids_can_fork(struct task_struct *parent, struct task_struct *child,
     	return err;
     }

    -static void pids_cancel_fork(struct task_struct *task,
    -			     struct kernel_clone_args *args)
    +static void pids_cancel_fork(struct task_struct *task, struct css_set *cset)
     {
    -	struct css_set *new_cset = NULL;
     	struct cgroup_subsys_state *css;
     	struct pids_cgroup *pids;

    -	if (args)
    -		new_cset = args->cset;
    -
    -	if (!new_cset)
    -		css = task_css_check(current, pids_cgrp_id, true);
    +	if (cset)
    +		css = cset->subsys[pids_cgrp_id];
     	else
    -		css = new_cset->subsys[pids_cgrp_id];
    +		css = task_css_check(current, pids_cgrp_id, true);
     	pids = css_pids(css);
     	pids_uncharge(pids, 1);
     }
- Michal Koutný <mkoutny-IBi9RG/b67k@public.gmane.org>:
  - update comment for cgroup_fork()
  - if CLONE_NEWCGROUP and CLONE_INTO_CGROUP is requested, set the
    root_cset of the new cgroup namespace to the child's cset

/* v4 */
- Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>:
  - verify that we can write to the target cgroup since we're not going through
    the vfs layer which would do it for us
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index 61d1a6cd0059..6b38b2545667 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -5966,6 +5966,15 @@ static int cgroup_css_set_fork(struct task_struct *parent,
                    goto err;
            }

    +       /*
    +        * Verify that we can the target cgroup is writable for us. This is
    +        * usally done by the vfs layer but since we're not going through the
    +        * vfs layer here we need to do it.
    +        */
    +       ret = cgroup_may_write(dst_cgrp, sb);
    +       if (ret)
    +               goto err;
    +
            ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb,
                                            !!(kargs->flags & CLONE_THREAD));
            if (ret)
---
 include/linux/cgroup-defs.h |   6 +-
 include/linux/cgroup.h      |  20 ++--
 include/linux/sched/task.h  |   4 +
 include/uapi/linux/sched.h  |   5 +
 kernel/cgroup/cgroup.c      | 194 +++++++++++++++++++++++++++++++-----
 kernel/cgroup/pids.c        |  16 ++-
 kernel/fork.c               |  13 ++-
 7 files changed, 217 insertions(+), 41 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 63097cb243cb..058bb16d073f 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -33,6 +33,7 @@ struct kernfs_ops;
 struct kernfs_open_file;
 struct seq_file;
 struct poll_table_struct;
+struct kernel_clone_args;
 
 #define MAX_CGROUP_TYPE_NAMELEN 32
 #define MAX_CGROUP_ROOT_NAMELEN 64
@@ -628,8 +629,9 @@ struct cgroup_subsys {
 	void (*cancel_attach)(struct cgroup_taskset *tset);
 	void (*attach)(struct cgroup_taskset *tset);
 	void (*post_attach)(void);
-	int (*can_fork)(struct task_struct *task);
-	void (*cancel_fork)(struct task_struct *task);
+	int (*can_fork)(struct task_struct *parent, struct task_struct *child,
+			struct css_set *cset);
+	void (*cancel_fork)(struct task_struct *child, struct css_set *cset);
 	void (*fork)(struct task_struct *task);
 	void (*exit)(struct task_struct *task);
 	void (*release)(struct task_struct *task);
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 5349465bfac1..dfd5e095f4ee 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -27,6 +27,8 @@
 
 #include <linux/cgroup-defs.h>
 
+struct kernel_clone_args;
+
 #ifdef CONFIG_CGROUPS
 
 /*
@@ -122,11 +124,14 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 
 void cgroup_fork(struct task_struct *p);
 extern int cgroup_can_fork(struct task_struct *parent,
-			   struct task_struct *child);
+			   struct task_struct *child,
+			   struct kernel_clone_args *kargs);
 extern void cgroup_cancel_fork(struct task_struct *parent,
-			       struct task_struct *child);
+			       struct task_struct *child,
+			       struct kernel_clone_args *kargs);
 extern void cgroup_post_fork(struct task_struct *parent,
-			     struct task_struct *child);
+			     struct task_struct *child,
+			     struct kernel_clone_args *kargs);
 void cgroup_exit(struct task_struct *p);
 void cgroup_release(struct task_struct *p);
 void cgroup_free(struct task_struct *p);
@@ -711,11 +716,14 @@ static inline int cgroupstats_build(struct cgroupstats *stats,
 
 static inline void cgroup_fork(struct task_struct *p) {}
 static inline int cgroup_can_fork(struct task_struct *parent,
-				  struct task_struct *child) { return 0; }
+				  struct task_struct *child,
+				  struct kernel_clone_args *kargs) { return 0; }
 static inline void cgroup_cancel_fork(struct task_struct *parent,
-			       struct task_struct *child) {};
+				      struct task_struct *child,
+				      struct kernel_clone_args *kargs) {};
 static inline void cgroup_post_fork(struct task_struct *parent,
-				    struct task_struct *child) {};
+				    struct task_struct *child,
+				    struct kernel_clone_args *kargs) {};
 static inline void cgroup_exit(struct task_struct *p) {}
 static inline void cgroup_release(struct task_struct *p) {}
 static inline void cgroup_free(struct task_struct *p) {}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index f1879884238e..38359071236a 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -13,6 +13,7 @@
 struct task_struct;
 struct rusage;
 union thread_union;
+struct css_set;
 
 /* All the bits taken by the old clone syscall. */
 #define CLONE_LEGACY_FLAGS 0xffffffffULL
@@ -29,6 +30,9 @@ struct kernel_clone_args {
 	pid_t *set_tid;
 	/* Number of elements in *set_tid */
 	size_t set_tid_size;
+	int cgroup;
+	struct cgroup *cgrp;
+	struct css_set *cset;
 };
 
 /*
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 4a0217832464..08620c220f30 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -35,6 +35,7 @@
 
 /* Flags for the clone3() syscall. */
 #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
+#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
 
 #ifndef __ASSEMBLY__
 /**
@@ -75,6 +76,8 @@
  * @set_tid_size: This defines the size of the array referenced
  *                in @set_tid. This cannot be larger than the
  *                kernel's limit of nested PID namespaces.
+ * @cgroup:       If CLONE_INTO_CGROUP is specified set this to
+ *                a file descriptor for the cgroup.
  *
  * The structure is versioned by size and thus extensible.
  * New struct members must go at the end of the struct and
@@ -91,11 +94,13 @@ struct clone_args {
 	__aligned_u64 tls;
 	__aligned_u64 set_tid;
 	__aligned_u64 set_tid_size;
+	__aligned_u64 cgroup;
 };
 #endif
 
 #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
 #define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
+#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
 
 /*
  * Scheduling policies
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index b110b435ae49..276cb6053987 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5883,8 +5883,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
  * @child: pointer to task_struct of forking parent process.
  *
  * A task is associated with the init_css_set until cgroup_post_fork()
- * attaches it to the parent's css_set.  Empty cg_list indicates that
- * @child isn't holding reference to its css_set.
+ * attaches it to the target css_set.
  */
 void cgroup_fork(struct task_struct *child)
 {
@@ -5910,26 +5909,160 @@ static struct cgroup *cgroup_get_from_file(struct file *f)
 	return cgrp;
 }
 
+/**
+ * cgroup_css_set_fork - find or create a css_set for a child process
+ * @parent: the parent of the child process
+ * @kargs: the arguments passed to create the child process
+ *
+ * This functions finds or creates a new css_set which the child
+ * process will be attached to in cgroup_post_fork(). By default,
+ * the child process will be given the same css_set as its parent.
+ *
+ * If CLONE_INTO_CGROUP is specified this function will try to find an
+ * existing css_set which includes the request cgroup and if not create
+ * a new css_set that the child will be attached to. If this function
+ * succeeds it will hold cgroup_threadgroup_rwsem on return. If
+ * CLONE_INTO_CGROUP is requested this function will grab cgroup mutex
+ * before grabbing cgroup_threadgroup_rwsem and will hold a reference
+ * to the target cgroup.
+ */
+static int cgroup_css_set_fork(struct task_struct *parent,
+			       struct kernel_clone_args *kargs)
+	__acquires(&cgroup_mutex) __acquires(&cgroup_threadgroup_rwsem)
+{
+	int ret;
+	struct cgroup *dst_cgrp = NULL;
+	struct css_set *cset;
+	struct super_block *sb;
+	struct file *f;
+
+	if (kargs->flags & CLONE_INTO_CGROUP)
+		mutex_lock(&cgroup_mutex);
+
+	cgroup_threadgroup_change_begin(parent);
+
+	spin_lock_irq(&css_set_lock);
+	cset = task_css_set(parent);
+	get_css_set(cset);
+	spin_unlock_irq(&css_set_lock);
+
+	if (!(kargs->flags & CLONE_INTO_CGROUP)) {
+		kargs->cset = cset;
+		return 0;
+	}
+
+	f = fget_raw(kargs->cgroup);
+	if (!f) {
+		ret = -EBADF;
+		goto err;
+	}
+	sb = f->f_path.dentry->d_sb;
+
+	dst_cgrp = cgroup_get_from_file(f);
+	if (IS_ERR(dst_cgrp)) {
+		ret = PTR_ERR(dst_cgrp);
+		dst_cgrp = NULL;
+		goto err;
+	}
+
+	/*
+	 * Verify that we can the target cgroup is writable for us. This is
+	 * usally done by the vfs layer but since we're not going through the
+	 * vfs layer here we need to do it.
+	 */
+	ret = cgroup_may_write(dst_cgrp, sb);
+	if (ret)
+		goto err;
+
+	ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb,
+					!!(kargs->flags & CLONE_THREAD));
+	if (ret)
+		goto err;
+
+	kargs->cset = find_css_set(cset, dst_cgrp);
+	if (!kargs->cset) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	if (cgroup_is_dead(dst_cgrp)) {
+		ret = -ENODEV;
+		goto err;
+	}
+
+	put_css_set(cset);
+	fput(f);
+	kargs->cgrp = dst_cgrp;
+	return ret;
+
+err:
+	cgroup_threadgroup_change_end(parent);
+	mutex_unlock(&cgroup_mutex);
+	if (f)
+		fput(f);
+	if (dst_cgrp)
+		cgroup_put(dst_cgrp);
+	put_css_set(cset);
+	return ret;
+}
+
+/**
+ * cgroup_css_set_put_fork - drop references we took during fork
+ * @parent: the parent of the child process
+ * @kargs: the arguments passed to create the child process
+ *
+ * Drop references to the prepared css_set and target cgroup if
+ * CLONE_INTO_CGROUP was requested. This function can only be
+ * called before fork()'s point of no return.
+ */
+static void cgroup_css_set_put_fork(struct task_struct *parent,
+				    struct kernel_clone_args *kargs)
+	__releases(&cgroup_threadgroup_rwsem) __releases(&cgroup_mutex)
+{
+	cgroup_threadgroup_change_end(parent);
+
+	if (kargs->flags & CLONE_INTO_CGROUP) {
+		struct cgroup *cgrp = kargs->cgrp;
+		struct css_set *cset = kargs->cset;
+
+		mutex_unlock(&cgroup_mutex);
+
+		if (cset) {
+			put_css_set(cset);
+			kargs->cset = NULL;
+		}
+
+		if (cgrp) {
+			cgroup_put(cgrp);
+			kargs->cgrp = NULL;
+		}
+	}
+}
+
 /**
  * cgroup_can_fork - called on a new task before the process is exposed
  * @parent: the parent process of @child
  * @child: the child process of @parent
  * @kargs: the arguments passed to create the child process
  *
+ * This prepares a new css_set for the child process which the child will
+ * be attached to in cgroup_post_fork().
  * This calls the subsystem can_fork() callbacks. If the cgroup_can_fork()
  * callback returns an error, the fork aborts with that error code. This
  * allows for a cgroup subsystem to conditionally allow or deny new forks.
  */
-int cgroup_can_fork(struct task_struct *parent, struct task_struct *child)
-	__acquires(&cgroup_threadgroup_rwsem) __releases(&cgroup_threadgroup_rwsem)
+int cgroup_can_fork(struct task_struct *parent, struct task_struct *child,
+			struct kernel_clone_args *kargs)
 {
 	struct cgroup_subsys *ss;
 	int i, j, ret;
 
-	cgroup_threadgroup_change_begin(parent);
+	ret = cgroup_css_set_fork(parent, kargs);
+	if (ret)
+		return ret;
 
 	do_each_subsys_mask(ss, i, have_canfork_callback) {
-		ret = ss->can_fork(child);
+		ret = ss->can_fork(parent, child, kargs->cset);
 		if (ret)
 			goto out_revert;
 	} while_each_subsys_mask();
@@ -5941,34 +6074,35 @@ int cgroup_can_fork(struct task_struct *parent, struct task_struct *child)
 		if (j >= i)
 			break;
 		if (ss->cancel_fork)
-			ss->cancel_fork(child);
+			ss->cancel_fork(child, kargs->cset);
 	}
 
-	cgroup_threadgroup_change_end(parent);
+	cgroup_css_set_put_fork(parent, kargs);
 
 	return ret;
 }
 
 /**
-  * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
-  * @parent: the parent process of @child
-  * @child: the child process of @parent
-  * @kargs: the arguments passed to create the child process
-  *
-  * This calls the cancel_fork() callbacks if a fork failed *after*
-  * cgroup_can_fork() succeded.
-  */
-void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child)
-	__releases(&cgroup_threadgroup_rwsem)
+ * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
+ * @parent: the parent process of @child
+ * @child: the child process of @parent
+ * @kargs: the arguments passed to create the child process
+ *
+ * This calls the cancel_fork() callbacks if a fork failed *after*
+ * cgroup_can_fork() succeded and cleans up references we took to
+ * prepare a new css_set for the child process in cgroup_can_fork().
+ */
+void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child,
+			struct kernel_clone_args *kargs)
 {
 	struct cgroup_subsys *ss;
 	int i;
 
 	for_each_subsys(ss, i)
 		if (ss->cancel_fork)
-			ss->cancel_fork(child);
+			ss->cancel_fork(child, kargs->cset);
 
-	cgroup_threadgroup_change_end(parent);
+	cgroup_css_set_put_fork(parent, kargs);
 }
 
 /**
@@ -5980,18 +6114,17 @@ void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child)
  * Attach the child process to its css_set calling the subsystem fork()
  * callbacks.
  */
-void cgroup_post_fork(struct task_struct *parent, struct task_struct *child)
-	__releases(&cgroup_threadgroup_rwsem)
+void cgroup_post_fork(struct task_struct *parent, struct task_struct *child,
+		      struct kernel_clone_args *kargs)
+	__releases(&cgroup_threadgroup_rwsem) __releases(&cgroup_mutex)
 {
 	struct cgroup_subsys *ss;
-	struct css_set *cset;
+	struct css_set *cset = kargs->cset;
 	int i;
 
 	spin_lock_irq(&css_set_lock);
 
 	WARN_ON_ONCE(!list_empty(&child->cg_list));
-	cset = task_css_set(current); /* current is @child's parent */
-	get_css_set(cset);
 	cset->nr_tasks++;
 	css_set_move_task(child, NULL, cset, false);
 
@@ -6026,6 +6159,17 @@ void cgroup_post_fork(struct task_struct *parent, struct task_struct *child)
 	} while_each_subsys_mask();
 
 	cgroup_threadgroup_change_end(parent);
+
+	if (kargs->flags & CLONE_INTO_CGROUP) {
+		mutex_unlock(&cgroup_mutex);
+
+		cgroup_put(kargs->cgrp);
+		kargs->cgrp = NULL;
+	}
+
+	/* Make the new cset the root_cset of the new cgroup namespace. */
+	if (kargs->flags & CLONE_NEWCGROUP)
+		child->nsproxy->cgroup_ns->root_cset = cset;
 }
 
 /**
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 138059eb730d..4e7c8819c8df 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -33,6 +33,7 @@
 #include <linux/atomic.h>
 #include <linux/cgroup.h>
 #include <linux/slab.h>
+#include <linux/sched/task.h>
 
 #define PIDS_MAX (PID_MAX_LIMIT + 1ULL)
 #define PIDS_MAX_STR "max"
@@ -214,13 +215,17 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
  * task_css_check(true) in pids_can_fork() and pids_cancel_fork() relies
  * on cgroup_threadgroup_change_begin() held by the copy_process().
  */
-static int pids_can_fork(struct task_struct *task)
+static int pids_can_fork(struct task_struct *parent, struct task_struct *child,
+			 struct css_set *cset)
 {
 	struct cgroup_subsys_state *css;
 	struct pids_cgroup *pids;
 	int err;
 
-	css = task_css_check(current, pids_cgrp_id, true);
+	if (cset)
+		css = cset->subsys[pids_cgrp_id];
+	else
+		css = task_css_check(current, pids_cgrp_id, true);
 	pids = css_pids(css);
 	err = pids_try_charge(pids, 1);
 	if (err) {
@@ -235,12 +240,15 @@ static int pids_can_fork(struct task_struct *task)
 	return err;
 }
 
-static void pids_cancel_fork(struct task_struct *task)
+static void pids_cancel_fork(struct task_struct *task, struct css_set *cset)
 {
 	struct cgroup_subsys_state *css;
 	struct pids_cgroup *pids;
 
-	css = task_css_check(current, pids_cgrp_id, true);
+	if (cset)
+		css = cset->subsys[pids_cgrp_id];
+	else
+		css = task_css_check(current, pids_cgrp_id, true);
 	pids = css_pids(css);
 	pids_uncharge(pids, 1);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index c76758dbd594..15d6576ad8c0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2171,7 +2171,7 @@ static __latent_entropy struct task_struct *copy_process(
 	 * between here and cgroup_post_fork() if an organisation operation is in
 	 * progress.
 	 */
-	retval = cgroup_can_fork(current, p);
+	retval = cgroup_can_fork(current, p, args);
 	if (retval)
 		goto bad_fork_put_pidfd;
 
@@ -2278,7 +2278,7 @@ static __latent_entropy struct task_struct *copy_process(
 	write_unlock_irq(&tasklist_lock);
 
 	proc_fork_connector(p);
-	cgroup_post_fork(current, p);
+	cgroup_post_fork(current, p, args);
 	perf_event_fork(p);
 
 	trace_task_newtask(p, clone_flags);
@@ -2289,7 +2289,7 @@ static __latent_entropy struct task_struct *copy_process(
 bad_fork_cancel_cgroup:
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
-	cgroup_cancel_fork(current, p);
+	cgroup_cancel_fork(current, p, args);
 bad_fork_put_pidfd:
 	if (clone_flags & CLONE_PIDFD) {
 		fput(pidfile);
@@ -2618,6 +2618,9 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 		     !valid_signal(args.exit_signal)))
 		return -EINVAL;
 
+	if ((args.flags & CLONE_INTO_CGROUP) && args.cgroup < 0)
+		return -EINVAL;
+
 	*kargs = (struct kernel_clone_args){
 		.flags		= args.flags,
 		.pidfd		= u64_to_user_ptr(args.pidfd),
@@ -2628,6 +2631,7 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 		.stack_size	= args.stack_size,
 		.tls		= args.tls,
 		.set_tid_size	= args.set_tid_size,
+		.cgroup		= args.cgroup,
 	};
 
 	if (args.set_tid &&
@@ -2671,7 +2675,8 @@ static inline bool clone3_stack_valid(struct kernel_clone_args *kargs)
 static bool clone3_args_valid(struct kernel_clone_args *kargs)
 {
 	/* Verify that no unknown flags are passed along. */
-	if (kargs->flags & ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND))
+	if (kargs->flags &
+	    ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND | CLONE_INTO_CGROUP))
 		return false;
 
 	/*
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 5/6] clone3: allow spawning processes into cgroups
@ 2020-01-17 18:12   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Ingo Molnar, Johannes Weiner,
	Li Zefan, Peter Zijlstra, cgroups-u79uwXL29TY76Z2rM5mHXA

This adds support for creating a process in a different cgroup than its
parent. Callers can limit and account processes and threads right from
the moment they are spawned:
- A service manager can directly spawn new services into dedicated
  cgroups.
- A process can be directly created in a frozen cgroup and will be
  frozen as well.
- The initial accounting jitter experienced by process supervisors and
  daemons is eliminated with this.
- Threaded applications or even thread implementations can choose to
  create a specific cgroup layout where each thread is spawned
  directly into a dedicated cgroup.

This feature is limited to the unified hierarchy. Callers need to pass
an directory file descriptor for the target cgroup. The caller can
choose to pass an O_PATH file descriptor. All usual migration
restrictions apply, i.e. there can be no processes in inner nodes. In
general, creating a process directly in a target cgroup adheres to all
migration restrictions.

Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
/* v1 */
Link: https://lore.kernel.org/r/20191218173516.7875-3-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org

/* v2 */
Link: https://lore.kernel.org/r/20191223061504.28716-3-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
- Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
  - prevent deadlock from wrong locking order
- Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>:
  - Rework locking. In the previous patch version we would have already
    acquired the cgroup_threadgroup_rwsem before we grabbed cgroup mutex
    we need to hold when CLONE_INTO_CGROUP is specified. This meant we
    could deadlock with other codepaths that all require it to be done
    the other way around. Fix this by first grabbing cgroup mutex when
    CLONE_INTO_CGROUP is specified and then grabbing
    cgroup_threadgroup_rwsem unconditionally after. This way we don't
    require the cgroup mutex be held in codepaths that don't need it.
  - Switch from mutex_lock() to mutex_lock_killable().

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-5-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org
- Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>:
  - s/mutex_lock_killable()/mutex_lock()/ because it should only ever
    be held for a short time:
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index a9fedcfeae4b..d68d3fb6af1d 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -5927,11 +5927,8 @@ static int cgroup_css_set_fork(struct task_struct *parent,
            struct super_block *sb;
            struct file *f;

    -       if (kargs->flags & CLONE_INTO_CGROUP) {
    -               ret = mutex_lock_killable(&cgroup_mutex);
    -               if (ret)
    -                       return ret;
    -       }
    +       if (kargs->flags & CLONE_INTO_CGROUP)
    +               mutex_lock(&cgroup_mutex);

            cgroup_threadgroup_change_begin(parent);
  - s/task_cgroup_from_root/cset->dfl_cgrp/:
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index d68d3fb6af1d..3ceef006d144 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -5922,7 +5922,7 @@ static int cgroup_css_set_fork(struct task_struct *parent,
            __acquires(&cgroup_mutex) __acquires(&cgroup_threadgroup_rwsem)
     {
            int ret;
    -       struct cgroup *dst_cgrp = NULL, *src_cgrp;
    +       struct cgroup *dst_cgrp = NULL;
            struct css_set *cset;
            struct super_block *sb;
            struct file *f;
    @@ -5956,11 +5956,7 @@ static int cgroup_css_set_fork(struct task_struct *parent,
                    goto err;
            }

    -       spin_lock_irq(&css_set_lock);
    -       src_cgrp = task_cgroup_from_root(parent, &cgrp_dfl_cgrp);
    -       spin_unlock_irq(&css_set_lock);
    -
    -       ret = cgroup_attach_permissions(src_cgrp, dst_cgrp, sb,
    +       ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb,
                                            !!(kargs->flags & CLONE_THREAD));
            if (ret)
                    goto err;
  - pass struct css_set instead of struct kernel_clone_args into cgroup
    fork subsystem callbacks:
    diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
    index cd848c6bac4a..058bb16d073f 100644
    --- a/include/linux/cgroup-defs.h
    +++ b/include/linux/cgroup-defs.h
    @@ -630,9 +630,8 @@ struct cgroup_subsys {
     	void (*attach)(struct cgroup_taskset *tset);
     	void (*post_attach)(void);
     	int (*can_fork)(struct task_struct *parent, struct task_struct *child,
    -			struct kernel_clone_args *kargs);
    -	void (*cancel_fork)(struct task_struct *child,
    -			    struct kernel_clone_args *kargs);
    +			struct css_set *cset);
    +	void (*cancel_fork)(struct task_struct *child, struct css_set *cset);
     	void (*fork)(struct task_struct *task);
     	void (*exit)(struct task_struct *task);
     	void (*release)(struct task_struct *task);
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index 3ceef006d144..2ac1c37a3fcb 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -6044,7 +6044,7 @@ int cgroup_can_fork(struct task_struct *parent, struct task_struct *child,
     		return ret;

     	do_each_subsys_mask(ss, i, have_canfork_callback) {
    -		ret = ss->can_fork(parent, child, kargs);
    +		ret = ss->can_fork(parent, child, kargs->cset);
     		if (ret)
     			goto out_revert;
     	} while_each_subsys_mask();
    @@ -6056,7 +6056,7 @@ int cgroup_can_fork(struct task_struct *parent, struct task_struct *child,
     		if (j >= i)
     			break;
     		if (ss->cancel_fork)
    -			ss->cancel_fork(child, kargs);
    +			ss->cancel_fork(child, kargs->cset);
     	}

     	cgroup_css_set_put_fork(parent, kargs);
    @@ -6082,7 +6082,7 @@ void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child,

     	for_each_subsys(ss, i)
     		if (ss->cancel_fork)
    -			ss->cancel_fork(child, kargs);
    +			ss->cancel_fork(child, kargs->cset);

     	cgroup_css_set_put_fork(parent, kargs);
     }
    diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
    index e5955bc1fb00..4e7c8819c8df 100644
    --- a/kernel/cgroup/pids.c
    +++ b/kernel/cgroup/pids.c
    @@ -216,20 +216,16 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
      * on cgroup_threadgroup_change_begin() held by the copy_process().
      */
     static int pids_can_fork(struct task_struct *parent, struct task_struct *child,
    -			 struct kernel_clone_args *args)
    +			 struct css_set *cset)
     {
    -	struct css_set *new_cset = NULL;
     	struct cgroup_subsys_state *css;
     	struct pids_cgroup *pids;
     	int err;

    -	if (args)
    -		new_cset = args->cset;
    -
    -	if (!new_cset)
    -		css = task_css_check(current, pids_cgrp_id, true);
    +	if (cset)
    +		css = cset->subsys[pids_cgrp_id];
     	else
    -		css = new_cset->subsys[pids_cgrp_id];
    +		css = task_css_check(current, pids_cgrp_id, true);
     	pids = css_pids(css);
     	err = pids_try_charge(pids, 1);
     	if (err) {
    @@ -244,20 +240,15 @@ static int pids_can_fork(struct task_struct *parent, struct task_struct *child,
     	return err;
     }

    -static void pids_cancel_fork(struct task_struct *task,
    -			     struct kernel_clone_args *args)
    +static void pids_cancel_fork(struct task_struct *task, struct css_set *cset)
     {
    -	struct css_set *new_cset = NULL;
     	struct cgroup_subsys_state *css;
     	struct pids_cgroup *pids;

    -	if (args)
    -		new_cset = args->cset;
    -
    -	if (!new_cset)
    -		css = task_css_check(current, pids_cgrp_id, true);
    +	if (cset)
    +		css = cset->subsys[pids_cgrp_id];
     	else
    -		css = new_cset->subsys[pids_cgrp_id];
    +		css = task_css_check(current, pids_cgrp_id, true);
     	pids = css_pids(css);
     	pids_uncharge(pids, 1);
     }
- Michal Koutn√Ω <mkoutny-IBi9RG/b67k@public.gmane.org>:
  - update comment for cgroup_fork()
  - if CLONE_NEWCGROUP and CLONE_INTO_CGROUP is requested, set the
    root_cset of the new cgroup namespace to the child's cset

/* v4 */
- Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>:
  - verify that we can write to the target cgroup since we're not going through
    the vfs layer which would do it for us
    diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
    index 61d1a6cd0059..6b38b2545667 100644
    --- a/kernel/cgroup/cgroup.c
    +++ b/kernel/cgroup/cgroup.c
    @@ -5966,6 +5966,15 @@ static int cgroup_css_set_fork(struct task_struct *parent,
                    goto err;
            }

    +       /*
    +        * Verify that we can the target cgroup is writable for us. This is
    +        * usally done by the vfs layer but since we're not going through the
    +        * vfs layer here we need to do it.
    +        */
    +       ret = cgroup_may_write(dst_cgrp, sb);
    +       if (ret)
    +               goto err;
    +
            ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb,
                                            !!(kargs->flags & CLONE_THREAD));
            if (ret)
---
 include/linux/cgroup-defs.h |   6 +-
 include/linux/cgroup.h      |  20 ++--
 include/linux/sched/task.h  |   4 +
 include/uapi/linux/sched.h  |   5 +
 kernel/cgroup/cgroup.c      | 194 +++++++++++++++++++++++++++++++-----
 kernel/cgroup/pids.c        |  16 ++-
 kernel/fork.c               |  13 ++-
 7 files changed, 217 insertions(+), 41 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 63097cb243cb..058bb16d073f 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -33,6 +33,7 @@ struct kernfs_ops;
 struct kernfs_open_file;
 struct seq_file;
 struct poll_table_struct;
+struct kernel_clone_args;
 
 #define MAX_CGROUP_TYPE_NAMELEN 32
 #define MAX_CGROUP_ROOT_NAMELEN 64
@@ -628,8 +629,9 @@ struct cgroup_subsys {
 	void (*cancel_attach)(struct cgroup_taskset *tset);
 	void (*attach)(struct cgroup_taskset *tset);
 	void (*post_attach)(void);
-	int (*can_fork)(struct task_struct *task);
-	void (*cancel_fork)(struct task_struct *task);
+	int (*can_fork)(struct task_struct *parent, struct task_struct *child,
+			struct css_set *cset);
+	void (*cancel_fork)(struct task_struct *child, struct css_set *cset);
 	void (*fork)(struct task_struct *task);
 	void (*exit)(struct task_struct *task);
 	void (*release)(struct task_struct *task);
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 5349465bfac1..dfd5e095f4ee 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -27,6 +27,8 @@
 
 #include <linux/cgroup-defs.h>
 
+struct kernel_clone_args;
+
 #ifdef CONFIG_CGROUPS
 
 /*
@@ -122,11 +124,14 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 
 void cgroup_fork(struct task_struct *p);
 extern int cgroup_can_fork(struct task_struct *parent,
-			   struct task_struct *child);
+			   struct task_struct *child,
+			   struct kernel_clone_args *kargs);
 extern void cgroup_cancel_fork(struct task_struct *parent,
-			       struct task_struct *child);
+			       struct task_struct *child,
+			       struct kernel_clone_args *kargs);
 extern void cgroup_post_fork(struct task_struct *parent,
-			     struct task_struct *child);
+			     struct task_struct *child,
+			     struct kernel_clone_args *kargs);
 void cgroup_exit(struct task_struct *p);
 void cgroup_release(struct task_struct *p);
 void cgroup_free(struct task_struct *p);
@@ -711,11 +716,14 @@ static inline int cgroupstats_build(struct cgroupstats *stats,
 
 static inline void cgroup_fork(struct task_struct *p) {}
 static inline int cgroup_can_fork(struct task_struct *parent,
-				  struct task_struct *child) { return 0; }
+				  struct task_struct *child,
+				  struct kernel_clone_args *kargs) { return 0; }
 static inline void cgroup_cancel_fork(struct task_struct *parent,
-			       struct task_struct *child) {};
+				      struct task_struct *child,
+				      struct kernel_clone_args *kargs) {};
 static inline void cgroup_post_fork(struct task_struct *parent,
-				    struct task_struct *child) {};
+				    struct task_struct *child,
+				    struct kernel_clone_args *kargs) {};
 static inline void cgroup_exit(struct task_struct *p) {}
 static inline void cgroup_release(struct task_struct *p) {}
 static inline void cgroup_free(struct task_struct *p) {}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index f1879884238e..38359071236a 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -13,6 +13,7 @@
 struct task_struct;
 struct rusage;
 union thread_union;
+struct css_set;
 
 /* All the bits taken by the old clone syscall. */
 #define CLONE_LEGACY_FLAGS 0xffffffffULL
@@ -29,6 +30,9 @@ struct kernel_clone_args {
 	pid_t *set_tid;
 	/* Number of elements in *set_tid */
 	size_t set_tid_size;
+	int cgroup;
+	struct cgroup *cgrp;
+	struct css_set *cset;
 };
 
 /*
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 4a0217832464..08620c220f30 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -35,6 +35,7 @@
 
 /* Flags for the clone3() syscall. */
 #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
+#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
 
 #ifndef __ASSEMBLY__
 /**
@@ -75,6 +76,8 @@
  * @set_tid_size: This defines the size of the array referenced
  *                in @set_tid. This cannot be larger than the
  *                kernel's limit of nested PID namespaces.
+ * @cgroup:       If CLONE_INTO_CGROUP is specified set this to
+ *                a file descriptor for the cgroup.
  *
  * The structure is versioned by size and thus extensible.
  * New struct members must go at the end of the struct and
@@ -91,11 +94,13 @@ struct clone_args {
 	__aligned_u64 tls;
 	__aligned_u64 set_tid;
 	__aligned_u64 set_tid_size;
+	__aligned_u64 cgroup;
 };
 #endif
 
 #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
 #define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
+#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
 
 /*
  * Scheduling policies
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index b110b435ae49..276cb6053987 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5883,8 +5883,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
  * @child: pointer to task_struct of forking parent process.
  *
  * A task is associated with the init_css_set until cgroup_post_fork()
- * attaches it to the parent's css_set.  Empty cg_list indicates that
- * @child isn't holding reference to its css_set.
+ * attaches it to the target css_set.
  */
 void cgroup_fork(struct task_struct *child)
 {
@@ -5910,26 +5909,160 @@ static struct cgroup *cgroup_get_from_file(struct file *f)
 	return cgrp;
 }
 
+/**
+ * cgroup_css_set_fork - find or create a css_set for a child process
+ * @parent: the parent of the child process
+ * @kargs: the arguments passed to create the child process
+ *
+ * This functions finds or creates a new css_set which the child
+ * process will be attached to in cgroup_post_fork(). By default,
+ * the child process will be given the same css_set as its parent.
+ *
+ * If CLONE_INTO_CGROUP is specified this function will try to find an
+ * existing css_set which includes the request cgroup and if not create
+ * a new css_set that the child will be attached to. If this function
+ * succeeds it will hold cgroup_threadgroup_rwsem on return. If
+ * CLONE_INTO_CGROUP is requested this function will grab cgroup mutex
+ * before grabbing cgroup_threadgroup_rwsem and will hold a reference
+ * to the target cgroup.
+ */
+static int cgroup_css_set_fork(struct task_struct *parent,
+			       struct kernel_clone_args *kargs)
+	__acquires(&cgroup_mutex) __acquires(&cgroup_threadgroup_rwsem)
+{
+	int ret;
+	struct cgroup *dst_cgrp = NULL;
+	struct css_set *cset;
+	struct super_block *sb;
+	struct file *f;
+
+	if (kargs->flags & CLONE_INTO_CGROUP)
+		mutex_lock(&cgroup_mutex);
+
+	cgroup_threadgroup_change_begin(parent);
+
+	spin_lock_irq(&css_set_lock);
+	cset = task_css_set(parent);
+	get_css_set(cset);
+	spin_unlock_irq(&css_set_lock);
+
+	if (!(kargs->flags & CLONE_INTO_CGROUP)) {
+		kargs->cset = cset;
+		return 0;
+	}
+
+	f = fget_raw(kargs->cgroup);
+	if (!f) {
+		ret = -EBADF;
+		goto err;
+	}
+	sb = f->f_path.dentry->d_sb;
+
+	dst_cgrp = cgroup_get_from_file(f);
+	if (IS_ERR(dst_cgrp)) {
+		ret = PTR_ERR(dst_cgrp);
+		dst_cgrp = NULL;
+		goto err;
+	}
+
+	/*
+	 * Verify that we can the target cgroup is writable for us. This is
+	 * usally done by the vfs layer but since we're not going through the
+	 * vfs layer here we need to do it.
+	 */
+	ret = cgroup_may_write(dst_cgrp, sb);
+	if (ret)
+		goto err;
+
+	ret = cgroup_attach_permissions(cset->dfl_cgrp, dst_cgrp, sb,
+					!!(kargs->flags & CLONE_THREAD));
+	if (ret)
+		goto err;
+
+	kargs->cset = find_css_set(cset, dst_cgrp);
+	if (!kargs->cset) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	if (cgroup_is_dead(dst_cgrp)) {
+		ret = -ENODEV;
+		goto err;
+	}
+
+	put_css_set(cset);
+	fput(f);
+	kargs->cgrp = dst_cgrp;
+	return ret;
+
+err:
+	cgroup_threadgroup_change_end(parent);
+	mutex_unlock(&cgroup_mutex);
+	if (f)
+		fput(f);
+	if (dst_cgrp)
+		cgroup_put(dst_cgrp);
+	put_css_set(cset);
+	return ret;
+}
+
+/**
+ * cgroup_css_set_put_fork - drop references we took during fork
+ * @parent: the parent of the child process
+ * @kargs: the arguments passed to create the child process
+ *
+ * Drop references to the prepared css_set and target cgroup if
+ * CLONE_INTO_CGROUP was requested. This function can only be
+ * called before fork()'s point of no return.
+ */
+static void cgroup_css_set_put_fork(struct task_struct *parent,
+				    struct kernel_clone_args *kargs)
+	__releases(&cgroup_threadgroup_rwsem) __releases(&cgroup_mutex)
+{
+	cgroup_threadgroup_change_end(parent);
+
+	if (kargs->flags & CLONE_INTO_CGROUP) {
+		struct cgroup *cgrp = kargs->cgrp;
+		struct css_set *cset = kargs->cset;
+
+		mutex_unlock(&cgroup_mutex);
+
+		if (cset) {
+			put_css_set(cset);
+			kargs->cset = NULL;
+		}
+
+		if (cgrp) {
+			cgroup_put(cgrp);
+			kargs->cgrp = NULL;
+		}
+	}
+}
+
 /**
  * cgroup_can_fork - called on a new task before the process is exposed
  * @parent: the parent process of @child
  * @child: the child process of @parent
  * @kargs: the arguments passed to create the child process
  *
+ * This prepares a new css_set for the child process which the child will
+ * be attached to in cgroup_post_fork().
  * This calls the subsystem can_fork() callbacks. If the cgroup_can_fork()
  * callback returns an error, the fork aborts with that error code. This
  * allows for a cgroup subsystem to conditionally allow or deny new forks.
  */
-int cgroup_can_fork(struct task_struct *parent, struct task_struct *child)
-	__acquires(&cgroup_threadgroup_rwsem) __releases(&cgroup_threadgroup_rwsem)
+int cgroup_can_fork(struct task_struct *parent, struct task_struct *child,
+			struct kernel_clone_args *kargs)
 {
 	struct cgroup_subsys *ss;
 	int i, j, ret;
 
-	cgroup_threadgroup_change_begin(parent);
+	ret = cgroup_css_set_fork(parent, kargs);
+	if (ret)
+		return ret;
 
 	do_each_subsys_mask(ss, i, have_canfork_callback) {
-		ret = ss->can_fork(child);
+		ret = ss->can_fork(parent, child, kargs->cset);
 		if (ret)
 			goto out_revert;
 	} while_each_subsys_mask();
@@ -5941,34 +6074,35 @@ int cgroup_can_fork(struct task_struct *parent, struct task_struct *child)
 		if (j >= i)
 			break;
 		if (ss->cancel_fork)
-			ss->cancel_fork(child);
+			ss->cancel_fork(child, kargs->cset);
 	}
 
-	cgroup_threadgroup_change_end(parent);
+	cgroup_css_set_put_fork(parent, kargs);
 
 	return ret;
 }
 
 /**
-  * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
-  * @parent: the parent process of @child
-  * @child: the child process of @parent
-  * @kargs: the arguments passed to create the child process
-  *
-  * This calls the cancel_fork() callbacks if a fork failed *after*
-  * cgroup_can_fork() succeded.
-  */
-void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child)
-	__releases(&cgroup_threadgroup_rwsem)
+ * cgroup_cancel_fork - called if a fork failed after cgroup_can_fork()
+ * @parent: the parent process of @child
+ * @child: the child process of @parent
+ * @kargs: the arguments passed to create the child process
+ *
+ * This calls the cancel_fork() callbacks if a fork failed *after*
+ * cgroup_can_fork() succeded and cleans up references we took to
+ * prepare a new css_set for the child process in cgroup_can_fork().
+ */
+void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child,
+			struct kernel_clone_args *kargs)
 {
 	struct cgroup_subsys *ss;
 	int i;
 
 	for_each_subsys(ss, i)
 		if (ss->cancel_fork)
-			ss->cancel_fork(child);
+			ss->cancel_fork(child, kargs->cset);
 
-	cgroup_threadgroup_change_end(parent);
+	cgroup_css_set_put_fork(parent, kargs);
 }
 
 /**
@@ -5980,18 +6114,17 @@ void cgroup_cancel_fork(struct task_struct *parent, struct task_struct *child)
  * Attach the child process to its css_set calling the subsystem fork()
  * callbacks.
  */
-void cgroup_post_fork(struct task_struct *parent, struct task_struct *child)
-	__releases(&cgroup_threadgroup_rwsem)
+void cgroup_post_fork(struct task_struct *parent, struct task_struct *child,
+		      struct kernel_clone_args *kargs)
+	__releases(&cgroup_threadgroup_rwsem) __releases(&cgroup_mutex)
 {
 	struct cgroup_subsys *ss;
-	struct css_set *cset;
+	struct css_set *cset = kargs->cset;
 	int i;
 
 	spin_lock_irq(&css_set_lock);
 
 	WARN_ON_ONCE(!list_empty(&child->cg_list));
-	cset = task_css_set(current); /* current is @child's parent */
-	get_css_set(cset);
 	cset->nr_tasks++;
 	css_set_move_task(child, NULL, cset, false);
 
@@ -6026,6 +6159,17 @@ void cgroup_post_fork(struct task_struct *parent, struct task_struct *child)
 	} while_each_subsys_mask();
 
 	cgroup_threadgroup_change_end(parent);
+
+	if (kargs->flags & CLONE_INTO_CGROUP) {
+		mutex_unlock(&cgroup_mutex);
+
+		cgroup_put(kargs->cgrp);
+		kargs->cgrp = NULL;
+	}
+
+	/* Make the new cset the root_cset of the new cgroup namespace. */
+	if (kargs->flags & CLONE_NEWCGROUP)
+		child->nsproxy->cgroup_ns->root_cset = cset;
 }
 
 /**
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 138059eb730d..4e7c8819c8df 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -33,6 +33,7 @@
 #include <linux/atomic.h>
 #include <linux/cgroup.h>
 #include <linux/slab.h>
+#include <linux/sched/task.h>
 
 #define PIDS_MAX (PID_MAX_LIMIT + 1ULL)
 #define PIDS_MAX_STR "max"
@@ -214,13 +215,17 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
  * task_css_check(true) in pids_can_fork() and pids_cancel_fork() relies
  * on cgroup_threadgroup_change_begin() held by the copy_process().
  */
-static int pids_can_fork(struct task_struct *task)
+static int pids_can_fork(struct task_struct *parent, struct task_struct *child,
+			 struct css_set *cset)
 {
 	struct cgroup_subsys_state *css;
 	struct pids_cgroup *pids;
 	int err;
 
-	css = task_css_check(current, pids_cgrp_id, true);
+	if (cset)
+		css = cset->subsys[pids_cgrp_id];
+	else
+		css = task_css_check(current, pids_cgrp_id, true);
 	pids = css_pids(css);
 	err = pids_try_charge(pids, 1);
 	if (err) {
@@ -235,12 +240,15 @@ static int pids_can_fork(struct task_struct *task)
 	return err;
 }
 
-static void pids_cancel_fork(struct task_struct *task)
+static void pids_cancel_fork(struct task_struct *task, struct css_set *cset)
 {
 	struct cgroup_subsys_state *css;
 	struct pids_cgroup *pids;
 
-	css = task_css_check(current, pids_cgrp_id, true);
+	if (cset)
+		css = cset->subsys[pids_cgrp_id];
+	else
+		css = task_css_check(current, pids_cgrp_id, true);
 	pids = css_pids(css);
 	pids_uncharge(pids, 1);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index c76758dbd594..15d6576ad8c0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2171,7 +2171,7 @@ static __latent_entropy struct task_struct *copy_process(
 	 * between here and cgroup_post_fork() if an organisation operation is in
 	 * progress.
 	 */
-	retval = cgroup_can_fork(current, p);
+	retval = cgroup_can_fork(current, p, args);
 	if (retval)
 		goto bad_fork_put_pidfd;
 
@@ -2278,7 +2278,7 @@ static __latent_entropy struct task_struct *copy_process(
 	write_unlock_irq(&tasklist_lock);
 
 	proc_fork_connector(p);
-	cgroup_post_fork(current, p);
+	cgroup_post_fork(current, p, args);
 	perf_event_fork(p);
 
 	trace_task_newtask(p, clone_flags);
@@ -2289,7 +2289,7 @@ static __latent_entropy struct task_struct *copy_process(
 bad_fork_cancel_cgroup:
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
-	cgroup_cancel_fork(current, p);
+	cgroup_cancel_fork(current, p, args);
 bad_fork_put_pidfd:
 	if (clone_flags & CLONE_PIDFD) {
 		fput(pidfile);
@@ -2618,6 +2618,9 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 		     !valid_signal(args.exit_signal)))
 		return -EINVAL;
 
+	if ((args.flags & CLONE_INTO_CGROUP) && args.cgroup < 0)
+		return -EINVAL;
+
 	*kargs = (struct kernel_clone_args){
 		.flags		= args.flags,
 		.pidfd		= u64_to_user_ptr(args.pidfd),
@@ -2628,6 +2631,7 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 		.stack_size	= args.stack_size,
 		.tls		= args.tls,
 		.set_tid_size	= args.set_tid_size,
+		.cgroup		= args.cgroup,
 	};
 
 	if (args.set_tid &&
@@ -2671,7 +2675,8 @@ static inline bool clone3_stack_valid(struct kernel_clone_args *kargs)
 static bool clone3_args_valid(struct kernel_clone_args *kargs)
 {
 	/* Verify that no unknown flags are passed along. */
-	if (kargs->flags & ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND))
+	if (kargs->flags &
+	    ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND | CLONE_INTO_CGROUP))
 		return false;
 
 	/*
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 6/6] selftests/cgroup: add tests for cloning into cgroups
  2020-01-17 18:12 ` Christian Brauner
                   ` (5 preceding siblings ...)
  (?)
@ 2020-01-17 18:12 ` Christian Brauner
  2020-01-17 20:24   ` Roman Gushchin
  -1 siblings, 1 reply; 27+ messages in thread
From: Christian Brauner @ 2020-01-17 18:12 UTC (permalink / raw)
  To: linux-api, linux-kernel, Tejun Heo
  Cc: Oleg Nesterov, Christian Brauner, Roman Gushchin, Shuah Khan,
	cgroups, linux-kselftest

Expand the cgroup test-suite to include tests for CLONE_INTO_CGROUP.
This adds the following tests:
- CLONE_INTO_CGROUP manages to clone a process directly into a correctly
  delegated cgroup
- CLONE_INTO_CGROUP fails to clone a process into a cgroup that has been
  removed after we've opened an fd to it
- CLONE_INTO_CGROUP fails to clone a process into an invalid domain
  cgroup
- CLONE_INTO_CGROUP adheres to the no internal process constraint
- CLONE_INTO_CGROUP works with the freezer feature

Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v1 */
Link: https://lore.kernel.org/r/20191218173516.7875-4-christian.brauner@ubuntu.com

/* v2 */
Link: https://lore.kernel.org/r/20191223061504.28716-4-christian.brauner@ubuntu.com
unchanged

/* v3 */
Link: https://lore.kernel.org/r/20200117002143.15559-6-christian.brauner@ubuntu.com
unchanged

/* v3 */
unchanged
---
 tools/testing/selftests/cgroup/Makefile       |   6 +-
 tools/testing/selftests/cgroup/cgroup_util.c  | 126 ++++++++++++++++++
 tools/testing/selftests/cgroup/cgroup_util.h  |   4 +
 tools/testing/selftests/cgroup/test_core.c    |  64 +++++++++
 .../selftests/clone3/clone3_selftests.h       |  19 ++-
 5 files changed, 214 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
index 66aafe1f5746..967f268fde74 100644
--- a/tools/testing/selftests/cgroup/Makefile
+++ b/tools/testing/selftests/cgroup/Makefile
@@ -11,6 +11,6 @@ TEST_GEN_PROGS += test_freezer
 
 include ../lib.mk
 
-$(OUTPUT)/test_memcontrol: cgroup_util.c
-$(OUTPUT)/test_core: cgroup_util.c
-$(OUTPUT)/test_freezer: cgroup_util.c
+$(OUTPUT)/test_memcontrol: cgroup_util.c ../clone3/clone3_selftests.h
+$(OUTPUT)/test_core: cgroup_util.c ../clone3/clone3_selftests.h
+$(OUTPUT)/test_freezer: cgroup_util.c ../clone3/clone3_selftests.h
diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c
index 8f7131dcf1ff..8a637ca7d73a 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -15,6 +15,7 @@
 #include <unistd.h>
 
 #include "cgroup_util.h"
+#include "../clone3/clone3_selftests.h"
 
 static ssize_t read_text(const char *path, char *buf, size_t max_len)
 {
@@ -331,12 +332,112 @@ int cg_run(const char *cgroup,
 	}
 }
 
+pid_t clone_into_cgroup(int cgroup_fd)
+{
+#ifdef CLONE_ARGS_SIZE_VER2
+	pid_t pid;
+
+	struct clone_args args = {
+		.flags = CLONE_INTO_CGROUP,
+		.exit_signal = SIGCHLD,
+		.cgroup = cgroup_fd,
+	};
+
+	pid = sys_clone3(&args, sizeof(struct clone_args));
+	/*
+	 * Verify that this is a genuine test failure:
+	 * ENOSYS -> clone3() not available
+	 * E2BIG  -> CLONE_INTO_CGROUP not available
+	 */
+	if (pid < 0 && (errno == ENOSYS || errno == E2BIG))
+		goto pretend_enosys;
+
+	return pid;
+
+pretend_enosys:
+#endif
+	errno = ENOSYS;
+	return -ENOSYS;
+}
+
+int clone_reap(pid_t pid, int options)
+{
+	int ret;
+	siginfo_t info = {
+		.si_signo = 0,
+	};
+
+again:
+	ret = waitid(P_PID, pid, &info, options | __WALL | __WNOTHREAD);
+	if (ret < 0) {
+		if (errno == EINTR)
+			goto again;
+		return -1;
+	}
+
+	if (options & WEXITED) {
+		if (WIFEXITED(info.si_status))
+			return WEXITSTATUS(info.si_status);
+	}
+
+	if (options & WSTOPPED) {
+		if (WIFSTOPPED(info.si_status))
+			return WSTOPSIG(info.si_status);
+	}
+
+	if (options & WCONTINUED) {
+		if (WIFCONTINUED(info.si_status))
+			return 0;
+	}
+
+	return -1;
+}
+
+int dirfd_open_opath(const char *dir)
+{
+	return open(dir, O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW | O_PATH);
+}
+
+#define close_prot_errno(fd)                                                   \
+	if (fd >= 0) {                                                         \
+		int _e_ = errno;                                               \
+		close(fd);                                                     \
+		errno = _e_;                                                   \
+	}
+
+static int clone_into_cgroup_run_nowait(const char *cgroup,
+					int (*fn)(const char *cgroup, void *arg),
+					void *arg)
+{
+	int cgroup_fd;
+	pid_t pid;
+
+	cgroup_fd =  dirfd_open_opath(cgroup);
+	if (cgroup_fd < 0)
+		return -1;
+
+	pid = clone_into_cgroup(cgroup_fd);
+	close_prot_errno(cgroup_fd);
+	if (pid == 0)
+		exit(fn(cgroup, arg));
+
+	return pid;
+}
+
 int cg_run_nowait(const char *cgroup,
 		  int (*fn)(const char *cgroup, void *arg),
 		  void *arg)
 {
 	int pid;
 
+	pid = clone_into_cgroup_run_nowait(cgroup, fn, arg);
+	if (pid > 0)
+		return pid;
+
+	/* Genuine test failure. */
+	if (pid < 0 && errno != ENOSYS)
+		return -1;
+
 	pid = fork();
 	if (pid == 0) {
 		char buf[64];
@@ -450,3 +551,28 @@ int proc_read_strstr(int pid, bool thread, const char *item, const char *needle)
 
 	return strstr(buf, needle) ? 0 : -1;
 }
+
+int clone_into_cgroup_run_wait(const char *cgroup)
+{
+	int cgroup_fd;
+	pid_t pid;
+
+	cgroup_fd =  dirfd_open_opath(cgroup);
+	if (cgroup_fd < 0)
+		return -1;
+
+	pid = clone_into_cgroup(cgroup_fd);
+	close_prot_errno(cgroup_fd);
+	if (pid < 0)
+		return -1;
+
+	if (pid == 0)
+		exit(EXIT_SUCCESS);
+
+	/*
+	 * We don't care whether this fails. We only care whether the initial
+	 * clone succeeded.
+	 */
+	(void)clone_reap(pid, WEXITED);
+	return 0;
+}
diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h
index 49c54fbdb229..5a1305dd1f0b 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.h
+++ b/tools/testing/selftests/cgroup/cgroup_util.h
@@ -50,3 +50,7 @@ extern int cg_wait_for_proc_count(const char *cgroup, int count);
 extern int cg_killall(const char *cgroup);
 extern ssize_t proc_read_text(int pid, bool thread, const char *item, char *buf, size_t size);
 extern int proc_read_strstr(int pid, bool thread, const char *item, const char *needle);
+extern pid_t clone_into_cgroup(int cgroup_fd);
+extern int clone_reap(pid_t pid, int options);
+extern int clone_into_cgroup_run_wait(const char *cgroup);
+extern int dirfd_open_opath(const char *dir);
diff --git a/tools/testing/selftests/cgroup/test_core.c b/tools/testing/selftests/cgroup/test_core.c
index c5ca669feb2b..96e016ccafe0 100644
--- a/tools/testing/selftests/cgroup/test_core.c
+++ b/tools/testing/selftests/cgroup/test_core.c
@@ -25,8 +25,11 @@
 static int test_cgcore_populated(const char *root)
 {
 	int ret = KSFT_FAIL;
+	int err;
 	char *cg_test_a = NULL, *cg_test_b = NULL;
 	char *cg_test_c = NULL, *cg_test_d = NULL;
+	int cgroup_fd = -EBADF;
+	pid_t pid;
 
 	cg_test_a = cg_name(root, "cg_test_a");
 	cg_test_b = cg_name(root, "cg_test_a/cg_test_b");
@@ -78,6 +81,52 @@ static int test_cgcore_populated(const char *root)
 	if (cg_read_strcmp(cg_test_d, "cgroup.events", "populated 0\n"))
 		goto cleanup;
 
+	/* Test that we can directly clone into a new cgroup. */
+	cgroup_fd = dirfd_open_opath(cg_test_d);
+	if (cgroup_fd < 0)
+		goto cleanup;
+
+	pid = clone_into_cgroup(cgroup_fd);
+	if (pid < 0) {
+		if (errno == ENOSYS)
+			goto cleanup_pass;
+		goto cleanup;
+	}
+
+	if (pid == 0) {
+		if (raise(SIGSTOP))
+			exit(EXIT_FAILURE);
+		exit(EXIT_SUCCESS);
+	}
+
+	err = cg_read_strcmp(cg_test_d, "cgroup.events", "populated 1\n");
+
+	(void)clone_reap(pid, WSTOPPED);
+	(void)kill(pid, SIGCONT);
+	(void)clone_reap(pid, WEXITED);
+
+	if (err)
+		goto cleanup;
+
+	if (cg_read_strcmp(cg_test_d, "cgroup.events", "populated 0\n"))
+		goto cleanup;
+
+	/* Remove cgroup. */
+	if (cg_test_d) {
+		cg_destroy(cg_test_d);
+		free(cg_test_d);
+		cg_test_d = NULL;
+	}
+
+	pid = clone_into_cgroup(cgroup_fd);
+	if (pid < 0)
+		goto cleanup_pass;
+	if (pid == 0)
+		exit(EXIT_SUCCESS);
+	(void)clone_reap(pid, WEXITED);
+	goto cleanup;
+
+cleanup_pass:
 	ret = KSFT_PASS;
 
 cleanup:
@@ -93,6 +142,8 @@ static int test_cgcore_populated(const char *root)
 	free(cg_test_c);
 	free(cg_test_b);
 	free(cg_test_a);
+	if (cgroup_fd >= 0)
+		close(cgroup_fd);
 	return ret;
 }
 
@@ -136,6 +187,16 @@ static int test_cgcore_invalid_domain(const char *root)
 	if (errno != EOPNOTSUPP)
 		goto cleanup;
 
+	if (!clone_into_cgroup_run_wait(child))
+		goto cleanup;
+
+	if (errno == ENOSYS)
+		goto cleanup_pass;
+
+	if (errno != EOPNOTSUPP)
+		goto cleanup;
+
+cleanup_pass:
 	ret = KSFT_PASS;
 
 cleanup:
@@ -345,6 +406,9 @@ static int test_cgcore_internal_process_constraint(const char *root)
 	if (!cg_enter_current(parent))
 		goto cleanup;
 
+	if (!clone_into_cgroup_run_wait(parent))
+		goto cleanup;
+
 	ret = KSFT_PASS;
 
 cleanup:
diff --git a/tools/testing/selftests/clone3/clone3_selftests.h b/tools/testing/selftests/clone3/clone3_selftests.h
index a3f2c8ad8bcc..91c1a78ddb39 100644
--- a/tools/testing/selftests/clone3/clone3_selftests.h
+++ b/tools/testing/selftests/clone3/clone3_selftests.h
@@ -5,12 +5,24 @@
 
 #define _GNU_SOURCE
 #include <sched.h>
+#include <linux/sched.h>
+#include <linux/types.h>
 #include <stdint.h>
 #include <syscall.h>
-#include <linux/types.h>
+#include <sys/wait.h>
+
+#include "../kselftest.h"
 
 #define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
 
+#ifndef CLONE_INTO_CGROUP
+#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
+#endif
+
+#ifndef CLONE_ARGS_SIZE_VER0
+#define CLONE_ARGS_SIZE_VER0 64
+#endif
+
 #ifndef __NR_clone3
 #define __NR_clone3 -1
 struct clone_args {
@@ -22,10 +34,13 @@ struct clone_args {
 	__aligned_u64 stack;
 	__aligned_u64 stack_size;
 	__aligned_u64 tls;
+#define CLONE_ARGS_SIZE_VER1 80
 	__aligned_u64 set_tid;
 	__aligned_u64 set_tid_size;
+#define CLONE_ARGS_SIZE_VER2 88
+	__aligned_u64 cgroup;
 };
-#endif
+#endif /* __NR_clone3 */
 
 static pid_t sys_clone3(struct clone_args *args, size_t size)
 {
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 6/6] selftests/cgroup: add tests for cloning into cgroups
  2020-01-17 18:12 ` [PATCH v4 6/6] selftests/cgroup: add tests for cloning " Christian Brauner
@ 2020-01-17 20:24   ` Roman Gushchin
  0 siblings, 0 replies; 27+ messages in thread
From: Roman Gushchin @ 2020-01-17 20:24 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-api, linux-kernel, Tejun Heo, Oleg Nesterov, Shuah Khan,
	cgroups, linux-kselftest

On Fri, Jan 17, 2020 at 07:12:19PM +0100, Christian Brauner wrote:
> Expand the cgroup test-suite to include tests for CLONE_INTO_CGROUP.
> This adds the following tests:
> - CLONE_INTO_CGROUP manages to clone a process directly into a correctly
>   delegated cgroup
> - CLONE_INTO_CGROUP fails to clone a process into a cgroup that has been
>   removed after we've opened an fd to it
> - CLONE_INTO_CGROUP fails to clone a process into an invalid domain
>   cgroup
> - CLONE_INTO_CGROUP adheres to the no internal process constraint
> - CLONE_INTO_CGROUP works with the freezer feature
> 
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: cgroups@vger.kernel.org
> Cc: linux-kselftest@vger.kernel.org
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

Acked-by: Roman Gushchin <guro@fb.com>

Thank you for adding tests!


> ---
> /* v1 */
> Link: https://lore.kernel.org/r/20191218173516.7875-4-christian.brauner@ubuntu.com
> 
> /* v2 */
> Link: https://lore.kernel.org/r/20191223061504.28716-4-christian.brauner@ubuntu.com
> unchanged
> 
> /* v3 */
> Link: https://lore.kernel.org/r/20200117002143.15559-6-christian.brauner@ubuntu.com
> unchanged
> 
> /* v3 */

v4?

> unchanged
> ---
>  tools/testing/selftests/cgroup/Makefile       |   6 +-
>  tools/testing/selftests/cgroup/cgroup_util.c  | 126 ++++++++++++++++++
>  tools/testing/selftests/cgroup/cgroup_util.h  |   4 +
>  tools/testing/selftests/cgroup/test_core.c    |  64 +++++++++
>  .../selftests/clone3/clone3_selftests.h       |  19 ++-
>  5 files changed, 214 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
> index 66aafe1f5746..967f268fde74 100644
> --- a/tools/testing/selftests/cgroup/Makefile
> +++ b/tools/testing/selftests/cgroup/Makefile
> @@ -11,6 +11,6 @@ TEST_GEN_PROGS += test_freezer
>  
>  include ../lib.mk
>  
> -$(OUTPUT)/test_memcontrol: cgroup_util.c
> -$(OUTPUT)/test_core: cgroup_util.c
> -$(OUTPUT)/test_freezer: cgroup_util.c
> +$(OUTPUT)/test_memcontrol: cgroup_util.c ../clone3/clone3_selftests.h
> +$(OUTPUT)/test_core: cgroup_util.c ../clone3/clone3_selftests.h
> +$(OUTPUT)/test_freezer: cgroup_util.c ../clone3/clone3_selftests.h
> diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c
> index 8f7131dcf1ff..8a637ca7d73a 100644
> --- a/tools/testing/selftests/cgroup/cgroup_util.c
> +++ b/tools/testing/selftests/cgroup/cgroup_util.c
> @@ -15,6 +15,7 @@
>  #include <unistd.h>
>  
>  #include "cgroup_util.h"
> +#include "../clone3/clone3_selftests.h"
>  
>  static ssize_t read_text(const char *path, char *buf, size_t max_len)
>  {
> @@ -331,12 +332,112 @@ int cg_run(const char *cgroup,
>  	}
>  }
>  
> +pid_t clone_into_cgroup(int cgroup_fd)
> +{
> +#ifdef CLONE_ARGS_SIZE_VER2
> +	pid_t pid;
> +
> +	struct clone_args args = {
> +		.flags = CLONE_INTO_CGROUP,
> +		.exit_signal = SIGCHLD,
> +		.cgroup = cgroup_fd,
> +	};
> +
> +	pid = sys_clone3(&args, sizeof(struct clone_args));
> +	/*
> +	 * Verify that this is a genuine test failure:
> +	 * ENOSYS -> clone3() not available
> +	 * E2BIG  -> CLONE_INTO_CGROUP not available
> +	 */
> +	if (pid < 0 && (errno == ENOSYS || errno == E2BIG))
> +		goto pretend_enosys;
> +
> +	return pid;
> +
> +pretend_enosys:
> +#endif
> +	errno = ENOSYS;
> +	return -ENOSYS;
> +}
> +
> +int clone_reap(pid_t pid, int options)
> +{
> +	int ret;
> +	siginfo_t info = {
> +		.si_signo = 0,
> +	};
> +
> +again:
> +	ret = waitid(P_PID, pid, &info, options | __WALL | __WNOTHREAD);
> +	if (ret < 0) {
> +		if (errno == EINTR)
> +			goto again;
> +		return -1;
> +	}
> +
> +	if (options & WEXITED) {
> +		if (WIFEXITED(info.si_status))
> +			return WEXITSTATUS(info.si_status);
> +	}
> +
> +	if (options & WSTOPPED) {
> +		if (WIFSTOPPED(info.si_status))
> +			return WSTOPSIG(info.si_status);
> +	}
> +
> +	if (options & WCONTINUED) {
> +		if (WIFCONTINUED(info.si_status))
> +			return 0;
> +	}
> +
> +	return -1;
> +}
> +
> +int dirfd_open_opath(const char *dir)
> +{
> +	return open(dir, O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW | O_PATH);
> +}
> +
> +#define close_prot_errno(fd)                                                   \
> +	if (fd >= 0) {                                                         \
> +		int _e_ = errno;                                               \
> +		close(fd);                                                     \
> +		errno = _e_;                                                   \
> +	}
> +
> +static int clone_into_cgroup_run_nowait(const char *cgroup,
> +					int (*fn)(const char *cgroup, void *arg),
> +					void *arg)
> +{
> +	int cgroup_fd;
> +	pid_t pid;
> +
> +	cgroup_fd =  dirfd_open_opath(cgroup);
> +	if (cgroup_fd < 0)
> +		return -1;
> +
> +	pid = clone_into_cgroup(cgroup_fd);
> +	close_prot_errno(cgroup_fd);
> +	if (pid == 0)
> +		exit(fn(cgroup, arg));
> +
> +	return pid;
> +}
> +
>  int cg_run_nowait(const char *cgroup,
>  		  int (*fn)(const char *cgroup, void *arg),
>  		  void *arg)
>  {
>  	int pid;
>  
> +	pid = clone_into_cgroup_run_nowait(cgroup, fn, arg);
> +	if (pid > 0)
> +		return pid;
> +
> +	/* Genuine test failure. */
> +	if (pid < 0 && errno != ENOSYS)
> +		return -1;
> +
>  	pid = fork();
>  	if (pid == 0) {
>  		char buf[64];
> @@ -450,3 +551,28 @@ int proc_read_strstr(int pid, bool thread, const char *item, const char *needle)
>  
>  	return strstr(buf, needle) ? 0 : -1;
>  }
> +
> +int clone_into_cgroup_run_wait(const char *cgroup)
> +{
> +	int cgroup_fd;
> +	pid_t pid;
> +
> +	cgroup_fd =  dirfd_open_opath(cgroup);
> +	if (cgroup_fd < 0)
> +		return -1;
> +
> +	pid = clone_into_cgroup(cgroup_fd);
> +	close_prot_errno(cgroup_fd);
> +	if (pid < 0)
> +		return -1;
> +
> +	if (pid == 0)
> +		exit(EXIT_SUCCESS);
> +
> +	/*
> +	 * We don't care whether this fails. We only care whether the initial
> +	 * clone succeeded.
> +	 */
> +	(void)clone_reap(pid, WEXITED);
> +	return 0;
> +}
> diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h
> index 49c54fbdb229..5a1305dd1f0b 100644
> --- a/tools/testing/selftests/cgroup/cgroup_util.h
> +++ b/tools/testing/selftests/cgroup/cgroup_util.h
> @@ -50,3 +50,7 @@ extern int cg_wait_for_proc_count(const char *cgroup, int count);
>  extern int cg_killall(const char *cgroup);
>  extern ssize_t proc_read_text(int pid, bool thread, const char *item, char *buf, size_t size);
>  extern int proc_read_strstr(int pid, bool thread, const char *item, const char *needle);
> +extern pid_t clone_into_cgroup(int cgroup_fd);
> +extern int clone_reap(pid_t pid, int options);
> +extern int clone_into_cgroup_run_wait(const char *cgroup);
> +extern int dirfd_open_opath(const char *dir);
> diff --git a/tools/testing/selftests/cgroup/test_core.c b/tools/testing/selftests/cgroup/test_core.c
> index c5ca669feb2b..96e016ccafe0 100644
> --- a/tools/testing/selftests/cgroup/test_core.c
> +++ b/tools/testing/selftests/cgroup/test_core.c
> @@ -25,8 +25,11 @@
>  static int test_cgcore_populated(const char *root)
>  {
>  	int ret = KSFT_FAIL;
> +	int err;
>  	char *cg_test_a = NULL, *cg_test_b = NULL;
>  	char *cg_test_c = NULL, *cg_test_d = NULL;
> +	int cgroup_fd = -EBADF;
> +	pid_t pid;
>  
>  	cg_test_a = cg_name(root, "cg_test_a");
>  	cg_test_b = cg_name(root, "cg_test_a/cg_test_b");
> @@ -78,6 +81,52 @@ static int test_cgcore_populated(const char *root)
>  	if (cg_read_strcmp(cg_test_d, "cgroup.events", "populated 0\n"))
>  		goto cleanup;
>  
> +	/* Test that we can directly clone into a new cgroup. */
> +	cgroup_fd = dirfd_open_opath(cg_test_d);
> +	if (cgroup_fd < 0)
> +		goto cleanup;
> +
> +	pid = clone_into_cgroup(cgroup_fd);
> +	if (pid < 0) {
> +		if (errno == ENOSYS)
> +			goto cleanup_pass;
> +		goto cleanup;
> +	}
> +
> +	if (pid == 0) {
> +		if (raise(SIGSTOP))
> +			exit(EXIT_FAILURE);
> +		exit(EXIT_SUCCESS);
> +	}
> +
> +	err = cg_read_strcmp(cg_test_d, "cgroup.events", "populated 1\n");
> +
> +	(void)clone_reap(pid, WSTOPPED);
> +	(void)kill(pid, SIGCONT);
> +	(void)clone_reap(pid, WEXITED);
> +
> +	if (err)
> +		goto cleanup;
> +
> +	if (cg_read_strcmp(cg_test_d, "cgroup.events", "populated 0\n"))
> +		goto cleanup;
> +
> +	/* Remove cgroup. */
> +	if (cg_test_d) {
> +		cg_destroy(cg_test_d);
> +		free(cg_test_d);
> +		cg_test_d = NULL;
> +	}
> +
> +	pid = clone_into_cgroup(cgroup_fd);
> +	if (pid < 0)
> +		goto cleanup_pass;
> +	if (pid == 0)
> +		exit(EXIT_SUCCESS);
> +	(void)clone_reap(pid, WEXITED);
> +	goto cleanup;
> +
> +cleanup_pass:
>  	ret = KSFT_PASS;
>  
>  cleanup:
> @@ -93,6 +142,8 @@ static int test_cgcore_populated(const char *root)
>  	free(cg_test_c);
>  	free(cg_test_b);
>  	free(cg_test_a);
> +	if (cgroup_fd >= 0)
> +		close(cgroup_fd);
>  	return ret;
>  }
>  
> @@ -136,6 +187,16 @@ static int test_cgcore_invalid_domain(const char *root)
>  	if (errno != EOPNOTSUPP)
>  		goto cleanup;
>  
> +	if (!clone_into_cgroup_run_wait(child))
> +		goto cleanup;
> +
> +	if (errno == ENOSYS)
> +		goto cleanup_pass;
> +
> +	if (errno != EOPNOTSUPP)
> +		goto cleanup;
> +
> +cleanup_pass:
>  	ret = KSFT_PASS;
>  
>  cleanup:
> @@ -345,6 +406,9 @@ static int test_cgcore_internal_process_constraint(const char *root)
>  	if (!cg_enter_current(parent))
>  		goto cleanup;
>  
> +	if (!clone_into_cgroup_run_wait(parent))
> +		goto cleanup;
> +
>  	ret = KSFT_PASS;
>  
>  cleanup:
> diff --git a/tools/testing/selftests/clone3/clone3_selftests.h b/tools/testing/selftests/clone3/clone3_selftests.h
> index a3f2c8ad8bcc..91c1a78ddb39 100644
> --- a/tools/testing/selftests/clone3/clone3_selftests.h
> +++ b/tools/testing/selftests/clone3/clone3_selftests.h
> @@ -5,12 +5,24 @@
>  
>  #define _GNU_SOURCE
>  #include <sched.h>
> +#include <linux/sched.h>
> +#include <linux/types.h>
>  #include <stdint.h>
>  #include <syscall.h>
> -#include <linux/types.h>
> +#include <sys/wait.h>
> +
> +#include "../kselftest.h"
>  
>  #define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
>  
> +#ifndef CLONE_INTO_CGROUP
> +#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
> +#endif
> +
> +#ifndef CLONE_ARGS_SIZE_VER0
> +#define CLONE_ARGS_SIZE_VER0 64
> +#endif
> +
>  #ifndef __NR_clone3
>  #define __NR_clone3 -1
>  struct clone_args {
> @@ -22,10 +34,13 @@ struct clone_args {
>  	__aligned_u64 stack;
>  	__aligned_u64 stack_size;
>  	__aligned_u64 tls;
> +#define CLONE_ARGS_SIZE_VER1 80
>  	__aligned_u64 set_tid;
>  	__aligned_u64 set_tid_size;
> +#define CLONE_ARGS_SIZE_VER2 88
> +	__aligned_u64 cgroup;
>  };
> -#endif
> +#endif /* __NR_clone3 */
>  
>  static pid_t sys_clone3(struct clone_args *args, size_t size)
>  {
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 3/6] cgroup: refactor fork helpers
  2020-01-17 18:12   ` Christian Brauner
  (?)
@ 2020-01-20 14:00   ` Oleg Nesterov
  2020-01-20 14:04     ` Christian Brauner
  -1 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2020-01-20 14:00 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-api, linux-kernel, Tejun Heo, Johannes Weiner, Li Zefan, cgroups

This is probably the only patch in series I can understand ;)

To me it looks like a good cleanup regardless, but

On 01/17, Christian Brauner wrote:
>
> The patch just passes in the parent task_struct

For what? "parent" is always "current", no?

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 3/6] cgroup: refactor fork helpers
  2020-01-20 14:00   ` Oleg Nesterov
@ 2020-01-20 14:04     ` Christian Brauner
  2020-01-20 14:22       ` Oleg Nesterov
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Brauner @ 2020-01-20 14:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-api, linux-kernel, Tejun Heo, Johannes Weiner, Li Zefan, cgroups

On Mon, Jan 20, 2020 at 03:00:30PM +0100, Oleg Nesterov wrote:
> This is probably the only patch in series I can understand ;)
> 
> To me it looks like a good cleanup regardless, but
> 
> On 01/17, Christian Brauner wrote:
> >
> > The patch just passes in the parent task_struct
> 
> For what? "parent" is always "current", no?

Yes. What exactly are you hinting at? :) Would you prefer that the
commit message speaks of "current" instead of "parent"?

Christian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 3/6] cgroup: refactor fork helpers
  2020-01-20 14:04     ` Christian Brauner
@ 2020-01-20 14:22       ` Oleg Nesterov
  2020-01-20 14:33           ` Christian Brauner
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2020-01-20 14:22 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-api, linux-kernel, Tejun Heo, Johannes Weiner, Li Zefan, cgroups

On 01/20, Christian Brauner wrote:
>
> On Mon, Jan 20, 2020 at 03:00:30PM +0100, Oleg Nesterov wrote:
> > This is probably the only patch in series I can understand ;)
> >
> > To me it looks like a good cleanup regardless, but
> >
> > On 01/17, Christian Brauner wrote:
> > >
> > > The patch just passes in the parent task_struct
> >
> > For what? "parent" is always "current", no?
>
> Yes. What exactly are you hinting at? :) Would you prefer that the
> commit message speaks of "current" instead of "parent"?

I meant, I don't understand why did you add the new "parent" arg,
cgroup_xxx_fork() can simply use "current" ?

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 3/6] cgroup: refactor fork helpers
@ 2020-01-20 14:33           ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-20 14:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-api, linux-kernel, Tejun Heo, Johannes Weiner, Li Zefan, cgroups

On Mon, Jan 20, 2020 at 03:22:03PM +0100, Oleg Nesterov wrote:
> On 01/20, Christian Brauner wrote:
> >
> > On Mon, Jan 20, 2020 at 03:00:30PM +0100, Oleg Nesterov wrote:
> > > This is probably the only patch in series I can understand ;)
> > >
> > > To me it looks like a good cleanup regardless, but
> > >
> > > On 01/17, Christian Brauner wrote:
> > > >
> > > > The patch just passes in the parent task_struct
> > >
> > > For what? "parent" is always "current", no?
> >
> > Yes. What exactly are you hinting at? :) Would you prefer that the
> > commit message speaks of "current" instead of "parent"?
> 
> I meant, I don't understand why did you add the new "parent" arg,
> cgroup_xxx_fork() can simply use "current" ?

There's no specific reason behind it. I don't care much whether it's an
arg or we just grab current in each helper directly.

Christian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 3/6] cgroup: refactor fork helpers
@ 2020-01-20 14:33           ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-20 14:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Johannes Weiner,
	Li Zefan, cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, Jan 20, 2020 at 03:22:03PM +0100, Oleg Nesterov wrote:
> On 01/20, Christian Brauner wrote:
> >
> > On Mon, Jan 20, 2020 at 03:00:30PM +0100, Oleg Nesterov wrote:
> > > This is probably the only patch in series I can understand ;)
> > >
> > > To me it looks like a good cleanup regardless, but
> > >
> > > On 01/17, Christian Brauner wrote:
> > > >
> > > > The patch just passes in the parent task_struct
> > >
> > > For what? "parent" is always "current", no?
> >
> > Yes. What exactly are you hinting at? :) Would you prefer that the
> > commit message speaks of "current" instead of "parent"?
> 
> I meant, I don't understand why did you add the new "parent" arg,
> cgroup_xxx_fork() can simply use "current" ?

There's no specific reason behind it. I don't care much whether it's an
arg or we just grab current in each helper directly.

Christian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/6] cgroup: unify attach permission checking
@ 2020-01-20 14:42     ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2020-01-20 14:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-api, linux-kernel, Tejun Heo, Li Zefan, Johannes Weiner, cgroups

I guess I am totally confused, but...

On 01/17, Christian Brauner wrote:
>
> +static inline bool cgroup_same_domain(const struct cgroup *src_cgrp,
> +				      const struct cgroup *dst_cgrp)
> +{
> +	return src_cgrp->dom_cgrp == dst_cgrp->dom_cgrp;
> +}
> +
> +static int cgroup_attach_permissions(struct cgroup *src_cgrp,
> +				     struct cgroup *dst_cgrp,
> +				     struct super_block *sb, bool thread)
> +{
> +	int ret = 0;
> +
> +	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, sb);
> +	if (ret)
> +		return ret;
> +
> +	ret = cgroup_migrate_vet_dst(dst_cgrp);
> +	if (ret)
> +		return ret;
> +
> +	if (thread &&
> +	    !cgroup_same_domain(src_cgrp->dom_cgrp, dst_cgrp->dom_cgrp))
                                        ^^^^^^^^^^          ^^^^^^^^^^

             cgroup_same_domain(src_cgrp, dst_cgrp)

no?

And given that cgroup_same_domain() has no other users, perhaps it can
simply check

	     src_cgrp->dom_cgrp != dst_cgrp->dom_cgrp

?

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/6] cgroup: unify attach permission checking
@ 2020-01-20 14:42     ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2020-01-20 14:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Li Zefan,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

I guess I am totally confused, but...

On 01/17, Christian Brauner wrote:
>
> +static inline bool cgroup_same_domain(const struct cgroup *src_cgrp,
> +				      const struct cgroup *dst_cgrp)
> +{
> +	return src_cgrp->dom_cgrp == dst_cgrp->dom_cgrp;
> +}
> +
> +static int cgroup_attach_permissions(struct cgroup *src_cgrp,
> +				     struct cgroup *dst_cgrp,
> +				     struct super_block *sb, bool thread)
> +{
> +	int ret = 0;
> +
> +	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, sb);
> +	if (ret)
> +		return ret;
> +
> +	ret = cgroup_migrate_vet_dst(dst_cgrp);
> +	if (ret)
> +		return ret;
> +
> +	if (thread &&
> +	    !cgroup_same_domain(src_cgrp->dom_cgrp, dst_cgrp->dom_cgrp))
                                        ^^^^^^^^^^          ^^^^^^^^^^

             cgroup_same_domain(src_cgrp, dst_cgrp)

no?

And given that cgroup_same_domain() has no other users, perhaps it can
simply check

	     src_cgrp->dom_cgrp != dst_cgrp->dom_cgrp

?

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/6] cgroup: unify attach permission checking
@ 2020-01-20 14:46       ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-20 14:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-api, linux-kernel, Tejun Heo, Li Zefan, Johannes Weiner, cgroups

On Mon, Jan 20, 2020 at 03:42:45PM +0100, Oleg Nesterov wrote:
> I guess I am totally confused, but...
> 
> On 01/17, Christian Brauner wrote:
> >
> > +static inline bool cgroup_same_domain(const struct cgroup *src_cgrp,
> > +				      const struct cgroup *dst_cgrp)
> > +{
> > +	return src_cgrp->dom_cgrp == dst_cgrp->dom_cgrp;
> > +}
> > +
> > +static int cgroup_attach_permissions(struct cgroup *src_cgrp,
> > +				     struct cgroup *dst_cgrp,
> > +				     struct super_block *sb, bool thread)
> > +{
> > +	int ret = 0;
> > +
> > +	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, sb);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = cgroup_migrate_vet_dst(dst_cgrp);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (thread &&
> > +	    !cgroup_same_domain(src_cgrp->dom_cgrp, dst_cgrp->dom_cgrp))
>                                         ^^^^^^^^^^          ^^^^^^^^^^
> 
>              cgroup_same_domain(src_cgrp, dst_cgrp)
> 
> no?
> 
> And given that cgroup_same_domain() has no other users, perhaps it can
> simply check
> 
> 	     src_cgrp->dom_cgrp != dst_cgrp->dom_cgrp

Yeah, I just added it because the helper is very descriptive given its
name. Maybe too descriptive given my braino.
I'll just remove it in favor of this check and give it a small comment.

Thanks!
Christian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/6] cgroup: unify attach permission checking
@ 2020-01-20 14:46       ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-20 14:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Li Zefan,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, Jan 20, 2020 at 03:42:45PM +0100, Oleg Nesterov wrote:
> I guess I am totally confused, but...
> 
> On 01/17, Christian Brauner wrote:
> >
> > +static inline bool cgroup_same_domain(const struct cgroup *src_cgrp,
> > +				      const struct cgroup *dst_cgrp)
> > +{
> > +	return src_cgrp->dom_cgrp == dst_cgrp->dom_cgrp;
> > +}
> > +
> > +static int cgroup_attach_permissions(struct cgroup *src_cgrp,
> > +				     struct cgroup *dst_cgrp,
> > +				     struct super_block *sb, bool thread)
> > +{
> > +	int ret = 0;
> > +
> > +	ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, sb);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = cgroup_migrate_vet_dst(dst_cgrp);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (thread &&
> > +	    !cgroup_same_domain(src_cgrp->dom_cgrp, dst_cgrp->dom_cgrp))
>                                         ^^^^^^^^^^          ^^^^^^^^^^
> 
>              cgroup_same_domain(src_cgrp, dst_cgrp)
> 
> no?
> 
> And given that cgroup_same_domain() has no other users, perhaps it can
> simply check
> 
> 	     src_cgrp->dom_cgrp != dst_cgrp->dom_cgrp

Yeah, I just added it because the helper is very descriptive given its
name. Maybe too descriptive given my braino.
I'll just remove it in favor of this check and give it a small comment.

Thanks!
Christian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/6] clone3: allow spawning processes into cgroups
@ 2020-01-20 15:39     ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2020-01-20 15:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-api, linux-kernel, Tejun Heo, Ingo Molnar, Johannes Weiner,
	Li Zefan, Peter Zijlstra, cgroups

On 01/17, Christian Brauner wrote:
>
> +static int cgroup_css_set_fork(struct task_struct *parent,
> +			       struct kernel_clone_args *kargs)
...
> +	kargs->cset = find_css_set(cset, dst_cgrp);
> +	if (!kargs->cset) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	if (cgroup_is_dead(dst_cgrp)) {
> +		ret = -ENODEV;
> +		goto err;
                ^^^^^^^^

this looks wrong... don't we need put_css_set(kargs->cset) before "goto err" ?

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/6] clone3: allow spawning processes into cgroups
@ 2020-01-20 15:39     ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2020-01-20 15:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Ingo Molnar,
	Johannes Weiner, Li Zefan, Peter Zijlstra,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 01/17, Christian Brauner wrote:
>
> +static int cgroup_css_set_fork(struct task_struct *parent,
> +			       struct kernel_clone_args *kargs)
...
> +	kargs->cset = find_css_set(cset, dst_cgrp);
> +	if (!kargs->cset) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	if (cgroup_is_dead(dst_cgrp)) {
> +		ret = -ENODEV;
> +		goto err;
                ^^^^^^^^

this looks wrong... don't we need put_css_set(kargs->cset) before "goto err" ?

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/6] clone3: allow spawning processes into cgroups
  2020-01-20 15:39     ` Oleg Nesterov
  (?)
@ 2020-01-20 21:38     ` Christian Brauner
  -1 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2020-01-20 21:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-api, linux-kernel, Tejun Heo, Ingo Molnar, Johannes Weiner,
	Li Zefan, Peter Zijlstra, cgroups

On Mon, Jan 20, 2020 at 04:39:30PM +0100, Oleg Nesterov wrote:
> On 01/17, Christian Brauner wrote:
> >
> > +static int cgroup_css_set_fork(struct task_struct *parent,
> > +			       struct kernel_clone_args *kargs)
> ...
> > +	kargs->cset = find_css_set(cset, dst_cgrp);
> > +	if (!kargs->cset) {
> > +		ret = -ENOMEM;
> > +		goto err;
> > +	}
> > +
> > +	if (cgroup_is_dead(dst_cgrp)) {
> > +		ret = -ENODEV;
> > +		goto err;
>                 ^^^^^^^^
> 
> this looks wrong... don't we need put_css_set(kargs->cset) before "goto err" ?

Yeah, but we should rather do:

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 4d36255ef25f..482055d1e64a 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5994,6 +5994,8 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
        if (dst_cgrp)
                cgroup_put(dst_cgrp);
        put_css_set(cset);
+       if (kargs->cset)
+               put_css_set(kargs->cset);
        return ret;
 }

Christian

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-01-20 21:39 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-17 18:12 [PATCH v4 0/6] clone3 & cgroups: allow spawning processes into cgroups Christian Brauner
2020-01-17 18:12 ` Christian Brauner
2020-01-17 18:12 ` [PATCH v4 1/6] cgroup: unify attach permission checking Christian Brauner
2020-01-17 18:12   ` Christian Brauner
2020-01-20 14:42   ` Oleg Nesterov
2020-01-20 14:42     ` Oleg Nesterov
2020-01-20 14:46     ` Christian Brauner
2020-01-20 14:46       ` Christian Brauner
2020-01-17 18:12 ` [PATCH v4 2/6] cgroup: add cgroup_get_from_file() helper Christian Brauner
2020-01-17 18:12   ` Christian Brauner
2020-01-17 18:12 ` [PATCH v4 3/6] cgroup: refactor fork helpers Christian Brauner
2020-01-17 18:12   ` Christian Brauner
2020-01-20 14:00   ` Oleg Nesterov
2020-01-20 14:04     ` Christian Brauner
2020-01-20 14:22       ` Oleg Nesterov
2020-01-20 14:33         ` Christian Brauner
2020-01-20 14:33           ` Christian Brauner
2020-01-17 18:12 ` [PATCH v4 4/6] cgroup: add cgroup_may_write() helper Christian Brauner
2020-01-17 18:12   ` Christian Brauner
2020-01-17 18:12 ` [PATCH v4 5/6] clone3: allow spawning processes into cgroups Christian Brauner
2020-01-17 18:12   ` Christian Brauner
2020-01-17 18:12   ` Christian Brauner
2020-01-20 15:39   ` Oleg Nesterov
2020-01-20 15:39     ` Oleg Nesterov
2020-01-20 21:38     ` Christian Brauner
2020-01-17 18:12 ` [PATCH v4 6/6] selftests/cgroup: add tests for cloning " Christian Brauner
2020-01-17 20:24   ` Roman Gushchin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.